Integer type:  int32  int64  nag_int  show int32  show int32  show int64  show int64  show nag_int  show nag_int

Chapter Contents
Chapter Introduction
NAG Toolbox

# NAG Toolbox: nag_stat_quantiles_stream_fixed (g01an)

## Purpose

nag_stat_quantiles_stream_fixed (g01an) finds approximate quantiles from a data stream of known size using an out-of-core algorithm.

## Syntax

[ind, np, qv, rcomm, icomm, ifail] = g01an(ind, n, rv, eps, q, rcomm, icomm, 'nb', nb, 'nq', nq, 'lrcomm', lrcomm, 'licomm', licomm)
[ind, np, qv, rcomm, icomm, ifail] = nag_stat_quantiles_stream_fixed(ind, n, rv, eps, q, rcomm, icomm, 'nb', nb, 'nq', nq, 'lrcomm', lrcomm, 'licomm', licomm)

## Description

A quantile is a value which divides a frequency distribution such that there is a given proportion of data values below the quantile. For example, the median of a dataset is the 0.5$0.5$ quantile because half the values are less than or equal to it.
nag_stat_quantiles_stream_fixed (g01an) uses a slightly modified version of an algorithm described in a paper by Zhang and Wang (2007) to determine ε$\epsilon$-approximate quantiles of a data stream of n$n$ real values, where n$n$ is known. Given any quantile q[0.0,1.0]$q\in \left[0.0,1.0\right]$, an ε$\epsilon$-approximate quantile is defined as an element in the data stream whose rank falls within [(qε)n,(q + ε)n] $\left[\left(q-\epsilon \right)n,\left(q+\epsilon \right)n\right]$. In case of more than one ε$\epsilon$-approximate quantile being available, the one closest to qn$qn$ is returned.

## References

Zhang Q and Wang W (2007) A fast algorithm for approximate quantiles in high speed data streams Proceedings of the 19th International Conference on Scientific and Statistical Database Management IEEE Computer Society 29

## Parameters

### Compulsory Input Parameters

1:     ind – int64int32nag_int scalar
Indicates the action required in the current call to nag_stat_quantiles_stream_fixed (g01an).
ind = 0${\mathbf{ind}}=0$
Return the required length of rcomm and icomm in icomm(1)${\mathbf{icomm}}\left(1\right)$ and icomm(2)${\mathbf{icomm}}\left(2\right)$ respectively. n and eps must be set and licomm must be at least 2$2$.
ind = 1${\mathbf{ind}}=1$
Initialise the communication arrays and process the first nb values from the data stream as supplied in rv.
ind = 2${\mathbf{ind}}=2$
Process the next block of nb values from the data stream. The calling program must update rv and (if required) nb, and re-enter nag_stat_quantiles_stream_fixed (g01an) with all other parameters unchanged.
ind = 3${\mathbf{ind}}=3$
Calculate the nq ε$\epsilon$-approximate quantiles specified in q. The calling program must set q and nq and re-enter nag_stat_quantiles_stream_fixed (g01an) with all other parameters unchanged. This option can be chosen only when npexp(1.0) / eps${\mathbf{np}}\ge ⌈\mathrm{exp}\left(1.0\right)/{\mathbf{eps}}⌉$.
Constraint: on entry ind = 0${\mathbf{ind}}=0$, 1$1$, 2$2$ or 3$3$.
2:     n – int64int32nag_int scalar
n$n$, the total number of values in the data stream.
Constraint: n > 0${\mathbf{n}}>0$.
3:     rv( : $:$) – double array
Note: the dimension of the array rv must be at least nb${\mathbf{nb}}$ if ind = 1${\mathbf{ind}}=1$ or 2$2$.
If ind = 1${\mathbf{ind}}=1$ or 2$2$, the vector containing the current block of data, otherwise rv is not referenced.
4:     eps – double scalar
Approximation factor ε$\epsilon$.
Constraint: epsexp(1.0) / n ​ and ​eps1.0${\mathbf{eps}}\ge \mathrm{exp}\left(1.0\right)/{\mathbf{n}}\text{​ and ​}{\mathbf{eps}}\le 1.0$.
5:     q( : $:$) – double array
Note: the dimension of the array q must be at least nq${\mathbf{nq}}$ if ind = 3${\mathbf{ind}}=3$.
If ind = 3${\mathbf{ind}}=3$, the quantiles to be calculated, otherwise q is not referenced. Note that q(i) = 0.0${\mathbf{q}}\left(i\right)=0.0$, corresponds to the minimum value and q(i) = 1.0${\mathbf{q}}\left(i\right)=1.0$ to the maximum value.
Constraint: if ind = 3${\mathbf{ind}}=3$, 0.0q(i)1.0$0.0\le {\mathbf{q}}\left(\mathit{i}\right)\le 1.0$, for i = 1,2,,nq$\mathit{i}=1,2,\dots ,{\mathbf{nq}}$.
6:     rcomm(lrcomm) – double array
lrcomm, the dimension of the array, must satisfy the constraint if ind0${\mathbf{ind}}\ne 0$, lrcomm must be at least equal to the value returned in icomm(1)${\mathbf{icomm}}\left(1\right)$ by a call to nag_stat_quantiles_stream_fixed (g01an) with ind = 0${\mathbf{ind}}=0$. This will not be more than x + 2 × min (x, x / 2.0 + 1 ) × log2(n / x + 1.0) + 1 $x+2×\mathrm{min}\phantom{\rule{0.125em}{0ex}}\left(x,⌈x/2.0⌉+1\right)×{\mathrm{log}}_{2}\left({\mathbf{n}}/x+1.0\right)+1$, where x = max (1, log(eps × n) / eps ) $x=\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(1,⌊\mathrm{log}\left({\mathbf{eps}}×{\mathbf{n}}\right)/{\mathbf{eps}}⌋\right)$.
Constraint: if ind0${\mathbf{ind}}\ne 0$, lrcomm must be at least equal to the value returned in icomm(1)${\mathbf{icomm}}\left(1\right)$ by a call to nag_stat_quantiles_stream_fixed (g01an) with ind = 0${\mathbf{ind}}=0$. This will not be more than x + 2 × min (x, x / 2.0 + 1 ) × log2(n / x + 1.0) + 1 $x+2×\mathrm{min}\phantom{\rule{0.125em}{0ex}}\left(x,⌈x/2.0⌉+1\right)×{\mathrm{log}}_{2}\left({\mathbf{n}}/x+1.0\right)+1$, where x = max (1, log(eps × n) / eps ) $x=\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(1,⌊\mathrm{log}\left({\mathbf{eps}}×{\mathbf{n}}\right)/{\mathbf{eps}}⌋\right)$.
7:     icomm(licomm) – int64int32nag_int array
licomm, the dimension of the array, must satisfy the constraint
• if ind = 0${\mathbf{ind}}=0$, licomm2${\mathbf{licomm}}\ge 2$;
• otherwise licomm must be at least equal to the value returned in icomm(2)${\mathbf{icomm}}\left(2\right)$ by a call to nag_stat_quantiles_stream_fixed (g01an) with ind = 0${\mathbf{ind}}=0$. This will not be more than 2 × ( x + 2 × min (x, x / 2.0 + 1 ) × y) + y + 6 $2×\left(x+2×\mathrm{min}\phantom{\rule{0.125em}{0ex}}\left(x,⌈x/2.0⌉+1\right)×y\right)+y+6$, where x = max (1,log(eps × n) / eps) $x=\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(1,⌊\mathrm{log}\left({\mathbf{eps}}×{\mathbf{n}}\right)/{\mathbf{eps}}⌋\right)$ and y = log2(n / x + 1.0) + 1 $y={\mathrm{log}}_{2}\left({\mathbf{n}}/x+1.0\right)+1$.
Constraints:
• if ind = 0${\mathbf{ind}}=0$, licomm2${\mathbf{licomm}}\ge 2$;
• otherwise licomm must be at least equal to the value returned in icomm(2)${\mathbf{icomm}}\left(2\right)$ by a call to nag_stat_quantiles_stream_fixed (g01an) with ind = 0${\mathbf{ind}}=0$. This will not be more than 2 × ( x + 2 × min (x, x / 2.0 + 1 ) × y) + y + 6 $2×\left(x+2×\mathrm{min}\phantom{\rule{0.125em}{0ex}}\left(x,⌈x/2.0⌉+1\right)×y\right)+y+6$, where x = max (1,log(eps × n) / eps) $x=\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(1,⌊\mathrm{log}\left({\mathbf{eps}}×{\mathbf{n}}\right)/{\mathbf{eps}}⌋\right)$ and y = log2(n / x + 1.0) + 1 $y={\mathrm{log}}_{2}\left({\mathbf{n}}/x+1.0\right)+1$.

### Optional Input Parameters

1:     nb – int64int32nag_int scalar
Default: The dimension of the array rv.
If ind = 1${\mathbf{ind}}=1$ or 2$2$, the size of the current block of data. The size of blocks of data in array rv can vary; therefore nb can change between calls to nag_stat_quantiles_stream_fixed (g01an).
Constraint: if ind = 1${\mathbf{ind}}=1$ or 2$2$, nb > 0${\mathbf{nb}}>0$.
2:     nq – int64int32nag_int scalar
Default: The dimension of the array q.
If ind = 3${\mathbf{ind}}=3$, the number of quantiles requested, otherwise nq is not referenced.
Constraint: if ind = 3${\mathbf{ind}}=3$, nq > 0${\mathbf{nq}}>0$.
3:     lrcomm – int64int32nag_int scalar
Default: The dimension of the array rcomm.
Constraint: if ind0${\mathbf{ind}}\ne 0$, lrcomm must be at least equal to the value returned in icomm(1)${\mathbf{icomm}}\left(1\right)$ by a call to nag_stat_quantiles_stream_fixed (g01an) with ind = 0${\mathbf{ind}}=0$. This will not be more than x + 2 × min (x, x / 2.0 + 1 ) × log2(n / x + 1.0) + 1 $x+2×\mathrm{min}\phantom{\rule{0.125em}{0ex}}\left(x,⌈x/2.0⌉+1\right)×{\mathrm{log}}_{2}\left({\mathbf{n}}/x+1.0\right)+1$, where x = max (1, log(eps × n) / eps ) $x=\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(1,⌊\mathrm{log}\left({\mathbf{eps}}×{\mathbf{n}}\right)/{\mathbf{eps}}⌋\right)$.
4:     licomm – int64int32nag_int scalar
Default: The dimension of the array icomm.
Constraints:
• if ind = 0${\mathbf{ind}}=0$, licomm2${\mathbf{licomm}}\ge 2$;
• otherwise licomm must be at least equal to the value returned in icomm(2)${\mathbf{icomm}}\left(2\right)$ by a call to nag_stat_quantiles_stream_fixed (g01an) with ind = 0${\mathbf{ind}}=0$. This will not be more than 2 × ( x + 2 × min (x, x / 2.0 + 1 ) × y) + y + 6 $2×\left(x+2×\mathrm{min}\phantom{\rule{0.125em}{0ex}}\left(x,⌈x/2.0⌉+1\right)×y\right)+y+6$, where x = max (1,log(eps × n) / eps) $x=\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(1,⌊\mathrm{log}\left({\mathbf{eps}}×{\mathbf{n}}\right)/{\mathbf{eps}}⌋\right)$ and y = log2(n / x + 1.0) + 1 $y={\mathrm{log}}_{2}\left({\mathbf{n}}/x+1.0\right)+1$.

None.

### Output Parameters

1:     ind – int64int32nag_int scalar
Indicates output from a successful call.
ind = 1${\mathbf{ind}}=1$
Lengths of rcomm and icomm have been returned in icomm(1)${\mathbf{icomm}}\left(1\right)$ and icomm(2)${\mathbf{icomm}}\left(2\right)$ respectively.
ind = 2${\mathbf{ind}}=2$
nag_stat_quantiles_stream_fixed (g01an) has processed np data points and expects to be called again with additional data (i.e., np < n${\mathbf{np}}<{\mathbf{n}}$).
ind = 3${\mathbf{ind}}=3$
nag_stat_quantiles_stream_fixed (g01an) has returned the requested ε$\epsilon$-approximate quantiles in qv. These quantiles are based on np data points.
ind = 4${\mathbf{ind}}=4$
Routine has processed all n data points (i.e., np = n${\mathbf{np}}={\mathbf{n}}$).
2:     np – int64int32nag_int scalar
The number of elements processed so far.
3:     qv( : $:$) – double array
Note: the dimension of the array qv must be at least nq${\mathbf{nq}}$ if ind = 3${\mathbf{ind}}=3$.
If ind = 3${\mathbf{ind}}=3$, qv(i)${\mathbf{qv}}\left(i\right)$ contains the ε$\epsilon$-approximate quantiles specified by the value provided in q(i)${\mathbf{q}}\left(i\right)$.
4:     rcomm(lrcomm) – double array
Communication array, used to store information between calls to nag_stat_quantiles_stream_fixed (g01an).
5:     icomm(licomm) – int64int32nag_int array
Communication array, used to store information between calls to nag_stat_quantiles_stream_fixed (g01an).
6:     ifail – int64int32nag_int scalar
${\mathrm{ifail}}={\mathbf{0}}$ unless the function detects an error (see [Error Indicators and Warnings]).
As an out-of-core function nag_stat_quantiles_stream_fixed (g01an) will only perform certain parameter checks when a data checkpoint (including completion of data input) is signaled. As such it will usually be inappropriate to halt program execution when an error is detected since any errors may be subsequently resolved without losing any processing already carried out. Therefore setting ifail to a value of 1​ or ​1$-1\text{​ or ​}1$ is recommended. If the output of error messages is undesirable, then the value 1$1$ is recommended. When the value 1​ or ​1$-\mathbf{1}\text{​ or ​}\mathbf{1}$ is used it is essential to test the value of ifail on exit.

## Error Indicators and Warnings

Errors or warnings detected by the function:
ifail = 1${\mathbf{ifail}}=1$
Constraint: ind = 0${\mathbf{ind}}=0$, 1$1$, 2$2$ or 3$3$.
ifail = 2${\mathbf{ifail}}=2$
Constraint: n > 0${\mathbf{n}}>0$.
ifail = 3${\mathbf{ifail}}=3$
Constraint: exp(1.0) / neps1.0$\mathrm{exp}\left(1.0\right)/{\mathbf{n}}\le {\mathbf{eps}}\le 1.0$.
ifail = 4${\mathbf{ifail}}=4$
Constraint: if ind = 1${\mathbf{ind}}=1$ or 2$2$ then nb > 0${\mathbf{nb}}>0$.
ifail = 5${\mathbf{ifail}}=5$
On entry, licomm is too small:
ifail = 6${\mathbf{ifail}}=6$
On entry, lrcomm is too small:
ifail = 7${\mathbf{ifail}}=7$
Number of data elements streamed, _$_$ is not sufficient for a quantile query when .
Supply more data or reprocess the data with a higher eps value.
ifail = 8${\mathbf{ifail}}=8$
Constraint: if ind = 3${\mathbf{ind}}=3$ then nq > 0${\mathbf{nq}}>0$.
ifail = 9${\mathbf{ifail}}=9$
Constraint: if ind = 3${\mathbf{ind}}=3$ then 0.0q(i)1.0$0.0\le {\mathbf{q}}\left(i\right)\le 1.0$ for all i$i$.
ifail = 999${\mathbf{ifail}}=-999$
Dynamic memory allocation failed.

## Accuracy

Not applicable.

The average time taken by nag_stat_quantiles_stream_fixed (g01an) is n log(1 / εlog(εn)) ${\mathbf{n}}\mathrm{log}\left(1/\epsilon \mathrm{log}\left(\epsilon {\mathbf{n}}\right)\right)$.

## Example

```function nag_stat_quantiles_stream_fixed_example
n = int64(60);
eps = 0.2;
q = [0.25; 0.5; 1.0];
rv = { ...
[34.01, 57.95, 44.88, 22.04, 28.84,  4.43,  0.32, 20.82, ...
20.53, 13.08, 7.99,  54.03, 23.21, 26.73, 39.72,  0.97],
[39.05, 38.78, 19.38, 51.34, 24.08, 12.41, 58.11, 35.90, ...
40.38, 27.41, 19.80,  6.02, 45.33, 36.34, 43.14, 53.84 , ...
39.49,  9.04, 36.74, 58.72, 59.95, 15.41, 33.05, 39.54],
[33.24, 58.67, 54.12, 39.48, 43.73, 24.15, 55.72,  8.87],
[40.47, 46.18, 20.36,  6.95, 36.86, 49.24, 56.83, 43.87, ...
29.86, 22.49, 25.29, 33.17]};
% Call NAG routine for the first time to obtain lengths of communication arrays
ind = int64(0);
icomm = zeros(2, 1, 'int64');
[ind, np, qv, rcomm, icomm, ifail] = ...
nag_stat_quantiles_stream_fixed(ind, n, 0, eps, q, 0, icomm);
% Allocate communication arrays
rcomm = zeros(icomm(1), 1);
icomm = zeros(icomm(2), 1, 'int64');

for i=1:4
% Repeat calls to NAG routine for every dataset block in rv until n
% observations have been passed.
[ind, np, qv, rcomm, icomm, ifail] = ...
nag_stat_quantiles_stream_fixed(ind, n, rv{i}, eps, q, rcomm, icomm);
if ifail ~= 0
break;
end
end

if ifail == 0
% Call NAG routine again to calculate quantiles specified in vector q
ind = int64(3);
[ind, np, qv, rcomm, icomm, ifail] = ...
nag_stat_quantiles_stream_fixed(ind, n, 0, eps, q, rcomm, icomm);
else
fprintf('\nAn error occurred processing the %dth set of observations\n', i);
fprintf('ifail = %d\n', ifail);
end

% Print the results
if ifail == 0
fprintf('\nInput Data:\n %d observations\n eps = %5.2f\n\n', n, eps);
fprintf('Quantile     Result\n');
for i = 1:numel(q)
fprintf('%7.2f     %7.2f\n', q(i), qv(i));
end
else
fprintf('\nAn error occurred calculating the quantiles, ifail = %d\n', ifail);
end
```
```

Input Data:
60 observations
eps =  0.20

Quantile     Result
0.25       22.49
0.50       36.86
1.00       59.95

```
```function g01an_example
n = int64(60);
eps = 0.2;
q = [0.25; 0.5; 1.0];
rv = { ...
[34.01, 57.95, 44.88, 22.04, 28.84,  4.43,  0.32, 20.82, ...
20.53, 13.08, 7.99,  54.03, 23.21, 26.73, 39.72,  0.97],
[39.05, 38.78, 19.38, 51.34, 24.08, 12.41, 58.11, 35.90, ...
40.38, 27.41, 19.80,  6.02, 45.33, 36.34, 43.14, 53.84 , ...
39.49,  9.04, 36.74, 58.72, 59.95, 15.41, 33.05, 39.54],
[33.24, 58.67, 54.12, 39.48, 43.73, 24.15, 55.72,  8.87],
[40.47, 46.18, 20.36,  6.95, 36.86, 49.24, 56.83, 43.87, ...
29.86, 22.49, 25.29, 33.17]};
% Call NAG routine for the first time to obtain lengths of communication arrays
ind = int64(0);
icomm = zeros(2, 1, 'int64');
[ind, np, qv, rcomm, icomm, ifail] = g01an(ind, n, 0, eps, q, 0, icomm);
% Allocate communication arrays
rcomm = zeros(icomm(1), 1);
icomm = zeros(icomm(2), 1, 'int64');

for i=1:4
% Repeat calls to NAG routine for every dataset block in rv until n
% observations have been passed.
[ind, np, qv, rcomm, icomm, ifail] = g01an(ind, n, rv{i}, eps, q, rcomm, icomm);
if ifail ~= 0
break;
end
end

if ifail == 0
% Call NAG routine again to calculate quantiles specified in vector q
ind = int64(3);
[ind, np, qv, rcomm, icomm, ifail] = g01an(ind, n, 0, eps, q, rcomm, icomm);
else
fprintf('\nAn error occurred processing the %dth set of observations\n', i);
fprintf('ifail = %d\n', ifail);
end

% Print the results
if ifail == 0
fprintf('\nInput Data:\n %d observations\n eps = %5.2f\n\n', n, eps);
fprintf('Quantile     Result\n');
for i = 1:numel(q)
fprintf('%7.2f     %7.2f\n', q(i), qv(i));
end
else
fprintf('\nAn error occurred calculating the quantiles, ifail = %d\n', ifail);
end
```
```

Input Data:
60 observations
eps =  0.20

Quantile     Result
0.25       22.49
0.50       36.86
1.00       59.95

```