# NAG FL Interfaceg01anf (quantiles_​stream_​fixed)

## ▸▿ Contents

Settings help

FL Name Style:

FL Specification Language:

## 1Purpose

g01anf finds approximate quantiles from a data stream of known size using an out-of-core algorithm.

## 2Specification

Fortran Interface
 Subroutine g01anf ( ind, n, rv, nb, eps, np, q, qv, nq,
 Integer, Intent (In) :: n, nb, nq, lrcomm, licomm Integer, Intent (Inout) :: ind, icomm(licomm), ifail Integer, Intent (Out) :: np Real (Kind=nag_wp), Intent (In) :: rv(*), eps, q(*) Real (Kind=nag_wp), Intent (Inout) :: qv(*), rcomm(lrcomm)
#include <nag.h>
 void g01anf_ (Integer *ind, const Integer *n, const double rv[], const Integer *nb, const double *eps, Integer *np, const double q[], double qv[], const Integer *nq, double rcomm[], const Integer *lrcomm, Integer icomm[], const Integer *licomm, Integer *ifail)
The routine may be called by the names g01anf or nagf_stat_quantiles_stream_fixed.

## 3Description

A quantile is a value which divides a frequency distribution such that there is a given proportion of data values below the quantile. For example, the median of a dataset is the $0.5$ quantile because half the values are less than or equal to it.
g01anf uses a slightly modified version of an algorithm described in a paper by Zhang and Wang (2007) to determine $\epsilon$-approximate quantiles of a data stream of $n$ real values, where $n$ is known. Given any quantile $q\in \left[0.0,1.0\right]$, an $\epsilon$-approximate quantile is defined as an element in the data stream whose rank falls within $\left[\left(q-\epsilon \right)n,\left(q+\epsilon \right)n\right]$. In case of more than one $\epsilon$-approximate quantile being available, the one closest to $qn$ is returned.

## 4References

Zhang Q and Wang W (2007) A fast algorithm for approximate quantiles in high speed data streams Proceedings of the 19th International Conference on Scientific and Statistical Database Management IEEE Computer Society 29

## 5Arguments

1: $\mathbf{ind}$Integer Input/Output
On entry: indicates the action required in the current call to g01anf.
${\mathbf{ind}}=0$
Return the required length of rcomm and icomm in ${\mathbf{icomm}}\left(1\right)$ and ${\mathbf{icomm}}\left(2\right)$ respectively. n and eps must be set and licomm must be at least $2$.
${\mathbf{ind}}=1$
Initialise the communication arrays and process the first nb values from the data stream as supplied in rv.
${\mathbf{ind}}=2$
Process the next block of nb values from the data stream. The calling program must update rv and (if required) nb, and re-enter g01anf with all other parameters unchanged.
${\mathbf{ind}}=3$
Calculate the nq $\epsilon$-approximate quantiles specified in q. The calling program must set q and nq and re-enter g01anf with all other parameters unchanged. This option can be chosen only when ${\mathbf{np}}\ge ⌈\mathrm{exp}\left(1.0\right)/{\mathbf{eps}}⌉$.
On exit: indicates output from a successful call.
${\mathbf{ind}}=1$
Lengths of rcomm and icomm have been returned in ${\mathbf{icomm}}\left(1\right)$ and ${\mathbf{icomm}}\left(2\right)$ respectively.
${\mathbf{ind}}=2$
g01anf has processed np data points and expects to be called again with additional data (i.e., ${\mathbf{np}}<{\mathbf{n}}$).
${\mathbf{ind}}=3$
g01anf has returned the requested $\epsilon$-approximate quantiles in qv. These quantiles are based on np data points.
${\mathbf{ind}}=4$
Routine has processed all n data points (i.e., ${\mathbf{np}}={\mathbf{n}}$).
Constraint: on entry ${\mathbf{ind}}=0$, $1$, $2$ or $3$.
2: $\mathbf{n}$Integer Input
On entry: $n$, the total number of values in the data stream.
Constraint: ${\mathbf{n}}>0$.
3: $\mathbf{rv}\left(*\right)$Real (Kind=nag_wp) array Input
Note: the dimension of the array rv must be at least ${\mathbf{nb}}$ if ${\mathbf{ind}}=1$ or $2$.
On entry: if ${\mathbf{ind}}=1$ or $2$, the vector containing the current block of data, otherwise rv is not referenced.
4: $\mathbf{nb}$Integer Input
On entry: if ${\mathbf{ind}}=1$ or $2$, the size of the current block of data. The size of blocks of data in array rv can vary;, therefore, nb can change between calls to g01anf.
Constraint: if ${\mathbf{ind}}=1$ or $2$, ${\mathbf{nb}}>0$.
5: $\mathbf{eps}$Real (Kind=nag_wp) Input
On entry: approximation factor $\epsilon$.
Constraint: ${\mathbf{eps}}\ge \mathrm{exp}\left(1.0\right)/{\mathbf{n}}\text{​ and ​}{\mathbf{eps}}\le 1.0$.
6: $\mathbf{np}$Integer Output
On exit: the number of elements processed so far.
7: $\mathbf{q}\left(*\right)$Real (Kind=nag_wp) array Input
Note: the dimension of the array q must be at least ${\mathbf{nq}}$ if ${\mathbf{ind}}=3$.
On entry: if ${\mathbf{ind}}=3$, the quantiles to be calculated, otherwise q is not referenced. Note that ${\mathbf{q}}\left(i\right)=0.0$, corresponds to the minimum value and ${\mathbf{q}}\left(i\right)=1.0$ to the maximum value.
Constraint: if ${\mathbf{ind}}=3$, $0.0\le {\mathbf{q}}\left(\mathit{i}\right)\le 1.0$, for $\mathit{i}=1,2,\dots ,{\mathbf{nq}}$.
8: $\mathbf{qv}\left(*\right)$Real (Kind=nag_wp) array Output
Note: the dimension of the array qv must be at least ${\mathbf{nq}}$ if ${\mathbf{ind}}=3$.
On exit: if ${\mathbf{ind}}=3$, ${\mathbf{qv}}\left(i\right)$ contains the $\epsilon$-approximate quantiles specified by the value provided in ${\mathbf{q}}\left(i\right)$.
9: $\mathbf{nq}$Integer Input
On entry: if ${\mathbf{ind}}=3$, the number of quantiles requested, otherwise nq is not referenced.
Constraint: if ${\mathbf{ind}}=3$, ${\mathbf{nq}}>0$.
10: $\mathbf{rcomm}\left({\mathbf{lrcomm}}\right)$Real (Kind=nag_wp) array Communication Array
11: $\mathbf{lrcomm}$Integer Input
On entry: the dimension of the array rcomm as declared in the (sub)program from which g01anf is called.
Constraint: if ${\mathbf{ind}}\ne 0$, lrcomm must be at least equal to the value returned in ${\mathbf{icomm}}\left(1\right)$ by a call to g01anf with ${\mathbf{ind}}=0$. This will not be more than $x+2×\mathrm{min}\phantom{\rule{0.125em}{0ex}}\left(x,⌈x/2.0⌉+1\right)×{\mathrm{log}}_{2}\left({\mathbf{n}}/x+1.0\right)+1$, where $x=\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(1,⌊\mathrm{log}\left({\mathbf{eps}}×{\mathbf{n}}\right)/{\mathbf{eps}}⌋\right)$.
12: $\mathbf{icomm}\left({\mathbf{licomm}}\right)$Integer array Communication Array
13: $\mathbf{licomm}$Integer Input
On entry: the dimension of the array icomm as declared in the (sub)program from which g01anf is called.
Constraints:
• if ${\mathbf{ind}}=0$, ${\mathbf{licomm}}\ge 2$;
• otherwise licomm must be at least equal to the value returned in ${\mathbf{icomm}}\left(2\right)$ by a call to g01anf with ${\mathbf{ind}}=0$. This will not be more than $2×\left(x+2×\mathrm{min}\phantom{\rule{0.125em}{0ex}}\left(x,⌈x/2.0⌉+1\right)×y\right)+y+6$, where $x=\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(1,⌊\mathrm{log}\left({\mathbf{eps}}×{\mathbf{n}}\right)/{\mathbf{eps}}⌋\right)$ and $y={\mathrm{log}}_{2}\left({\mathbf{n}}/x+1.0\right)+1$.
14: $\mathbf{ifail}$Integer Input/Output
On entry: ifail must be set to $0$, $-1$ or $1$. If you are unfamiliar with this argument you should refer to Section 7 in the Introduction to the NAG Library CL Interface for details.
On exit: ${\mathbf{ifail}}={\mathbf{0}}$ unless the routine detects an error (see Section 6).
As an out-of-core routine g01anf will only perform certain argument checks when a data checkpoint (including completion of data input) is signaled. As such it will usually be inappropriate to halt program execution when an error is detected since any errors may be subsequently resolved without losing any processing already carried out. Therefore, setting ifail to a value of $-1$ or $1$ is recommended. If the output of error messages is undesirable, the value $1$ is recommended. When the value $-\mathbf{1}$ or $\mathbf{1}$ is used it is essential to test the value of ifail on exit.

## 6Error Indicators and Warnings

If on entry ${\mathbf{ifail}}=0$ or $-1$, explanatory error messages are output on the current error message unit (as defined by x04aaf).
Errors or warnings detected by the routine:
${\mathbf{ifail}}=1$
On entry, ${\mathbf{ind}}=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{ind}}=0$, $1$, $2$ or $3$.
${\mathbf{ifail}}=2$
On entry, ${\mathbf{n}}=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{n}}>0$.
${\mathbf{ifail}}=3$
On entry, ${\mathbf{eps}}=⟨\mathit{\text{value}}⟩$.
Constraint: $\mathrm{exp}\left(1.0\right)/{\mathbf{n}}\le {\mathbf{eps}}\le 1.0$.
${\mathbf{ifail}}=4$
On entry, ${\mathbf{ind}}=1$ or $2$ and ${\mathbf{nb}}=⟨\mathit{\text{value}}⟩$.
Constraint: if ${\mathbf{ind}}=1$ or $2$ then ${\mathbf{nb}}>0$.
${\mathbf{ifail}}=5$
On entry, licomm is too small: ${\mathbf{licomm}}=⟨\mathit{\text{value}}⟩$.
${\mathbf{ifail}}=6$
On entry, lrcomm is too small: ${\mathbf{lrcomm}}=⟨\mathit{\text{value}}⟩$.
${\mathbf{ifail}}=7$
Number of data elements streamed, $⟨\mathit{\text{value}}⟩$ is not sufficient for a quantile query when ${\mathbf{eps}}=⟨\mathit{\text{value}}⟩$.
Supply more data or reprocess the data with a higher eps value.
${\mathbf{ifail}}=8$
On entry, ${\mathbf{ind}}=3$ and ${\mathbf{nq}}=⟨\mathit{\text{value}}⟩$.
Constraint: if ${\mathbf{ind}}=3$ then ${\mathbf{nq}}>0$.
${\mathbf{ifail}}=9$
On entry, ${\mathbf{ind}}=3$ and ${\mathbf{q}}\left(⟨\mathit{\text{value}}⟩\right)=⟨\mathit{\text{value}}⟩$.
Constraint: if ${\mathbf{ind}}=3$ then $0.0\le {\mathbf{q}}\left(i\right)\le 1.0$ for all $i$.
${\mathbf{ifail}}=-99$
An unexpected error has been triggered by this routine. Please contact NAG.
See Section 7 in the Introduction to the NAG Library FL Interface for further information.
${\mathbf{ifail}}=-399$
Your licence key may have expired or may not have been installed correctly.
See Section 8 in the Introduction to the NAG Library FL Interface for further information.
${\mathbf{ifail}}=-999$
Dynamic memory allocation failed.
See Section 9 in the Introduction to the NAG Library FL Interface for further information.

Not applicable.

## 8Parallelism and Performance

g01anf is threaded by NAG for parallel execution in multithreaded implementations of the NAG Library.
Please consult the X06 Chapter Introduction for information on how to control and interrogate the OpenMP environment used within this routine. Please also consult the Users' Note for your implementation for any additional implementation-specific information.

The average time taken by g01anf is ${\mathbf{n}}\mathrm{log}\left(1/\epsilon \mathrm{log}\left(\epsilon {\mathbf{n}}\right)\right)$.

## 10Example

This example calculates $\epsilon$-approximate quantile for $q=0.25$, $0.5$ and $1.0$ for a data stream of $60$ values. The stream is read in four blocks of varying size.

### 10.1Program Text

Program Text (g01anfe.f90)

### 10.2Program Data

Program Data (g01anfe.d)

### 10.3Program Results

Program Results (g01anfe.r)