naginterfaces.library.stat.quantiles_​stream_​arbitrary

naginterfaces.library.stat.quantiles_stream_arbitrary(ind, rv, nb, eps, q, nq, rcomm, icomm)[source]

quantiles_stream_arbitrary finds approximate quantiles from a large arbitrary-sized data stream using an out-of-core algorithm.

For full information please refer to the NAG Library document for g01ap

https://www.nag.com/numeric/nl/nagdoc_29.3/flhtml/g01/g01apf.html

Parameters
indint

On initial entry: must be set to .Indicates the action required in the current call to quantiles_stream_arbitrary.

Initialize the communication arrays and attempt to process the first values from the data stream. , and must be set and must be at least .

Attempt to process the next block of values from the data stream. The calling program must update and (if required) , and re-enter quantiles_stream_arbitrary with all other parameters unchanged.

Continue calculation following the reallocation of either or both of the communication arrays and .

Calculate the -approximate quantiles specified in . The calling program must set and and re-enter quantiles_stream_arbitrary with all other parameters unchanged. This option can be chosen only when .

rvfloat, array-like, shape

Note: the required length for this argument is determined as follows: if : ; otherwise: .

If , or , the vector containing the current block of data, otherwise is not referenced.

nbint

If , or , the size of the current block of data. The size of blocks of data in array can vary;, therefore, can change between calls to quantiles_stream_arbitrary.

epsfloat

Approximation factor .

qfloat, array-like, shape

Note: the required length for this argument is determined as follows: if : ; otherwise: .

If , the quantiles to be calculated, otherwise is not referenced. Note that , corresponds to the minimum value and to the maximum value.

nqint

If , the number of quantiles requested, otherwise is not referenced.

rcommfloat, ndarray, shape , modified in place

On entry: if or then the first elements of as supplied to quantiles_stream_arbitrary must be identical to the first elements of returned from the last call to quantiles_stream_arbitrary, where is the value of used in the last call. In other words, the contents of must not be altered between calls to this function. If needs to be reallocated then its contents must be preserved. If then need not be set.

On exit: holds information required by subsequent calls to quantiles_stream_arbitrary.

icommint, ndarray, shape , modified in place

On entry: if or then the first elements of as supplied to quantiles_stream_arbitrary must be identical to the first elements of returned from the last call to quantiles_stream_arbitrary, where is the value of used in the last call. In other words, the contents of must not be altered between calls to this function. If needs to be reallocated then its contents must be preserved. If then need not be set.

On exit: holds the minimum required length for and holds the minimum required length for . The remaining elements of are used for communication between subsequent calls to quantiles_stream_arbitrary.

Returns
indint

Indicates output from the call.

quantiles_stream_arbitrary has processed data points and expects to be called again with additional data.

Either one or more of the communication arrays and is too small. The new minimum lengths of and have been returned in and respectively. If the new minimum length is greater than the current length then the corresponding communication array needs to be reallocated, its contents preserved and quantiles_stream_arbitrary called again with all other parameters unchanged.

If there is more data to be processed, it is recommended that and are made significantly bigger than the minimum to limit the number of reallocations.

quantiles_stream_arbitrary has returned the requested -approximate quantiles in . These quantiles are based on data points.

npint

, the number of elements processed so far.

qvfloat, ndarray, shape

If , contains the -approximate quantiles specified by the value provided in .

Raises
NagValueError
(errno )

On entry, .

Constraint: , , or .

(errno )

On entry, .

Constraint: .

(errno )

On entry, , or and .

Constraint: if , or then .

(errno )

The contents of have been altered between calls to this function.

(errno )

The contents of have been altered between calls to this function.

(errno )

Number of data elements streamed, is not sufficient for a quantile query when .

Supply more data or reprocess the data with a higher value.

(errno )

On entry, and .

Constraint: if then .

(errno )

On entry, and .

Constraint: if then for all .

Notes

A quantile is a value which divides a frequency distribution such that there is a given proportion of data values below the quantile. For example, the median of a dataset is the quantile because half the values are less than or equal to it.

quantiles_stream_arbitrary uses a slightly modified version of an algorithm described in a paper by Zhang and Wang (2007) to determine -approximate quantiles of a large arbitrary-sized data stream of real values, where is a user-defined approximation factor. Let denote the number of data elements processed so far then, given any quantile , an -approximate quantile is defined as an element in the data stream whose rank falls within . In case of more than one -approximate quantile being available, the one closest to is used.

References

Zhang, Q and Wang, W, 2007, A fast algorithm for approximate quantiles in high speed data streams, Proceedings of the 19th International Conference on Scientific and Statistical Database Management, IEEE Computer Society, 29