Integer type:  int32  int64  nag_int  show int32  show int32  show int64  show int64  show nag_int  show nag_int

Chapter Contents
Chapter Introduction
NAG Toolbox

# NAG Toolbox: nag_stat_summary_onevar (g01at)

## Purpose

nag_stat_summary_onevar (g01at) calculates the mean, standard deviation, coefficients of skewness and kurtosis, and the maximum and minimum values for a set of (optionally weighted) data. The input data can be split into arbitrary sized blocks, allowing large datasets to be summarised.

## Syntax

[pn, xmean, xsd, xskew, xkurt, xmin, xmax, rcomm, ifail] = g01at(x, 'nb', nb, 'wt', wt, 'pn', pn, 'rcomm', rcomm)
[pn, xmean, xsd, xskew, xkurt, xmin, xmax, rcomm, ifail] = nag_stat_summary_onevar(x, 'nb', nb, 'wt', wt, 'pn', pn, 'rcomm', rcomm)

## Description

Given a sample of $n$ observations, denoted by $x=\left\{{x}_{i}:i=1,2,\dots ,n\right\}$ and a set of non-negative weights, $w=\left\{{w}_{i}:i=1,2,\dots ,n\right\}$, nag_stat_summary_onevar (g01at) calculates a number of quantities:
(a) Mean
 $x- = ∑ i=1 n wi xi W , where W = ∑ i=1 n wi .$
(b) Standard deviation
 $s2 = ∑ i=1 n wi xi - x- 2 d , where d = W - ∑ i=1 n wi2 W .$
(c) Coefficient of skewness
 $s3 = ∑ i=1 n wi xi - x- 3 d ⁢ s23 .$
(d) Coefficient of kurtosis
 $s4 = ∑ i=1 n wi xi - x- 4 d ⁢ s24 -3 .$
(e) Maximum and minimum elements, with ${w}_{i}\ne 0$.
These quantities are calculated using the one pass algorithm of West (1979).
For large datasets, or where all the data is not available at the same time, $x$ and $w$ can be split into arbitrary sized blocks and nag_stat_summary_onevar (g01at) called multiple times.

## References

West D H D (1979) Updating mean and variance estimates: An improved method Comm. ACM 22 532–555

## Parameters

### Compulsory Input Parameters

1:     $\mathrm{x}\left({\mathbf{nb}}\right)$ – double array
The current block of observations, corresponding to ${x}_{\mathit{i}}$, for $\mathit{i}=k+1,\dots ,k+b$, where $k$ is the number of observations processed so far and $b$ is the size of the current block of data.

### Optional Input Parameters

1:     $\mathrm{nb}$int64int32nag_int scalar
Default: the dimension of the array x.
$b$, the number of observations in the current block of data. The size of the block of data supplied in x and wt can vary; therefore nb can change between calls to nag_stat_summary_onevar (g01at).
Constraint: ${\mathbf{nb}}\ge 0$.
2:     $\mathrm{wt}\left(:\right)$ – double array
The dimension of the array wt must be at least ${\mathbf{nb}}$ if $\mathit{iwt}=1$
If $\mathit{iwt}=1$, wt must contain the user-supplied weights corresponding to the block of data supplied in x, that is ${w}_{\mathit{i}}$, for $\mathit{i}=k+1,\dots ,k+b$.
Constraint: if $\mathit{iwt}=1$, ${\mathbf{wt}}\left(\mathit{i}\right)\ge 0$, for $\mathit{i}=1,2,\dots ,{\mathbf{nb}}$.
3:     $\mathrm{pn}$int64int32nag_int scalar
Default: $0$
The number of valid observations processed so far, that is the number of observations with ${w}_{i}>0$, for $\mathit{i}=1,2,\dots ,k$. On the first call to nag_stat_summary_onevar (g01at), or when starting to summarise a new dataset, pn must be set to $0$.
If ${\mathbf{pn}}\ne 0$, it must be the same value as returned by the last call to nag_stat_summary_onevar (g01at).
4:     $\mathrm{rcomm}\left(20\right)$ – double array
Communication array, used to store information between calls to nag_stat_summary_onevar (g01at). If ${\mathbf{pn}}=0$, rcomm need not be initialized, otherwise it must be unchanged since the last call to this function.

### Output Parameters

1:     $\mathrm{pn}$int64int32nag_int scalar
Default: $0$
The updated number of valid observations processed, that is the number of observations with ${w}_{i}>0$, for $\mathit{i}=1,2,\dots ,k+b$.
2:     $\mathrm{xmean}$ – double scalar
$\stackrel{-}{x}$, the mean of the first $k+b$ observations.
3:     $\mathrm{xsd}$ – double scalar
${s}_{2}$, the standard deviation of the first $k+b$ observations.
4:     $\mathrm{xskew}$ – double scalar
${s}_{3}$, the coefficient of skewness for the first $k+b$ observations.
5:     $\mathrm{xkurt}$ – double scalar
${s}_{4}$, the coefficient of kurtosis for the first $k+b$ observations.
6:     $\mathrm{xmin}$ – double scalar
The smallest value in the first $k+b$ observations.
7:     $\mathrm{xmax}$ – double scalar
The largest value in the first $k+b$ observations.
8:     $\mathrm{rcomm}\left(20\right)$ – double array
The updated communication array. The first five elements of rcomm hold information that may be of interest with
 $rcomm1 = ∑ i=1 k+b wi rcomm2 = ∑ i=1 k+b wi 2 - ∑ i=1 k+b wi2 rcomm3 = ∑ i=1 k+b wi xi - x- 2 rcomm4 = ∑ i=1 k+b wi xi - x- 3 rcomm5 = ∑ i=1 k+b wi xi - x- 4$
the remaining elements of rcomm are used for workspace and so are undefined.
9:     $\mathrm{ifail}$int64int32nag_int scalar
${\mathbf{ifail}}={\mathbf{0}}$ unless the function detects an error (see Error Indicators and Warnings).

## Error Indicators and Warnings

Errors or warnings detected by the function:

Cases prefixed with W are classified as warnings and do not generate an error of type NAG:error_n. See nag_issue_warnings.

${\mathbf{ifail}}=11$
Constraint: ${\mathbf{nb}}\ge 0$.
${\mathbf{ifail}}=31$
Constraint: $\mathit{iwt}=0$ or $1$.
${\mathbf{ifail}}=41$
Constraint: if $\mathit{iwt}=1$ then ${\mathbf{wt}}\left(\mathit{i}\right)\ge 0$, for $\mathit{i}=1,2,\dots ,{\mathbf{nb}}$.
${\mathbf{ifail}}=51$
Constraint: ${\mathbf{pn}}\ge 0$.
${\mathbf{ifail}}=52$
Constraint: if ${\mathbf{pn}}>0$, pn must be unchanged since previous call.
W  ${\mathbf{ifail}}=53$
On entry, the number of valid observations is zero.
W  ${\mathbf{ifail}}=71$
On exit we were unable to calculate xskew or xkurt. A value of $0$ has been returned.
W  ${\mathbf{ifail}}=72$
On exit we were unable to calculate xsd, xskew or xkurt. A value of $0$ has been returned.
${\mathbf{ifail}}=121$
rcomm has been corrupted between calls.
${\mathbf{ifail}}=-99$
${\mathbf{ifail}}=-399$
Your licence key may have expired or may not have been installed correctly.
${\mathbf{ifail}}=-999$
Dynamic memory allocation failed.

## Accuracy

Not applicable.

Both nag_stat_summary_onevar (g01at) and nag_stat_summary_onevar_combine (g01au) consolidate results from multiple summaries. Whereas the former can only be used to combine summaries calculated sequentially, the latter combines summaries calculated in an arbitrary order allowing, for example, summaries calculated on different processing units to be combined.

## Example

This example summarises some simulated data. The data is supplied in three blocks, the first consisting of $21$ observations, the second $51$ observations and the last $28$ observations.
```function g01at_example

fprintf('g01at example results\n\n');

x1 = [-0.62; -1.92; -1.72; -6.35;  2.00;  7.65;  6.15;
3.81;  4.87; -0.51;  6.88; -5.85; -0.72;  0.66;
2.23; -1.61; -0.15; -1.15; -8.74; -3.94;  3.61];
wt1 = [4.91;  0.25;  3.90;  3.75;  1.17;  3.19;  2.66;
0.02;  3.59;  3.63;  4.83;  3.72;  1.72;  0.78;
4.74;  1.72;  3.94;  1.33;  0.51;  2.40;  3.90];
x2 = [-0.66; -2.39; -6.25;  1.23;  2.27; -2.27; 10.12;
8.29; -2.99;  8.71; -0.74;  0.02;  1.22;  1.70;
4.30;  2.99; -0.83; -1.00;  6.57;  2.32; -3.47;
-1.41; -5.26;  0.53;  1.80;  4.79; -3.04;  1.20;
-3.21; -3.75;  0.86;  1.27; -5.95; -5.27;  1.63;
3.59; -0.01; -1.38; -4.71; -4.82;  3.55;  0.46;
2.57;  1.76; -4.05;  1.23; -1.99;  3.20; -0.65;
8.42; -6.01];
x3 = [ 1.13; -8.86;  5.92; -1.71; -3.99;  6.57; -2.01;
-2.29; -1.11;  7.14;  4.84; -4.44; -3.32; 10.25;
-2.11;  8.02; -7.31;  2.80; -1.20;  1.01;  1.37;
-2.28;  1.28; -3.95;  3.43; -0.61; 4.85; -0.11];

rcomm = zeros(20,1);

% Initialise the number of valid observations processed so far
[pn, xmean, xsd, xskew, xkurt, xmin, xmax, rcomm, ifail] = ...
g01at(x1, 'wt', wt1);
[pn, xmean, xsd, xskew, xkurt, xmin, xmax, rcomm, ifail] = ...
g01at(x2, 'pn', pn, 'rcomm', rcomm);
[pn, xmean, xsd, xskew, xkurt, xmin, xmax, rcomm, ifail] = ...
g01at(x3, 'pn', pn, 'rcomm', rcomm);

% Display the results
fprintf('Data supplied in 3 blocks\n');
fprintf('%d valid observations\n', pn);
fprintf('Mean          %13.2f\n', xmean);
fprintf('Std devn      %13.2f\n', xsd);
fprintf('Skewness      %13.2f\n', xskew);
fprintf('Kurtosis      %13.2f\n', xkurt);
fprintf('Minimum       %13.2f\n', xmin);
fprintf('Maximum       %13.2f\n', xmax);

```
```g01at example results

Data supplied in 3 blocks
100 valid observations
Mean                   0.51
Std devn               4.24
Skewness               0.18
Kurtosis              -0.59
Minimum               -8.86
Maximum               10.25
```