# Source code for naginterfaces.library.smooth

# -*- coding: utf-8 -*-
r"""
Module Summary
--------------
Interfaces for the NAG Mark 28.4 smooth Chapter.

smooth - Smoothing in Statistics

This module is concerned with methods for smoothing data.
Included are methods for density estimation, smoothing time series data, and statistical applications of splines.
These methods may also be viewed as nonparametric modelling.

Functionality Index
-------------------

**Compute smoothed data sequence**

running median smoothers: :meth:data_runningmedian

**Fit cubic smoothing spline**

smoothing parameter estimated: :meth:fit_spline_parest

smoothing parameter given: :meth:fit_spline

**Kernel density estimation**

Gaussian kernel, thread safe: :meth:kerndens_gauss

Reorder data to give ordered distinct observations: :meth:data_order

For full information please refer to the NAG Library document

https://www.nag.com/numeric/nl/nagdoc_28.4/flhtml/g10/g10intro.html
"""

[docs]def fit_spline(mode, x, y, rho, c, comm, wt=None):
r"""
fit_spline fits a cubic smoothing spline for a given smoothing parameter.

.. _g10ab-py2-py-doc:

For full information please refer to the NAG Library document for g10ab

https://www.nag.com/numeric/nl/nagdoc_28.4/flhtml/g10/g10abf.html

.. _g10ab-py2-py-parameters:

**Parameters**
**mode** : str, length 1
Indicates in which mode the function is to be used.

:math:\mathrm{mode} = \texttt{'P'}

Initialization and fitting is performed. This partial fit can be used in an iterative weighted least squares context where the weights are changing at each call to fit_spline or when the coefficients are not required.

:math:\mathrm{mode} = \texttt{'Q'}

Fitting only is performed. Initialization must have been performed previously by a call to fit_spline with :math:\mathrm{mode} = \texttt{'P'}. This quick fit may be called repeatedly with different values of :math:\mathrm{rho} without re-initialization.

:math:\mathrm{mode} = \texttt{'F'}

Initialization and full fitting is performed and the function coefficients are calculated.

**x** : float, array-like, shape :math:\left(n\right)
The distinct and ordered values :math:x_{\textit{i}}, for :math:\textit{i} = 1,2,\ldots,n.

**y** : float, array-like, shape :math:\left(n\right)
The values :math:y_{\textit{i}}, for :math:\textit{i} = 1,2,\ldots,n.

**rho** : float
:math:\rho, the smoothing parameter.

**c** : float, array-like, shape :math:\left(n-1, 3\right)
If :math:\mathrm{mode} = \texttt{'Q'}, :math:\mathrm{c} must be unaltered from the previous call to fit_spline with :math:\mathrm{mode} = \texttt{'P'}. Otherwise :math:\mathrm{c} need not be set.

**comm** : dict, communication object, modified in place
Communication structure.

On initial entry: need not be set.

**wt** : None or float, array-like, shape :math:\left(n\right), optional
If :math:\textit{weight} = \texttt{'W'}, :math:\mathrm{wt} must contain the :math:n weights. Otherwise :math:\mathrm{wt} is not referenced and unit weights are assumed.

**Returns**
**yhat** : float, ndarray, shape :math:\left(n\right)
The fitted values, :math:\hat{y}_{\textit{i}}, for :math:\textit{i} = 1,2,\ldots,n.

**c** : float, ndarray, shape :math:\left(n-1, 3\right)
If :math:\mathrm{mode} = \texttt{'F'}, :math:\mathrm{c} contains the spline coefficients. More precisely, the value of the spline at :math:t is given by :math:\left(\left(\mathrm{c}[i-1,2]\times d+\mathrm{c}[i-1,1]\right)\times d+\mathrm{c}[i-1,0]\right)\times d+\hat{y}_i, where :math:x_i\leq t < x_{{i+1}} and :math:d = t-x_i.

If :math:\mathrm{mode} = \texttt{'P'} or :math:\texttt{'Q'}, :math:\mathrm{c} contains information that will be used in a subsequent call to fit_spline with :math:\mathrm{mode} = \texttt{'Q'}.

The (weighted) residual sum of squares.

**df** : float
The residual degrees of freedom.

**res** : float, ndarray, shape :math:\left(n\right)
The (weighted) residuals, :math:r_{\textit{i}}, for :math:\textit{i} = 1,2,\ldots,n.

**h** : float, ndarray, shape :math:\left(n\right)
The leverages, :math:h_{{\textit{i}\textit{i}}}, for :math:\textit{i} = 1,2,\ldots,n.

.. _g10ab-py2-py-errors:

**Raises**
**NagValueError**
(errno :math:1)
On entry, :math:\mathrm{mode} = \langle\mathit{\boldsymbol{value}}\rangle.

Constraint: :math:\mathrm{mode} = \texttt{'P'}, :math:\texttt{'Q'} or :math:\texttt{'F'}.

(errno :math:1)
On entry, :math:\mathrm{rho} = \langle\mathit{\boldsymbol{value}}\rangle.

Constraint: :math:\mathrm{rho}\geq 0.0.

(errno :math:1)
On entry, :math:n = \langle\mathit{\boldsymbol{value}}\rangle.

Constraint: :math:n\geq 3.

(errno :math:2)
On entry, at least one element of :math:\mathrm{wt}\leq 0.0.

(errno :math:3)
On entry, :math:\mathrm{x} is not a strictly ordered array.

.. _g10ab-py2-py-notes:

**Notes**
fit_spline fits a cubic smoothing spline to a set of :math:n observations (:math:x_i, :math:y_i), for :math:i = 1,2,\ldots,n.
The spline provides a flexible smooth function for situations in which a simple polynomial or nonlinear regression model is unsuitable.

Cubic smoothing splines arise as the unique real-valued solution function :math:f, with absolutely continuous first derivative and squared-integrable second derivative, which minimizes:

.. math::
\sum_{{i = 1}}^nw_i{\left(y_i-f\left(x_i\right)\right)}^2+\rho \int_{{-\infty }}^{\infty }{\left(f^{{\prime \prime }}\left(x\right)\right)}^2{dx}\text{,}

where :math:w_i is the (optional) weight for the :math:i\ th observation and :math:\rho is the smoothing parameter.
This criterion consists of two parts: the first measures the fit of the curve, and the second the smoothness of the curve.
The value of the smoothing parameter :math:\rho weights these two aspects; larger values of :math:\rho give a smoother fitted curve but, in general, a poorer fit.
For details of how the cubic spline can be estimated see Hutchinson and de Hoog (1985) and Reinsch (1967).

The fitted values, :math:\hat{y} = \left(\hat{y}_1, \hat{y}_2, \ldots, \hat{y}_n\right)^\mathrm{T}, and weighted residuals, :math:r_i, can be written as

.. math::

for a matrix :math:H.
The residual degrees of freedom for the spline is :math:\mathrm{trace}\left(I-H\right) and the diagonal elements of :math:H, :math:h_{{ii}}, are the leverages.

The parameter :math:\rho can be chosen in a number of ways.
The fit can be inspected for a number of different values of :math:\rho.
Alternatively the degrees of freedom for the spline, which determines the value of :math:\rho, can be specified, or the (generalized) cross-validation can be minimized to give :math:\rho; see :meth:fit_spline_parest for further details.

fit_spline requires the :math:x_i to be strictly increasing.
If two or more observations have the same :math:x_i-value then they should be replaced by a single observation with :math:y_i equal to the (weighted) mean of the :math:y values and weight, :math:w_i, equal to the sum of the weights.
This operation can be performed by :meth:data_order.

The computation is split into three phases.

(i) Compute matrices needed to fit spline.

(#) Fit spline for a given value of :math:\rho.

(#) Compute spline coefficients.

When fitting the spline for several different values of :math:\rho, phase \(1) need only be carried out once and then phase \(2) repeated for different values of :math:\rho.
If the spline is being fitted as part of an iterative weighted least squares procedure phases \(1) and \(2) have to be repeated for each set of weights.
In either case, phase \(3) will often only have to be performed after the final fit has been computed.

The algorithm is based on Hutchinson (1986).

.. _g10ab-py2-py-references:

**References**
Hastie, T J and Tibshirani, R J, 1990, Generalized Additive Models, Chapman and Hall

Hutchinson, M F, 1986, Algorithm 642: A fast procedure for calculating minimum cross-validation cubic smoothing splines, ACM Trans. Math. Software (12), 150--153

Hutchinson, M F and de Hoog, F R, 1985, Smoothing noisy data with spline functions, Numer. Math. (47), 99--106

Reinsch, C H, 1967, Smoothing by spline functions, Numer. Math. (10), 177--183
"""
raise NotImplementedError

[docs]def fit_spline_parest(method, x, y, crit, wt=None, u=0.0, tol=0.0, maxcal=0):
r"""
fit_spline_parest estimates the values of the smoothing parameter and fits a cubic smoothing spline to a set of data.

.. _g10ac-py2-py-doc:

For full information please refer to the NAG Library document for g10ac

https://www.nag.com/numeric/nl/nagdoc_28.4/flhtml/g10/g10acf.html

.. _g10ac-py2-py-parameters:

**Parameters**
**method** : str, length 1
Indicates whether the smoothing parameter is to be found by minimization of the CV or GCV functions, or by finding the smoothing parameter corresponding to a specified degrees of freedom value.

:math:\mathrm{method} = \texttt{'C'}

Cross-validation is used.

:math:\mathrm{method} = \texttt{'D'}

The degrees of freedom are specified.

:math:\mathrm{method} = \texttt{'G'}

Generalized cross-validation is used.

**x** : float, array-like, shape :math:\left(n\right)
The distinct and ordered values :math:x_{\textit{i}}, for :math:\textit{i} = 1,2,\ldots,n.

**y** : float, array-like, shape :math:\left(n\right)
The values :math:y_{\textit{i}}, for :math:\textit{i} = 1,2,\ldots,n.

**crit** : float
If :math:\mathrm{method} = \texttt{'D'}, the required degrees of freedom for the spline.

If :math:\mathrm{method} = \texttt{'C'} or :math:\texttt{'G'}, :math:\mathrm{crit} need not be set.

**wt** : None or float, array-like, shape :math:\left(n\right), optional
If :math:\textit{weight} = \texttt{'W'}, :math:\mathrm{wt} must contain the :math:n weights. Otherwise :math:\mathrm{wt} is not referenced and unit weights are assumed.

**u** : float, optional
The upper bound on the smoothing parameter. If :math:\mathrm{u}\leq \mathrm{tol}, :math:\mathrm{u} = 1000.0 will be used instead. See Further Comments <https://www.nag.com/numeric/nl/nagdoc_28.4/flhtml/g10/g10acf.html#fcomments>__ for details on how this argument is used.

**tol** : float, optional
The accuracy to which the smoothing parameter :math:\mathrm{rho} is required. :math:\mathrm{tol} should preferably be not much less than :math:\sqrt{\epsilon }, where :math:\epsilon is the machine precision. If :math:\mathrm{tol} < \epsilon, :math:\mathrm{tol} = \sqrt{\epsilon } will be used instead.

**maxcal** : int, optional
The maximum number of spline evaluations to be used in finding the value of :math:\rho. If :math:\mathrm{maxcal} < 3, :math:\mathrm{maxcal} = 100 will be used instead.

**Returns**
**yhat** : float, ndarray, shape :math:\left(n\right)
The fitted values, :math:\hat{y}_{\textit{i}}, for :math:\textit{i} = 1,2,\ldots,n.

**c** : float, ndarray, shape :math:\left(n-1, 3\right)
The spline coefficients. More precisely, the value of the spline approximation at :math:t is given by :math:\left(\left(\mathrm{c}[i-1,2]\times d+\mathrm{c}[i-1,1]\right)\times d+\mathrm{c}[i-1,0]\right)\times d+\hat{y}_i, where :math:x_i\leq t < x_{{i+1}} and :math:d = t-x_i.

The (weighted) residual sum of squares.

**df** : float
The residual degrees of freedom. If :math:\mathrm{method} = \texttt{'D'} this will be :math:n-\mathrm{crit} to the required accuracy.

**res** : float, ndarray, shape :math:\left(n\right)
The (weighted) residuals, :math:r_{\textit{i}}, for :math:\textit{i} = 1,2,\ldots,n.

**h** : float, ndarray, shape :math:\left(n\right)
The leverages, :math:h_{{\textit{i}\textit{i}}}, for :math:\textit{i} = 1,2,\ldots,n.

**crit** : float
If :math:\mathrm{method} = \texttt{'C'}, the value of the cross-validation, or if :math:\mathrm{method} = \texttt{'G'}, the value of the generalized cross-validation function, evaluated at the value of :math:\rho returned in :math:\mathrm{rho}.

**rho** : float
The smoothing parameter, :math:\rho.

.. _g10ac-py2-py-errors:

**Raises**
**NagValueError**
(errno :math:1)
On entry, :math:\mathrm{crit} = \langle\mathit{\boldsymbol{value}}\rangle.

Constraint: if :math:\mathrm{method} = \texttt{'D'}, :math:\mathrm{crit}\leq n.

(errno :math:1)
On entry, :math:\mathrm{crit} = \langle\mathit{\boldsymbol{value}}\rangle.

Constraint: if :math:\mathrm{method} = \texttt{'D'}, :math:\mathrm{crit} > 2.0.

(errno :math:1)
On entry, :math:\mathrm{method} is not valid: :math:\mathrm{method} = \langle\mathit{\boldsymbol{value}}\rangle.

(errno :math:1)
On entry, :math:n = \langle\mathit{\boldsymbol{value}}\rangle.

Constraint: :math:n\geq 3.

(errno :math:2)
On entry, at least one element of :math:\mathrm{wt}\leq 0.0.

(errno :math:3)
On entry, :math:\mathrm{x} is not a strictly ordered array.

(errno :math:4)
For the specified degrees of freedom, :math:\mathrm{rho} > \mathrm{u}: :math:\mathrm{u} = \langle\mathit{\boldsymbol{value}}\rangle.

**Warns**
**NagAlgorithmicWarning**
(errno :math:5)
Accuracy of :math:\mathrm{tol} cannot be achieved: :math:\mathrm{tol} = \langle\mathit{\boldsymbol{value}}\rangle.

(errno :math:6)
:math:\mathrm{maxcal} iterations have been performed.

(errno :math:7)
Optimum value of :math:\mathrm{rho} lies above :math:\mathrm{u}: :math:\mathrm{u} = \langle\mathit{\boldsymbol{value}}\rangle.

.. _g10ac-py2-py-notes:

**Notes**
In the NAG Library the traditional C interface for this routine uses a different algorithmic base. Please contact NAG if you have any questions about compatibility.

For a set of :math:n observations :math:\left(x_{\textit{i}}, y_{\textit{i}}\right), for :math:\textit{i} = 1,2,\ldots,n, the spline provides a flexible smooth function for situations in which a simple polynomial or nonlinear regression model is not suitable.

Cubic smoothing splines arise as the unique real-valued solution function :math:f, with absolutely continuous first derivative and squared-integrable second derivative, which minimizes

.. math::
\sum_{{i = 1}}^nw_i{\left(y_i-f\left(x_i\right)\right)}^2+\rho \int_{{-\infty }}^{\infty }{\left(f^{{\prime \prime }}\left(x\right)\right)}^2{dx}\text{,}

where :math:w_i is the (optional) weight for the :math:i\ th observation and :math:\rho is the smoothing parameter.
This criterion consists of two parts: the first measures the fit of the curve and the second the smoothness of the curve.
The value of the smoothing parameter :math:\rho weights these two aspects; larger values of :math:\rho give a smoother fitted curve but, in general, a poorer fit.
For details of how the cubic spline can be fitted see Hutchinson and de Hoog (1985) and Reinsch (1967).

The fitted values, :math:\hat{y} = \left(\hat{y}_1, \hat{y}_2, \ldots, \hat{y}_n\right)^\mathrm{T}, and weighted residuals, :math:r_i, can be written as:

.. math::

for a matrix :math:H.
The residual degrees of freedom for the spline is :math:\mathrm{trace}\left(I-H\right) and the diagonal elements of :math:H are the leverages.

The parameter :math:\rho can be estimated in a number of ways.

(i) The degrees of freedom for the spline can be specified, i.e., find :math:\rho such that :math:\mathrm{trace}\left(H\right) = \nu_0 for given :math:\nu_0.

(#) Minimize the cross-validation (CV), i.e., find :math:\rho such that the CV is minimized, where

.. math::
\mathrm{CV} = \frac{1}{{\sum_{{i = 1}}^nw_i}}{\sum_{{i = 1}}^n}\left[\frac{r_i}{{1-h_{{ii}}}}\right]^2\text{.}

(#) Minimize the generalized cross-validation (GCV), i.e., find :math:\rho such that the GCV is minimized, where

.. math::
\mathrm{GCV} = \frac{n^2}{{{\sum_{{i = 1}}^n}w_i}}\left[\frac{{{\sum_{{i = 1}}^n}r_i^2}}{{\left({\sum_{{i = 1}}^n}\left(1-h_{{ii}}\right)\right)^2}}\right]\text{.}

fit_spline_parest requires the :math:x_i to be strictly increasing.
If two or more observations have the same :math:x_i value then they should be replaced by a single observation with :math:y_i equal to the (weighted) mean of the :math:y values and weight, :math:w_i, equal to the sum of the weights.
This operation can be performed by :meth:data_order.

The algorithm is based on Hutchinson (1986). :meth:roots.contfn_brent_rcomm <naginterfaces.library.roots.contfn_brent_rcomm> is used to solve for :math:\rho given :math:\nu_0 and the method of :meth:opt.one_var_func <naginterfaces.library.opt.one_var_func> is used to minimize the GCV or CV.

.. _g10ac-py2-py-references:

**References**
Hastie, T J and Tibshirani, R J, 1990, Generalized Additive Models, Chapman and Hall

Hutchinson, M F, 1986, Algorithm 642: A fast procedure for calculating minimum cross-validation cubic smoothing splines, ACM Trans. Math. Software (12), 150--153

Hutchinson, M F and de Hoog, F R, 1985, Smoothing noisy data with spline functions, Numer. Math. (47), 99--106

Reinsch, C H, 1967, Smoothing by spline functions, Numer. Math. (10), 177--183
"""
raise NotImplementedError

[docs]def kerndens_gauss(x, comm, wtype=2, window=1.0, slo=None, shi=None, ns=512):
r"""
kerndens_gauss performs kernel density estimation using a Gaussian kernel.

.. _g10bb-py2-py-doc:

For full information please refer to the NAG Library document for g10bb

https://www.nag.com/numeric/nl/nagdoc_28.4/flhtml/g10/g10bbf.html

.. _g10bb-py2-py-parameters:

**Parameters**
**x** : float, array-like, shape :math:\left(n\right)
:math:x_{\textit{i}}, for :math:\textit{i} = 1,2,\ldots,n.

On a continuation call, :math:\mathrm{x} must be unchanged since the last call to kerndens_gauss.

**comm** : dict, communication object, modified in place
Communication structure.

If not initialized on entry then the values of :math:Y_l are to be calculated by this call to kerndens_gauss.

Otherwise, this is a continuation call and it is assumed that the values of :math:Y_l were calculated by a previous call to this function and the relevant information is stored in :math:\mathrm{comm}\ ['rcomm'].

**wtype** : int, optional
How the window width, :math:h, is to be calculated:

:math:\mathrm{wtype} = 1

:math:h is supplied in :math:\mathrm{window}.

:math:\mathrm{wtype} = 2

:math:h is to be calculated from the data, with

.. math::
h = m\times \left(\frac{{0.9\times \mathrm{min}\left({q_{75}-q_{25}}, \sigma \right)}}{n^{0.2}}\right)

where :math:q_{75}-q_{25} is the inter-quartile range and :math:\sigma the standard deviation of the sample, :math:x, and :math:m is a multipler supplied in :math:\mathrm{window}. The :math:25\% and :math:75\% quartiles, :math:q_{25} and :math:q_{75}, are calculated using :meth:stat.quantiles <naginterfaces.library.stat.quantiles>. This is the 'rule-of-thumb' suggested by Silverman (1990).

**window** : float, optional
If :math:\mathrm{wtype} = 1, then :math:h, the window width. Otherwise, :math:m, the multiplier used in the calculation of :math:h.

**slo** : None or float, optional
Note: if this argument is **None** then a default value will be used, determined as follows: :math:3.0.

If :math:\mathrm{slo} < \mathrm{shi} then :math:a, the lower limit of the interval on which the estimate is calculated. Otherwise, :math:a and :math:b, the lower and upper limits of the interval, are calculated as follows:

.. math::
\begin{array}{ccc}a& = &\textit{min}_i\left\{x_i\right\}-\mathrm{slo}\times h\\b& = &\textit{max}_i\left\{x_i\right\}+\mathrm{slo}\times h\end{array}

where :math:h is the window width.

For most applications :math:a should be at least three window widths below the lowest data point.

On a continuation call, a supplied :math:\mathrm{slo} will be ignored and the appropriate value from the previous call to kerndens_gauss will be extracted from :math:\mathrm{comm}\ ['rcomm'].

**shi** : None or float, optional
Note: if this argument is **None** then a default value will be used, determined as follows: :math:0.0.

If :math:\mathrm{slo} < \mathrm{shi} then :math:b, the upper limit of the interval on which the estimate is calculated. Otherwise a value for :math:b is calculated from the data as stated in the description of :math:\mathrm{slo} and the value supplied in :math:\mathrm{shi} is not used.

For most applications :math:b should be at least three window widths above the highest data point.

On a continuation call, a supplied :math:\mathrm{shi} will be ignored and the appropriate value from the previous call to kerndens_gauss will be extracted from :math:\mathrm{comm}\ ['rcomm'].

**ns** : int, optional
:math:n_s, the number of points at which the estimate is calculated.

On a continuation call, :math:\mathrm{ns} must be unchanged since the last call to kerndens_gauss.

**Returns**
**window** : float
:math:h, the window width actually used.

**slo** : float
:math:a, the lower limit actually used.

**shi** : float
:math:b, the upper limit actually used.

**smooth** : float, ndarray, shape :math:\left(\mathrm{ns}\right)
:math:\hat{f}\left(t_{\textit{l}}\right), for :math:\textit{l} = 1,2,\ldots,n_s, the :math:n_s values of the density estimate.

**t** : float, ndarray, shape :math:\left(\mathrm{ns}\right)
:math:t_{\textit{l}}, for :math:\textit{l} = 1,2,\ldots,n_s, the points at which the estimate is calculated.

.. _g10bb-py2-py-errors:

**Raises**
**NagValueError**
(errno :math:11)
On entry, :math:n = \langle\mathit{\boldsymbol{value}}\rangle.

Constraint: :math:n > 0.

(errno :math:12)
On entry, :math:n = \langle\mathit{\boldsymbol{value}}\rangle.

On entry at previous call, :math:n = \langle\mathit{\boldsymbol{value}}\rangle.

Constraint: on a continuation call, :math:\textit{n} must be unchanged since previous call.

(errno :math:31)
On entry, :math:\mathrm{wtype} = \langle\mathit{\boldsymbol{value}}\rangle.

Constraint: :math:\mathrm{wtype} = 1 or :math:2.

(errno :math:41)
On entry, :math:\mathrm{window} = \langle\mathit{\boldsymbol{value}}\rangle.

Constraint: :math:\mathrm{window} > 0.0.

(errno :math:51)
On entry, :math:\mathrm{slo} = \langle\mathit{\boldsymbol{value}}\rangle.

On exit from previous call, :math:\mathrm{slo} = \langle\mathit{\boldsymbol{value}}\rangle.

Constraint: on a continuation call, :math:\mathrm{slo} must be the same value returned by the previous call.

(errno :math:62)
On entry, :math:\mathrm{shi} = \langle\mathit{\boldsymbol{value}}\rangle.

On exit from previous call, :math:\mathrm{shi} = \langle\mathit{\boldsymbol{value}}\rangle.

Constraint: on a continuation call, :math:\mathrm{shi} must be the same value returned by the previous call.

(errno :math:71)
On entry, :math:\mathrm{ns} = \langle\mathit{\boldsymbol{value}}\rangle.

Constraint: :math:\mathrm{ns}\geq 2.

(errno :math:74)
On entry, :math:\mathrm{ns} = \langle\mathit{\boldsymbol{value}}\rangle.

On entry at previous call, :math:\mathrm{ns} = \langle\mathit{\boldsymbol{value}}\rangle.

Constraint: on a continuation call, :math:\mathrm{ns} must be unchanged since previous call.

(errno :math:111)
:math:\mathrm{comm}\ ['rcomm'] has been corrupted between calls.

**Warns**
**NagAlgorithmicWarning**
(errno :math:61)
On entry, :math:\mathrm{slo} = \langle\mathit{\boldsymbol{value}}\rangle and :math:\mathrm{shi} = \langle\mathit{\boldsymbol{value}}\rangle.

On entry, :math:\mathrm{min}\left(\mathrm{x}\right) = \langle\mathit{\boldsymbol{value}}\rangle and :math:\mathrm{max}\left(\mathrm{x}\right) = \langle\mathit{\boldsymbol{value}}\rangle.

Expected values of at least :math:\langle\mathit{\boldsymbol{value}}\rangle and :math:\langle\mathit{\boldsymbol{value}}\rangle for :math:\mathrm{slo} and :math:\mathrm{shi}.

.. _g10bb-py2-py-notes:

**Notes**
Given a sample of :math:n observations, :math:x_1,x_2,\ldots,x_n, from a distribution with unknown density function, :math:f\left(x\right), an estimate of the density function, :math:\hat{f}\left(x\right), may be required.
The simplest form of density estimator is the histogram.
This may be defined by:

.. math::
\hat{f}\left(x\right) = \frac{1}{{nh}}n_j\text{, }\quad a+\left(j-1\right)h < x < a+jh\text{, }\quad j = 1,2,\ldots,n_s\text{,}

where :math:n_j is the number of observations falling in the interval :math:a+\left(j-1\right)h to :math:a+jh, :math:a is the lower bound to the histogram, :math:b = n_sh is the upper bound and :math:n_s is the total number of intervals.
The value :math:h is known as the window width.
To produce a smoother density estimate a kernel method can be used.
A kernel function, :math:K\left(t\right), satisfies the conditions:

.. math::

The kernel density estimator is then defined as

.. math::
\hat{f}\left(x\right) = \frac{1}{{nh}}\sum_{{i = 1}}^nK\left(\frac{{x-x_i}}{h}\right)\text{.}

The choice of :math:K is usually not important but to ease the computational burden use can be made of the Gaussian kernel defined as

.. math::
K\left(t\right) = \frac{1}{\sqrt{2\pi }}e^{{-t^2/2}}\text{.}

The smoothness of the estimator depends on the window width :math:h.
The larger the value of :math:h the smoother the density estimate.
The value of :math:h can be chosen by examining plots of the smoothed density for different values of :math:h or by using cross-validation methods (see Silverman (1990)).

Silverman (1982) and Silverman (1990) show how the Gaussian kernel density estimator can be computed using a fast Fourier transform (FFT).
In order to compute the kernel density estimate over the range :math:a to :math:b the following steps are required.

(i) Discretize the data to give :math:n_s equally spaced points :math:t_l with weights :math:\xi_l (see Jones and Lotwick (1984)).

(#) Compute the FFT of the weights :math:\xi_l to give :math:Y_l.

(#) Compute :math:\zeta_l = e^{{-\frac{1}{2}h^2s_l^2}}Y_l where :math:s_l = 2\pi l/\left(b-a\right).

(#) Find the inverse FFT of :math:\zeta_l to give :math:\hat{f}\left(x\right).

To compute the kernel density estimate for further values of :math:h only steps \(iii) and (iv) need be repeated.

.. _g10bb-py2-py-references:

**References**
Jones, M C and Lotwick, H W, 1984, Remark AS R50. A remark on algorithm AS 176. Kernel density estimation using the Fast Fourier Transform, Appl. Statist. (33), 120--122

Silverman, B W, 1982, Algorithm AS 176. Kernel density estimation using the fast Fourier transform, Appl. Statist. (31), 93--99

Silverman, B W, 1990, Density Estimation, Chapman and Hall
"""
raise NotImplementedError

[docs]def data_runningmedian(itype, y):
r"""
data_runningmedian computes a smoothed data sequence using running median smoothers.

.. _g10ca-py2-py-doc:

For full information please refer to the NAG Library document for g10ca

https://www.nag.com/numeric/nl/nagdoc_28.4/flhtml/g10/g10caf.html

.. _g10ca-py2-py-parameters:

**Parameters**
**itype** : int
Specifies the method to be used.

If :math:\mathrm{itype} = 0, 4253H,twice is used.

If :math:\mathrm{itype} = 1, 3RSSH,twice is used.

**y** : float, array-like, shape :math:\left(n\right)
The sample observations.

**Returns**
**smooth** : float, ndarray, shape :math:\left(n\right)
Contains the smooth.

**rough** : float, ndarray, shape :math:\left(n\right)
Contains the rough.

.. _g10ca-py2-py-errors:

**Raises**
**NagValueError**
(errno :math:1)
On entry, :math:\mathrm{itype} = \langle\mathit{\boldsymbol{value}}\rangle.

Constraint: :math:\mathrm{itype} = 0 or :math:1.

(errno :math:2)
On entry, :math:n = \langle\mathit{\boldsymbol{value}}\rangle.

Constraint: :math:n > 6.

.. _g10ca-py2-py-notes:

**Notes**
In the NAG Library the traditional C interface for this routine uses a different algorithmic base. Please contact NAG if you have any questions about compatibility.

Given a sequence of :math:n observations recorded at equally spaced intervals, data_runningmedian fits a smooth curve through the data using one of two smoothers.
The two smoothers are based on the use of running medians and averages to summarise overlapping segments.
The fit and the residuals are called the smooth and the rough respectively.
They obey the following:

.. math::
\mathrm{Data} = \mathrm{Smooth}+\mathrm{Rough}\text{.}

The two smoothers are:

(1) 4253H,twice consisting of a running median of :math:4, then :math:2, then :math:5, then :math:3 followed by hanning. Hanning is a running weighted average, the weights being :math:1/4, :math:1/2 and :math:1/4. The result of this smoothing is then reroughed by computing residuals, applying the same smoother to them and adding the result to the smooth of the first pass.

(#) 3RSSH,twice consisting of a running median of :math:3, two splitting operations named S to improve the smooth sequence, each of which is followed by a running median of :math:3, and finally hanning. The end points are dealt with using the method described by Velleman and Hoaglin (1981). The full smoother 3RSSH,twice is produced by reroughing as described above.

The compound smoother 4253H,twice is recommended.
The smoother 3RSSH,twice is popular when calculating by hand as it requires simpler computations and is included for comparison purposes.

.. _g10ca-py2-py-references:

**References**
Tukey, J W, 1977, Exploratory Data Analysis, Addison--Wesley

Velleman, P F and Hoaglin, D C, 1981, Applications, Basics, and Computing of Exploratory Data Analysis, Duxbury Press, Boston, MA
"""
raise NotImplementedError

[docs]def data_order(x, y, wt=None):
r"""
data_order orders and weights data which is entered unsequentially, weighted or unweighted.

.. _g10za-py2-py-doc:

For full information please refer to the NAG Library document for g10za

https://www.nag.com/numeric/nl/nagdoc_28.4/flhtml/g10/g10zaf.html

.. _g10za-py2-py-parameters:

**Parameters**
**x** : float, array-like, shape :math:\left(n\right)
The values, :math:x_{\textit{i}}, for :math:\textit{i} = 1,2,\ldots,n.

**y** : float, array-like, shape :math:\left(n\right)
The values :math:y_{\textit{i}}, for :math:\textit{i} = 1,2,\ldots,n.

**wt** : None or float, array-like, shape :math:\left(n\right), optional
If :math:\textit{weight} = \texttt{'W'}, :math:\mathrm{wt} must contain the :math:n weights. Otherwise :math:\mathrm{wt} is not referenced and unit weights are assumed.

**Returns**
**nord** : int
The number of distinct observations.

**xord** : float, ndarray, shape :math:\left(n\right)
The first :math:\mathrm{nord} elements contain the ordered and distinct :math:x_i.

**yord** : float, ndarray, shape :math:\left(n\right)
The first :math:\mathrm{nord} elements contain the values :math:y^{\prime } corresponding to the values in :math:\mathrm{xord}.

**wtord** : float, ndarray, shape :math:\left(n\right)
The first :math:\mathrm{nord} elements contain the values :math:w^{\prime } corresponding to the values of :math:\mathrm{xord} and :math:\mathrm{yord}.

The within group sum of squares for tied observations.

.. _g10za-py2-py-errors:

**Raises**
**NagValueError**
(errno :math:1)
On entry, :math:\textit{weight} = \langle\mathit{\boldsymbol{value}}\rangle.

Constraint: :math:\textit{weight} = \texttt{'W'} or :math:\texttt{'U'}.

(errno :math:1)
On entry, :math:n = \langle\mathit{\boldsymbol{value}}\rangle.

Constraint: :math:n\geq 1.

(errno :math:2)
On entry, all weights are zero.

(errno :math:2)
On entry, :math:i = \langle\mathit{\boldsymbol{value}}\rangle and :math:\mathrm{wt}[i-1] = \langle\mathit{\boldsymbol{value}}\rangle.

Constraint: :math:\mathrm{wt}[i-1] \geq 0.0.

.. _g10za-py2-py-notes:

**Notes**
In the NAG Library the traditional C interface for this routine uses a different algorithmic base. Please contact NAG if you have any questions about compatibility.

Given a set of observations :math:\left(x_i, y_i\right), for :math:i = 1,2,\ldots,n, with corresponding weights :math:w_i, data_order rearranges the observations so that the :math:x_i are in ascending order.

For any equal :math:x_i in the ordered set, say :math:x_j = x_{{j+1}} = \cdots = x_{{j+k}}, a single observation :math:x_j is returned with a corresponding :math:y^{\prime } and :math:w^{\prime }, calculated as

.. math::
w^{\prime } = \sum_{{l = 0}}^kw_{{i+l}}

and

.. math::
y^{\prime } = \frac{{\sum_{{l = 0}}^kw_{{i+l}}y_{{i+l}}}}{{w^{\prime }}}\text{.}

Observations with zero weight are ignored.
If no weights are supplied by you, then unit weights are assumed; that is :math:w_{\textit{i}} = 1, for :math:\textit{i} = 1,2,\ldots,n.

In addition, the within group sum of squares is computed for the tied observations using West's algorithm (see West (1979)).

.. _g10za-py2-py-references:

**References**
Draper, N R and Smith, H, 1985, Applied Regression Analysis, (2nd Edition), Wiley

West, D H D, 1979, Updating mean and variance estimates: An improved method, Comm. ACM (22), 532--555
"""
raise NotImplementedError