Source code for naginterfaces.library.univar

# -*- coding: utf-8 -*-
r"""
Module Summary
--------------
Interfaces for the NAG Mark 28.5 `univar` Chapter.

``univar`` - Univariate Estimation

This module deals with the estimation of unknown parameters of a univariate distribution.
It includes both point and interval estimation using maximum likelihood and robust methods.

See Also
--------
``naginterfaces.library.examples.univar`` :
    This subpackage contains examples for the ``univar`` module.
    See also the :ref:`library_univar_ex` subsection.

Functionality Index
-------------------

2 sample :math:`t`-test: :meth:`ttest_2normal`

**Confidence intervals for parameters**

  binomial distribution: :meth:`ci_binomial`

  Poisson distribution: :meth:`ci_poisson`

**Maximum likelihood estimation of parameters**

  Normal distribution, grouped and/or censored data: :meth:`estim_normal`

  Weibull distribution: :meth:`estim_weibull`

**Outlier detection**

  Peirce

    raw data or single variance supplied: :meth:`outlier_peirce_1var`

    two variances supplied: :meth:`outlier_peirce_2var`

**Parameter estimates**

  generalized Pareto distribution: :meth:`estim_genpareto`

**Robust estimation**

  confidence intervals

    one sample: :meth:`robust_1var_ci`

    two samples: :meth:`robust_2var_ci`

  median, median absolute deviation and robust standard deviation: :meth:`robust_1var_median`

  :math:`M`-estimates for location and scale parameters

    standard weight functions: :meth:`robust_1var_mestim`

    trimmed and winsorized means and estimates of their variance: :meth:`robust_1var_trimmed`

    user-defined weight functions: :meth:`robust_1var_mestim_wgt`

For full information please refer to the NAG Library document

https://www.nag.com/numeric/nl/nagdoc_28.5/flhtml/g07/g07intro.html
"""

# NAG Copyright 2017-2022.

[docs]def ci_binomial(n, k, clevel): r""" ``ci_binomial`` computes a confidence interval for the parameter :math:`p` (the probability of a success) of a binomial distribution. .. _g07aa-py2-py-doc: For full information please refer to the NAG Library document for g07aa https://www.nag.com/numeric/nl/nagdoc_28.5/flhtml/g07/g07aaf.html .. _g07aa-py2-py-parameters: **Parameters** **n** : int :math:`n`, the number of trials. **k** : int :math:`k`, the number of successes. **clevel** : float The confidence level, :math:`\left(1-\alpha \right)`, for two-sided interval estimate. For example :math:`\mathrm{clevel} = 0.95` will give a :math:`95\%` confidence interval. **Returns** **pl** : float The lower limit, :math:`p_l`, of the confidence interval. **pu** : float The upper limit, :math:`p_u`, of the confidence interval. .. _g07aa-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`\mathrm{clevel} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`0.0 < \mathrm{clevel} < 1.0`. (`errno` :math:`1`) On entry, :math:`\mathrm{n} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\mathrm{k} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{n}\geq \mathrm{k}`. (`errno` :math:`1`) On entry, :math:`\mathrm{k} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{k} \geq 0`. (`errno` :math:`1`) On entry, :math:`\mathrm{n} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{n} \geq 1`. (`errno` :math:`2`) When using the relationship with the gamma distribution the series to calculate the gamma probabilities has failed to converge. .. _g07aa-py2-py-notes: **Notes** Given the number of trials, :math:`n`, and the number of successes, :math:`k`, this function computes a :math:`100\left(1-\alpha \right)\%` confidence interval for :math:`p`, the probability parameter of a binomial distribution with probability function, .. math:: f\left(x\right) = \begin{pmatrix}n\\x\end{pmatrix}p^x\left(1-p\right)^{{n-x}}\text{, }\quad x = 0,1,\ldots,n\text{,} where :math:`\alpha` is in the interval :math:`\left(0, 1\right)`. Let the confidence interval be denoted by [:math:`p_l,p_u`]. The point estimate for :math:`p` is :math:`\hat{p} = k/n`. The lower and upper confidence limits :math:`p_l` and :math:`p_u` are estimated by the solutions to the equations; .. math:: \sum_{{x = k}}^n\begin{pmatrix}n\\x\end{pmatrix}p_l^x\left(1-p_l\right)^{{n-x}} = \alpha /2\text{,} .. math:: \sum_{{x = 0}}^k\begin{pmatrix}n\\x\end{pmatrix}p_u^x\left(1-p_u\right)^{{n-x}} = \alpha /2\text{.} Three different methods are used depending on the number of trials, :math:`n`, and the number of successes, :math:`k`. (1) If :math:`\mathrm{max}\left(k, {n-k}\right) < 10^6`. The relationship between the beta and binomial distributions (see page 38 of Hastings and Peacock (1975)) is used to derive the equivalent equations, .. math:: \begin{array}{lcl}p_l& = &\beta_{{k,n-k+1,\alpha /2}}\text{,}\\&&\\p_u& = &\beta_{{k+1,n-k,1-\alpha /2}}\text{,}\end{array} where :math:`\beta_{{a,b,\delta }}` is the deviate associated with the lower tail probability, :math:`\delta`, of the beta distribution with parameters :math:`a` and :math:`b`. These beta deviates are computed using :meth:`stat.inv_cdf_beta <naginterfaces.library.stat.inv_cdf_beta>`. (#) If :math:`\mathrm{max}\left(k, {n-k}\right)\geq 10^6` and :math:`\mathrm{min}\left(k, {n-k}\right)\leq 1000`. The binomial variate with parameters :math:`n` and :math:`p` is approximated by a Poisson variate with mean :math:`np`, see page 38 of Hastings and Peacock (1975). The relationship between the Poisson and :math:`\chi^2`-distributions (see page 112 of Hastings and Peacock (1975)) is used to derive the following equations; .. math:: \begin{array}{lcl}p_l& = &\frac{1}{{2n}}\chi_{{2k,\alpha /2}}^2\text{,}\\&&\\p_u& = &\frac{1}{{2n}}\chi_{{2k+2,1-\alpha /2}}^2\text{,}\end{array} where :math:`\chi_{{\delta,\nu }}^2` is the deviate associated with the lower tail probability, :math:`\delta`, of the :math:`\chi^2`-distribution with :math:`\nu` degrees of freedom. In turn the relationship between the :math:`\chi^2`-distribution and the gamma distribution (see page 70 of Hastings and Peacock (1975)) yields the following equivalent equations; .. math:: \begin{array}{lcl}p_l& = &\frac{1}{{2n}}\gamma_{{k,2\text{;}\alpha /2}}\text{,}\\&&\\p_u& = &\frac{1}{{2n}}\gamma_{{k+1,2\text{;}1-\alpha /2}}\text{,}\end{array} where :math:`\gamma_{{\alpha,\beta \text{;}\delta }}` is the deviate associated with the lower tail probability, :math:`\delta`, of the gamma distribution with shape parameter :math:`\alpha` and scale parameter :math:`\beta`. These deviates are computed using :meth:`stat.inv_cdf_gamma <naginterfaces.library.stat.inv_cdf_gamma>`. (#) If :math:`\mathrm{max}\left(k, {n-k}\right) > 10^6` and :math:`\mathrm{min}\left(k, {n-k}\right) > 1000`. The binomial variate with parameters :math:`n` and :math:`p` is approximated by a Normal variate with mean :math:`np` and variance :math:`np\left(1-p\right)`, see page 38 of Hastings and Peacock (1975). The approximate lower and upper confidence limits :math:`p_l` and :math:`p_u` are the solutions to the equations; .. math:: \begin{array}{lcl}\frac{{k-np_l}}{{\sqrt{np_l\left(1-p_l\right)}}}& = &z_{{1-\alpha /2}}\text{,}\\&&\\&&\\\frac{{k-np_u}}{{\sqrt{np_u\left(1-p_u\right)}}}& = &z_{{\alpha /2}}\text{,}\end{array} where :math:`z_{\delta }` is the deviate associated with the lower tail probability, :math:`\delta`, of the standard Normal distribution. These equations are solved using a quadratic equation solver (:meth:`zeros.quadratic_real <naginterfaces.library.zeros.quadratic_real>`). .. _g07aa-py2-py-references: **References** Hastings, N A J and Peacock, J B, 1975, `Statistical Distributions`, Butterworth Snedecor, G W and Cochran, W G, 1967, `Statistical Methods`, Iowa State University Press """ raise NotImplementedError
[docs]def ci_poisson(n, xmean, clevel): r""" ``ci_poisson`` computes a confidence interval for the mean parameter of the Poisson distribution. .. _g07ab-py2-py-doc: For full information please refer to the NAG Library document for g07ab https://www.nag.com/numeric/nl/nagdoc_28.5/flhtml/g07/g07abf.html .. _g07ab-py2-py-parameters: **Parameters** **n** : int :math:`n`, the sample size. **xmean** : float The sample mean, :math:`\bar{x}`. **clevel** : float The confidence level, :math:`\left(1-\alpha \right)`, for two-sided interval estimate. For example :math:`\mathrm{clevel} = 0.95` gives a :math:`95\%` confidence interval. **Returns** **tl** : float The lower limit, :math:`\theta_l`, of the confidence interval. **tu** : float The upper limit, :math:`\theta_u`, of the confidence interval. .. _g07ab-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`\mathrm{clevel} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`0.0 < \mathrm{clevel} < 1.0`. (`errno` :math:`1`) On entry, :math:`\mathrm{xmean} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{xmean}\geq 0.0`. (`errno` :math:`1`) On entry, :math:`\mathrm{n} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{n} > 0`. (`errno` :math:`2`) When using the relationship with the gamma distribution the series to calculate the gamma probabilities has failed to converge. .. _g07ab-py2-py-notes: **Notes** Given a random sample of size :math:`n`, denoted by :math:`x_1,x_2,\ldots,x_n`, from a Poisson distribution with probability function .. math:: p\left(x\right) = e^{{-\theta }}\frac{\theta^x}{{x!}}\text{, }\quad x = 0,1,2,\ldots the point estimate, :math:`\hat{\theta }`, for :math:`\theta` is the sample mean, :math:`\bar{x}`. Given :math:`n` and :math:`\bar{x}` this function computes a :math:`100\left(1-\alpha \right)\%` confidence interval for the parameter :math:`\theta`, denoted by [:math:`\theta_l,\theta_u`], where :math:`\alpha` is in the interval :math:`\left(0, 1\right)`. The lower and upper confidence limits are estimated by the solutions to the equations .. math:: \begin{array}{c}e^{{-n\theta_l}}\sum_{{x = T}}^{\infty } \frac{\left(n\theta_l\right)^x}{{x!}} = \frac{\alpha }{2}\text{,}\\\\\\e^{{-n\theta_u}}\sum_{{x = 0}}^T\frac{\left(n\theta_u\right)^x}{{x!}} = \frac{\alpha }{2}\text{,}\end{array} where :math:`T = \sum_{{i = 1}}^nx_i = n\hat{\theta }`. The relationship between the Poisson distribution and the :math:`\chi^2`-distribution (see page 112 of Hastings and Peacock (1975)) is used to derive the equations .. math:: \begin{array}{l}\theta_l = \frac{1}{{2n}}\chi_{{2T,\alpha /2}}^2\text{,}\\\\\theta_u = \frac{1}{{2n}}\chi_{{2T+2,1-\alpha /2}}^2\text{,}\end{array} where :math:`\chi_{{\nu,p}}^2` is the deviate associated with the lower tail probability :math:`p` of the :math:`\chi^2`-distribution with :math:`\nu` degrees of freedom. In turn the relationship between the :math:`\chi^2`-distribution and the gamma distribution (see page 70 of Hastings and Peacock (1975)) yields the following equivalent equations; .. math:: \begin{array}{l}\theta_l = \frac{1}{{2n}}\gamma_{{T,2\text{;}\alpha /2}}\text{,}\\\\\theta_u = \frac{1}{{2n}}\gamma_{{T+1,2\text{;}1-\alpha /2}}\text{,}\end{array} where :math:`\gamma_{{\alpha,\beta \text{;}\delta }}` is the deviate associated with the lower tail probability, :math:`\delta`, of the gamma distribution with shape parameter :math:`\alpha` and scale parameter :math:`\beta`. These deviates are computed using :meth:`stat.inv_cdf_gamma <naginterfaces.library.stat.inv_cdf_gamma>`. .. _g07ab-py2-py-references: **References** Hastings, N A J and Peacock, J B, 1975, `Statistical Distributions`, Butterworth Snedecor, G W and Cochran, W G, 1967, `Statistical Methods`, Iowa State University Press """ raise NotImplementedError
[docs]def estim_normal(method, x, xc, ic, xmu, xsig, tol, maxit): r""" ``estim_normal`` computes maximum likelihood estimates and their standard errors for parameters of the Normal distribution from grouped and/or censored data. .. _g07bb-py2-py-doc: For full information please refer to the NAG Library document for g07bb https://www.nag.com/numeric/nl/nagdoc_28.5/flhtml/g07/g07bbf.html .. _g07bb-py2-py-parameters: **Parameters** **method** : str, length 1 Indicates whether the Newton--Raphson or :math:`EM` algorithm should be used. If :math:`\mathrm{method} = \texttt{'N'}`, the Newton--Raphson algorithm is used. If :math:`\mathrm{method} = \texttt{'E'}`, the :math:`EM` algorithm is used. **x** : float, array-like, shape :math:`\left(n\right)` The observations :math:`x_{\textit{i}}`, :math:`L_{\textit{i}}` or :math:`U_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,n`. If the observation is exactly specified -- the exact value, :math:`x_i`. If the observation is right-censored -- the lower value, :math:`L_i`. If the observation is left-censored -- the upper value, :math:`U_i`. If the observation is interval-censored -- the lower or upper value, :math:`L_i` or :math:`U_i`, (see :math:`\mathrm{xc}`). **xc** : float, array-like, shape :math:`\left(n\right)` If the :math:`\textit{j}`\ th observation, for :math:`\textit{j} = 1,2,\ldots,n` is an interval-censored observation then :math:`\mathrm{xc}[j-1]` should contain the complementary value to :math:`\mathrm{x}[j-1]`, that is, if :math:`\mathrm{x}[j-1] < \mathrm{xc}[j-1]`, then :math:`\mathrm{xc}[j-1]` contains upper value, :math:`U_i`, and if :math:`\mathrm{x}[j-1] > \mathrm{xc}[j-1]`, then :math:`\mathrm{xc}[j-1]` contains lower value, :math:`L_i`. Otherwise if the :math:`j`\ th observation is exact or right - or left-censored :math:`\mathrm{xc}[j-1]` need not be set. Note: if :math:`\mathrm{x}[j-1] = \mathrm{xc}[j-1]` then the observation is ignored. **ic** : int, array-like, shape :math:`\left(n\right)` :math:`\mathrm{ic}[\textit{i}-1]` contains the censoring codes for the :math:`\textit{i}`\ th observation, for :math:`\textit{i} = 1,2,\ldots,n`. If :math:`\mathrm{ic}[i-1] = 0`, the observation is exactly specified. If :math:`\mathrm{ic}[i-1] = 1`, the observation is right-censored. If :math:`\mathrm{ic}[i-1] = 2`, the observation is left-censored. If :math:`\mathrm{ic}[i-1] = 3`, the observation is interval-censored. **xmu** : float If :math:`\mathrm{xsig} > 0.0` the initial estimate of the mean, :math:`\mu`; otherwise :math:`\mathrm{xmu}` need not be set. **xsig** : float Specifies whether an initial estimate of :math:`\mu` and :math:`\sigma` are to be supplied. :math:`\mathrm{xsig} > 0.0` :math:`\mathrm{xsig}` is the initial estimate of :math:`\sigma` and :math:`\mathrm{xmu}` must contain an initial estimate of :math:`\mu`. :math:`\mathrm{xsig}\leq 0.0` Initial estimates of :math:`\mathrm{xmu}` and :math:`\mathrm{xsig}` are calculated internally from: (a) the exact observations, if the number of exactly specified observations is :math:`\text{}\geq 2`; or (#) the interval-censored observations; if the number of interval-censored observations is :math:`\text{}\geq 1`; or (#) they are set to :math:`0.0` and :math:`1.0` respectively. **tol** : float The relative precision required for the final estimates of :math:`\mu` and :math:`\sigma`. Convergence is assumed when the absolute relative changes in the estimates of both :math:`\mu` and :math:`\sigma` are less than :math:`\mathrm{tol}`. If :math:`\mathrm{tol} = 0.0`, a relative precision of :math:`0.000005` is used. **maxit** : int The maximum number of iterations. If :math:`\mathrm{maxit}\leq 0`, a value of :math:`25` is used. **Returns** **xmu** : float The maximum likelihood estimate, :math:`\hat{\mu }`, of :math:`\mu`. **xsig** : float The maximum likelihood estimate, :math:`\hat{\sigma }`, of :math:`\sigma`. **sexmu** : float The estimate of the standard error of :math:`\hat{\mu }`. **sexsig** : float The estimate of the standard error of :math:`\hat{\sigma }`. **corr** : float The estimate of the correlation between :math:`\hat{\mu }` and :math:`\hat{\sigma }`. **dev** : float The maximized log-likelihood, :math:`L\left({\hat{\mu }}, {\hat{\sigma }}\right)`. **nobs** : int, ndarray, shape :math:`\left(4\right)` The number of the different types of each observation; :math:`\mathrm{nobs}[0]` contains number of right-censored observations. :math:`\mathrm{nobs}[1]` contains number of left-censored observations. :math:`\mathrm{nobs}[2]` contains number of interval-censored observations. :math:`\mathrm{nobs}[3]` contains number of exactly specified observations. **nit** : int The number of iterations performed. .. _g07bb-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n \geq 2`. (`errno` :math:`1`) On entry, :math:`\mathrm{method} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{method} = \texttt{'N'}` or :math:`\texttt{'E'}`. (`errno` :math:`1`) On entry, effective number of observations :math:`\text{} < 2`. (`errno` :math:`1`) On entry, :math:`i = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\mathrm{ic}[i-1] = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{ic}[i-1] = 0`, :math:`1`, :math:`2` or :math:`3`. (`errno` :math:`1`) On entry, :math:`\mathrm{tol} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\text{machine precision} < \mathrm{tol}\leq 1.0` or :math:`\mathrm{tol} = 0.0`. (`errno` :math:`2`) The chosen method has not converged in :math:`\langle\mathit{\boldsymbol{value}}\rangle` iterations. (`errno` :math:`3`) The process has diverged. (`errno` :math:`3`) The EM process has failed. (`errno` :math:`4`) Standard errors cannot be computed. .. _g07bb-py2-py-notes: **Notes** A sample of size :math:`n` is taken from a Normal distribution with mean :math:`\mu` and variance :math:`\sigma^2` and consists of grouped and/or censored data. Each of the :math:`n` observations is known by a pair of values :math:`\left(L_i, U_i\right)` such that: .. math:: L_i\leq x_i\leq U_i\text{.} The data is represented as particular cases of this form: exactly specified observations occur when :math:`L_i = U_i = x_i`, right-censored observations, known only by a lower bound, occur when :math:`U_i→\infty`, left-censored observations, known only by a upper bound, occur when :math:`L_i→{-\infty }`, and interval-censored observations when :math:`L_i < x_i < U_i`. Let the set :math:`A` identify the exactly specified observations, sets :math:`B` and :math:`C` identify the observations censored on the right and left respectively, and set :math:`D` identify the observations confined between two finite limits. Also let there be :math:`r` exactly specified observations, i.e., the number in :math:`A`. The probability density function for the standard Normal distribution is .. math:: Z\left(x\right) = \frac{1}{\sqrt{2\pi }}\mathrm{exp}\left(-\frac{1}{2}x^2\right)\text{, }\quad {-\infty } < x < \infty and the cumulative distribution function is .. math:: P\left(X\right) = 1-Q\left(X\right) = \int_{{-\infty }}^XZ\left(x\right){dx}\text{.} The log-likelihood of the sample can be written as: .. math:: L\left(\mu, \sigma \right) = -r\log\left(\sigma \right)-\frac{1}{2}\sum_A\left\{\left(x_i-\mu \right)/\sigma \right\}^2+\sum_B\log\left(Q\left(l_i\right)\right)+\sum_C\log\left(P\left(u_i\right)\right)+\sum_D\log\left(p_i\right) where :math:`p_i = P\left(u_i\right)-P\left(l_i\right)` and :math:`u_i = \left(U_i-\mu \right)/\sigma \text{, }\quad l_i = \left(L_i-\mu \right)/\sigma`. Let .. math:: S\left(x_i\right) = \frac{{Z\left(x_i\right)}}{{Q\left(x_i\right)}}\text{, }\quad S_1\left(l_i, u_i\right) = \frac{{Z\left(l_i\right)-Z\left(u_i\right)}}{p_i} and .. math:: S_2\left(l_i, u_i\right) = \frac{{u_iZ\left(u_i\right)-l_iZ\left(l_i\right)}}{p_i}\text{,} then the first derivatives of the log-likelihood can be written as: .. math:: \frac{{\partial L\left(\mu, \sigma \right)}}{{\partial \mu }} = L_1\left(\mu, \sigma \right) = \sigma^{-2}\sum_A\left(x_i-\mu \right)+\sigma^{-1}\sum_BS\left(l_i\right)-\sigma^{-1}\sum_CS\left(-u_i\right)+\sigma^{-1}\sum_DS_1\left(l_i, u_i\right) and .. math:: \frac{{\partial L\left(\mu, \sigma \right)}}{{\partial \sigma }} = L_2\left(\mu, \sigma \right) = -r\sigma^{-1}+\sigma^{-3}\sum_A\left(x_i-\mu \right)^2+\sigma^{-1}\sum_Bl_iS\left(l_i\right)-\sigma^{-1}\sum_Cu_iS\left(-u_i\right) .. math:: -\sigma^{-1}\sum_DS_2\left(l_i, u_i\right) The maximum likelihood estimates, :math:`\hat{\mu }` and :math:`\hat{\sigma }`, are the solution to the equations: .. math:: L_1\left({\hat{\mu }}, {\hat{\sigma }}\right) = 0 and .. math:: L_2\left({\hat{\mu }}, {\hat{\sigma }}\right) = 0 and if the second derivatives :math:`\frac{{\partial^2L}}{{\partial^2\mu }}`, :math:`\frac{{\partial^2L}}{{{\partial \mu }{\partial \sigma }}}` and :math:`\frac{{\partial^2L}}{{\partial^2\sigma }}` are denoted by :math:`L_{11}`, :math:`L_{12}` and :math:`L_{22}` respectively, then estimates of the standard errors of :math:`\hat{\mu }` and :math:`\hat{\sigma }` are given by: .. math:: \mathrm{se}\left(\hat{\mu }\right) = \sqrt{\frac{{-L_{22}}}{{L_{11}L_{22}-L_{12}^2}}}\text{, }\quad \mathrm{se}\left(\hat{\sigma }\right) = \sqrt{\frac{{-L_{11}}}{{L_{11}L_{22}-L_{12}^2}}} and an estimate of the correlation of :math:`\hat{\mu }` and :math:`\hat{\sigma }` is given by: .. math:: \frac{L_{12}}{{\sqrt{L_{12}L_{22}}}}\text{.} To obtain the maximum likelihood estimates the equations `(1) <https://www.nag.com/numeric/nl/nagdoc_28.5/flhtml/g07/g07bbf.html#eqn1>`__ and `(2) <https://www.nag.com/numeric/nl/nagdoc_28.5/flhtml/g07/g07bbf.html#eqn1>`__ can be solved using either the Newton--Raphson method or the Expectation-maximization :math:`\left(EM\right)` algorithm of Dempster `et al.` (1977). **Newton--Raphson Method** This consists of using approximate estimates :math:`\tilde{\mu }` and :math:`\tilde{\sigma }` to obtain improved estimates :math:`\tilde{\mu }+\delta \tilde{\mu }` and :math:`\tilde{\sigma }+\delta \tilde{\sigma }` by solving .. math:: \begin{array}{l}\delta \tilde{\mu }L_{11}+\delta \tilde{\sigma }L_{12}+L_1 = 0\text{,}\\\\\delta \tilde{\mu }L_{12}+\delta \tilde{\sigma }L_{22}+L_2 = 0\text{,}\end{array} for the corrections :math:`\delta \tilde{\mu }` and :math:`\delta \tilde{\sigma }`. **EM Algorithm** The expectation step consists of constructing the variable :math:`w_i` as follows: .. math:: \text{if }\quad i \in A\text{, }\quad w_i = x_i .. math:: \text{if }\quad i \in B\text{, }\quad w_i = E\left(x_i | x_i > L_i\right) = \mu +\sigma S\left(l_i\right) .. math:: \text{if }\quad i \in C\text{, }\quad w_i = E\left(x_i | x_i < U_i\right) = \mu -\sigma S\left(-u_i\right) .. math:: \text{if }\quad i \in D\text{, }\quad w_i = E\left(x_i | L_i < x_i < U_i\right) = \mu +\sigma S_1\left(l_i, u_i\right) the maximization step consists of substituting `(3) <https://www.nag.com/numeric/nl/nagdoc_28.5/flhtml/g07/g07bbf.html#eqn3>`__, `(4) <https://www.nag.com/numeric/nl/nagdoc_28.5/flhtml/g07/g07bbf.html#eqn3>`__, `(5) <https://www.nag.com/numeric/nl/nagdoc_28.5/flhtml/g07/g07bbf.html#eqn3>`__ and `(6) <https://www.nag.com/numeric/nl/nagdoc_28.5/flhtml/g07/g07bbf.html#eqn3>`__ into `(1) <https://www.nag.com/numeric/nl/nagdoc_28.5/flhtml/g07/g07bbf.html#eqn1>`__ and `(2) <https://www.nag.com/numeric/nl/nagdoc_28.5/flhtml/g07/g07bbf.html#eqn1>`__ giving: .. math:: \hat{\mu } = \sum_{{i = 1}}^n\hat{w}_i/n and .. math:: \hat{\sigma }^2 = \sum_{{i = 1}}^n\left(\hat{w}_i-\hat{\mu }\right)^2/\left\{r+\sum_BT\left(\hat{l}_i\right)+\sum_CT\left(-\hat{u}_i\right)+\sum_DT_1\left(\hat{l}_i, \hat{u}_i\right)\right\} where .. math:: T\left(x\right) = S\left(x\right)\left\{S\left(x\right)-x\right\}\text{, }\quad T_1\left(l, u\right) = S_1^2\left(l, u\right)+S_2\left(l, u\right) and where :math:`\hat{w}_i`, :math:`\hat{l}_i` and :math:`\hat{u}_i` are :math:`w_i`, :math:`l_i` and :math:`u_i` evaluated at :math:`\hat{\mu }` and :math:`\hat{\sigma }`. Equations `(3) <https://www.nag.com/numeric/nl/nagdoc_28.5/flhtml/g07/g07bbf.html#eqn3>`__ and `(8) <https://www.nag.com/numeric/nl/nagdoc_28.5/flhtml/g07/g07bbf.html#eqn3>`__ are the basis of the :math:`EM` iterative procedure for finding :math:`\hat{\mu }` and :math:`\hat{\sigma }^2`. The procedure consists of alternately estimating :math:`\hat{\mu }` and :math:`\hat{\sigma }^2` using `(7) <https://www.nag.com/numeric/nl/nagdoc_28.5/flhtml/g07/g07bbf.html#eqn7>`__ and `(8) <https://www.nag.com/numeric/nl/nagdoc_28.5/flhtml/g07/g07bbf.html#eqn7>`__ and estimating :math:`\left\{\hat{w}_i\right\}` using `(3) <https://www.nag.com/numeric/nl/nagdoc_28.5/flhtml/g07/g07bbf.html#eqn3>`__ and `(6) <https://www.nag.com/numeric/nl/nagdoc_28.5/flhtml/g07/g07bbf.html#eqn3>`__. In choosing between the two methods a general rule is that the Newton--Raphson method converges more quickly but requires good initial estimates whereas the :math:`EM` algorithm converges slowly but is robust to the initial values. In the case of the censored Normal distribution, if only a small proportion of the observations are censored then estimates based on the exact observations should give good enough initial estimates for the Newton--Raphson method to be used. If there are a high proportion of censored observations then the :math:`EM` algorithm should be used and if high accuracy is required the subsequent use of the Newton--Raphson method to refine the estimates obtained from the :math:`EM` algorithm should be considered. .. _g07bb-py2-py-references: **References** Dempster, A P, Laird, N M and Rubin, D B, 1977, `Maximum likelihood from incomplete data via the` :math:`{EM}` `algorithm (with discussion)`, J. Roy. Statist. Soc. Ser. B (39), 1--38 Swan, A V, 1969, `Algorithm AS 16. Maximum likelihood estimation from grouped and censored normal data`, Appl. Statist. (18), 110--114 Wolynetz, M S, 1979, `Maximum likelihood estimation from confined and censored normal data`, Appl. Statist. (28), 185--195 """ raise NotImplementedError
[docs]def estim_weibull(cens, x, ic, gamma, tol, maxit): r""" ``estim_weibull`` computes maximum likelihood estimates for parameters of the Weibull distribution from data which may be right-censored. .. _g07be-py2-py-doc: For full information please refer to the NAG Library document for g07be https://www.nag.com/numeric/nl/nagdoc_28.5/flhtml/g07/g07bef.html .. _g07be-py2-py-parameters: **Parameters** **cens** : str, length 1 Indicates whether the data is censored or non-censored. :math:`\mathrm{cens} = \texttt{'N'}` Each observation is assumed to be exactly specified. :math:`\mathrm{ic}` is not referenced. :math:`\mathrm{cens} = \texttt{'C'}` Each observation is censored according to the value contained in :math:`\mathrm{ic}[\textit{i}-1]`, for :math:`\textit{i} = 1,2,\ldots,n`. **x** : float, array-like, shape :math:`\left(n\right)` :math:`\mathrm{x}[\textit{i}-1]` contains the :math:`\textit{i}`\ th observation, :math:`x_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,n`. **ic** : int, array-like, shape :math:`\left(:\right)` Note: the required length for this argument is determined as follows: if :math:`\mathrm{cens}=\texttt{'C'}`: :math:`n`; otherwise: :math:`1`. If :math:`\mathrm{cens} = \texttt{'C'}`, :math:`\mathrm{ic}[\textit{i}-1]` contains the censoring codes for the :math:`\textit{i}`\ th observation, for :math:`\textit{i} = 1,2,\ldots,n`. If :math:`\mathrm{ic}[i-1] = 0`, the :math:`i`\ th observation is exactly specified. If :math:`\mathrm{ic}[i-1] = 1`, the :math:`i`\ th observation is right-censored. If :math:`\mathrm{cens} = \texttt{'N'}`, :math:`\mathrm{ic}` is not referenced. **gamma** : float Indicates whether an initial estimate of :math:`\gamma` is provided. If :math:`\mathrm{gamma} > 0.0`, it is taken as the initial estimate of :math:`\gamma` and an initial estimate of :math:`\beta` is calculated from this value of :math:`\gamma`. If :math:`\mathrm{gamma}\leq 0.0`, initial estimates of :math:`\gamma` and :math:`\beta` are calculated, internally, providing the data contains at least two distinct exact observations. (If there are only two distinct exact observations, the largest observation must not be exactly specified.) See `Further Comments <https://www.nag.com/numeric/nl/nagdoc_28.5/flhtml/g07/g07bef.html#fcomments>`__ for further details. **tol** : float The relative precision required for the final estimates of :math:`\beta` and :math:`\gamma`. Convergence is assumed when the absolute relative changes in the estimates of both :math:`\beta` and :math:`\gamma` are less than :math:`\mathrm{tol}`. If :math:`\mathrm{tol} = 0.0`, a relative precision of :math:`0.000005` is used. **maxit** : int The maximum number of iterations allowed. If :math:`\mathrm{maxit}\leq 0`, a value of :math:`25` is used. **Returns** **beta** : float The maximum likelihood estimate, :math:`\hat{\beta }`, of :math:`\beta`. **gamma** : float Contains the maximum likelihood estimate, :math:`\hat{\gamma }`, of :math:`\gamma`. **sebeta** : float An estimate of the standard error of :math:`\hat{\beta }`. **segam** : float An estimate of the standard error of :math:`\hat{\gamma }`. **corr** : float An estimate of the correlation between :math:`\hat{\beta }` and :math:`\hat{\gamma }`. **dev** : float The maximized kernel log-likelihood, :math:`L\left({\hat{\beta }}, {\hat{\gamma }}\right)`. **nit** : int The number of iterations performed. .. _g07be-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`\mathrm{tol} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\text{machine precision} < \mathrm{tol}\leq 1.0` or :math:`\mathrm{tol} = 0.0`. (`errno` :math:`1`) On entry, :math:`\mathrm{cens} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{cens} = \texttt{'N'}` or :math:`\texttt{'C'}`. (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n\geq 1`. (`errno` :math:`2`) On entry, :math:`i = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\mathrm{ic}[i-1] = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{ic}[i-1] = 0` or :math:`1`. (`errno` :math:`2`) On entry, :math:`i = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\mathrm{x}[i-1] = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{x}[i-1] > 0.0`. (`errno` :math:`3`) Unable to calculate initial values. (`errno` :math:`3`) On entry, there are no exactly specified observations. (`errno` :math:`4`) The chosen method has not converged in :math:`\langle\mathit{\boldsymbol{value}}\rangle` iterations. (`errno` :math:`5`) The process has diverged. (`errno` :math:`5`) Hessian matrix of the Newton--Raphson process is singular. (`errno` :math:`6`) Potential overflow detected. .. _g07be-py2-py-notes: **Notes** ``estim_weibull`` computes maximum likelihood estimates of the parameters of the Weibull distribution from exact or right-censored data. For :math:`n` realizations, :math:`y_i`, from a Weibull distribution a value :math:`x_i` is observed such that .. math:: x_i\leq y_i\text{.} There are two situations: (a) exactly specified observations, when :math:`x_i = y_i` (#) right-censored observations, known by a lower bound, when :math:`x_i < y_i`. The probability density function of the Weibull distribution, and hence the contribution of an exactly specified observation to the likelihood, is given by: .. math:: f\left({x;\lambda }, \gamma \right) = \lambda \gamma x^{{\gamma -1}}\mathrm{exp}\left(-\lambda x^{\gamma }\right)\text{, }\quad x > 0\text{, for }\lambda,\gamma > 0\text{;} while the survival function of the Weibull distribution, and hence the contribution of a right-censored observation to the likelihood, is given by: .. math:: S\left({x;\lambda }, \gamma \right) = \mathrm{exp}\left(-\lambda x^{\gamma }\right)\text{, }\quad x > 0\text{, for }\lambda,\gamma > 0\text{.} If :math:`d` of the :math:`n` observations are exactly specified and indicated by :math:`i \in D` and the remaining :math:`\left(n-d\right)` are right-censored, then the likelihood function, :math:`\text{Like }\left(\lambda, \gamma \right)` is given by .. math:: \textit{Like}\left(\lambda, \gamma \right)\propto {\left(\lambda \gamma \right)}^d\left(\prod_{{i \in D}}x_i^{{\gamma -1}}\right)\mathrm{exp}\left(-\lambda \sum_{{i = 1}}^nx_i^{\gamma }\right)\text{.} To avoid possible numerical instability a different parameterisation :math:`\beta,\gamma` is used, with :math:`\beta = \log\left(\lambda \right)`. The kernel log-likelihood function, :math:`L\left(\beta, \gamma \right)`, is then: .. math:: L\left(\beta, \gamma \right) = d\log\left(\gamma \right)+d\beta +\left(\gamma -1\right)\sum_{{i \in D}}\log\left(x_i\right)-e^{\beta }\sum_{{i = 1}}^nx_i^{\gamma }\text{.} If the derivatives :math:`\frac{{\partial L}}{{\partial \beta }}`, :math:`\frac{{\partial L}}{{\partial \gamma }}`, :math:`\frac{{\partial^2L}}{{{\partial \beta }^2}}`, :math:`\frac{{\partial^2L}}{{{\partial \beta }{\partial \gamma }}}` and :math:`\frac{{\partial^2L}}{{{\partial \gamma }^2}}` are denoted by :math:`L_1`, :math:`L_2`, :math:`L_{11}`, :math:`L_{12}` and :math:`L_{22}`, respectively, then the maximum likelihood estimates, :math:`\hat{\beta }` and :math:`\hat{\gamma }`, are the solution to the equations: .. math:: L_1\left({\hat{\beta }}, {\hat{\gamma }}\right) = 0 and .. math:: L_2\left({\hat{\beta }}, {\hat{\gamma }}\right) = 0 Estimates of the asymptotic standard errors of :math:`\hat{\beta }` and :math:`\hat{\gamma }` are given by: .. math:: \mathrm{se}\left(\hat{\beta }\right) = \sqrt{\frac{{-L_{22}}}{{L_{11}L_{22}-L_{12}^2}}}\text{, }\quad \mathrm{se}\left(\hat{\gamma }\right) = \sqrt{\frac{{-L_{11}}}{{L_{11}L_{22}-L_{12}^2}}}\text{.} An estimate of the correlation coefficient of :math:`\hat{\beta }` and :math:`\hat{\gamma }` is given by: .. math:: \frac{L_{12}}{{\sqrt{L_{12}L_{22}}}}\text{.} **Note:** if an estimate of the original parameter :math:`\lambda` is required, then .. math:: \hat{\lambda } = \mathrm{exp}\left(\hat{\beta }\right)\quad \text{ and }\quad \mathrm{se}\left(\hat{\lambda }\right) = \hat{\lambda }\mathrm{se}\left(\hat{\beta }\right)\text{.} The equations `(1) <https://www.nag.com/numeric/nl/nagdoc_28.5/flhtml/g07/g07bef.html#eqn1>`__ and `(2) <https://www.nag.com/numeric/nl/nagdoc_28.5/flhtml/g07/g07bef.html#eqn1>`__ are solved by the Newton--Raphson iterative method with adjustments made to ensure that :math:`\hat{\gamma } > 0.0`. .. _g07be-py2-py-references: **References** Gross, A J and Clark, V A, 1975, `Survival Distributions: Reliability Applications in the Biomedical Sciences`, Wiley Kalbfleisch, J D and Prentice, R L, 1980, `The Statistical Analysis of Failure Time Data`, Wiley See Also -------- :meth:`naginterfaces.library.examples.univar.estim_weibull_ex.main` """ raise NotImplementedError
[docs]def estim_genpareto(y, optopt): r""" ``estim_genpareto`` estimates parameter values for the generalized Pareto distribution by using either moments or maximum likelihood. .. _g07bf-py2-py-doc: For full information please refer to the NAG Library document for g07bf https://www.nag.com/numeric/nl/nagdoc_28.5/flhtml/g07/g07bff.html .. _g07bf-py2-py-parameters: **Parameters** **y** : float, array-like, shape :math:`\left(n\right)` The :math:`n` observations :math:`y_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,n`, assumed to follow a generalized Pareto distribution. **optopt** : int Determines the method of estimation, set: :math:`\mathrm{optopt} = -2` For the method of probability-weighted moments. :math:`\mathrm{optopt} = -1` For the method of moments. :math:`\mathrm{optopt} = 1` For maximum likelihood with starting values given by the method of moments estimates. :math:`\mathrm{optopt} = 2` For maximum likelihood with starting values given by the method of probability-weighted moments. **Returns** **xi** : float The parameter estimate :math:`\hat{\xi }`. **beta** : float The parameter estimate :math:`\hat{\beta }`. **asvc** : float, ndarray, shape :math:`\left(4\right)` The variance-covariance of the asymptotic Normal distribution of :math:`\hat{\xi }` and :math:`\hat{\beta }`. :math:`\mathrm{asvc}[0]` contains the variance of :math:`\hat{\xi }`; :math:`\mathrm{asvc}[3]` contains the variance of :math:`\hat{\beta }`; :math:`\mathrm{asvc}[1]` and :math:`\mathrm{asvc}[2]` contain the covariance of :math:`\hat{\xi }` and :math:`\hat{\beta }`. **obsvc** : float, ndarray, shape :math:`\left(4\right)` If maximum likelihood estimates are requested, the observed variance-covariance of :math:`\hat{\xi }` and :math:`\hat{\beta }`. :math:`\mathrm{obsvc}[0]` contains the variance of :math:`\hat{\xi }`; :math:`\mathrm{obsvc}[3]` contains the variance of :math:`\hat{\beta }`; :math:`\mathrm{obsvc}[1]` and :math:`\mathrm{obsvc}[2]` contain the covariance of :math:`\hat{\xi }` and :math:`\hat{\beta }`. **ll** : float If maximum likelihood estimates are requested, :math:`\mathrm{ll}` contains the log-likelihood value :math:`L` at the end of the optimization; otherwise :math:`\mathrm{ll}` is set to :math:`{-1.0}`. .. _g07bf-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n > 1`. (`errno` :math:`2`) On entry, :math:`\mathrm{y}[\langle\mathit{\boldsymbol{value}}\rangle] = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{y}[i-1]\geq 0.0` for all :math:`i`. (`errno` :math:`3`) On entry, :math:`\mathrm{optopt} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{optopt} = -2`, :math:`-1`, :math:`1` or :math:`2`. (`errno` :math:`9`) The optimization of log-likelihood failed to converge; no maximum likelihood estimates are returned. Try using the other maximum likelihood option by resetting :math:`\mathrm{optopt}`. If this also fails, moments-based estimates can be returned by an appropriate setting of :math:`\mathrm{optopt}`. (`errno` :math:`10`) Variance of data in :math:`\mathrm{y}` is too low for method of moments optimization. (`errno` :math:`11`) The sum of :math:`\mathrm{y}` is zero within machine precision. **Warns** **NagAlgorithmicWarning** (`errno` :math:`6`) The asymptotic distribution is not available for the returned parameter estimates. (`errno` :math:`7`) The distribution of maximum likelihood estimates cannot be calculated for the returned parameter estimates because the Hessian matrix could not be inverted. (`errno` :math:`8`) The asymptotic distribution of parameter estimates is invalid and the distribution of maximum likelihood estimates cannot be calculated for the returned parameter estimates because the Hessian matrix could not be inverted. .. _g07bf-py2-py-notes: **Notes** Let the distribution function of a set of :math:`n` observations .. math:: y_i\text{, }\quad i = 1,2,\ldots,n be given by the generalized Pareto distribution: .. math:: F\left(y\right) = \left\{\begin{array}{cc} 1 - \left(1+\frac{{\xi y}}{\beta }\right)^{{-1/\xi }} \text{,} &\xi \neq 0\\ 1-e^{{-\frac{y}{\beta }}} \text{,} &\xi = 0\text{;}\end{array}\right. where :math:`\beta > 0` and :math:`y\geq 0`, when :math:`\xi \geq 0`; :math:`0\leq y\leq -\frac{\beta }{\xi }`, when :math:`\xi < 0`. Estimates :math:`\hat{\xi }` and :math:`\hat{\beta }` of the parameters :math:`\xi` and :math:`\beta` are calculated by using one of: method of moments (MOM); probability-weighted moments (PWM); maximum likelihood estimates (MLE) that seek to maximize the log-likelihood: .. math:: L = {-n}\mathrm{ln}\left({\hat{\beta }-\left(1+\frac{1}{\hat{\xi }}\right)}\right)\sum_{1}^{n}{\mathrm{ln}\left(1+\frac{{\hat{\xi }y_i}}{\hat{\beta }}\right)}\text{.} The variances and covariance of the asymptotic Normal distribution of parameter estimates :math:`\hat{\xi }` and :math:`\hat{\beta }` are returned if :math:`\hat{\xi }` satisfies: :math:`\hat{\xi } < \frac{1}{4}` for the MOM; :math:`\hat{\xi } < \frac{1}{2}` for the PWM method; :math:`\hat{\xi } < -\frac{1}{2}` for the MLE method. If the MLE option is exercised, the observed variances and covariance of :math:`\hat{\xi }` and :math:`\hat{\beta }` is returned, given by the negative inverse Hessian of :math:`L`. .. _g07bf-py2-py-references: **References** Hosking, J R M and Wallis, J R, 1987, `Parameter and quantile estimation for the generalized Pareto distribution`, Technometrics (29(3)) McNeil, A J, Frey, R and Embrechts, P, 2005, `Quantitative Risk Management`, Princeton University Press """ raise NotImplementedError
[docs]def ttest_2normal(tail, equal, nx, ny, xmean, ymean, xstd, ystd, clevel): r""" ``ttest_2normal`` computes a :math:`t`-test statistic to test for a difference in means between two Normal populations, together with a confidence interval for the difference between the means. .. _g07ca-py2-py-doc: For full information please refer to the NAG Library document for g07ca https://www.nag.com/numeric/nl/nagdoc_28.5/flhtml/g07/g07caf.html .. _g07ca-py2-py-parameters: **Parameters** **tail** : str, length 1 Indicates which tail probability is to be calculated, and thus which alternative hypothesis is to be used. :math:`\mathrm{tail} = \texttt{'T'}` The two tail probability, i.e., :math:`H_1:\mu_x\neq \mu_y`. :math:`\mathrm{tail} = \texttt{'U'}` The upper tail probability, i.e., :math:`H_1:\mu_x > \mu_y`. :math:`\mathrm{tail} = \texttt{'L'}` The lower tail probability, i.e., :math:`H_1:\mu_x < \mu_y`. **equal** : str, length 1 Indicates whether the population variances are assumed to be equal or not. :math:`\mathrm{equal} = \texttt{'E'}` The population variances are assumed to be equal, that is :math:`\sigma_x^2 = \sigma_y^2`. :math:`\mathrm{equal} = \texttt{'U'}` The population variances are not assumed to be equal. **nx** : int :math:`n_x`, the size of the :math:`X` sample. **ny** : int :math:`n_y`, the size of the :math:`Y` sample. **xmean** : float :math:`\bar{x}`, the mean of the :math:`X` sample. **ymean** : float :math:`\bar{y}`, the mean of the :math:`Y` sample. **xstd** : float :math:`s_x`, the standard deviation of the :math:`X` sample. **ystd** : float :math:`s_y`, the standard deviation of the :math:`Y` sample. **clevel** : float The confidence level, :math:`1-\alpha`, for the specified tail. For example :math:`\mathrm{clevel} = 0.95` will give a :math:`95\%` confidence interval. **Returns** **t** : float Contains the test statistic, :math:`t_{\mathrm{obs}}` or :math:`t_{\mathrm{obs}}^{\prime }`. **df** : float Contains the degrees of freedom for the test statistic. **prob** : float Contains the significance level, that is the tail probability, :math:`p`, as defined by :math:`\mathrm{tail}`. **dl** : float Contains the lower confidence limit for :math:`\mu_x-\mu_y`. **du** : float Contains the upper confidence limit for :math:`\mu_x-\mu_y`. .. _g07ca-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`\mathrm{clevel} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`0.0 < \mathrm{clevel} < 1.0`. (`errno` :math:`1`) On entry, :math:`\mathrm{ystd} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{ystd} > 0.0`. (`errno` :math:`1`) On entry, :math:`\mathrm{xstd} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{xstd} > 0.0`. (`errno` :math:`1`) On entry, :math:`\mathrm{ny} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{ny} \geq 2`. (`errno` :math:`1`) On entry, :math:`\mathrm{nx} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{nx} \geq 2`. (`errno` :math:`1`) On entry, :math:`\mathrm{equal} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{equal} = \texttt{'E'}` or :math:`\texttt{'U'}`. (`errno` :math:`1`) On entry, :math:`\mathrm{tail} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{tail} = \texttt{'T'}`, :math:`\texttt{'U'}` or :math:`\texttt{'L'}`. .. _g07ca-py2-py-notes: **Notes** `In the NAG Library the traditional C interface for this routine uses a different algorithmic base. Please contact NAG if you have any questions about compatibility.` Consider two independent samples, denoted by :math:`X` and :math:`Y`, of size :math:`n_x` and :math:`n_y` drawn from two Normal populations with means :math:`\mu_x` and :math:`\mu_y`, and variances :math:`\sigma_x^2` and :math:`\sigma_y^2` respectively. Denote the sample means by :math:`\bar{x}` and :math:`\bar{y}` and the sample variances by :math:`s_x^2` and :math:`s_y^2` respectively. ``ttest_2normal`` calculates a test statistic and its significance level to test the null hypothesis :math:`H_0:\mu_x = \mu_y`, together with upper and lower confidence limits for :math:`\mu_x-\mu_y`. The test used depends on whether or not the two population variances are assumed to be equal. (1) It is assumed that the two variances are equal, that is :math:`\sigma_x^2 = \sigma_y^2`. The test used is the two sample :math:`t`-test. The test statistic :math:`t` is defined by; .. math:: t_{\mathrm{obs}} = \frac{{\bar{x}-\bar{y}}}{{s\sqrt{\left(1/n_x\right)+\left(1/n_y\right)}}} where .. math:: s^2 = \frac{{\left(n_x-1\right)s_x^2+\left(n_y-1\right)s_y^2}}{{n_x+n_y-2}} is the pooled variance of the two samples. Under the null hypothesis :math:`H_0` this test statistic has a :math:`t`-distribution with :math:`\left(n_x+n_y-2\right)` degrees of freedom. The test of :math:`H_0` is carried out against one of three possible alternatives; :math:`H_1:\mu_x\neq \mu_y`; the significance level, :math:`p = P\left(t\geq \left\lvert t_{\mathrm{obs}}\right\rvert \right)`, i.e., a two tailed probability. :math:`H_1:\mu_x > \mu_y`; the significance level, :math:`p = P\left(t\geq t_{\mathrm{obs}}\right)`, i.e., an upper tail probability. :math:`H_1:\mu_x < \mu_y`; the significance level, :math:`p = P\left(t\leq t_{\mathrm{obs}}\right)`, i.e., a lower tail probability. Upper and lower :math:`100\left(1-\alpha \right)\%` confidence limits for :math:`\mu_x-\mu_y` are calculated as: .. math:: \left(\bar{x}-\bar{y}\right)\pm t_{{1-\alpha /2}}s\sqrt{\left(1/n_x\right)+\left(1/n_y\right)}\text{.} where :math:`t_{{1-\alpha /2}}` is the :math:`100\left(1-\alpha /2\right)` percentage point of the :math:`t`-distribution with (:math:`n_x+n_y-2`) degrees of freedom. (#) It is not assumed that the two variances are equal. If the population variances are not equal the usual two sample :math:`t`-statistic no longer has a :math:`t`-distribution and an approximate test is used. This problem is often referred to as the Behrens--Fisher problem, see Kendall and Stuart (1969). The test used here is based on Satterthwaites procedure. To test the null hypothesis the test statistic :math:`t^{\prime }` is used where .. math:: t_{\mathrm{obs}}^{\prime } = \frac{{\bar{x}-\bar{y}}}{{\mathrm{se}\left(\bar{x}-\bar{y}\right)}} where :math:`\mathrm{se}\left(\bar{x}-\bar{y}\right) = \sqrt{\frac{{s_x^2}}{n_x}+\frac{{s_y^2}}{n_y}}`. A :math:`t`-distribution with :math:`f` degrees of freedom is used to approximate the distribution of :math:`t^{\prime }` where .. math:: f = \frac{{\mathrm{se}\left(\left(\bar{x}-\bar{y}\right)^4\right)}}{{\frac{\left(s_x^2/n_x\right)^2}{\left(n_x-1\right)}+\frac{\left(s_y^2/n_y\right)^2}{\left(n_y-1\right)}}}\text{.} The test of :math:`H_0` is carried out against one of the three alternative hypotheses described above, replacing :math:`t` by :math:`t^{\prime }` and :math:`t_{\mathrm{obs}}` by :math:`t_{\mathrm{obs}}^{\prime }`. Upper and lower :math:`100\left(1-\alpha \right)\%` confidence limits for :math:`\mu_x-\mu_y` are calculated as: .. math:: \left(\bar{x}-\bar{y}\right)\pm t_{{1-\alpha /2}}\mathrm{se}\left(x-\bar{y}\right)\text{.} where :math:`t_{{1-\alpha /2}}` is the :math:`100\left(1-\alpha /2\right)` percentage point of the :math:`t`-distribution with :math:`f` degrees of freedom. .. _g07ca-py2-py-references: **References** Johnson, M G and Kotz, A, 1969, `The Encyclopedia of Statistics` (2), Griffin Kendall, M G and Stuart, A, 1969, `The Advanced Theory of Statistics (Volume 1)`, (3rd Edition), Griffin Snedecor, G W and Cochran, W G, 1967, `Statistical Methods`, Iowa State University Press """ raise NotImplementedError
[docs]def robust_1var_median(x): r""" ``robust_1var_median`` finds the median, median absolute deviation, and a robust estimate of the standard deviation for a set of ungrouped data. .. _g07da-py2-py-doc: For full information please refer to the NAG Library document for g07da https://www.nag.com/numeric/nl/nagdoc_28.5/flhtml/g07/g07daf.html .. _g07da-py2-py-parameters: **Parameters** **x** : float, array-like, shape :math:`\left(n\right)` The vector of observations, :math:`x_1,x_2,\ldots,x_n`. **Returns** **y** : float, ndarray, shape :math:`\left(n\right)` The observations sorted into ascending order. **xme** : float The median, :math:`\theta_{\mathrm{med}}`. **xmd** : float The median absolute deviation, :math:`\sigma_{\mathrm{med}}`. **xsd** : float The robust estimate of the standard deviation, :math:`\sigma_{\mathrm{med}}^{\prime }`. .. _g07da-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n > 1`. .. _g07da-py2-py-notes: **Notes** `In the NAG Library the traditional C interface for this routine uses a different algorithmic base. Please contact NAG if you have any questions about compatibility.` The data consists of a sample of size :math:`n`, denoted by :math:`x_1,x_2,\ldots,x_n`, drawn from a random variable :math:`X`. ``robust_1var_median`` first computes the median, .. math:: \theta_{\mathrm{med}} = \mathrm{med}_i\left\{x_i\right\}\text{,} and from this the median absolute deviation can be computed, .. math:: \sigma_{\mathrm{med}} = \mathrm{med}_i\left\{\left\lvert x_i-\theta_{\mathrm{med}}\right\rvert \right\}\text{.} Finally, a robust estimate of the standard deviation is computed, .. math:: \sigma_{\mathrm{med}}^{\prime } = \sigma_{\mathrm{med}}/\Phi^{-1}\left(0.75\right) where :math:`\Phi^{-1}\left(0.75\right)` is the value of the inverse standard Normal function at the point :math:`0.75`. ``robust_1var_median`` is based upon function LTMDDV within the ROBETH library, see Marazzi (1987). .. _g07da-py2-py-references: **References** Huber, P J, 1981, `Robust Statistics`, Wiley Marazzi, A, 1987, `Subroutines for robust estimation of location and scale in ROBETH`, Cah. Rech. Doc. IUMSP, No. 3 ROB 1, Institut Universitaire de Médecine Sociale et Préventive, Lausanne """ raise NotImplementedError
[docs]def robust_1var_mestim(isigma, x, ipsi, c, h1, h2, h3, dchi, theta, sigma, tol, maxit=50): r""" ``robust_1var_mestim`` computes an :math:`M`-estimate of location with (optional) simultaneous estimation of the scale using Huber's algorithm. .. _g07db-py2-py-doc: For full information please refer to the NAG Library document for g07db https://www.nag.com/numeric/nl/nagdoc_28.5/flhtml/g07/g07dbf.html .. _g07db-py2-py-parameters: **Parameters** **isigma** : int The value assigned to :math:`\mathrm{isigma}` determines whether :math:`\hat{\sigma }` is to be simultaneously estimated. :math:`\mathrm{isigma} = 0` The estimation of :math:`\hat{\sigma }` is bypassed and :math:`\mathrm{sigma}` is set equal to :math:`\sigma_c`. :math:`\mathrm{isigma} = 1` :math:`\hat{\sigma }` is estimated simultaneously. **x** : float, array-like, shape :math:`\left(n\right)` The vector of observations, :math:`x_1,x_2,\ldots,x_n`. **ipsi** : int Which :math:`\psi` function is to be used. :math:`\mathrm{ipsi} = 0` :math:`\psi \left(t\right) = t`. :math:`\mathrm{ipsi} = 1` Huber's function. :math:`\mathrm{ipsi} = 2` Hampel's piecewise linear function. :math:`\mathrm{ipsi} = 3` Andrew's sine wave, :math:`\mathrm{ipsi} = 4` Tukey's bi-weight. **c** : float If :math:`\mathrm{ipsi} = 1`, :math:`\mathrm{c}` must specify the parameter, :math:`c`, of Huber's :math:`\psi` function. :math:`\mathrm{c}` is not referenced if :math:`\mathrm{ipsi}\neq 1`. **h1** : float If :math:`\mathrm{ipsi} = 2`, :math:`\mathrm{h1}`, :math:`\mathrm{h2}` and :math:`\mathrm{h3}` must specify the parameters, :math:`h_1`, :math:`h_2`, and :math:`h_3`, of Hampel's piecewise linear :math:`\psi` function. :math:`\mathrm{h1}`, :math:`\mathrm{h2}` and :math:`\mathrm{h3}` are not referenced if :math:`\mathrm{ipsi}\neq 2`. **h2** : float If :math:`\mathrm{ipsi} = 2`, :math:`\mathrm{h1}`, :math:`\mathrm{h2}` and :math:`\mathrm{h3}` must specify the parameters, :math:`h_1`, :math:`h_2`, and :math:`h_3`, of Hampel's piecewise linear :math:`\psi` function. :math:`\mathrm{h1}`, :math:`\mathrm{h2}` and :math:`\mathrm{h3}` are not referenced if :math:`\mathrm{ipsi}\neq 2`. **h3** : float If :math:`\mathrm{ipsi} = 2`, :math:`\mathrm{h1}`, :math:`\mathrm{h2}` and :math:`\mathrm{h3}` must specify the parameters, :math:`h_1`, :math:`h_2`, and :math:`h_3`, of Hampel's piecewise linear :math:`\psi` function. :math:`\mathrm{h1}`, :math:`\mathrm{h2}` and :math:`\mathrm{h3}` are not referenced if :math:`\mathrm{ipsi}\neq 2`. **dchi** : float :math:`d`, the parameter of the :math:`\chi` function. :math:`\mathrm{dchi}` is not referenced if :math:`\mathrm{ipsi} = 0`. **theta** : float If :math:`\mathrm{sigma} > 0` then :math:`\mathrm{theta}` must be set to the required starting value of the estimation of the location parameter :math:`\hat{\theta }`. A reasonable initial value for :math:`\hat{\theta }` will often be the sample mean or median. **sigma** : float The role of :math:`\mathrm{sigma}` depends on the value assigned to :math:`\mathrm{isigma}`, as follows: if :math:`\mathrm{isigma} = 1`, :math:`\mathrm{sigma}` must be assigned a value which determines the values of the starting points for the calculations of :math:`\hat{\theta }` and :math:`\hat{\sigma }`. If :math:`\mathrm{sigma}\leq 0.0` then ``robust_1var_mestim`` will determine the starting points of :math:`\hat{\theta }` and :math:`\hat{\sigma }`. Otherwise the value assigned to :math:`\mathrm{sigma}` will be taken as the starting point for :math:`\hat{\sigma }`, and :math:`\mathrm{theta}` must be assigned a value before entry, see above; if :math:`\mathrm{isigma} = 0`, :math:`\mathrm{sigma}` must be assigned a value which determines the value of :math:`\sigma_c`, which is held fixed during the iterations, and the starting value for the calculation of :math:`\hat{\theta }`. If :math:`\mathrm{sigma}\leq 0`, ``robust_1var_mestim`` will determine the value of :math:`\sigma_c` as the median absolute deviation adjusted to reduce bias (see :meth:`robust_1var_median`) and the starting point for :math:`\hat{\theta }`. Otherwise, the value assigned to :math:`\mathrm{sigma}` will be taken as the value of :math:`\sigma_c` and :math:`\mathrm{theta}` must be assigned a relevant value before entry, see above. **tol** : float The relative precision for the final estimates. Convergence is assumed when the increments for :math:`\mathrm{theta}`, and :math:`\mathrm{sigma}` are less than :math:`\mathrm{tol}\times \mathrm{max}\left(1.0, \sigma_{{k-1}}\right)`. **maxit** : int, optional The maximum number of iterations that should be used during the estimation. **Returns** **theta** : float The :math:`M`-estimate of the location parameter, :math:`\hat{\theta }`. **sigma** : float Contains the :math:`M`-estimate of the scale parameter, :math:`\hat{\sigma }`, if :math:`\mathrm{isigma}` was assigned the value :math:`1` on entry, otherwise :math:`\mathrm{sigma}` will contain the initial fixed value :math:`\sigma_c`. **rs** : float, ndarray, shape :math:`\left(n\right)` The Winsorized residuals. **nit** : int The number of iterations that were used during the estimation. **wrk** : float, ndarray, shape :math:`\left(n\right)` If :math:`\mathrm{sigma}\leq 0.0` on entry, :math:`\mathrm{wrk}` will contain the :math:`n` observations in ascending order. .. _g07db-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`\mathrm{ipsi} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{ipsi} = 0`, :math:`1`, :math:`2`, :math:`3` or :math:`4`. (`errno` :math:`1`) On entry, :math:`\mathrm{isigma} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{isigma} = 0` or :math:`1`. (`errno` :math:`1`) On entry, :math:`\mathrm{maxit} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{maxit} > 0`. (`errno` :math:`1`) On entry, :math:`\mathrm{tol} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{tol} > 0.0`. (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n > 1`. (`errno` :math:`2`) On entry, :math:`\mathrm{dchi} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{dchi} > 0.0`. (`errno` :math:`2`) On entry, :math:`\mathrm{h1} = \langle\mathit{\boldsymbol{value}}\rangle`, :math:`\mathrm{h2} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\mathrm{h3} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`0\leq \mathrm{h1}\leq \mathrm{h2}\leq \mathrm{h3}` and :math:`\mathrm{h3} > 0.0`. (`errno` :math:`2`) On entry, :math:`\mathrm{c} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{c} > 0.0`. (`errno` :math:`3`) All elements of :math:`\mathrm{x}` are equal. (`errno` :math:`4`) Current estimate of :math:`\mathrm{sigma}` is zero or negative: :math:`\mathrm{sigma} = \langle\mathit{\boldsymbol{value}}\rangle`. (`errno` :math:`5`) Number of iterations required exceeds :math:`\mathrm{maxit}`: :math:`\mathrm{maxit} = \langle\mathit{\boldsymbol{value}}\rangle`. (`errno` :math:`6`) All winsorized residuals are zero. .. _g07db-py2-py-notes: **Notes** `In the NAG Library the traditional C interface for this routine uses a different algorithmic base. Please contact NAG if you have any questions about compatibility.` The data consists of a sample of size :math:`n`, denoted by :math:`x_1,x_2,\ldots,x_n`, drawn from a random variable :math:`X`. The :math:`x_i` are assumed to be independent with an unknown distribution function of the form .. math:: F\left(\left(x_i-\theta \right)/\sigma \right) where :math:`\theta` is a location parameter, and :math:`\sigma` is a scale parameter. :math:`M`-estimators of :math:`\theta` and :math:`\sigma` are given by the solution to the following system of equations: .. math:: \sum_{{i = 1}}^n\psi \left(\left(x_i-\hat{\theta }\right)/\hat{\sigma }\right) = 0 .. math:: \sum_{{i = 1}}^n\chi \left(\left(x_i-\hat{\theta }\right)/\hat{\sigma }\right) = \left(n-1\right)\beta where :math:`\psi` and :math:`\chi` are given functions, and :math:`\beta` is a constant, such that :math:`\hat{\sigma }` is an unbiased estimator when :math:`x_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,n` has a Normal distribution. Optionally, the second equation can be omitted and the first equation is solved for :math:`\hat{\theta }` using an assigned value of :math:`\sigma = \sigma_c`. The values of :math:`\psi \left(\frac{{x_i-\hat{\theta }}}{\hat{\sigma }}\right)\hat{\sigma }` are known as the Winsorized residuals. The following functions are available for :math:`\psi` and :math:`\chi` in ``robust_1var_mestim``: (a) **Null Weights** .. rst-class:: nag-rules-none nag-align-left +-------------------------------------------+ |:math:`\psi \left(t\right) = t` | +-------------------------------------------+ |:math:`\chi \left(t\right) = \frac{t^2}{2}`| +-------------------------------------------+ Use of these null functions leads to the mean and standard deviation of the data. (#) **Huber's Function** .. rst-class:: nag-rules-none nag-align-left +------------------------------------------------------------------------------------------+----------------------------------------+ |:math:`\psi \left(t\right) = \mathrm{max}\left({-c}, \mathrm{min}\left(c, t\right)\right)`| | +------------------------------------------------------------------------------------------+----------------------------------------+ |:math:`\chi \left(t\right) = \frac{\left\lvert t\right\rvert^2}{2}` |:math:`\left\lvert t\right\rvert \leq d`| +------------------------------------------------------------------------------------------+----------------------------------------+ |:math:`\chi \left(t\right) = \frac{d^2}{2}` |:math:`\left\lvert t\right\rvert > d` | +------------------------------------------------------------------------------------------+----------------------------------------+ (#) **Hampel's Piecewise Linear Function** .. rst-class:: nag-rules-none nag-align-left +---------------------------------------------------------------------------------------+----------------------------------------+ |:math:`\psi_{{h_1,h_2,h_3}}\left(t\right) = -\psi_{{h_1,h_2,h_3}}\left(-t\right)` | | +---------------------------------------------------------------------------------------+----------------------------------------+ |:math:`\psi_{{h_1,h_2,h_3}}\left(t\right) = t` |:math:`0\leq t\leq h_1` | +---------------------------------------------------------------------------------------+----------------------------------------+ |:math:`\psi_{{h_1,h_2,h_3}}\left(t\right) = h_1` |:math:`h_1\leq t\leq h_2` | +---------------------------------------------------------------------------------------+----------------------------------------+ |:math:`\psi_{{h_1,h_2,h_3}}\left(t\right) = h_1\left(h_3-t\right)/\left(h_3-h_2\right)`|:math:`h_2\leq t\leq h_3` | +---------------------------------------------------------------------------------------+----------------------------------------+ |:math:`\psi_{{h_1,h_2,h_3}}\left(t\right) = 0` |:math:`t > h_3` | +---------------------------------------------------------------------------------------+----------------------------------------+ |:math:`\chi \left(t\right) = \frac{\left\lvert t\right\rvert^2}{2}` |:math:`\left\lvert t\right\rvert \leq d`| +---------------------------------------------------------------------------------------+----------------------------------------+ |:math:`\chi \left(t\right) = \frac{d^2}{2}` |:math:`\left\lvert t\right\rvert > d` | +---------------------------------------------------------------------------------------+----------------------------------------+ (#) **Andrew's Sine Wave Function** .. rst-class:: nag-rules-none nag-align-left +-------------------------------------------------------------------+----------------------------------------+ |:math:`\psi \left(t\right) = \sin\left(t\right)` |:math:`{-\pi }\leq t\leq \pi` | +-------------------------------------------------------------------+----------------------------------------+ |:math:`\psi \left(t\right) = 0` |otherwise | +-------------------------------------------------------------------+----------------------------------------+ |:math:`\chi \left(t\right) = \frac{\left\lvert t\right\rvert^2}{2}`|:math:`\left\lvert t\right\rvert \leq d`| +-------------------------------------------------------------------+----------------------------------------+ |:math:`\chi \left(t\right) = \frac{d^2}{2}` |:math:`\left\lvert t\right\rvert > d` | +-------------------------------------------------------------------+----------------------------------------+ (#) **Tukey's Bi-weight** .. rst-class:: nag-rules-none nag-align-left +-------------------------------------------------------------------+----------------------------------------+ |:math:`\psi \left(t\right) = t\left(1-t^2\right)^2` |:math:`\left\lvert t\right\rvert \leq 1`| +-------------------------------------------------------------------+----------------------------------------+ |:math:`\psi \left(t\right) = 0` |otherwise | +-------------------------------------------------------------------+----------------------------------------+ |:math:`\chi \left(t\right) = \frac{\left\lvert t\right\rvert^2}{2}`|:math:`\left\lvert t\right\rvert \leq d`| +-------------------------------------------------------------------+----------------------------------------+ |:math:`\chi \left(t\right) = \frac{d^2}{2}` |:math:`\left\lvert t\right\rvert > d` | +-------------------------------------------------------------------+----------------------------------------+ where :math:`c`, :math:`h_1`, :math:`h_2`, :math:`h_3` and :math:`d` are constants. Equations `(1) <https://www.nag.com/numeric/nl/nagdoc_28.5/flhtml/g07/g07dbf.html#eqn1>`__ and `(2) <https://www.nag.com/numeric/nl/nagdoc_28.5/flhtml/g07/g07dbf.html#eqn1>`__ are solved by a simple iterative procedure suggested by Huber: .. math:: \hat{\sigma }_k = \sqrt{\frac{1}{{\beta \left(n-1\right)}}\left(\sum_{{i = 1}}^n\chi \left(\frac{{x_i-\hat{\theta }_{{k-1}}}}{\hat{\sigma }_{{k-1}}}\right)\right)\hat{\sigma }_{{k-1}}^2} and .. math:: \hat{\theta }_k = \hat{\theta }_{{k-1}}+\frac{1}{n}\sum_{{i = 1}}^n\psi \left(\frac{{x_i-\hat{\theta }_{{k-1}}}}{\hat{\sigma }_k}\right)\hat{\sigma }_k or .. math:: \hat{\sigma }_k = \sigma_c\text{, if }\quad \sigma \quad \text{ is fixed.} The initial values for :math:`\hat{\theta }` and :math:`\hat{\sigma }` may either be user-supplied or calculated within ``robust_1var_mestim`` as the sample median and an estimate of :math:`\sigma` based on the median absolute deviation respectively. ``robust_1var_mestim`` is based upon function LYHALG within the ROBETH library, see Marazzi (1987). .. _g07db-py2-py-references: **References** Hampel, F R, Ronchetti, E M, Rousseeuw, P J and Stahel, W A, 1986, `Robust Statistics. The Approach Based on Influence Functions`, Wiley Huber, P J, 1981, `Robust Statistics`, Wiley Marazzi, A, 1987, `Subroutines for robust estimation of location and scale in ROBETH`, Cah. Rech. Doc. IUMSP, No. 3 ROB 1, Institut Universitaire de Médecine Sociale et Préventive, Lausanne """ raise NotImplementedError
[docs]def robust_1var_mestim_wgt(chi, psi, isigma, x, beta, theta, sigma, tol, maxit=50, data=None): r""" ``robust_1var_mestim_wgt`` computes an :math:`M`-estimate of location with (optional) simultaneous estimation of scale, where you provide the weight functions. .. _g07dc-py2-py-doc: For full information please refer to the NAG Library document for g07dc https://www.nag.com/numeric/nl/nagdoc_28.5/flhtml/g07/g07dcf.html .. _g07dc-py2-py-parameters: **Parameters** **chi** : callable retval = chi(t, data=None) :math:`\mathrm{chi}` must return the value of the weight function :math:`\chi` for a given value of its argument. The value of :math:`\chi` must be non-negative. **Parameters** **t** : float The argument for which :math:`\mathrm{chi}` must be evaluated. **data** : arbitrary, optional, modifiable in place User-communication data for callback functions. **Returns** **retval** : float The value of the weight function :math:`\phi` evaluated at :math:`\mathrm{t}`. **psi** : callable retval = psi(t, data=None) :math:`\mathrm{psi}` must return the value of the weight function :math:`\psi` for a given value of its argument. **Parameters** **t** : float The argument for which :math:`\mathrm{psi}` must be evaluated. **data** : arbitrary, optional, modifiable in place User-communication data for callback functions. **Returns** **retval** : float The value of the weight function :math:`\psi` evaluated at :math:`\mathrm{t}`. **isigma** : int The value assigned to :math:`\mathrm{isigma}` determines whether :math:`\hat{\sigma }` is to be simultaneously estimated. :math:`\mathrm{isigma} = 0` The estimation of :math:`\hat{\sigma }` is bypassed and :math:`\mathrm{sigma}` is set equal to :math:`\sigma_c`. :math:`\mathrm{isigma} = 1` :math:`\hat{\sigma }` is estimated simultaneously. **x** : float, array-like, shape :math:`\left(n\right)` The vector of observations, :math:`x_1,x_2,\ldots,x_n`. **beta** : float The value of the constant :math:`\beta` of the chosen :math:`\mathrm{chi}` function. **theta** : float If :math:`\mathrm{sigma} > 0`, :math:`\mathrm{theta}` must be set to the required starting value of the estimate of the location parameter :math:`\hat{\theta }`. A reasonable initial value for :math:`\hat{\theta }` will often be the sample mean or median. **sigma** : float The role of :math:`\mathrm{sigma}` depends on the value assigned to :math:`\mathrm{isigma}` as follows. If :math:`\mathrm{isigma} = 1`, :math:`\mathrm{sigma}` must be assigned a value which determines the values of the starting points for the calculation of :math:`\hat{\theta }` and :math:`\hat{\sigma }`. If :math:`\mathrm{sigma}\leq 0.0`, ``robust_1var_mestim_wgt`` will determine the starting points of :math:`\hat{\theta }` and :math:`\hat{\sigma }`. Otherwise, the value assigned to :math:`\mathrm{sigma}` will be taken as the starting point for :math:`\hat{\sigma }`, and :math:`\mathrm{theta}` must be assigned a relevant value before entry, see above. If :math:`\mathrm{isigma} = 0`, :math:`\mathrm{sigma}` must be assigned a value which determines the values of :math:`\sigma_c`, which is held fixed during the iterations, and the starting value for the calculation of :math:`\hat{\theta }`. If :math:`\mathrm{sigma}\leq 0`, ``robust_1var_mestim_wgt`` will determine the value of :math:`\sigma_c` as the median absolute deviation adjusted to reduce bias (see :meth:`robust_1var_median`) and the starting point for :math:`\theta`. Otherwise, the value assigned to :math:`\mathrm{sigma}` will be taken as the value of :math:`\sigma_c` and :math:`\mathrm{theta}` must be assigned a relevant value before entry, see above. **tol** : float The relative precision for the final estimates. Convergence is assumed when the increments for :math:`\mathrm{theta}`, and :math:`\mathrm{sigma}` are less than :math:`\mathrm{tol}\times \mathrm{max}\left(1.0, \sigma_{{k-1}}\right)`. **maxit** : int, optional The maximum number of iterations that should be used during the estimation. **data** : arbitrary, optional User-communication data for callback functions. **Returns** **theta** : float The :math:`M`-estimate of the location parameter :math:`\hat{\theta }`. **sigma** : float The :math:`M`-estimate of the scale parameter :math:`\hat{\sigma }`, if :math:`\mathrm{isigma}` was assigned the value :math:`1` on entry, otherwise :math:`\mathrm{sigma}` will contain the initial fixed value :math:`\sigma_c`. **rs** : float, ndarray, shape :math:`\left(n\right)` The Winsorized residuals. **nit** : int The number of iterations that were used during the estimation. **wrk** : float, ndarray, shape :math:`\left(n\right)` If :math:`\mathrm{sigma}\leq 0.0` on entry, :math:`\mathrm{wrk}` will contain the :math:`n` observations in ascending order. .. _g07dc-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`\mathrm{isigma} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{isigma} = 0` or :math:`1`. (`errno` :math:`1`) On entry, :math:`\mathrm{maxit} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{maxit} > 0`. (`errno` :math:`1`) On entry, :math:`\mathrm{tol} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{tol} > 0.0`. (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n > 1`. (`errno` :math:`2`) On entry, :math:`\mathrm{beta} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{beta} > 0.0`. (`errno` :math:`3`) All elements of :math:`\mathrm{x}` are equal. (`errno` :math:`4`) Current estimate of :math:`\mathrm{sigma}` is zero or negative: :math:`\mathrm{sigma} = \langle\mathit{\boldsymbol{value}}\rangle`. (`errno` :math:`5`) Number of iterations required exceeds :math:`\mathrm{maxit}`: :math:`\mathrm{maxit} = \langle\mathit{\boldsymbol{value}}\rangle`. (`errno` :math:`6`) All winsorized residuals are zero. (`errno` :math:`7`) The :math:`\mathrm{chi}` function returned a negative value: :math:`\mathrm{chi} = \langle\mathit{\boldsymbol{value}}\rangle`. .. _g07dc-py2-py-notes: **Notes** The data consists of a sample of size :math:`n`, denoted by :math:`x_1,x_2,\ldots,x_n`, drawn from a random variable :math:`X`. The :math:`x_i` are assumed to be independent with an unknown distribution function of the form, .. math:: F\left(\left(x_i-\theta \right)/\sigma \right) where :math:`\theta` is a location parameter, and :math:`\sigma` is a scale parameter. :math:`M`-estimators of :math:`\theta` and :math:`\sigma` are given by the solution to the following system of equations; .. math:: \begin{array}{lcl}\sum_{{i = 1}}^n\psi \left(\left(x_i-\hat{\theta }\right)/\hat{\sigma }\right)& = &0\\&&\\&&\\\sum_{{i = 1}}^n\chi \left(\left(x_i-\hat{\theta }\right)/\hat{\sigma }\right)& = &\left(n-1\right)\beta \end{array} where :math:`\psi` and :math:`\chi` are user-supplied weight functions, and :math:`\beta` is a constant. Optionally the second equation can be omitted and the first equation is solved for :math:`\hat{\theta }` using an assigned value of :math:`\sigma = \sigma_c`. The constant :math:`\beta` should be chosen so that :math:`\hat{\sigma }` is an unbiased estimator when :math:`x_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,n` has a Normal distribution. To achieve this the value of :math:`\beta` is calculated as: .. math:: \beta = E\left(\chi \right) = \int_{{-\infty }}^{\infty }\chi \left(z\right)\frac{1}{\sqrt{2\pi }}\mathrm{exp}\left\{\frac{{-z^2}}{2}\right\}{dz} The values of :math:`\psi \left(\frac{{x_i-\hat{\theta }}}{\hat{\sigma }}\right)\hat{\sigma }` are known as the Winsorized residuals. The equations are solved by a simple iterative procedure, suggested by Huber: .. math:: \hat{\sigma }_k = \sqrt{\frac{1}{{\beta \left(n-1\right)}}\left(\sum_{{i = 1}}^n\chi \left(\frac{{x_i-\hat{\theta }_{{k-1}}}}{\hat{\sigma }_{{k-1}}}\right)\right)\hat{\sigma }_{{k-1}}^2} and .. math:: \hat{\theta }_k = \hat{\theta }_{{k-1}}+\frac{1}{n}\sum_{{i = 1}}^n\psi \left(\frac{{x_i-\hat{\theta }_{{k-1}}}}{\hat{\sigma }_k}\right)\hat{\sigma }_k or .. math:: \hat{\sigma }_k = \sigma_c if :math:`\sigma` is fixed. The initial values for :math:`\hat{\theta }` and :math:`\hat{\sigma }` may be user-supplied or calculated within :meth:`robust_1var_mestim` as the sample median and an estimate of :math:`\sigma` based on the median absolute deviation respectively. ``robust_1var_mestim_wgt`` is based upon function LYHALG within the ROBETH library, see Marazzi (1987). .. _g07dc-py2-py-references: **References** Hampel, F R, Ronchetti, E M, Rousseeuw, P J and Stahel, W A, 1986, `Robust Statistics. The Approach Based on Influence Functions`, Wiley Huber, P J, 1981, `Robust Statistics`, Wiley Marazzi, A, 1987, `Subroutines for robust estimation of location and scale in ROBETH`, Cah. Rech. Doc. IUMSP, No. 3 ROB 1, Institut Universitaire de Médecine Sociale et Préventive, Lausanne """ raise NotImplementedError
[docs]def robust_1var_trimmed(x, alpha): r""" ``robust_1var_trimmed`` calculates the trimmed and Winsorized means of a sample and estimates of the variances of the two means. .. _g07dd-py2-py-doc: For full information please refer to the NAG Library document for g07dd https://www.nag.com/numeric/nl/nagdoc_28.5/flhtml/g07/g07ddf.html .. _g07dd-py2-py-parameters: **Parameters** **x** : float, array-like, shape :math:`\left(n\right)` The sample observations, :math:`x_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,n`. **alpha** : float :math:`\alpha`, the proportion of observations to be trimmed at each end of the sorted sample. **Returns** **tmean** : float The :math:`\alpha`-trimmed mean, :math:`\bar{x}_t`. **wmean** : float The :math:`\alpha`-Winsorized mean, :math:`\bar{x}_w`. **tvar** : float Contains an estimate of the variance of the trimmed mean. **wvar** : float Contains an estimate of the variance of the Winsorized mean. **k** : int Contains the number of observations trimmed at each end, :math:`k`. **sx** : float, ndarray, shape :math:`\left(n\right)` Contains the sample observations sorted into ascending order. .. _g07dd-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n \leq 1`. (`errno` :math:`2`) On entry, :math:`\mathrm{alpha} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`0.0\leq \mathrm{alpha} < 0.5`. .. _g07dd-py2-py-notes: **Notes** `In the NAG Library the traditional C interface for this routine uses a different algorithmic base. Please contact NAG if you have any questions about compatibility.` ``robust_1var_trimmed`` calculates the :math:`\alpha`-trimmed mean and :math:`\alpha`-Winsorized mean for a given :math:`\alpha`, as described below. Let :math:`x_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,n` represent the :math:`n` sample observations sorted into ascending order. Let :math:`k = \left[\alpha n\right]` where :math:`\left[y\right]` represents the integer nearest to :math:`y`; if :math:`2k = n` then :math:`k` is reduced by :math:`1`. Then the trimmed mean is defined as: .. math:: \bar{x}_t = \frac{1}{{n-2k}}\sum_{{i = k+1}}^{{n-k}}x_i\text{,} and the Winsorized mean is defined as: .. math:: \bar{x}_w = \frac{1}{n}\left(\sum_{{i = k+1}}^{{n-k}}x_i+kx_{{k+1}}+kx_{{n-k}}\right)\text{.} ``robust_1var_trimmed`` then calculates the Winsorized variance about the trimmed and Winsorized means respectively and divides by :math:`n` to obtain estimates of the variances of the above two means. Thus we have; .. math:: \text{Estimate of }\mathrm{var}\left(\bar{x}_t\right) = \frac{1}{n^2}\left(\sum_{{i = k+1}}^{{n-k}}\left(x_i-\bar{x}_t\right)^2+k\left(x_{{k+1}}-\bar{x}_t\right)^2+k\left(x_{{n-k}}-\bar{x}_t\right)^2\right) and .. math:: \text{Estimate of }\mathrm{var}\left(\bar{x}_w\right) = \frac{1}{n^2}\left(\sum_{{i = k+1}}^{{n-k}}\left(x_i-\bar{x}_w\right)^2+k\left(x_{{k+1}}-\bar{x}_w\right)^2+k\left(x_{{n-k}}-\bar{x}_w\right)^2\right)\text{.} .. _g07dd-py2-py-references: **References** Hampel, F R, Ronchetti, E M, Rousseeuw, P J and Stahel, W A, 1986, `Robust Statistics. The Approach Based on Influence Functions`, Wiley Huber, P J, 1981, `Robust Statistics`, Wiley """ raise NotImplementedError
[docs]def robust_1var_ci(method, x, clevel): r""" ``robust_1var_ci`` computes a rank based (nonparametric) estimate and confidence interval for the location parameter of a single population. .. _g07ea-py2-py-doc: For full information please refer to the NAG Library document for g07ea https://www.nag.com/numeric/nl/nagdoc_28.5/flhtml/g07/g07eaf.html .. _g07ea-py2-py-parameters: **Parameters** **method** : str, length 1 Specifies the method to be used. :math:`\mathrm{method} = \texttt{'E'}` The exact algorithm is used. :math:`\mathrm{method} = \texttt{'A'}` The iterative algorithm is used. **x** : float, array-like, shape :math:`\left(n\right)` The sample observations, :math:`x_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,n`. **clevel** : float The confidence interval desired. For example, for a :math:`95\%` confidence interval set :math:`\mathrm{clevel} = 0.95`. **Returns** **theta** : float The estimate of the location, :math:`\hat{\theta }`. **thetal** : float The estimate of the lower limit of the confidence interval, :math:`\theta_l`. **thetau** : float The estimate of the upper limit of the confidence interval, :math:`\theta_u`. **estcl** : float An estimate of the actual percentage confidence of the interval found, as a proportion between :math:`\left(0.0, 1.0\right)`. **wlower** : float The upper value of the Wilcoxon test statistic, :math:`W_u`, corresponding to the lower limit of the confidence interval. **wupper** : float The lower value of the Wilcoxon test statistic, :math:`W_l`, corresponding to the upper limit of the confidence interval. .. _g07ea-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`\mathrm{clevel} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`0.0 < \mathrm{clevel} < 1.0`. (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n \geq 2`. (`errno` :math:`1`) On entry, :math:`\mathrm{method} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{method} = \texttt{'E'}` or :math:`\texttt{'A'}`. (`errno` :math:`2`) Not enough information to compute an interval estimate since the whole sample is identical. The common value is returned in :math:`\mathrm{theta}`, :math:`\mathrm{thetal}` and :math:`\mathrm{thetau}`. (`errno` :math:`3`) The iterative procedure used to estimate :math:`\theta` has not converged. (`errno` :math:`3`) The iterative procedure used to estimate, :math:`\theta_u`, the upper confidence limit has not converged. (`errno` :math:`3`) The iterative procedure used to estimate, :math:`\theta_l`, the lower confidence limit has not converged. .. _g07ea-py2-py-notes: **Notes** Consider a vector of independent observations, :math:`x = \left(x_1,x_2,\ldots,x_n\right)^\mathrm{T}` with unknown common symmetric density :math:`f\left(x_i-\theta \right)`. ``robust_1var_ci`` computes the Hodges--Lehmann location estimator (see Lehmann (1975)) of the centre of symmetry :math:`\theta`, together with an associated confidence interval. The Hodges--Lehmann estimate is defined as .. math:: \hat{\theta } = \mathrm{median}\left\{\frac{{x_i+x_j}}{2},1\leq i\leq j\leq n\right\}\text{.} Let :math:`m = \left(n\left(n+1\right)\right)/2` and let :math:`a_{\textit{k}}`, for :math:`\textit{k} = 1,2,\ldots,m` denote the :math:`m` ordered averages :math:`\left(x_i+x_j\right)/2` for :math:`1\leq i\leq j\leq n`. Then if :math:`m` is odd, :math:`\hat{\theta } = a_k` where :math:`k = \left(m+1\right)/2`; if :math:`m` is even, :math:`\hat{\theta } = \left(a_k+a_{{k+1}}\right)/2` where :math:`k = m/2`. This estimator arises from inverting the one-sample Wilcoxon signed-rank test statistic, :math:`W\left(x-\theta_0\right)`, for testing the hypothesis that :math:`\theta = \theta_0`. Effectively :math:`W\left(x-\theta_0\right)` is a monotonically decreasing step function of :math:`\theta_0` with .. math:: \begin{array}{c}\text{mean }\left(W\right) = \mu = \frac{{n\left(n+1\right)}}{4}\text{,}\\\\\\\mathrm{var}\left(W\right) = \sigma^2 = \frac{{n\left(n+1\right)\left(2n+1\right)}}{24}\text{.}\end{array} The estimate :math:`\hat{\theta }` is the solution to the equation :math:`W\left(x-\hat{\theta }\right) = \mu`; two methods are available for solving this equation. These methods avoid the computation of all the ordered averages :math:`a_k`; this is because for large :math:`n` both the storage requirements and the computation time would be excessive. The first is an exact method based on a set partitioning procedure on the set of all ordered averages :math:`\left(x_i+x_j\right)/2` for :math:`i\leq j`. This is based on the algorithm proposed by Monahan (1984). The second is an iterative algorithm, based on the Illinois method which is a modification of the `regula falsi` method, see McKean and Ryan (1977). This algorithm has proved suitable for the function :math:`W\left(x-\theta_0\right)` which is asymptotically linear as a function of :math:`\theta_0`. The confidence interval limits are also based on the inversion of the Wilcoxon test statistic. Given a desired percentage for the confidence interval, :math:`1-\alpha`, expressed as a proportion between :math:`0` and :math:`1`, initial estimates for the lower and upper confidence limits of the Wilcoxon statistic are found from .. math:: W_l = \mu -0.5+\left(\sigma \Phi^{-1}\left(\alpha /2\right)\right) and .. math:: W_u = \mu +0.5+\left(\sigma \Phi^{-1}\left(1-\alpha /2\right)\right)\text{,} where :math:`\Phi^{-1}` is the inverse cumulative Normal distribution function. :math:`W_l` and :math:`W_u` are rounded to the nearest integer values. These estimates are then refined using an exact method if :math:`n\leq 80`, and a Normal approximation otherwise, to find :math:`W_l` and :math:`W_u` satisfying .. math:: \begin{array}{l}P\left(W\leq W_l\right)\leq \alpha /2\\P\left(W\leq W_l+1\right) > \alpha /2\end{array} and .. math:: \begin{array}{l}P\left(W\geq W_u\right)\leq \alpha /2\\P\left(W\geq W_u-1\right) > \alpha /2\text{.}\end{array} Let :math:`W_u = m-k`; then :math:`\theta_l = a_{{k+1}}`. This is the largest value :math:`\theta_l` such that :math:`W\left(x-\theta_l\right) = W_u`. Let :math:`W_l = k`; then :math:`\theta_u = a_{{m-k}}`. This is the smallest value :math:`\theta_u` such that :math:`W\left(x-\theta_u\right) = W_l`. As in the case of :math:`\hat{\theta }`, these equations may be solved using either the exact or the iterative methods to find the values :math:`\theta_l` and :math:`\theta_u`. Then :math:`\left(\theta_l, \theta_u\right)` is the confidence interval for :math:`\theta`. The confidence interval is thus defined by those values of :math:`\theta_0` such that the null hypothesis, :math:`\theta = \theta_0`, is not rejected by the Wilcoxon signed-rank test at the :math:`\left(100\times \alpha \right)\%` level. .. _g07ea-py2-py-references: **References** Lehmann, E L, 1975, `Nonparametrics: Statistical Methods Based on Ranks`, Holden--Day Marazzi, A, 1987, `Subroutines for robust estimation of location and scale in ROBETH`, Cah. Rech. Doc. IUMSP, No. 3 ROB 1, Institut Universitaire de Médecine Sociale et Préventive, Lausanne McKean, J W and Ryan, T A, 1977, `Algorithm 516: An algorithm for obtaining confidence intervals and point estimates based on ranks in the two-sample location problem`, ACM Trans. Math. Software (10), 183--185 Monahan, J F, 1984, `Algorithm 616: Fast computation of the Hodges--Lehman location estimator`, ACM Trans. Math. Software (10), 265--270 """ raise NotImplementedError
[docs]def robust_2var_ci(method, x, y, clevel): r""" ``robust_2var_ci`` calculates a rank based (nonparametric) estimate and confidence interval for the difference in location between two independent populations. .. _g07eb-py2-py-doc: For full information please refer to the NAG Library document for g07eb https://www.nag.com/numeric/nl/nagdoc_28.5/flhtml/g07/g07ebf.html .. _g07eb-py2-py-parameters: **Parameters** **method** : str, length 1 Specifies the method to be used. :math:`\mathrm{method} = \texttt{'E'}` The exact algorithm is used. :math:`\mathrm{method} = \texttt{'A'}` The iterative algorithm is used. **x** : float, array-like, shape :math:`\left(n\right)` The observations of the first sample, :math:`x_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,n`. **y** : float, array-like, shape :math:`\left(m\right)` The observations of the second sample, :math:`y_{\textit{j}}`, for :math:`\textit{j} = 1,2,\ldots,m`. **clevel** : float The confidence interval required, :math:`1-\alpha`; e.g., for a :math:`95\%` confidence interval set :math:`\mathrm{clevel} = 0.95`. **Returns** **theta** : float The estimate of the difference in the location of the two populations, :math:`\hat{\theta }`. **thetal** : float The estimate of the lower limit of the confidence interval, :math:`\theta_l`. **thetau** : float The estimate of the upper limit of the confidence interval, :math:`\theta_u`. **estcl** : float An estimate of the actual percentage confidence of the interval found, as a proportion between :math:`\left(0.0, 1.0\right)`. **ulower** : float The value of the Mann--Whitney :math:`U` statistic corresponding to the lower confidence limit, :math:`U_l`. **uupper** : float The value of the Mann--Whitney :math:`U` statistic corresponding to the upper confidence limit, :math:`U_u`. .. _g07eb-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`m = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`m\geq 1`. (`errno` :math:`1`) On entry, :math:`\mathrm{clevel} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`0.0 < \mathrm{clevel} < 1.0`. (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n\geq 1`. (`errno` :math:`1`) On entry, :math:`\mathrm{method} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{method} = \texttt{'E'}` or :math:`\texttt{'A'}`. **Warns** **NagAlgorithmicWarning** (`errno` :math:`2`) Not enough information to compute an interval estimate since each sample has identical values. The common difference is returned in :math:`\mathrm{theta}`, :math:`\mathrm{thetal}` and :math:`\mathrm{thetau}`. (`errno` :math:`3`) The iterative procedure used to estimate :math:`\theta` has not converged. (`errno` :math:`3`) The iterative procedure used to estimate, :math:`\theta_u`, the upper confidence limit has not converged. (`errno` :math:`3`) The iterative procedure used to estimate, :math:`\theta_l`, the lower confidence limit has not converged. .. _g07eb-py2-py-notes: **Notes** Consider two random samples from two populations which have the same continuous distribution except for a shift in the location. Let the random sample, :math:`x = \left(x_1,x_2,\ldots,x_n\right)^\mathrm{T}`, have distribution :math:`F\left(x\right)` and the random sample, :math:`y = \left(y_1,y_2,\ldots,y_m\right)^\mathrm{T}`, have distribution :math:`F\left(x-\theta \right)`. ``robust_2var_ci`` finds a point estimate, :math:`\hat{\theta }`, of the difference in location :math:`\theta` together with an associated confidence interval. The estimates are based on the ordered differences :math:`y_j-x_i`. The estimate :math:`\hat{\theta }` is defined by .. math:: \hat{\theta } = \mathrm{median}\left\{y_j-x_i\text{, }\quad i = 1,2,\ldots,n\text{;}j = 1,2,\ldots,m\right\}\text{.} Let :math:`d_{\textit{k}}`, for :math:`\textit{k} = 1,2,\ldots,nm`, denote the :math:`nm` (ascendingly) ordered differences :math:`y_{\textit{j}}-x_{\textit{i}}`, for :math:`\textit{j} = 1,2,\ldots,m`, for :math:`\textit{i} = 1,2,\ldots,n`. Then if :math:`nm` is odd, :math:`\hat{\theta } = d_k` where :math:`k = \left(nm-1\right)/2`; if :math:`nm` is even, :math:`\hat{\theta } = \left(d_k+d_{{k+1}}\right)/2` where :math:`k = nm/2`. This estimator arises from inverting the two sample Mann--Whitney rank test statistic, :math:`U\left(\theta_0\right)`, for testing the hypothesis that :math:`\theta = \theta_0`. Thus :math:`U\left(\theta_0\right)` is the value of the Mann--Whitney :math:`U` statistic for the two independent samples :math:`\left\{\left(x_i+\theta_0\right)\text{, for }i = 1,2,\ldots,n\right\}` and :math:`\left\{y_j\text{, for }j = 1,2,\ldots,m\right\}`. Effectively :math:`U\left(\theta_0\right)` is a monotonically increasing step function of :math:`\theta_0` with .. math:: \begin{array}{c}\text{mean }\left(U\right) = \mu = \frac{{nm}}{2}\text{,}\\\\\mathrm{var}\left(U\right) = \sigma^2\frac{{nm\left(n+m+1\right)}}{12}\text{.}\end{array} The estimate :math:`\hat{\theta }` is the solution to the equation :math:`U\left(\hat{\theta }\right) = \mu`; two methods are available for solving this equation. These methods avoid the computation of all the ordered differences :math:`d_k`; this is because for large :math:`n` and :math:`m` both the storage requirements and the computation time would be high. The first is an exact method based on a set partitioning procedure on the set of all differences :math:`y_{\textit{j}}-x_{\textit{i}}`, for :math:`\textit{j} = 1,2,\ldots,m`, for :math:`\textit{i} = 1,2,\ldots,n`. This is adapted from the algorithm proposed by Monahan (1984) for the computation of the Hodges--Lehmann estimator for a single population. The second is an iterative algorithm, based on the Illinois method which is a modification of the `regula falsi` method, see McKean and Ryan (1977). This algorithm has proved suitable for the function :math:`U\left(\theta_0\right)` which is asymptotically linear as a function of :math:`\theta_0`. The confidence interval limits are also based on the inversion of the Mann--Whitney test statistic. Given a desired percentage for the confidence interval, :math:`1-\alpha`, expressed as a proportion between :math:`0.0` and :math:`1.0` initial estimates of the upper and lower confidence limits for the Mann--Whitney :math:`U` statistic are found; .. math:: \begin{array}{c}U_l = \mu -0.5+\left(\sigma \times \Phi^{-1}\left(\alpha /2\right)\right)\\\\U_u = \mu +0.5+\left(\sigma \times \Phi^{-1}\left(\left(1-\alpha \right)/2\right)\right)\end{array} where :math:`\Phi^{-1}` is the inverse cumulative Normal distribution function. :math:`U_l` and :math:`U_u` are rounded to the nearest integer values. These estimates are refined using an exact method, without taking ties into account, if :math:`n+m\leq 40` and :math:`\mathrm{max}\left(n, m\right)\leq 30` and a Normal approximation otherwise, to find :math:`U_l` and :math:`U_u` satisfying .. math:: \begin{array}{l}P\left(U\leq U_l\right)\leq \alpha /2\\P\left(U\leq U_l+1\right) > \alpha /2\end{array} and .. math:: \begin{array}{l}P\left(U\geq U_u\right)\leq \alpha /2\\P\left(U\geq U_u-1\right) > \alpha /2\text{.}\end{array} The function :math:`U\left(\theta_0\right)` is a monotonically increasing step function. It is the number of times a score in the second sample, :math:`y_j`, precedes a score in the first sample, :math:`x_i+\theta`, where we only count a half if a score in the second sample actually equals a score in the first. Let :math:`U_l = k`; then :math:`\theta_l = d_{{k+1}}`. This is the largest value :math:`\theta_l` such that :math:`U\left(\theta_l\right) = U_l`. Let :math:`U_u = nm-k`; then :math:`\theta_u = d_{{nm-k}}`. This is the smallest value :math:`\theta_u` such that :math:`U\left(\theta_u\right) = U_u`. As in the case of :math:`\hat{\theta }`, these equations may be solved using either the exact or iterative methods to find the values :math:`\theta_l` and :math:`\theta_u`. Then :math:`\left(\theta_l, \theta_u\right)` is the confidence interval for :math:`\theta`. The confidence interval is thus defined by those values of :math:`\theta_0` such that the null hypothesis, :math:`\theta = \theta_0`, is not rejected by the Mann--Whitney two sample rank test at the :math:`\left(100\times \alpha \right)\%` level. .. _g07eb-py2-py-references: **References** Lehmann, E L, 1975, `Nonparametrics: Statistical Methods Based on Ranks`, Holden--Day McKean, J W and Ryan, T A, 1977, `Algorithm 516: An algorithm for obtaining confidence intervals and point estimates based on ranks in the two-sample location problem`, ACM Trans. Math. Software (10), 183--185 Monahan, J F, 1984, `Algorithm 616: Fast computation of the Hodges--Lehman location estimator`, ACM Trans. Math. Software (10), 265--270 """ raise NotImplementedError
[docs]def outlier_peirce_1var(p, y, ldiff, mean=0.0, var=0.0): r""" ``outlier_peirce_1var`` identifies outlying values using Peirce's criterion. .. _g07ga-py2-py-doc: For full information please refer to the NAG Library document for g07ga https://www.nag.com/numeric/nl/nagdoc_28.5/flhtml/g07/g07gaf.html .. _g07ga-py2-py-parameters: **Parameters** **p** : int :math:`p`, the number of parameters in the model used in obtaining the :math:`y`. If :math:`y` is an observed set of values, as opposed to the residuals from fitting a model with :math:`p` parameters, then :math:`p` should be set to :math:`1`, i.e., as if a model just containing the mean had been used. **y** : float, array-like, shape :math:`\left(n\right)` :math:`y`, the data being tested. **ldiff** : int The maximum number of values to be returned in arrays :math:`\mathrm{diff}` and :math:`\mathrm{llamb}`. If :math:`\mathrm{ldiff}\leq 0`, arrays :math:`\mathrm{diff}` and :math:`\mathrm{llamb}` are not referenced. **mean** : float, optional If :math:`\mathrm{var} > 0.0`, :math:`\mathrm{mean}` must contain :math:`\mu`, the mean of :math:`y`, otherwise :math:`\mathrm{mean}` is not referenced and the mean is calculated from the data supplied in :math:`\mathrm{y}`. **var** : float, optional If :math:`\mathrm{var} > 0.0`, :math:`\mathrm{var}` must contain :math:`\sigma^2`, the variance of :math:`y`, otherwise the variance is calculated from the data supplied in :math:`\mathrm{y}`. **Returns** **iout** : int, ndarray, shape :math:`\left(n\right)` The indices of the values in :math:`\mathrm{y}` sorted in descending order of the absolute difference from the mean, therefore, :math:`\left\lvert \mathrm{y}[ \mathrm{iout}[\textit{i}-2] -1]-\mu \right\rvert \geq \left\lvert \mathrm{y}[ \mathrm{iout}[\textit{i}-1] -1]-\mu \right\rvert`, for :math:`\textit{i} = 2,3,\ldots,n`. **niout** : int The number of potential outliers. The indices for these potential outliers are held in the first :math:`\mathrm{niout}` elements of :math:`\mathrm{iout}`. By construction there can be at most :math:`n-\mathrm{p}-1` values flagged as outliers. **diff** : float, ndarray, shape :math:`\left(\mathrm{ldiff}\right)` :math:`\mathrm{diff}[\textit{i}-1]` holds :math:`\left\lvert y-\mu \right\rvert -\sigma^2z` for observation :math:`\mathrm{y}[\mathrm{iout}[\textit{i}-1]-1]`, for :math:`\textit{i} = 1,2,\ldots,\mathrm{min}\left(\mathrm{ldiff}, {\mathrm{niout}+1}, {n-\mathrm{p}-1}\right)`. **llamb** : float, ndarray, shape :math:`\left(\mathrm{ldiff}\right)` :math:`\mathrm{llamb}[\textit{i}-1]` holds :math:`\log\left(\lambda^2\right)` for observation :math:`\mathrm{y}[\mathrm{iout}[\textit{i}-1]-1]`, for :math:`\textit{i} = 1,2,\ldots,\mathrm{min}\left(\mathrm{ldiff}, {\mathrm{niout}+1}, {n-\mathrm{p}-1}\right)`. .. _g07ga-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n\geq 3`. (`errno` :math:`2`) On entry, :math:`\mathrm{p} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`1\leq \mathrm{p}\leq n-2`. .. _g07ga-py2-py-notes: **Notes** ``outlier_peirce_1var`` flags outlying values in data using Peirce's criterion. Let :math:`y` denote a vector of :math:`n` observations (for example the residuals) obtained from a model with :math:`p` parameters, :math:`m` denote the number of potential outlying values, :math:`\mu` and :math:`\sigma^2` denote the mean and variance of :math:`y` respectively, :math:`\tilde{y}` denote a vector of length :math:`n-m` constructed by dropping the :math:`m` values from :math:`y` with the largest value of :math:`\left\lvert y_i-\mu \right\rvert`, :math:`\tilde{\sigma }^2` denote the (unknown) variance of :math:`\tilde{y}`, :math:`\lambda` denote the ratio of :math:`\tilde{\sigma }` and :math:`\sigma` with :math:`\lambda = \frac{\tilde{\sigma }}{\sigma }`. Peirce's method flags :math:`y_i` as a potential outlier if :math:`\left\lvert y_i-\mu \right\rvert \geq x`, where :math:`x = \sigma^2z` and :math:`z` is obtained from the solution of .. math:: R^m = \lambda^{{m-n}}\frac{{m^m\left(n-m\right)^{{n-m}}}}{n^n} where .. math:: R = 2\mathrm{exp}\left(\left(\frac{{z^2-1}}{2}\right)\left(1-\Phi \left(z\right)\right)\right) and :math:`\Phi` is the cumulative distribution function for the standard Normal distribution. As :math:`\tilde{\sigma }^2` is unknown an assumption is made that the relationship between :math:`\tilde{\sigma }^2` and :math:`\sigma^2`, hence :math:`\lambda`, depends only on the sum of squares of the rejected observations and the ratio estimated as .. math:: \lambda^2 = \frac{{n-p-mz^2}}{{n-p-m}} which gives .. math:: z^2 = 1+\frac{{n-p-m}}{m}\left(1-\lambda^2\right) A value for the cutoff :math:`x` is calculated iteratively. An initial value of :math:`R = 0.2` is used and a value of :math:`\lambda` is estimated using equation `[equation] <https://www.nag.com/numeric/nl/nagdoc_28.5/flhtml/g07/g07gaf.html#eqna>`__. Equation `[equation] <https://www.nag.com/numeric/nl/nagdoc_28.5/flhtml/g07/g07gaf.html#eqnc>`__ is then used to obtain an estimate of :math:`z` and then equation `[equation] <https://www.nag.com/numeric/nl/nagdoc_28.5/flhtml/g07/g07gaf.html#eqnd>`__ is used to get a new estimate for :math:`R`. This process is then repeated until the relative change in :math:`z` between consecutive iterations is :math:`\text{}\leq \sqrt{\epsilon }`, where :math:`\epsilon` is machine precision. By construction, the cutoff for testing for :math:`m+1` potential outliers is less than the cutoff for testing for :math:`m` potential outliers. Therefore, Peirce's criterion is used in sequence with the existence of a single potential outlier being investigated first. If one is found, the existence of two potential outliers is investigated etc. If one of a duplicate series of observations is flagged as an outlier, then all of them are flagged as outliers. .. _g07ga-py2-py-references: **References** Gould, B A, 1855, `On Peirce's criterion for the rejection of doubtful observations, with tables for facilitating its application`, The Astronomical Journal (45) Peirce, B, 1852, `Criterion for the rejection of doubtful observations`, The Astronomical Journal (45) """ raise NotImplementedError
[docs]def outlier_peirce_2var(n, e, var1, var2): r""" ``outlier_peirce_2var`` returns a flag indicating whether a single data point is an outlier as defined by Peirce's criterion. .. _g07gb-py2-py-doc: For full information please refer to the NAG Library document for g07gb https://www.nag.com/numeric/nl/nagdoc_28.5/flhtml/g07/g07gbf.html .. _g07gb-py2-py-parameters: **Parameters** **n** : int :math:`n`, the number of observations. **e** : float :math:`\tilde{e}`, the value being tested. **var1** : float :math:`\sigma^2`, the residual variance on fitting model :math:`M` to :math:`y`. **var2** : float :math:`\tilde{\sigma }^2`, the residual variance on fitting model :math:`M` to :math:`\tilde{y}`. **Returns** **outl** : bool :math:`\mathbf{True}` if the value being tested is an outlier. **x** : float An estimated value of :math:`x`, the cutoff that indicates an outlier. **lx** : float :math:`l`, the lower limit for :math:`x`. **ux** : float :math:`u`, the upper limit for :math:`x`. .. _g07gb-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`\mathrm{n} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{n}\geq 3`. (`errno` :math:`3`) On entry, :math:`\mathrm{var1} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{var1} > 0.0`. (`errno` :math:`4`) On entry, :math:`\mathrm{var2} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{var2} > 0.0`. (`errno` :math:`4`) On entry, :math:`\mathrm{var1} = \langle\mathit{\boldsymbol{value}}\rangle`, :math:`\mathrm{var2} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{var2} < \mathrm{var1}`. .. _g07gb-py2-py-notes: **Notes** ``outlier_peirce_2var`` tests a potential outlying value using Peirce's criterion. Let :math:`e` denote a vector of :math:`n` residuals with mean zero and variance :math:`\sigma^2` obtained from fitting some model :math:`M` to a series of data :math:`y`, :math:`\tilde{e}` denote the largest absolute residual in :math:`e`, i.e., :math:`\left\lvert \tilde{e}\right\rvert \geq \left\lvert e_i\right\rvert` for all :math:`i`, and let :math:`\tilde{y}` denote the data series :math:`y` with the observation corresponding to :math:`\tilde{e}` having been omitted, :math:`\tilde{\sigma }^2` denote the residual variance on fitting model :math:`M` to :math:`\tilde{y}`, :math:`\lambda` denote the ratio of :math:`\tilde{\sigma }` and :math:`\sigma` with :math:`\lambda = \frac{\tilde{\sigma }}{\sigma }`. Peirce's method flags :math:`\tilde{e}` as a potential outlier if :math:`\left\lvert \tilde{e}\right\rvert \geq x`, where :math:`x = \sigma^2z` and :math:`z` is obtained from the solution of .. math:: R = \lambda^{{1-n}}\frac{{\left(n-1\right)^{{n-1}}}}{n^n} where .. math:: R = 2\mathrm{exp}\left(\left(\frac{{z^2-1}}{2}\right)\left(1-\Phi \left(z\right)\right)\right) and :math:`\Phi` is the cumulative distribution function for the standard Normal distribution. Unlike :meth:`outlier_peirce_1var`, both :math:`\sigma^2` and :math:`\tilde{\sigma }^2` must be supplied and, therefore, no assumptions are made about the nature of the relationship between these two quantities. Only a single potential outlier is tested for at a time. This function uses an algorithm described in :meth:`opt.one_var_func <naginterfaces.library.opt.one_var_func>` to refine a lower, :math:`l`, and upper, :math:`u`, limit for :math:`x`. This refinement stops when :math:`\left\lvert \tilde{e}\right\rvert < l` or :math:`\left\lvert \tilde{e}\right\rvert > u`. .. _g07gb-py2-py-references: **References** Gould, B A, 1855, `On Peirce's criterion for the rejection of doubtful observations, with tables for facilitating its application`, The Astronomical Journal (45) Peirce, B, 1852, `Criterion for the rejection of doubtful observations`, The Astronomical Journal (45) """ raise NotImplementedError