hide long namesshow long names
hide short namesshow short names
Integer type:  int32  int64  nag_int  show int32  show int32  show int64  show int64  show nag_int  show nag_int

PDF version (NAG web site, 64-bit version, 64-bit version)
Chapter Contents
Chapter Introduction
NAG Toolbox

NAG Toolbox Chapter Introduction

G07 — Univariate Estimation

Scope of the Chapter

This chapter deals with the estimation of unknown parameters of a univariate distribution. It includes both point and interval estimation using maximum likelihood and robust methods.

Background to the Problems

Statistical inference is concerned with the making of inferences about a population using the observed part of the population called a sample. The population can usually be described using a probability model which will be written in terms of some unknown parameters. For example, the hours of relief given by a drug may be assumed to follow a Normal distribution with mean μμ and variance σ2σ2; it is then required to make inferences about the parameters, μμ and σ2σ2, on the basis of an observed sample of relief times.
There are two main aspects of statistical inference: the estimation of the parameters and the testing of hypotheses about the parameters. In the example above, the values of the parameter σ2σ2 may be estimated and the hypothesis that μ3μ3 tested. This chapter is mainly concerned with estimation but the test of a hypothesis about a parameter is often closely linked to its estimation. Tests of hypotheses which are not linked closely to estimation are given in the chapter on nonparametric statistics (Chapter G08).
There are two types of estimation to be considered in this chapter: point estimation and interval estimation. Point estimation is when a single value is obtained as the best estimate of the parameter. However, as this estimate will be based on only one of a large number of possible samples, it can be seen that if a different sample were taken, a different estimate would be obtained. The distribution of the estimate across all the possible samples is known as the sampling distribution. The sampling distribution contains information on the performance of the estimator, and enables estimators to be compared. For example, a good estimator would have a sampling distribution with mean equal to the true value of the parameter; that is, it should be an unbiased estimator; also the variance of the sampling distribution should be as small as possible. When considering a parameter estimate it is important to consider its variability as measured by its variance, or more often the square root of the variance, the standard error.
The sampling distribution can be used to find interval estimates or confidence intervals for the parameter. A confidence interval is an interval calculated from the sample so that its distribution, as given by the sampling distribution, is such that it contains the true value of the parameter with a certain probability.
Estimates will be functions of the observed sample and these functions are known as estimators. It is usually more convenient for the estimator to be based on statistics from the sample rather than all the individuals observations. If these statistics contain all the relevant information then they are known as sufficient statistics. There are several ways of obtaining the estimators; these include least squares, the method of moments, and maximum likelihood. Least squares estimation requires no knowledge of the distributional form of the error apart from its mean and variance matrix, whereas the method of maximum likelihood is mainly applicable to situations in which the true distribution is known apart from the values of a finite number of unknown parameters. Note that under the assumption of Normality, the least squares estimation is equivalent to the maximum likelihood estimation. Least squares is often used in regression analysis as described in Chapter G02, and maximum likelihood is described below.
Estimators derived from least squares or maximum likelihood will often be greatly affected by the presence of extreme or unusual observations. Estimators that are designed to be less affected are known as robust estimators.

Maximum Likelihood Estimation

Let XiXi be a univariate random variable with probability density function
fXi(xi ; θ),
fXi(xi;θ),
where θθ is a vector of length pp consisting of the unknown parameters. For example, a Normal distribution with mean θ1θ1 and standard deviation θ2θ2 has probability density function
1/(sqrt(2π)θ2)exp((1/2)((xiθ1)/(θ2))2) .
12πθ2 exp(-12 (xi-θ1θ2) 2) .
The likelihood for a sample of nn independent observations is
n
Like = fXi(xi ; θ),
i = 1
Like=i=1nfXi (xi;θ) ,
where xixi is the observed value of XiXi. If each XiXi has an identical distribution, this reduces to
n
Like = fX(xi ; θ),
i = 1
Like=i=1nfX (xi;θ) ,
(1)
and the log-likelihood is
n
log(Like) = L = log(fX(xi ; θ)).
i = 1
log(Like)=L=i=1nlog(fX(xi;θ)).
(2)
The maximum likelihood estimates (θ̂θ^) of θθ are the values of θθ that maximize (1) and (2). If the range of XX is independent of the parameters, then θ̂θ^ can usually be found as the solution to
n
()/(θ̂j)log(fX(xi ; θ̂)) = (L)/(θ̂j) = 0,  j = 1,2,,p.
i = 1
i=1n θ^j log(fX(xi;θ^))= L θ^j =0,  j=1,2,,p.
(3)
Note that (L)/(θj) L θj  is known as the efficient score.
Maximum likelihood estimators possess several important properties.
(a) Maximum likelihood estimators are functions of the sufficient statistics.
(b) Maximum likelihood estimators are (under certain conditions) consistent. That is, the estimator converges in probability to the true value as the sample size increases. Note that for small samples the maximum likelihood estimator may be biased.
(c) For maximum likelihood estimators found as a solution to (3), subject to certain conditions, it follows that
E ((L)/(θ)) = 0,
E ( L θ )=0,
(4)
and
I(θ) = E ((2L)/(θ2)) = E (((L)/(θ))2) ,
I(θ)=-E ( 2L θ2 )=E ( ( L θ ) 2) ,
(5)
and then that θ̂θ^ is asymptotically Normal with mean vector θ0θ0 and variance-covariance matrix Iθ01Iθ0-1 where θ0θ0 denotes the true value of θθ. The matrix IθIθ is known as the information matrix and Iθ01Iθ0-1 is known as the Cramer–Rao lower bound for the variance of an estimator of θθ.
For example, if we consider a sample, x1,x2,,xnx1,x2,,xn, of size nn drawn from a Normal distribution with unknown mean μμ and unknown variance σ2σ2 then we have
n
L = log(Like(μ,σ2 ; x)) = n/2log(2π)n/2log(σ2)(xiμ)2 / 2σ2
i = 1
L=log(Like(μ,σ2;x))=-n2log(2π)-n2log(σ2)-i=1n (xi-μ) 2/2σ2
and thus
n
(L)/(μ) = (xiμ) / σ2
i = 1
L μ =i= 1n (xi-μ)/σ2
and
n
(L)/(σ2) = n/(2σ2) + (xiμ)2 / 2σ4.
i = 1
L σ2 =-n2σ2 +i=1n (xi-μ) 2/2σ4.
Then equating these two equations to zero and solving gives the maximum likelihood estimates
μ̂ = x
μ^=x-
and
n
σ̂2 = (xix)2 / n.
i = 1
σ^2=i=1n (xi-x-) 2/n.
These maximum likelihood estimates are asymptotically Normal with mean vector aa, where
aT = (μ,σ2),
aT=(μ,σ2),
and covariance matrix CC. To obtain CC we find the second derivatives of LL with respect to μμ and σ2σ2 as follows:
(2L)/(μ2) = n/(σ2)
n
(2L)/((σ2)2) = n/(2σ4)(xiμ)2 / σ6
i = 1
(2L)/(μσ2) = (2L)/(σ2μ) = (n(xμ))/(σ4).
2L μ2 =- nσ2 2L (σ2)2 = n2σ4 -i=1n (xi-μ) 2/σ6 2L μσ2 = 2L σ2μ =- n(x--μ)σ4.
Then
C1 = E
((2 L)/(μ2)(2 L)/(σ2μ))
(2 L)/(μσ2) (2 L)/((σ2)2)
=
(n / σ20)
0 n / 2σ4
C-1=-E 2 L μ2 2 L σ2μ 2 L μσ2 2 L (σ2)2 = n/σ2 0 0 n/2σ4
so that
C =
(σ2 / n0)
0 2σ4 / n
.
C= σ2/n 0 0 2σ4/n .
To obtain an estimate of CC the matrix may be evaluated at the maximum likelihood estimates.
It may not always be possible to find maximum likelihood estimates in a convenient closed form, and in these cases iterative numerical methods, such as the Newton–Raphson procedure or the EM algorithm (expectation maximization), will be necessary to compute the maximum likelihood estimates. Their asymptotic variances and covariances may then be found by substituting the estimates into the second derivatives. Note that it may be difficult to find the expected value of the second derivatives required for the variance-covariance matrix and in these cases the observed value of the second derivatives is often used.
The use of maximum likelihood estimation allows the construction of generalized likelihood ratio tests. If λ = 2(l1l2)λ=2(l1-l2), where l1l1 is the maximized log-likelihood function for a model 11 and l2l2 is the maximized log-likelihood function for a model 22, then under the hypothesis that model 22 is correct, 2λ2λ is asymptotically distributed as a χ2χ2 variable with pqp-q degrees of freedom. Consider two models in which model 11 has pp parameters and model 22 is a sub-model (nested model) of model 11 with q < pq<p parameters, that is model 11 has an extra pqp-q parameters. This result provides a useful method for performing hypothesis tests on the parameters. Alternatively, tests exist based on the asymptotic Normality of the estimator and the efficient score; see page 315 of Cox and Hinkley (1974).

Confidence Intervals

Suppose we can find a function, t(x,θ)t(x,θ), whose distribution depends upon the sample xx but not on the unknown parameter θθ, and which is a monotonic (say decreasing) function in θθ for each xx, then we can find t1t1 such that P (t1t(x,θ)) = 1 α P ( t 1 t (x,θ) ) = 1 - α  no matter what θθ happens to be. The function t(x,θ)t(x,θ) is known as a pivotal quantity. Since the function is monotonic the statement that t1t(x,θ)t1t(x,θ) may be rewritten as θθ1(x)θθ1(x) see Figure 1. The statistic θ1(x)θ1(x) will vary from sample to sample and if we assert that θθ1(x)θθ1(x) for any sample values which arise, we will be right in a proportion 1α1-α of the cases, in the long run or on average. We call θ1(x)θ1(x) a 1α1-α upper confidence limit for θθ.
Figure 1
Figure 1
We have considered only an upper confidence limit. The above idea may be generalized to a two-sided confidence interval where two quantities, t0t0 and t1t1, are found such that for all θθ, P (t1t(x,θ)t0) = 1 α P ( t 1 t (x,θ) t 0 ) = 1 - α . This interval may be rewritten as θ0(x)θθ1(x)θ0(x)θθ1(x). Thus if we assert that θθ lies in the interval [θ0(x),θ1(x)θ0(x),θ1(x)] we will be right on average in 1α1-α proportion of the times under repeated sampling.
Hypothesis (significance) tests on the parameters may be used to find these confidence limits. For example, if we observe a value, kk, from a binomial distribution, with known parameter nn and unknown parameter pp, then to find the lower confidence limit we find plpl such that the probability that the null hypothesis H0H0: p = plp=pl (against the one sided alternative that p > plp>pl) will be rejected, is less than or equal to α / 2α/2. Thus for a binomial random variable, BB, with parameters nn and plpl we require that P(Bk)α / 2P(Bk)α/2. The upper confidence limit, pupu, can be constructed in a similar way.
For large samples the asymptotic Normality of the maximum likelihood estimates discussed above is used to construct confidence intervals for the unknown parameters.

Robust Estimation

For particular cases the probability density function can be written as
fXi(xi ; θ) = 1/(θ2)g ((xiθ1)/(θ2))
fXi(xi;θ)=1θ2g (xi-θ1θ2)
for a suitable function gg; then θ1θ1 is known as a location parameter and θ2θ2, usually written as σσ, is known as a scale parameter. This is true of the Normal distribution.
If θ1θ1 is a location parameter, as described above, then equation (3) becomes
n
ψ((xiθ̂1)/(σ̂)) = 0,
i = 1
i=1nψ (xi-θ^1σ^)=0,
(6)
where ψ(z) = d/(dz)log(g(z))ψ(z)=- ddz log(g(z)).
For the scale parameter σσ (or σ2σ2) the equation is
n
χ((xiθ̂1)/(σ̂)) = n / 2,
i = 1
i=1nχ (xi-θ^1σ^)=n/2,
(7)
where χ(z) = zψ(z) / 2χ(z)=zψ(z)/2.
For the Normal distribution ψ(z) = zψ(z)=z and χ(z) = z2 / 2χ(z)=z2/2. Thus, the maximum likelihood estimates for θ1θ1 and σ2σ2 are the sample mean and variance with the nn divisor respectively. As the latter is biased, (7) can be replaced by
n
χ((xiθ̂1)/(σ̂)) = (n1)β,
i = 1
i=1nχ (xi-θ^1σ^)=(n-1)β,
(8)
where ββ is a suitable constant, which for the Normal χχ function is (1/2) 12 .
The influence of an observation on the estimates depends on the form of the ψψ and χχ functions. For a discussion of influence, see Hampel et al. (1986) and Huber (1981). The influence of extreme values can be reduced by bounding the values of the ψψ- and χχ-functions. One suggestion due to Huber (1981) is
ψ(z) =
{ − C, ||z < − C z, |z| ≤ C C, ||z > C.
ψ(z)={ -C, ||z<-C z, |z|C C, ||z>C.
Figure 2
Figure 2
Redescending ψψ-functions are often considered; these give zero values to ψ(z)ψ(z) for large positive or negative values of zz. Hampel et al. (1986) suggested
ψ(z) =
{ − ψ( − z) z, 0 ≤ z ≤ h1. h1, h1 ≤ z ≤ h2. h1(h3 − z) / (h3 − h2), h2 ≤ z ≤ h3. 0, z > h3.
ψ(z)={ -ψ(-z) z, 0zh1. h1, h1zh2. h1(h3-z)/(h3-h2), h2zh3. 0, z>h3.
Figure 3
Figure 3
Usually a χχ-function based on Huber's ψψ-function is used: χ = ψ2 / 2χ=ψ2/2. Estimators based on such bounded ψψ-functions are known as MM-estimators, and provide one type of robust estimator.
Other robust estimators for the location parameter are
(i) the sample median,
(ii) the trimmed mean, i.e., the mean calculated after the extreme values have been removed from the sample,
(iii) the winsorized mean, i.e., the mean calculated after the extreme values of the sample have been replaced by other more moderate values from the sample.
For the scale parameter, alternative estimators are
(i) the median absolute deviation scaled to produce an estimator which is unbiased in the case of data coming from a Normal distribution,
(ii) the winsorized variance, i.e., the variance calculated after the extreme values of the sample have been replaced by other more moderate values from the sample.
For a general discussion of robust estimation, see Hampel et al. (1986) and Huber (1981).

Robust Confidence Intervals

In Section [Confidence Intervals] it was shown how tests of hypotheses can be used to find confidence intervals. That approach uses a parametric test that requires the assumption that the data used in the computation of the confidence has a known distribution. As an alternative, a more robust confidence interval can be found by replacing the parametric test by a nonparametric test. In the case of the confidence interval for the location parameter, a Wilcoxon test statistic can be used, and for the difference in location, computed from two samples, a Mann–Whitney test statistic can be used.

Recommendations on Choice and Use of Available Functions

Maximum Likelihood Estimation and Confidence Intervals
nag_univar_ci_binomial (g07aa)provides a confidence interval for the parameter pp of the binomial distribution.
nag_univar_ci_poisson (g07ab)provides a confidence interval for the mean parameter of the Poisson distribution.
nag_univar_estim_normal (g07bb)provides maximum likelihood estimates and their standard errors for the parameters of the Normal distribution from grouped and/or censored data.
nag_univar_estim_weibull (g07be)provides maximum likelihood estimates and their standard errors for the parameters of the Weibull distribution from data which may be right-censored.
nag_univar_estim_genpareto (g07bf)provides maximum likelihood estimates and their standard errors for the parameters of the generalized Pareto distribution.
nag_univar_ttest_2normal (g07ca)provides a tt-test statistic to test for a difference in means between two Normal populations, together with a confidence interval for the difference between the means.
Robust Estimation
nag_univar_robust_1var_mestim (g07db)provides MM-estimates for location and, optionally, scale using four common forms of the ψψ-function.
nag_univar_robust_1var_mestim_wgt (g07dc)produces the MM-estimates for location and, optionally, scale but for user-supplied ψψ- and χχ-functions.
nag_univar_robust_1var_median (g07da)provides the sample median, median absolute deviation, and the scaled value of the median absolute deviation.
nag_univar_robust_1var_trimmed (g07dd)provides the trimmed mean and winsorized mean together with estimates of their variance based on a winsorized variance.
Robust Internal Estimation
nag_univar_robust_1var_ci (g07ea)produces a rank based confidence interval for locations.
nag_univar_robust_2var_ci (g07eb)produces a rank based confidence interval for the difference in location between two populations.
Outlier Detection
This chapter provides two functions for identifying potential outlying values, nag_univar_outlier_peirce_1var (g07ga) and nag_univar_outlier_peirce_2var (g07gb). Many of the model fitting functions, for examples those in Chapters G02 and G13 also return vectors of residuals which can also be used to aid in the identification of outlying values.

Functionality Index

2 sample t-test nag_univar_ttest_2normal (g07ca)
Confidence intervals for parameters, 
    binomial distribution nag_univar_ci_binomial (g07aa)
    Poisson distribution nag_univar_ci_poisson (g07ab)
Maximum likelihood estimation of parameters, 
    Normal distribution, grouped and/or censored data nag_univar_estim_normal (g07bb)
    Weibull distribution nag_univar_estim_weibull (g07be)
Outlier detection, 
    Peirce, 
        raw data or single variance supplied nag_univar_outlier_peirce_1var (g07ga)
        two variances supplied nag_univar_outlier_peirce_2var (g07gb)
Parameter estimates, 
    generalized Pareto distribution nag_univar_estim_genpareto (g07bf)
Robust estimation, 
    confidence intervals, 
        one sample nag_univar_robust_1var_ci (g07ea)
        two samples nag_univar_robust_2var_ci (g07eb)
    median, median absolute deviation and robust standard deviation nag_univar_robust_1var_median (g07da)
    M-estimates for location and scale parameters, 
        standard weight functions nag_univar_robust_1var_mestim (g07db)
        trimmed and winsorized means and estimates of their variance nag_univar_robust_1var_trimmed (g07dd)
        user-defined weight functions nag_univar_robust_1var_mestim_wgt (g07dc)

References

Cox D R and Hinkley D V (1974) Theoretical Statistics Chapman and Hall
Hampel F R, Ronchetti E M, Rousseeuw P J and Stahel W A (1986) Robust Statistics. The Approach Based on Influence Functions Wiley
Huber P J (1981) Robust Statistics Wiley
Kendall M G and Stuart A (1973) The Advanced Theory of Statistics (Volume 2) (3rd Edition) Griffin
Silvey S D (1975) Statistical Inference Chapman and Hall

PDF version (NAG web site, 64-bit version, 64-bit version)
Chapter Contents
Chapter Introduction
NAG Toolbox

© The Numerical Algorithms Group Ltd, Oxford, UK. 2009–2013