hide long namesshow long names
hide short namesshow short names
Integer type:  int32  int64  nag_int  show int32  show int32  show int64  show int64  show nag_int  show nag_int

PDF version (NAG web site, 64-bit version, 64-bit version)
Chapter Contents
Chapter Introduction
NAG Toolbox

NAG Toolbox: nag_univar_estim_normal (g07bb)

Purpose

nag_univar_estim_normal (g07bb) computes maximum likelihood estimates and their standard errors for parameters of the Normal distribution from grouped and/or censored data.

Syntax

[xmu, xsig, sexmu, sexsig, corr, dev, nobs, nit, ifail] = g07bb(method, x, xc, ic, xmu, xsig, tol, maxit, 'n', n)
[xmu, xsig, sexmu, sexsig, corr, dev, nobs, nit, ifail] = nag_univar_estim_normal(method, x, xc, ic, xmu, xsig, tol, maxit, 'n', n)

Description

A sample of size nn is taken from a Normal distribution with mean μμ and variance σ2σ2 and consists of grouped and/or censored data. Each of the nn observations is known by a pair of values (Li,Ui)(Li,Ui) such that:
LixiUi.
LixiUi.
The data is represented as particular cases of this form:
Let the set AA identify the exactly specified observations, sets BB and CC identify the observations censored on the right and left respectively, and set DD identify the observations confined between two finite limits. Also let there be rr exactly specified observations, i.e., the number in AA. The probability density function for the standard Normal distribution is
Z(x) = 1/(sqrt(2π))exp((1/2)x2) ,   < x <
Z(x)=12πexp(-12x2) ,  -<x<
and the cumulative distribution function is
X
P(X) = 1Q(X) = Z(x)dx.
P(X)= 1-Q(X)=-XZ(x)dx.
The log-likelihood of the sample can be written as:
L (μ,σ) = r logσ (1/2) A {(xiμ) / σ}2 + B log(Q(li)) + C log(P(ui)) + D log(pi)
L (μ,σ) =-r logσ - 1 2 A { ( x i -μ ) / σ } 2 +B log( Q ( l i ) ) + C log( P ( u i ) ) + D log( p i )
where pi = P(ui)P(li)pi=P(ui)-P(li) and ui = (Uiμ) / σ,  li = (Liμ) / σui=(Ui-μ)/σ,  li=(Li-μ)/σ.
Let
S(xi) = (Z(xi))/(Q(xi)),  S1(li,ui) = (Z(li)Z(ui))/(pi)
S(xi)=Z(xi) Q(xi) ,  S1(li,ui)=Z(li)-Z(ui)pi
and
S2(li,ui) = (uiZ(ui)liZ(li))/(pi),
S2(li,ui)=uiZ(ui)-liZ(li)pi,
then the first derivatives of the log-likelihood can be written as:
(L(μ,σ))/(μ) = L1(μ,σ) = σ2A(xiμ) + σ1BS(li)σ1CS(ui) + σ1DS1(li,ui)
L(μ,σ) μ =L1(μ,σ)=σ-2A(xi-μ)+σ-1BS(li)-σ-1CS(-ui)+σ-1DS1(li,ui)
and
(L(μ,σ))/(σ) = L2(μ,σ) = rσ1 + σ3A(xiμ)2 + σ1BliS(li)σ1CuiS(ui)
L(μ,σ) σ =L2(μ,σ)=-rσ-1+σ-3A (xi-μ) 2+σ-1BliS(li)-σ-1CuiS(-ui)
σ1DS2(li,ui)
-σ-1DS2(li,ui)
The maximum likelihood estimates, μ̂μ^ and σ̂σ^, are the solution to the equations:
L1(μ̂,σ̂) = 0
L1(μ^,σ^)=0
(1)
and
L2(μ̂,σ̂) = 0
L2(μ^,σ^)=0
(2)
and if the second derivatives (2 L)/(2 μ) 2 L 2 μ , (2 L)/(μ σ) 2 L μ σ  and (2L)/(2σ) 2L 2σ  are denoted by L11L11, L12L12 and L22L22 respectively, then estimates of the standard errors of μ̂μ^ and σ̂σ^ are given by:
se(μ̂) = sqrt((L22)/(L11L22L122)),  se(σ̂) = sqrt((L11)/(L11L22L122))
se(μ^)=-L22 L11L22-L122 ,  se(σ^)=-L11 L11L22-L122
and an estimate of the correlation of μ̂μ^ and σ̂σ^ is given by:
(L12)/(sqrt(L12L22)).
L12L12L22 .
To obtain the maximum likelihood estimates the equations (1) and (2) can be solved using either the Newton–Raphson method or the Expectation-maximization (EM)(EM) algorithm of Dempster et al. (1977).
Newton–Raphson Method
This consists of using approximate estimates μ̃μ~ and σ̃σ~ to obtain improved estimates μ̃ + δμ̃μ~+δμ~ and σ̃ + δσ̃σ~+δσ~ by solving
δμ̃L11 + δσ̃L12 + L1 = 0,
δμ̃L12 + δσ̃L22 + L2 = 0,
δμ~L11+δσ~L12+L1=0, δμ~L12+δσ~L22+L2=0,
for the corrections δμ̃δμ~ and δσ̃δσ~.
EM Algorithm
The expectation step consists of constructing the variable wiwi as follows:
if   iA,   wi = xi E (Li < xi < Ui) = μ + σ S1 (li,ui)
if   iA,   wi= xi E ( Li< xi< Ui )= μ+σ S1 (li,ui)
(3)
if   iB,   wi = E (xixi > Li) = μ + σS (li) S1 (li,ui)
if   iB,   wi= E ( xi xi> Li )=μ+σS (li) S1 (li,ui)
(4)
if   iC,   wi = E (xixi < Ui) = μσS (ui) (li,ui)
if   iC,   wi= E ( xi xi< Ui )=μ-σS (-ui) (li,ui)
(5)
if   iD,   wi = E (xiLi < xi < Ui) = μ + σ S1 (li,ui)
if   iD,   wi= E ( xi Li< xi< Ui )=μ+σ S1 (li,ui)
(6)
the maximization step consists of substituting (3), (4), (5) and (6) into (1) and (2) giving:
n
μ̂ = i / n
i = 1
μ^=i=1nw^i/n
(7)
and
n
σ̂2 = (iμ̂)2 / {r + BT(i) + CT(i) + DT1(i,i)}
i = 1
σ^2=i=1n(w^i-μ^)2/ {r+BT(l^i)+CT(-u^i)+DT1(l^i,u^i)}
(8)
where
T(x) = S(x){S(x)x} ,   T1(l,u) = S12(l,u) + S2(l,u)
T(x)=S(x){S(x)-x} ,   T1(l,u)=S12(l,u)+S2(l,u)
and where iw^i, il^i and iu^i are wiwi, lili and uiui evaluated at μ̂μ^ and σ̂σ^. Equations (3) to (8) are the basis of the EMEM iterative procedure for finding μ̂μ^ and σ̂2σ^2. The procedure consists of alternately estimating μ̂μ^ and σ̂2σ^2 using (7) and (8) and estimating {i}{w^i} using (3) to (6).
In choosing between the two methods a general rule is that the Newton–Raphson method converges more quickly but requires good initial estimates whereas the EMEM algorithm converges slowly but is robust to the initial values. In the case of the censored Normal distribution, if only a small proportion of the observations are censored then estimates based on the exact observations should give good enough initial estimates for the Newton–Raphson method to be used. If there are a high proportion of censored observations then the EMEM algorithm should be used and if high accuracy is required the subsequent use of the Newton–Raphson method to refine the estimates obtained from the EMEM algorithm should be considered.

References

Dempster A P, Laird N M and Rubin D B (1977) Maximum likelihood from incomplete data via the EMEM algorithm (with discussion) J. Roy. Statist. Soc. Ser. B 39 1–38
Swan A V (1969) Algorithm AS 16. Maximum likelihood estimation from grouped and censored normal data Appl. Statist. 18 110–114
Wolynetz M S (1979) Maximum likelihood estimation from confined and censored normal data Appl. Statist. 28 185–195

Parameters

Compulsory Input Parameters

1:     method – string (length ≥ 1)
Indicates whether the Newton–Raphson or EMEM algorithm should be used.
If method = 'N'method='N', then the Newton–Raphson algorithm is used.
If method = 'E'method='E', then the EMEM algorithm is used.
Constraint: method = 'N'method='N' or 'E''E'.
2:     x(n) – double array
n, the dimension of the array, must satisfy the constraint n2n2.
The observations xixi, LiLi or UiUi, for i = 1,2,,ni=1,2,,n.
If the observation is exactly specified – the exact value, xixi.
If the observation is right-censored – the lower value, LiLi.
If the observation is left-censored – the upper value, UiUi.
If the observation is interval-censored – the lower or upper value, LiLi or UiUi, (see xc).
3:     xc(n) – double array
n, the dimension of the array, must satisfy the constraint n2n2.
If the jjth observation, for j = 1,2,,nj=1,2,,n is an interval-censored observation then xc(j)xcj should contain the complementary value to x(j)xj, that is, if x(j) < xc(j)xj<xcj, then xc(j)xcj contains upper value, UiUi, and if x(j) > xc(j)xj>xcj, then xc(j)xcj contains lower value, LiLi. Otherwise if the jjth observation is exact or right- or left-censored xc(j)xcj need not be set.
Note: if x(j) = xc(j)xj=xcj then the observation is ignored.
4:     ic(n) – int64int32nag_int array
n, the dimension of the array, must satisfy the constraint n2n2.
ic(i)ici contains the censoring codes for the iith observation, for i = 1,2,,ni=1,2,,n.
If ic(i) = 0ici=0, the observation is exactly specified.
If ic(i) = 1ici=1, the observation is right-censored.
If ic(i) = 2ici=2, the observation is left-censored.
If ic(i) = 3ici=3, the observation is interval-censored.
Constraint: ic(i) = 0ici=0, 11, 22 or 33, for i = 1,2,,ni=1,2,,n.
5:     xmu – double scalar
If xsig > 0.0xsig>0.0 the initial estimate of the mean, μμ; otherwise xmu need not be set.
6:     xsig – double scalar
Specifies whether an initial estimate of μμ and σσ are to be supplied.
xsig > 0.0xsig>0.0
xsig is the initial estimate of σσ and xmu must contain an initial estimate of μμ.
xsig0.0xsig0.0
Initial estimates of xmu and xsig are calculated internally from:
(a) the exact observations, if the number of exactly specified observations is 22; or
(b) the interval-censored observations; if the number of interval-censored observations is 11; or
(c) they are set to 0.00.0 and 1.01.0 respectively.
7:     tol – double scalar
The relative precision required for the final estimates of μμ and σσ. Convergence is assumed when the absolute relative changes in the estimates of both μμ and σσ are less than tol.
If tol = 0.0tol=0.0, then a relative precision of 0.0000050.000005 is used.
Constraint: machine precision < tol1.0machine precision<tol1.0 or tol = 0.0tol=0.0.
8:     maxit – int64int32nag_int scalar
The maximum number of iterations.
If maxit0maxit0, then a value of 2525 is used.

Optional Input Parameters

1:     n – int64int32nag_int scalar
Default: The dimension of the arrays x, xc, ic. (An error is raised if these dimensions are not equal.)
nn, the number of observations.
Constraint: n2n2.

Input Parameters Omitted from the MATLAB Interface

wk

Output Parameters

1:     xmu – double scalar
The maximum likelihood estimate, μ̂μ^, of μμ.
2:     xsig – double scalar
The maximum likelihood estimate, σ̂σ^, of σσ.
3:     sexmu – double scalar
The estimate of the standard error of μ̂μ^.
4:     sexsig – double scalar
The estimate of the standard error of σ̂σ^.
5:     corr – double scalar
The estimate of the correlation between μ̂μ^ and σ̂σ^.
6:     dev – double scalar
The maximized log-likelihood, L(μ̂,σ̂)L(μ^,σ^).
7:     nobs(44) – int64int32nag_int array
The number of the different types of each observation;
nobs(1)nobs1 contains number of right-censored observations.
nobs(2)nobs2 contains number of left-censored observations.
nobs(3)nobs3 contains number of interval-censored observations.
nobs(4)nobs4 contains number of exactly specified observations.
8:     nit – int64int32nag_int scalar
The number of iterations performed.
9:     ifail – int64int32nag_int scalar
ifail = 0ifail=0 unless the function detects an error (see [Error Indicators and Warnings]).

Error Indicators and Warnings

Errors or warnings detected by the function:
  ifail = 1ifail=1
On entry,method'N'method'N' or 'E''E',
orn < 2n<2,
oric(i)0ici0, 11, 22 or 33, for some ii,
ortol < 0.0tol<0.0,
or0.0 < tol < machine precision0.0<tol<machine precision,
ortol > 1.0tol>1.0.
  ifail = 2ifail=2
The chosen method failed to converge in maxit iterations. You should either increase tol or maxit or, if using the EMEM algorithm try using the Newton–Raphson method with initial values those returned by the current call to nag_univar_estim_normal (g07bb). All returned values will be reasonable approximations to the correct results if maxit is not very small.
  ifail = 3ifail=3
The chosen method is diverging. This will be due to poor initial values. You should try different initial values.
  ifail = 4ifail=4
nag_univar_estim_normal (g07bb) was unable to calculate the standard errors. This can be caused by the method starting to diverge when the maximum number of iterations was reached.

Accuracy

The accuracy is controlled by the parameter tol.
If high precision is requested with the EMEM algorithm then there is a possibility that, due to the slow convergence, before the correct solution has been reached the increments of μ̂μ^ and σ̂σ^ may be smaller than tol and the process will prematurely assume convergence.

Further Comments

The process is deemed divergent if three successive increments of μμ or σσ increase.

Example

function nag_univar_estim_normal_example
method = 'N';
x = [4.5;
     5.4;
     3.9;
     5.1;
     4.6;
     4.8;
     2.9;
     6.3;
     5.5;
     4.6;
     4.1;
     5.2;
     3.2;
     4;
     3.1;
     5.1;
     3.8;
     2.2];
xc = [0;
     0;
     0;
     0;
     0;
     0;
     0;
     0;
     0;
     0;
     0;
     0;
     0;
     0;
     0;
     0;
     0;
     2.5];
ic = [int64(0);0;0;0;0;0;0;0;0;0;0;0;1;1;1;2;2;3];
xmu = 4;
xsig = 1;
tol = 5e-05;
maxit = int64(50);
[xmuOut, xsigOut, sexmu, sexsig, corr, dev, nobs, nit, ifail] = ...
    nag_univar_estim_normal(method, x, xc, ic, xmu, xsig, tol, maxit)
 

xmuOut =

    4.4924


xsigOut =

    1.0196


sexmu =

    0.2606


sexsig =

    0.1940


corr =

    0.0160


dev =

  -22.2817


nobs =

                    3
                    2
                    1
                   12


nit =

                    5


ifail =

                    0


function g07bb_example
method = 'N';
x = [4.5;
     5.4;
     3.9;
     5.1;
     4.6;
     4.8;
     2.9;
     6.3;
     5.5;
     4.6;
     4.1;
     5.2;
     3.2;
     4;
     3.1;
     5.1;
     3.8;
     2.2];
xc = [0;
     0;
     0;
     0;
     0;
     0;
     0;
     0;
     0;
     0;
     0;
     0;
     0;
     0;
     0;
     0;
     0;
     2.5];
ic = [int64(0);0;0;0;0;0;0;0;0;0;0;0;1;1;1;2;2;3];
xmu = 4;
xsig = 1;
tol = 5e-05;
maxit = int64(50);
[xmuOut, xsigOut, sexmu, sexsig, corr, dev, nobs, nit, ifail] = ...
    g07bb(method, x, xc, ic, xmu, xsig, tol, maxit)
 

xmuOut =

    4.4924


xsigOut =

    1.0196


sexmu =

    0.2606


sexsig =

    0.1940


corr =

    0.0160


dev =

  -22.2817


nobs =

                    3
                    2
                    1
                   12


nit =

                    5


ifail =

                    0



PDF version (NAG web site, 64-bit version, 64-bit version)
Chapter Contents
Chapter Introduction
NAG Toolbox

© The Numerical Algorithms Group Ltd, Oxford, UK. 2009–2013