hide long namesshow long names
hide short namesshow short names
Integer type:  int32  int64  nag_int  show int32  show int32  show int64  show int64  show nag_int  show nag_int

PDF version (NAG web site, 64-bit version, 64-bit version)
Chapter Contents
Chapter Introduction
NAG Toolbox

NAG Toolbox: nag_univar_estim_normal (g07bb)

 Contents

    1  Purpose
    2  Syntax
    7  Accuracy
    9  Example

Purpose

nag_univar_estim_normal (g07bb) computes maximum likelihood estimates and their standard errors for arguments of the Normal distribution from grouped and/or censored data.

Syntax

[xmu, xsig, sexmu, sexsig, corr, dev, nobs, nit, ifail] = g07bb(method, x, xc, ic, xmu, xsig, tol, maxit, 'n', n)
[xmu, xsig, sexmu, sexsig, corr, dev, nobs, nit, ifail] = nag_univar_estim_normal(method, x, xc, ic, xmu, xsig, tol, maxit, 'n', n)

Description

A sample of size n is taken from a Normal distribution with mean μ and variance σ2 and consists of grouped and/or censored data. Each of the n observations is known by a pair of values Li,Ui such that:
LixiUi.  
The data is represented as particular cases of this form:
Let the set A identify the exactly specified observations, sets B and C identify the observations censored on the right and left respectively, and set D identify the observations confined between two finite limits. Also let there be r exactly specified observations, i.e., the number in A. The probability density function for the standard Normal distribution is
Zx=12πexp-12x2 ,  -<x<  
and the cumulative distribution function is
PX= 1-QX=-XZxdx.  
The log-likelihood of the sample can be written as:
L μ,σ =-r logσ - 1 2 A x i -μ / σ 2 +B log Q l i + C log P u i + D log p i  
where pi=Pui-Pli and ui=Ui-μ/σ,  li=Li-μ/σ.
Let
Sxi=Zxi Qxi ,  S1li,ui=Zli-Zuipi  
and
S2li,ui=uiZui-liZlipi,  
then the first derivatives of the log-likelihood can be written as:
Lμ,σ μ =L1μ,σ=σ-2Axi-μ+σ-1BSli-σ-1CS-ui+σ-1DS1li,ui  
and
Lμ,σ σ =L2μ,σ=-rσ-1+σ-3A xi-μ 2+σ-1BliSli-σ-1CuiS-ui  
-σ-1DS2li,ui  
The maximum likelihood estimates, μ^ and σ^, are the solution to the equations:
L1μ^,σ^=0 (1)
and
L2μ^,σ^=0 (2)
and if the second derivatives 2 L 2 μ , 2 L μ σ  and 2L 2σ  are denoted by L11, L12 and L22 respectively, then estimates of the standard errors of μ^ and σ^ are given by:
seμ^=-L22 L11L22-L122 ,  seσ^=-L11 L11L22-L122  
and an estimate of the correlation of μ^ and σ^ is given by:
L12L12L22 .  
To obtain the maximum likelihood estimates the equations (1) and (2) can be solved using either the Newton–Raphson method or the Expectation-maximization EM algorithm of Dempster et al. (1977).
Newton–Raphson Method
This consists of using approximate estimates μ~ and σ~ to obtain improved estimates μ~+δμ~ and σ~+δσ~ by solving
δμ~L11+δσ~L12+L1=0, δμ~L12+δσ~L22+L2=0,  
for the corrections δμ~ and δσ~.
EM Algorithm
The expectation step consists of constructing the variable wi as follows:
if   iA,   wi= xi E Li< xi< Ui = μ+σ S1 li,ui (3)
if   iB,   wi= E xi xi> Li =μ+σS li S1 li,ui (4)
if   iC,   wi= E xi xi< Ui =μ-σS -ui li,ui (5)
if   iD,   wi= E xi Li< xi< Ui =μ+σ S1 li,ui (6)
the maximization step consists of substituting (3), (4), (5) and (6) into (1) and (2) giving:
μ^=i=1nw^i/n (7)
and
σ^2=i=1nw^i-μ^2/ r+BTl^i+CT-u^i+DT1l^i,u^i (8)
where
Tx=SxSx-x ,   T1l,u=S12l,u+S2l,u  
and where w^i, l^i and u^i are wi, li and ui evaluated at μ^ and σ^. Equations (3) to (8) are the basis of the EM iterative procedure for finding μ^ and σ^2. The procedure consists of alternately estimating μ^ and σ^2 using (7) and (8) and estimating w^i using (3) to (6).
In choosing between the two methods a general rule is that the Newton–Raphson method converges more quickly but requires good initial estimates whereas the EM algorithm converges slowly but is robust to the initial values. In the case of the censored Normal distribution, if only a small proportion of the observations are censored then estimates based on the exact observations should give good enough initial estimates for the Newton–Raphson method to be used. If there are a high proportion of censored observations then the EM algorithm should be used and if high accuracy is required the subsequent use of the Newton–Raphson method to refine the estimates obtained from the EM algorithm should be considered.

References

Dempster A P, Laird N M and Rubin D B (1977) Maximum likelihood from incomplete data via the EM algorithm (with discussion) J. Roy. Statist. Soc. Ser. B 39 1–38
Swan A V (1969) Algorithm AS 16. Maximum likelihood estimation from grouped and censored normal data Appl. Statist. 18 110–114
Wolynetz M S (1979) Maximum likelihood estimation from confined and censored normal data Appl. Statist. 28 185–195

Parameters

Compulsory Input Parameters

1:     method – string (length ≥ 1)
Indicates whether the Newton–Raphson or EM algorithm should be used.
If method='N', then the Newton–Raphson algorithm is used.
If method='E', then the EM algorithm is used.
Constraint: method='N' or 'E'.
2:     xn – double array
The observations xi, Li or Ui, for i=1,2,,n.
If the observation is exactly specified – the exact value, xi.
If the observation is right-censored – the lower value, Li.
If the observation is left-censored – the upper value, Ui.
If the observation is interval-censored – the lower or upper value, Li or Ui, (see xc).
3:     xcn – double array
If the jth observation, for j=1,2,,n is an interval-censored observation then xcj should contain the complementary value to xj, that is, if xj<xcj, then xcj contains upper value, Ui, and if xj>xcj, then xcj contains lower value, Li. Otherwise if the jth observation is exact or right- or left-censored xcj need not be set.
Note: if xj=xcj then the observation is ignored.
4:     icn int64int32nag_int array
ici contains the censoring codes for the ith observation, for i=1,2,,n.
If ici=0, the observation is exactly specified.
If ici=1, the observation is right-censored.
If ici=2, the observation is left-censored.
If ici=3, the observation is interval-censored.
Constraint: ici=0, 1, 2 or 3, for i=1,2,,n.
5:     xmu – double scalar
If xsig>0.0 the initial estimate of the mean, μ; otherwise xmu need not be set.
6:     xsig – double scalar
Specifies whether an initial estimate of μ and σ are to be supplied.
xsig>0.0
xsig is the initial estimate of σ and xmu must contain an initial estimate of μ.
xsig0.0
Initial estimates of xmu and xsig are calculated internally from:
(a) the exact observations, if the number of exactly specified observations is 2; or
(b) the interval-censored observations; if the number of interval-censored observations is 1; or
(c) they are set to 0.0 and 1.0 respectively.
7:     tol – double scalar
The relative precision required for the final estimates of μ and σ. Convergence is assumed when the absolute relative changes in the estimates of both μ and σ are less than tol.
If tol=0.0, then a relative precision of 0.000005 is used.
Constraint: machine precision<tol1.0 or tol=0.0.
8:     maxit int64int32nag_int scalar
The maximum number of iterations.
If maxit0, then a value of 25 is used.

Optional Input Parameters

1:     n int64int32nag_int scalar
Default: the dimension of the arrays x, xc, ic. (An error is raised if these dimensions are not equal.)
n, the number of observations.
Constraint: n2.

Output Parameters

1:     xmu – double scalar
The maximum likelihood estimate, μ^, of μ.
2:     xsig – double scalar
The maximum likelihood estimate, σ^, of σ.
3:     sexmu – double scalar
The estimate of the standard error of μ^.
4:     sexsig – double scalar
The estimate of the standard error of σ^.
5:     corr – double scalar
The estimate of the correlation between μ^ and σ^.
6:     dev – double scalar
The maximized log-likelihood, Lμ^,σ^.
7:     nobs4 int64int32nag_int array
The number of the different types of each observation;
nobs1 contains number of right-censored observations.
nobs2 contains number of left-censored observations.
nobs3 contains number of interval-censored observations.
nobs4 contains number of exactly specified observations.
8:     nit int64int32nag_int scalar
The number of iterations performed.
9:     ifail int64int32nag_int scalar
ifail=0 unless the function detects an error (see Error Indicators and Warnings).

Error Indicators and Warnings

Errors or warnings detected by the function:
   ifail=1
On entry,method'N' or 'E',
orn<2,
orici0, 1, 2 or 3, for some i,
ortol<0.0,
or0.0<tol<machine precision,
ortol>1.0.
   ifail=2
The chosen method failed to converge in maxit iterations. You should either increase tol or maxit or, if using the EM algorithm try using the Newton–Raphson method with initial values those returned by the current call to nag_univar_estim_normal (g07bb). All returned values will be reasonable approximations to the correct results if maxit is not very small.
   ifail=3
The chosen method is diverging. This will be due to poor initial values. You should try different initial values.
   ifail=4
nag_univar_estim_normal (g07bb) was unable to calculate the standard errors. This can be caused by the method starting to diverge when the maximum number of iterations was reached.
   ifail=-99
An unexpected error has been triggered by this routine. Please contact NAG.
   ifail=-399
Your licence key may have expired or may not have been installed correctly.
   ifail=-999
Dynamic memory allocation failed.

Accuracy

The accuracy is controlled by the argument tol.
If high precision is requested with the EM algorithm then there is a possibility that, due to the slow convergence, before the correct solution has been reached the increments of μ^ and σ^ may be smaller than tol and the process will prematurely assume convergence.

Further Comments

The process is deemed divergent if three successive increments of μ or σ increase.

Example

A sample of 18 observations and their censoring codes are read in and the Newton–Raphson method used to compute the estimates.
function g07bb_example


fprintf('g07bb example results\n\n');

% Data
x = [4.5;     5.4;    3.9;     5.1;     4.6;     4.8;
     2.9;     6.3;    5.5;     4.6;     4.1;     5.2;
     3.2;     4;      3.1;     5.1;     3.8;     2.2];

n  = numel(x);
xc = zeros(n,1);
ic = zeros(n,1,'int64');
xc(n,1) = 2.5;
ic(n-5:n,1) = [1; 1; 1; 2; 2; 3];

% Parameters
method = 'N';
xmu  = 4;
xsig = 1;
tol  = 5e-05;
maxit = int64(50);

% Calculate estimates
[xmu, xsig, sexmu, sexsig, corr, dev, nobs, nit, ifail] = ...
g07bb( ...
       method, x, xc, ic, xmu, xsig, tol, maxit);

% Display results
fprintf(' Mean                                     = %8.4f\n', xmu);
fprintf(' Standard deviation                       = %8.4f\n', xsig);
fprintf(' Standard error of mean                   = %8.4f\n', sexmu);
fprintf(' Standard error of sigma                  = %8.4f\n', sexsig);
fprintf(' Correlation coefficient                  = %8.4f\n', corr);
fprintf(' Number of right censored observations    = %3d\n', nobs(1));
fprintf(' Number of left censored observations     = %3d\n', nobs(2));
fprintf(' Number of interval censored observations = %3d\n', nobs(3));
fprintf(' Number of exactly specified observations = %3d\n', nobs(4));
fprintf(' Number of iterations                     = %3d\n', nit);
fprintf(' Log-likelihood                           = %8.4f\n', dev);


g07bb example results

 Mean                                     =   4.4924
 Standard deviation                       =   1.0196
 Standard error of mean                   =   0.2606
 Standard error of sigma                  =   0.1940
 Correlation coefficient                  =   0.0160
 Number of right censored observations    =   3
 Number of left censored observations     =   2
 Number of interval censored observations =   1
 Number of exactly specified observations =  12
 Number of iterations                     =   5
 Log-likelihood                           = -22.2817

PDF version (NAG web site, 64-bit version, 64-bit version)
Chapter Contents
Chapter Introduction
NAG Toolbox

© The Numerical Algorithms Group Ltd, Oxford, UK. 2009–2015