hide long namesshow long names
hide short namesshow short names
Integer type:  int32  int64  nag_int  show int32  show int32  show int64  show int64  show nag_int  show nag_int

PDF version (NAG web site, 64-bit version, 64-bit version)
Chapter Contents
Chapter Introduction
NAG Toolbox

NAG Toolbox: nag_correg_mixeff_ml (g02jb)

 Contents

    1  Purpose
    2  Syntax
    7  Accuracy
    9  Example

Purpose

nag_correg_mixeff_ml (g02jb) fits a linear mixed effects regression model using maximum likelihood (ML).

Syntax

[nff, nrf, df, ml, b, se, gamma, warn, ifail] = g02jb(nvpr, levels, yvid, fvid, rvid, svid, cwid, vpr, dat, fint, rint, lb, gamma, 'n', n, 'ncol', ncol, 'nfv', nfv, 'nrv', nrv, 'maxit', maxit, 'tol', tol)
[nff, nrf, df, ml, b, se, gamma, warn, ifail] = nag_correg_mixeff_ml(nvpr, levels, yvid, fvid, rvid, svid, cwid, vpr, dat, fint, rint, lb, gamma, 'n', n, 'ncol', ncol, 'nfv', nfv, 'nrv', nrv, 'maxit', maxit, 'tol', tol)
Note: the interface to this routine has changed since earlier releases of the toolbox:
At Mark 23: maxit and tol were made optional
At Mark 22: n was made optional

Description

nag_correg_mixeff_ml (g02jb) fits a model of the form:
y=Xβ+Zν+ε  
where and
Both ν  and ε  are assumed to have a Gaussian distribution with expectation zero and
Var ν ε = G 0 0 R  
where R= σ R 2 I , I  is the n×n  identity matrix and G  is a diagonal matrix. It is assumed that the random variables, Z , can be subdivided into g q  groups with each group being identically distributed with expectations zero and variance σi2 . The diagonal elements of matrix G  therefore take one of the values σi2 : i=1,2,,g , depending on which group the associated random variable belongs to.
The model therefore contains three sets of unknowns, the fixed effects, β , the random effects ν  and a vector of g+1  variance components, γ , where γ = σ12,σ22,, σ g-1 2 ,σg2,σR2 . Rather than working directly with γ , nag_correg_mixeff_ml (g02jb) uses an iterative process to estimate γ* = σ12 / σR2 , σ22 / σR2 ,, σg-12 / σR2 , σg2 / σR2 ,1 . Due to the iterative nature of the estimation a set of initial values, γ0 , for γ*  is required. nag_correg_mixeff_ml (g02jb) allows these initial values either to be supplied by you or calculated from the data using the minimum variance quadratic unbiased estimators (MIVQUE0) suggested by Rao (1972).
nag_correg_mixeff_ml (g02jb) fits the model using a quasi-Newton algorithm to maximize the log-likelihood function:
-2 l R = log V + n log r V-1 r + log 2 π / n  
where
V = ZG Z + R,   r=y-Xb   and   b = X V-1 X -1 X V-1 y .  
Once the final estimates for γ *  have been obtained, the value of σR2  is given by:
σR2 = r V-1 r / n - p .  
Case weights, Wc , can be incorporated into the model by replacing XX  and ZZ  with XWcX  and ZWcZ  respectively, for a diagonal weight matrix Wc .
The log-likelihood, lR, is calculated using the sweep algorithm detailed in Wolfinger et al. (1994).

References

Goodnight J H (1979) A tutorial on the SWEEP operator The American Statistician 33(3) 149–158
Harville D A (1977) Maximum likelihood approaches to variance component estimation and to related problems JASA 72 320–340
Rao C R (1972) Estimation of variance and covariance components in a linear model J. Am. Stat. Assoc. 67 112–115
Stroup W W (1989) Predictable functions and prediction space in the mixed model procedure Applications of Mixed Models in Agriculture and Related Disciplines Southern Cooperative Series Bulletin No. 343 39–48
Wolfinger R, Tobias R and Sall J (1994) Computing Gaussian likelihoods and their derivatives for general linear mixed models SIAM Sci. Statist. Comput. 15 1294–1310

Parameters

Compulsory Input Parameters

1:     nvpr int64int32nag_int scalar
If rint=1 and svid0, nvpr is the number of variance components being estimated-2, (g-1), else nvpr=g.
If nrv=0, nvpr is not referenced.
Constraint: if nrv0, 1nvprnrv.
2:     levelsncol int64int32nag_int array
levelsi contains the number of levels associated with the ith variable of the data matrix dat. If this variable is continuous or binary (i.e., only takes the values zero or one) then levelsi should be 1; if the variable is discrete then levelsi is the number of levels associated with it and datji is assumed to take the values 1 to levelsi, for j=1,2,,n.
Constraint: levelsi1, for i=1,2,,ncol.
3:     yvid int64int32nag_int scalar
The column of dat holding the dependent, y, variable.
Constraint: 1yvidncol.
4:     fvidnfv int64int32nag_int array
The columns of the data matrix dat holding the fixed independent variables with fvidi  holding the column number corresponding to the i th fixed variable.
Constraint: 1fvidincol, for i=1,2,,nfv.
5:     rvidnrv int64int32nag_int array
The columns of the data matrix dat holding the random independent variables with rvidi  holding the column number corresponding to the i th random variable.
Constraint: 1rvidincol, for i=1,2,,nrv.
6:     svid int64int32nag_int scalar
The column of dat holding the subject variable.
If svid=0, no subject variable is used.
Specifying a subject variable is equivalent to specifying the interaction between that variable and all of the random-effects. Letting the notation Z1 × ZS  denote the interaction between variables Z1  and ZS , fitting a model with rint = 0 , random-effects Z1 + Z2  and subject variable ZS  is equivalent to fitting a model with random-effects Z1 × ZS + Z2 × ZS  and no subject variable. If rint = 1  the model is equivalent to fitting ZS + Z1 × ZS + Z2 × ZS  and no subject variable.
Constraint: 0svidncol.
7:     cwid int64int32nag_int scalar
The column of dat holding the case weights.
If cwid=0, no weights are used.
Constraint: 0cwidncol.
8:     vprnrv int64int32nag_int array
vpri  holds a flag indicating the variance of the i th random variable. The variance of the i th random variable is σ j 2 , where j = vpri + 1  if rint=1 and svid0 and j = vpri  otherwise. Random variables with the same value of j are assumed to be taken from the same distribution.
Constraint: 1vprinvpr, for i=1,2,,nrv.
9:     datlddatncol – double array
lddat, the first dimension of the array, must satisfy the constraint lddatn.
Array containing all of the data. For the ith observation:
  • datiyvid holds the dependent variable, y;
  • if cwid0, daticwid holds the case weights;
  • if svid0, datisvid holds the subject variable.
The remaining columns hold the values of the independent variables.
Constraints:
  • if cwid0, daticwid0.0;
  • if levelsj1, 1datijlevelsj.
10:   fint int64int32nag_int scalar
Flag indicating whether a fixed intercept is included (fint=1).
Constraint: fint=0 or 1.
11:   rint int64int32nag_int scalar
Flag indicating whether a random intercept is included (rint=1).
If svid=0, rint is not referenced.
Constraint: rint=0 or 1.
12:   lb int64int32nag_int scalar
The size of the array b.
Constraint: lb fint + i=1 nfv maxlevelsfvidi-1,1 + LS × rint + i=1 nrv levelsrvidi  where LS = levelssvid  if svid0 and 1 otherwise.
13:   gammanvpr+2 – double array
Holds the initial values of the variance components, γ0 , with gammai the initial value for σi2/σR2, for i=1,2,,g. If rint=1 and svid0, g=nvpr+1, else g=nvpr.
If gamma1=-1.0, the remaining elements of gamma are ignored and the initial values for the variance components are estimated from the data using MIVQUE0.
Constraint: gamma1=-1.0 ​ or ​ gammai0.0, for i=1,2,,g.

Optional Input Parameters

1:     n int64int32nag_int scalar
Default: the first dimension of the array dat.
n, the number of observations.
Constraint: n1.
2:     ncol int64int32nag_int scalar
Default: the dimension of the array levels and the second dimension of the array dat. (An error is raised if these dimensions are not equal.)
The number of columns in the data matrix, dat.
Constraint: ncol1.
3:     nfv int64int32nag_int scalar
Default: the dimension of the array fvid.
The number of independent variables in the model which are to be treated as being fixed.
Constraint: 0nfv<ncol.
4:     nrv int64int32nag_int scalar
Default: the dimension of the arrays rvid, vpr. (An error is raised if these dimensions are not equal.)
The number of independent variables in the model which are to be treated as being random.
Constraints:
  • 0nrv<ncol;
  • nrv+rint>0.
5:     maxit int64int32nag_int scalar
Default: -1
The maximum number of iterations.
If maxit < 0 , the default value of 100  is used.
If maxit=0, the parameter estimates β,ν  and corresponding standard errors are calculated based on the value of γ0  supplied in gamma.
6:     tol – double scalar
Default: 0
The tolerance used to assess convergence.
If tol0.0, the default value of ε0.7 is used, where ε is the machine precision.

Output Parameters

1:     nff int64int32nag_int scalar
The number of fixed effects estimated (i.e., the number of columns, p, in the design matrix X).
2:     nrf int64int32nag_int scalar
The number of random effects estimated (i.e., the number of columns, q, in the design matrix Z).
3:     df int64int32nag_int scalar
The degrees of freedom.
4:     ml – double scalar
- 2 lR γ^ where lR is the log of the maximum likelihood calculated at γ^ , the estimated variance components returned in gamma.
5:     blb – double array
The parameter estimates, β,ν, with the first nff elements of b containing the fixed effect parameter estimates, β and the next nrf elements of b containing the random effect parameter estimates, ν.
Fixed effects
If fint=1, b1 contains the estimate of the fixed intercept. Let Li denote the number of levels associated with the ith fixed variable, that is Li = levels fvidi . Define
  • if fint=1, F1 = 2 else if fint=0, F1=1 ;
  • F i+1 = Fi + maxLi-1,1 , i1 .
Then for i=1,2,,nfv:
  • if Li > 1 , b Fi+j-2 contains the parameter estimate for the jth level of the ith fixed variable, for j=2,3,,Li;
  • if Li 1 , bFi contains the parameter estimate for the ith fixed variable.
Random effects
Redefining Li to denote the number of levels associated with the ith random variable, that is Li = levels rvidi . Define
  • if rint=1, R1 = 2 else if rint=0, R1=1 ;
    R i+1 = Ri + Li , i1 .
Then for i = 1 , 2 , , nrv :
  • if svid=0,
    • if Li > 1 , b nff + Ri +j-1 contains the parameter estimate for the jth level of the ith random variable, for j=1,2,,Li;
    • if Li 1 , b nff + Ri contains the parameter estimate for the ith random variable;
  • if svid 0 ,
    • let LS denote the number of levels associated with the subject variable, that is LS = levels svid ;
    • if Li > 1 , b nff + s-1 LS + Ri + j - 1 contains the parameter estimate for the interaction between the sth level of the subject variable and the jth level of the ith random variable, for s=1,2,,LS and j=1,2,,Li;
    • if Li 1 , b nff + s-1 LS + Ri contains the parameter estimate for the interaction between the sth level of the subject variable and the ith random variable, for s=1,2,,LS;
    • if rint=1, bnff+1 contains the estimate of the random intercept.
6:     selb – double array
The standard errors of the parameter estimates given in b.
7:     gammanvpr+2 – double array
gammai, for i=1,2,,g, holds the final estimate of σi2 and gammag+1 holds the final estimate for σR2.
8:     warn int64int32nag_int scalar
Is set to 1 if a variance component was estimated to be a negative value during the fitting process. Otherwise warn is set to 0 .
If warn=1, the negative estimate is set to zero and the estimation process allowed to continue.
9:     ifail int64int32nag_int scalar
ifail=0 unless the function detects an error (see Error Indicators and Warnings).

Error Indicators and Warnings

Errors or warnings detected by the function:
   ifail=1
On entry,n<2,
orncol<1,
orlddat<n,
oryvid<1 or yvid>ncol,
orcwid<0 or cwid>ncol,
ornfv<0 or nfvncol,
orfint0 and fint1,
or nrv<0 or nrv>ncol or nrv+rint<1,
ornvpr<0 or nvpr>nrv,
orrint0 and rint1,
orsvid<0 or svid>ncol,
orlb is too small.
   ifail=2
On entry,levelsi<1, for at least one i,
orfvidi<1, or fvidi>ncol, for at least one i,
orrvidi<1, or rvidi>ncol, for at least one i,
orvpri<1 or vpri>nvpr, for at least one i,
orat least one discrete variable in array dat has a value greater than that specified in levels,
orgammai<0, for at least one i, and gamma1-1.
   ifail=3
Degrees of freedom <1. The number of arguments exceed the effective number of observations.
   ifail=4
The function failed to converge to the specified tolerance in maxit iterations. See Further Comments for advice.
   ifail=-99
An unexpected error has been triggered by this routine. Please contact NAG.
   ifail=-399
Your licence key may have expired or may not have been installed correctly.
   ifail=-999
Dynamic memory allocation failed.

Accuracy

The accuracy of the results can be adjusted through the use of the tol argument.

Further Comments

Wherever possible any block structure present in the design matrix Z should be modelled through a subject variable, specified via svid, rather than being explicitly entered into dat.
nag_correg_mixeff_ml (g02jb) uses an iterative process to fit the specified model and for some problems this process may fail to converge (see ifail=4). If the function fails to converge then the maximum number of iterations (see maxit) or tolerance (see tol) may require increasing; try a different starting estimate in gamma. Alternatively, the model can be fit using restricted maximum likelihood (see nag_correg_mixeff_reml (g02ja)) or using the noniterative MIVQUE0.
To fit the model just using MIVQUE0, the first element of gamma should be set to -1 and maxit should be set to zero.
Although the quasi-Newton algorithm used in nag_correg_mixeff_ml (g02jb) tends to require more iterations before converging compared to the Newton–Raphson algorithm recommended by Wolfinger et al. (1994), it does not require the second derivatives of the likelihood function to be calculated and consequentially takes significantly less time per iteration.

Example

The following dataset is taken from Stroup (1989) and arises from a balanced split-plot design with the whole plots arranged in a randomized complete block-design.
In this example the full design matrix for the random independent variable, Z , is given by:
Z = 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1  
= A 0 0 0 0 A 0 0 0 0 A 0 0 0 0 A A 0 0 0 0 A 0 0 0 0 A 0 0 0 0 A , (1)
where
A = 1 1 0 0 1 0 1 0 1 0 0 1 .  
The block structure evident in (1) is modelled by specifying a four-level subject variable, taking the values 1 , 1 , 1 , 2 , 2 , 2 , 3 , 3 , 3 , 4 , 4 , 4 , 1 , 1 , 1 , 2 , 2 , 2 , 3 , 3 , 3 , 4 , 4 , 4 . The first column of 1s  is added to A  by setting rint=1. The remaining columns of A are specified by a three level factor, taking the values, 1 , 2 , 3 , 1 , 2 , 3 , 1 , .
function g02jb_example


fprintf('g02jb example results\n\n');

dat = [56, 1, 1, 1, 1;
       50, 1, 2, 1, 1;
       39, 1, 3, 1, 1;
       30, 2, 1, 1, 1;
       36, 2, 2, 1, 1;
       33, 2, 3, 1, 1;
       32, 3, 1, 1, 1;
       31, 3, 2, 1, 1;
       15, 3, 3, 1, 1;
       30, 4, 1, 1, 1;
       35, 4, 2, 1, 1;
       17, 4, 3, 1, 1;
       41, 1, 1, 2, 1;
       36, 1, 2, 2, 2;
       35, 1, 3, 2, 3;
       25, 2, 1, 2, 1;
       28, 2, 2, 2, 2;
       30, 2, 3, 2, 3;
       24, 3, 1, 2, 1;
       27, 3, 2, 2, 2;
       19, 3, 3, 2, 3;
       25, 4, 1, 2, 1;
       30, 4, 2, 2, 2;
       18, 4, 3, 2, 3];
[n,ncol] = size(dat);

% Number of levels in each variable
levels = [int64(1);4;3;2;3];

% Model information
yvid =  int64(1);
fvid = [int64(3);  4;  5];
rvid = [int64(3)];
svid =  int64(2);
cwid =  int64(0);
fint =  int64(1);
rint =  int64(1);

% Calculate lb
lb = (rint + sum(levels(rvid)))*prod(levels(svid)) + ...
     fint + sum(levels(fvid)) - numel(fvid);

% Variance component
vpr = [int64(1)];
nvpr = int64(numel(vpr));

% Initial gamma
gamma = [1; 1; 0];
% Fit the linear mixed effects regresion model
[nff, nrf, df, ml, b, se, gamma, warn, ifail] = ...
g02jb( ...
       nvpr, levels, yvid, fvid, rvid, svid, cwid, vpr, ...
       dat, fint, rint, lb, gamma);

% Display results
if warn
   fprintf(['Warning: At least one variance component was ', ...
            'estimated to be negative and then reset to zero\n\n']);
end
fprintf('Fixed effects (Estimate and Standard Deviation)\n\n');
k = 1;
if fint==1
  fprintf('Intercept%15s%10.4f%10.4f\n', ' ',b(k), se(k));
  k = k + 1;
end
for i = 1:numel(fvid)
  for j = 1:levels(fvid(i))
    if levels(fvid(i))==1 || j>1
      fprintf('Variable %3d Level %3d: %10.4f%10.4f\n', i, j, b(k), se(k));
      k = k + 1;
    end
  end
end
fprintf('\nRandom Effects (Estimate and Standard Deviation\n\n');
if svid==0
  for i = 1:numel(rvid)
    for j = 1:levels(rvid(i))
      fprintf('Variable %4d Level %4d: %10.4f %10.4f\n', i, j, b(k), se(k));
      k = k + 1;
    end
  end
else
  for l = 1:levels(svid)
    if (rint==1)
      fprintf('Intercept for Subject Level %4d:%12s%10.4f%10.4f\n', ...
              l, ' ', b(k), se(k));
      k = k + 1;
    end
    for i = 1:numel(rvid)
      for j = 1:levels(rvid(i))
        fprintf('Subject Level %4d Variable %4d Level %4d: %10.4f%10.4f\n', ...
               l, i, j, b(k), se(k));
        k = k + 1;
      end
    end
  end
end
fprintf('\n Variance Components\n');
for i = 1:nvpr+rint
  fprintf('%4d%10.4f\n',i,gamma(i));
end
fprintf('\nsigma^2           = %10.4f\n', gamma(nvpr+rint+1));
fprintf('-2log likelihood  = %10.4f\n', ml);
fprintf('DF                = %16d\n', df);


g02jb example results

Fixed effects (Estimate and Standard Deviation)

Intercept                  37.0000    4.0421
Variable   1 Level   2:     1.0000    3.0461
Variable   1 Level   3:   -11.0000    3.0461
Variable   2 Level   2:    -8.2500    1.8736
Variable   3 Level   2:     0.5000    2.6497
Variable   3 Level   3:     7.7500    2.6497

Random Effects (Estimate and Standard Deviation

Intercept for Subject Level    1:               10.7631    3.8855
Subject Level    1 Variable    1 Level    1:     3.7276    2.6268
Subject Level    1 Variable    1 Level    2:    -1.4476    2.6268
Subject Level    1 Variable    1 Level    3:     0.3733    2.6268
Intercept for Subject Level    2:               -0.5269    3.8855
Subject Level    2 Variable    1 Level    1:    -3.7171    2.6268
Subject Level    2 Variable    1 Level    2:    -1.2253    2.6268
Subject Level    2 Variable    1 Level    3:     4.8125    2.6268
Intercept for Subject Level    3:               -5.6450    3.8855
Subject Level    3 Variable    1 Level    1:     0.5903    2.6268
Subject Level    3 Variable    1 Level    2:     0.3987    2.6268
Subject Level    3 Variable    1 Level    3:    -2.3806    2.6268
Intercept for Subject Level    4:               -4.5912    3.8855
Subject Level    4 Variable    1 Level    1:    -0.6009    2.6268
Subject Level    4 Variable    1 Level    2:     2.2742    2.6268
Subject Level    4 Variable    1 Level    3:    -2.8052    2.6268

 Variance Components
   1   46.7969
   2   11.5365

sigma^2           =     7.0208
-2log likelihood  =   141.6877
DF                =               16

PDF version (NAG web site, 64-bit version, 64-bit version)
Chapter Contents
Chapter Introduction
NAG Toolbox

© The Numerical Algorithms Group Ltd, Oxford, UK. 2009–2015