NAG Library Routine Document

g02jaf (mixeff_reml)

1
Purpose

g02jaf fits a linear mixed effects regression model using restricted maximum likelihood (REML).

2
Specification

Fortran Interface
Subroutine g02jaf ( n, ncol, lddat, dat, levels, yvid, cwid, nfv, fvid, fint, nrv, rvid, nvpr, vpr, rint, svid, gamma, nff, nrf, df, reml, lb, b, se, maxit, tol, warn, ifail)
Integer, Intent (In):: n, ncol, lddat, levels(ncol), yvid, cwid, nfv, fvid(nfv), fint, nrv, rvid(nrv), nvpr, vpr(nrv), rint, svid, lb, maxit
Integer, Intent (Inout):: ifail
Integer, Intent (Out):: nff, nrf, df, warn
Real (Kind=nag_wp), Intent (In):: dat(lddat,ncol), tol
Real (Kind=nag_wp), Intent (Inout):: gamma(nvpr+2)
Real (Kind=nag_wp), Intent (Out):: reml, b(lb), se(lb)
C Header Interface
#include <nagmk26.h>
void  g02jaf_ (const Integer *n, const Integer *ncol, const Integer *lddat, const double dat[], const Integer levels[], const Integer *yvid, const Integer *cwid, const Integer *nfv, const Integer fvid[], const Integer *fint, const Integer *nrv, const Integer rvid[], const Integer *nvpr, const Integer vpr[], const Integer *rint, const Integer *svid, double gamma[], Integer *nff, Integer *nrf, Integer *df, double *reml, const Integer *lb, double b[], double se[], const Integer *maxit, const double *tol, Integer *warn, Integer *ifail)

3
Description

g02jaf fits a model of the form:
y=Xβ+Zν+ε  
where and
Both ν  and ε  are assumed to have a Gaussian distribution with expectation zero and
Var ν ε = G 0 0 R  
where R= σ R 2 I , I  is the n×n  identity matrix and G  is a diagonal matrix. It is assumed that the random variables, Z , can be subdivided into g q  groups with each group being identically distributed with expectations zero and variance σi2 . The diagonal elements of matrix G  therefore take one of the values σi2 : i=1,2,,g , depending on which group the associated random variable belongs to.
The model therefore contains three sets of unknowns, the fixed effects, β , the random effects ν  and a vector of g+1  variance components, γ , where γ = σ12,σ22,, σ g-1 2 ,σg2,σR2 . Rather than working directly with γ , g02jaf uses an iterative process to estimate γ* = σ12 / σR2 , σ22 / σR2 ,, σg-12 / σR2 , σg2 / σR2 ,1 . Due to the iterative nature of the estimation a set of initial values, γ0 , for γ*  is required. g02jaf allows these initial values either to be supplied by you or calculated from the data using the minimum variance quadratic unbiased estimators (MIVQUE0) suggested by Rao (1972).
g02jaf fits the model using a quasi-Newton algorithm to maximize the restricted log-likelihood function:
-2 l R = log V + n-p log r V-1 r + log X V-1 X + n-p 1+ log 2 π / n-p  
where
V = ZG Z + R,   r=y-Xb   and   b = X V-1 X -1 X V-1 y .  
Once the final estimates for γ *  have been obtained, the value of σR2  is given by:
σR2 = r V-1 r / n - p .  
Case weights, Wc , can be incorporated into the model by replacing XX  and ZZ  with XWcX  and ZWcZ  respectively, for a diagonal weight matrix Wc .
The log-likelihood, lR, is calculated using the sweep algorithm detailed in Wolfinger et al. (1994).

4
References

Goodnight J H (1979) A tutorial on the SWEEP operator The American Statistician 33(3) 149–158
Harville D A (1977) Maximum likelihood approaches to variance component estimation and to related problems JASA 72 320–340
Rao C R (1972) Estimation of variance and covariance components in a linear model J. Am. Stat. Assoc. 67 112–115
Stroup W W (1989) Predictable functions and prediction space in the mixed model procedure Applications of Mixed Models in Agriculture and Related Disciplines Southern Cooperative Series Bulletin No. 343 39–48
Wolfinger R, Tobias R and Sall J (1994) Computing Gaussian likelihoods and their derivatives for general linear mixed models SIAM Sci. Statist. Comput. 15 1294–1310

5
Arguments

1:     n – IntegerInput
On entry: n, the number of observations.
Constraint: n1.
2:     ncol – IntegerInput
On entry: the number of columns in the data matrix, dat.
Constraint: ncol1.
3:     lddat – IntegerInput
On entry: the first dimension of the array dat as declared in the (sub)program from which g02jaf is called.
Constraint: lddatn.
4:     datlddatncol – Real (Kind=nag_wp) arrayInput
On entry: array containing all of the data. For the ith observation:
  • datiyvid holds the dependent variable, y;
  • if cwid0, daticwid holds the case weights;
  • if svid0, datisvid holds the subject variable.
The remaining columns hold the values of the independent variables.
Constraints:
  • if cwid0, daticwid0.0;
  • if levelsj1, 1datijlevelsj.
5:     levelsncol – Integer arrayInput
On entry: levelsi contains the number of levels associated with the ith variable of the data matrix dat. If this variable is continuous or binary (i.e., only takes the values zero or one) then levelsi should be 1; if the variable is discrete then levelsi is the number of levels associated with it and datji is assumed to take the values 1 to levelsi, for j=1,2,,n.
Constraint: levelsi1, for i=1,2,,ncol.
6:     yvid – IntegerInput
On entry: the column of dat holding the dependent, y, variable.
Constraint: 1yvidncol.
7:     cwid – IntegerInput
On entry: the column of dat holding the case weights.
If cwid=0, no weights are used.
Constraint: 0cwidncol.
8:     nfv – IntegerInput
On entry: the number of independent variables in the model which are to be treated as being fixed.
Constraint: 0nfv<ncol.
9:     fvidnfv – Integer arrayInput
On entry: the columns of the data matrix dat holding the fixed independent variables with fvidi  holding the column number corresponding to the i th fixed variable.
Constraint: 1fvidincol, for i=1,2,,nfv.
10:   fint – IntegerInput
On entry: flag indicating whether a fixed intercept is included (fint=1).
Constraint: fint=0 or 1.
11:   nrv – IntegerInput
On entry: the number of independent variables in the model which are to be treated as being random.
Constraints:
  • 0nrv<ncol;
  • nrv+rint>0.
12:   rvidnrv – Integer arrayInput
On entry: the columns of the data matrix dat holding the random independent variables with rvidi  holding the column number corresponding to the i th random variable.
Constraint: 1rvidincol, for i=1,2,,nrv.
13:   nvpr – IntegerInput
On entry: if rint=1 and svid0, nvpr is the number of variance components being estimated-2, (g-1), else nvpr=g.
If nrv=0, nvpr is not referenced.
Constraint: if nrv0, 1nvprnrv.
14:   vprnrv – Integer arrayInput
On entry: vpri  holds a flag indicating the variance of the i th random variable. The variance of the i th random variable is σ j 2 , where j = vpri + 1  if rint=1 and svid0 and j = vpri  otherwise. Random variables with the same value of j are assumed to be taken from the same distribution.
Constraint: 1vprinvpr, for i=1,2,,nrv.
15:   rint – IntegerInput
On entry: flag indicating whether a random intercept is included (rint=1).
If svid=0, rint is not referenced.
Constraint: rint=0 or 1.
16:   svid – IntegerInput
On entry: the column of dat holding the subject variable.
If svid=0, no subject variable is used.
Specifying a subject variable is equivalent to specifying the interaction between that variable and all of the random-effects. Letting the notation Z1 × ZS  denote the interaction between variables Z1  and ZS , fitting a model with rint = 0 , random-effects Z1 + Z2  and subject variable ZS  is equivalent to fitting a model with random-effects Z1 × ZS + Z2 × ZS  and no subject variable. If rint = 1  the model is equivalent to fitting ZS + Z1 × ZS + Z2 × ZS  and no subject variable.
Constraint: 0svidncol.
17:   gammanvpr+2 – Real (Kind=nag_wp) arrayInput/Output
On entry: holds the initial values of the variance components, γ0 , with gammai the initial value for σi2/σR2, for i=1,2,,g. If rint=1 and svid0, g=nvpr+1, else g=nvpr.
If gamma1=-1.0, the remaining elements of gamma are ignored and the initial values for the variance components are estimated from the data using MIVQUE0.
On exit: gammai, for i=1,2,,g, holds the final estimate of σi2 and gammag+1 holds the final estimate for σR2.
Constraint: gamma1=-1.0  or  gammai0.0, for i=1,2,,g.
18:   nff – IntegerOutput
On exit: the number of fixed effects estimated (i.e., the number of columns, p, in the design matrix X).
19:   nrf – IntegerOutput
On exit: the number of random effects estimated (i.e., the number of columns, q, in the design matrix Z).
20:   df – IntegerOutput
On exit: the degrees of freedom.
21:   reml – Real (Kind=nag_wp)Output
On exit: - 2 lR γ^  where lR  is the log of the restricted maximum likelihood calculated at γ^ , the estimated variance components returned in gamma.
22:   lb – IntegerInput
On entry: the size of the array b.
Constraint: lb fint + i=1 nfv maxlevelsfvidi-1,1 + LS × rint + i=1 nrv levelsrvidi  where LS = levelssvid  if svid0 and 1 otherwise.
23:   blb – Real (Kind=nag_wp) arrayOutput
On exit: the parameter estimates, β,ν, with the first nff elements of b containing the fixed effect parameter estimates, β and the next nrf elements of b containing the random effect parameter estimates, ν.
Fixed effects
If fint=1, b1 contains the estimate of the fixed intercept. Let Li  denote the number of levels associated with the ith fixed variable, that is Li = levels fvidi . Define
  • if fint=1, F1 = 2  else if fint=0, F1=1 ;
  • F i+1 = Fi + maxLi-1,1 , i1 .
Then for i=1,2,,nfv:
  • if Li > 1 , b Fi+j-2  contains the parameter estimate for the jth level of the ith fixed variable, for j=2,3,,Li;
  • if Li 1 , bFi  contains the parameter estimate for the ith fixed variable.
Random effects
Redefining Li  to denote the number of levels associated with the ith random variable, that is Li = levels rvidi . Define
  • if rint=1, R1 = 2  else if rint=0, R1=1 ;
    R i+1 = Ri + Li , i1 .
Then for i = 1 , 2 , , nrv :
  • if svid=0,
    • if Li > 1 , b nff + Ri +j-1  contains the parameter estimate for the jth level of the ith random variable, for j=1,2,,Li;
    • if Li 1 , b nff + Ri  contains the parameter estimate for the ith random variable;
  • if svid 0 ,
    • let LS  denote the number of levels associated with the subject variable, that is LS = levels svid ;
    • if Li > 1 , b nff + s-1 LS + Ri + j - 1  contains the parameter estimate for the interaction between the sth level of the subject variable and the jth level of the ith random variable, for s=1,2,,LS and j=1,2,,Li;
    • if Li 1 , b nff + s-1 LS + Ri  contains the parameter estimate for the interaction between the sth level of the subject variable and the ith random variable, for s=1,2,,LS;
    • if rint=1, bnff+1  contains the estimate of the random intercept.
24:   selb – Real (Kind=nag_wp) arrayOutput
On exit: the standard errors of the parameter estimates given in b.
25:   maxit – IntegerInput
On entry: the maximum number of iterations.
If maxit < 0 , the default value of 100  is used.
If maxit=0, the parameter estimates β,ν  and corresponding standard errors are calculated based on the value of γ0  supplied in gamma.
26:   tol – Real (Kind=nag_wp)Input
On entry: the tolerance used to assess convergence.
If tol0.0, the default value of ε0.7 is used, where ε is the machine precision.
27:   warn – IntegerOutput
On exit: is set to 1  if a variance component was estimated to be a negative value during the fitting process. Otherwise warn is set to 0 .
If warn=1, the negative estimate is set to zero and the estimation process allowed to continue.
28:   ifail – IntegerInput/Output
On entry: ifail must be set to 0, -1 or 1. If you are unfamiliar with this argument you should refer to Section 3.4 in How to Use the NAG Library and its Documentation for details.
For environments where it might be inappropriate to halt program execution when an error is detected, the value -1 or 1 is recommended. If the output of error messages is undesirable, then the value 1 is recommended. Otherwise, if you are not familiar with this argument, the recommended value is 0. When the value -1 or 1 is used it is essential to test the value of ifail on exit.
On exit: ifail=0 unless the routine detects an error or a warning has been flagged (see Section 6).

6
Error Indicators and Warnings

If on entry ifail=0 or -1, explanatory error messages are output on the current error message unit (as defined by x04aaf).
Errors or warnings detected by the routine:
ifail=1
On entry, cwid=value and ncol=value.
Constraint: 0cwidncol and any supplied weights must be 0.0.
On entry, fint=value.
Constraint: fint=0 or 1.
On entry, lb too small: lb=value.
On entry, lddat=value and n=value.
Constraint: lddatn.
On entry, n=value.
Constraint: n1.
On entry, n=value.
Constraint: number of observations with nonzero weights must be greater than one.
On entry, ncol=value.
Constraint: ncol1.
On entry, nfv=value and ncol=value.
Constraint: 0nfv<ncol.
On entry, nrv=value and ncol=value.
Constraint: 0nrv<ncol and nrv+rint>0.
On entry, nvpr=value and nrv=value.
Constraint: 0nvprnrv and (nrv0 or nvpr1).
On entry, rint=value.
Constraint: rint=0 or 1.
On entry, svid=value and ncol=value.
Constraint: 0svidncol.
On entry, yvid=value and ncol=value.
Constraint: 1yvidncol.
ifail=2
On entry, gammai<0.0, for at least one i.
On entry, invalid data: categorical variable with value greater than that specified in levels.
On entry, levelsi<1, for at least one i.
On entry, ncol=value.
Constraint: 1fvidincol, for all i.
On entry, ncol=value.
Constraint: 1rvidincol, for all i.
On entry, nvpr=value.
Constraint: 1vprinvpr, for all i.
ifail=3
Degrees of freedom <1: df=value.
This is due to the number of parameters exceeding the effective number of observations.
ifail=4
Routine failed to converge in maxit iterations: maxit=value.
See Section 10 for advice.
Routine failed to converge to specified tolerance: tol=value.
See Section 10 for advice.
ifail=-99
An unexpected error has been triggered by this routine. Please contact NAG.
See Section 3.9 in How to Use the NAG Library and its Documentation for further information.
ifail=-399
Your licence key may have expired or may not have been installed correctly.
See Section 3.8 in How to Use the NAG Library and its Documentation for further information.
ifail=-999
Dynamic memory allocation failed.
See Section 3.7 in How to Use the NAG Library and its Documentation for further information.

7
Accuracy

The accuracy of the results can be adjusted through the use of the tol argument.

8
Parallelism and Performance

g02jaf is threaded by NAG for parallel execution in multithreaded implementations of the NAG Library.
g02jaf makes calls to BLAS and/or LAPACK routines, which may be threaded within the vendor library used by this implementation. Consult the documentation for the vendor library for further information.
Please consult the X06 Chapter Introduction for information on how to control and interrogate the OpenMP environment used within this routine. Please also consult the Users' Note for your implementation for any additional implementation-specific information.

9
Further Comments

Wherever possible any block structure present in the design matrix Z should be modelled through a subject variable, specified via svid, rather than being explicitly entered into dat.
g02jaf uses an iterative process to fit the specified model and for some problems this process may fail to converge (see ifail=4). If the routine fails to converge then the maximum number of iterations (see maxit) or tolerance (see tol) may require increasing; try a different starting estimate in gamma. Alternatively, the model can be fit using maximum likelihood (see g02jbf) or using the noniterative MIVQUE0.
To fit the model just using MIVQUE0, the first element of gamma should be set to -1.0 and maxit should be set to zero.
Although the quasi-Newton algorithm used in g02jaf tends to require more iterations before converging compared to the Newton–Raphson algorithm recommended by Wolfinger et al. (1994), it does not require the second derivatives of the likelihood function to be calculated and consequentially takes significantly less time per iteration.

10
Example

The following dataset is taken from Stroup (1989) and arises from a balanced split-plot design with the whole plots arranged in a randomized complete block-design.
In this example the full design matrix for the random independent variable, Z , is given by:
Z = 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1  
= A 0 0 0 0 A 0 0 0 0 A 0 0 0 0 A A 0 0 0 0 A 0 0 0 0 A 0 0 0 0 A , (1)
where
A = 1 1 0 0 1 0 1 0 1 0 0 1 .  
The block structure evident in (1) is modelled by specifying a four-level subject variable, taking the values 1 , 1 , 1 , 2 , 2 , 2 , 3 , 3 , 3 , 4 , 4 , 4 , 1 , 1 , 1 , 2 , 2 , 2 , 3 , 3 , 3 , 4 , 4 , 4 . The first column of 1s  is added to A  by setting rint=1. The remaining columns of A are specified by a three level factor, taking the values, 1 , 2 , 3 , 1 , 2 , 3 , 1 , .

10.1
Program Text

Program Text (g02jafe.f90)

10.2
Program Data

Program Data (g02jafe.d)

10.3
Program Results

Program Results (g02jafe.r)