where
$R={\sigma}_{R}^{2}I$, $I$ is the $n\times n$ identity matrix and $G$ is a diagonal matrix. It is assumed that the random variables, $Z$, can be subdivided into $g\le q$ groups with each group being identically distributed with expectation zero and variance ${\sigma}_{i}^{2}$. The diagonal elements of matrix $G$, therefore, take one of the values $\{{\sigma}_{i}^{2}:i=1,2,\dots ,g\}$, depending on which group the associated random variable belongs to.
The model, therefore, contains three sets of unknowns: the fixed effects $\beta $, the random effects $\nu $ and a vector of $g+1$ variance components $\gamma $, where
$\gamma =\{{\sigma}_{1}^{2},{\sigma}_{2}^{2},\dots ,{\sigma}_{g-1}^{2},{\sigma}_{g}^{2},{\sigma}_{R}^{2}\}$.
Case weights can be incorporated into the model by replacing $X$ and $Z$ with ${W}_{c}^{1/2}X$ and ${W}_{c}^{1/2}Z$ respectively where ${W}_{c}$ is a diagonal weight matrix.
The design matrices, $X$ and $Z$, are constructed from an $n\times {m}_{d}$ data matrix, $D$, a description of the fixed independent variables, ${\mathcal{M}}_{f}$, and a description of the random independent variables, ${\mathcal{M}}_{r}$. See Section 11 for further details.
4References
Rao C R (1972) Estimation of variance and covariance components in a linear model J. Am. Stat. Assoc.67 112–115
Wolfinger R, Tobias R and Sall J (1994) Computing Gaussian likelihoods and their derivatives for general linear mixed models SIAM Sci. Statist. Comput.15 1294–1310
5Arguments
1: $\mathbf{hlmm}$ – void **Input/Output
On entry: must be set to NULL or, alternatively, an existing G22 handle may be supplied in which case g02jfc will destroy the supplied G22 handle as if g22zac had been called.
On exit: holds a G22 handle to the internal data structure containing a description of the model. You must not change the G22 handle other than through the functions in Chapters G02 or G22.
2: $\mathbf{hddesc}$ – void *Input
On entry: a G22 handle to the internal data structure containing a description of the data matrix, $D$, as returned in hddesc by g22ybc.
3: $\mathbf{hfixed}$ – void *Input
On entry: a G22 handle to the internal data structure containing a description of the fixed part of the model ${\mathcal{M}}_{f}$ as returned in hform by g22yac.
If hfixed is NULL then the model is assumed to not have a fixed part.
4: $\mathbf{nrndm}$ – IntegerInput
On entry: the number of elements used to describe the random part of the model.
On entry: a series of G22 handles to internal data structures containing a description of the random part of the model ${\mathcal{M}}_{r}$ as returned in hform by g22yac. If ${\mathbf{nrndm}}=0$, hrndm is not referenced and may be NULL.
6: $\mathbf{n}$ – IntegerInput
On entry: $n$, the number of observations in the dataset, $D$.
Constraint:
$1\le {\mathbf{n}}\le {n}_{d}$, where ${n}_{d}$ is the value supplied in nobs when hddesc was created.
On entry: optionally, the diagonal elements of the weight matrix ${W}_{c}$.
If ${\mathbf{wt}}\left[i-1\right]=0.0$, the $i$th observation is not included in the model and the effective number of observations is the number of observations with nonzero weights.
If weights are not provided then wt must be set to NULL, and the effective number of observations is $n$.
Constraint:
if ${\mathbf{wt}}\phantom{\rule{0.25em}{0ex}}\text{is not}\phantom{\rule{0.25em}{0ex}}\mathbf{NULL}$, ${\mathbf{wt}}\left[\mathit{i}-1\right]\ge 0.0$, for $\mathit{i}=1,2,\dots ,n$
Note: the $(i,j)$th element of the matrix is stored in ${\mathbf{dat}}\left[(j-1)\times {\mathbf{pddat}}+i-1\right]$.
On entry: the data matrix, $D$. By default,
${D}_{ij}$, the $\mathit{i}$th value for the $\mathit{j}$th variable, for $\mathit{i}=1,2,\dots ,n$ and $\mathit{j}=1,2,\dots ,{m}_{d}$, should be supplied in ${\mathbf{dat}}\left[\left(j-1\right)\times {\mathbf{pddat}}+i-1\right]$.
If the optional parameter ${\mathbf{Storage\; Order}}$, described in g22ybc, is set to $\mathrm{VAROBS}$, ${D}_{ij}$ should be supplied in ${\mathbf{dat}}\left[\left(i-1\right)\times {\mathbf{pddat}}+j-1\right]$.
If either ${y}_{i}$, ${w}_{i}$ or ${D}_{ij}$, for a variable $j$ used in the model, is NaN (Not A Number) then that value is treated as missing and the whole observation is excluded from the analysis.
10: $\mathbf{pddat}$ – IntegerInput
On entry: the stride separating matrix row elements in the array dat.
Constraints:
if the optional parameter ${\mathbf{Storage\; Order}}$, described in g22ybc, is set to $\mathrm{VAROBS}$, ${\mathbf{pddat}}\ge {m}_{d}$;
if the optional parameter ${\mathbf{Storage\; Order}}$, described in g22ybc, is set to $\mathrm{VAROBS}$, ${\mathbf{sddat}}\ge n$;
otherwise ${\mathbf{sddat}}\ge {m}_{d}$.
12: $\mathbf{fnlsv}$ – Integer *Output
On exit: the number of levels for the overall subject variable in ${\mathcal{M}}_{f}$. If there is no overall subject variable, ${\mathbf{fnlsv}}=1$.
13: $\mathbf{nff}$ – Integer *Output
On exit: the number of fixed effects estimated in each of the fnlsv subject blocks. The number of columns, $p$, in the design matrix $X$ is given by $p={\mathbf{nff}}\times {\mathbf{fnlsv}}$.
14: $\mathbf{rnlsv}$ – Integer *Output
On exit: the number of levels for the overall subject variable in ${\mathcal{M}}_{r}$. If there is no overall subject variable, ${\mathbf{rnlsv}}=1$.
15: $\mathbf{nrf}$ – Integer *Output
On exit: the number of random effects estimated in each of the rnlsv subject blocks. The number of columns, $q$, in the design matrix $Z$ is given by $q={\mathbf{nrf}}\times {\mathbf{rnlsv}}$.
16: $\mathbf{nvpr}$ – Integer *Output
On exit: $g$, the number of variance components being estimated (excluding the overall variance, ${\sigma}_{R}^{2}$). This is defined by the number of terms in the random part of the model, ${\mathcal{M}}_{r}$ (see Section 11 for details).
On exit: a communication array as required by the functions g02jgcorg02jhc.
If licomm or lrcomm are too small and ${\mathbf{licomm}}\ge 2$, then ${\mathbf{fail}}\mathbf{.}\mathbf{code}=$NE_ARRAY_SIZE and ${\mathbf{icomm}}\left[0\right]$ holds the minimum required value for licomm and ${\mathbf{icomm}}\left[1\right]$ holds the minimum required value for lrcomm.
The NAG error argument (see Section 7 in the Introduction to the NAG Library CL Interface).
6Error Indicators and Warnings
NE_ALLOC_FAIL
Dynamic memory allocation failed.
See Section 3.1.2 in the Introduction to the NAG Library CL Interface for further information.
NE_ARRAY_SIZE
On entry, ${\mathbf{licomm}}=\u27e8\mathit{\text{value}}\u27e9$ and ${\mathbf{lrcomm}}=\u27e8\mathit{\text{value}}\u27e9$. Constraint: ${\mathbf{licomm}}\ge \u27e8\mathit{\text{value}}\u27e9$ and ${\mathbf{lrcomm}}\ge \u27e8\mathit{\text{value}}\u27e9$. icomm is not large enough to hold the minimum array sizes.
On entry, ${m}_{d}=\u27e8\mathit{\text{value}}\u27e9$ and ${\mathbf{pddat}}=\u27e8\mathit{\text{value}}\u27e9$. Constraint: ${\mathbf{pddat}}\ge {m}_{d}$.
On entry, ${m}_{d}=\u27e8\mathit{\text{value}}\u27e9$ and ${\mathbf{sddat}}=\u27e8\mathit{\text{value}}\u27e9$. Constraint: ${\mathbf{sddat}}\ge {m}_{d}$.
On entry, $n=\u27e8\mathit{\text{value}}\u27e9$ and ${\mathbf{pddat}}=\u27e8\mathit{\text{value}}\u27e9$. Constraint: ${\mathbf{pddat}}\ge n$.
On entry, $n=\u27e8\mathit{\text{value}}\u27e9$ and ${\mathbf{sddat}}=\u27e8\mathit{\text{value}}\u27e9$. Constraint: ${\mathbf{sddat}}\ge n$.
NE_BAD_PARAM
On entry, argument $\u27e8\mathit{\text{value}}\u27e9$ had an illegal value.
NE_FIELD_UNKNOWN
A variable name used when creating hfixed is not present in hddesc. Variable name: $\u27e8\mathit{\text{value}}\u27e9$.
A variable name used when creating hrndm is not present in hddesc. Variable name: $\u27e8\mathit{\text{value}}\u27e9$.
hfixed is not a G22 handle as generated by g22yac.
$i=\u27e8\mathit{\text{value}}\u27e9$. ${\mathbf{hrndm}}\left[i-1\right]$ has not been initialized or is corrupt.
$i=\u27e8\mathit{\text{value}}\u27e9$. ${\mathbf{hrndm}}\left[i-1\right]$ is not a G22 handle as generated by g22yac.
On entry, hlmm is not NULL or a recognised G22 handle.
NE_INT
On entry, ${\mathbf{n}}=\u27e8\mathit{\text{value}}\u27e9$.
Constraint: ${\mathbf{n}}\ge 1$.
On entry, ${\mathbf{n}}=\u27e8\mathit{\text{value}}\u27e9$ and ${n}_{d}=\u27e8\mathit{\text{value}}\u27e9$. Constraint: ${\mathbf{n}}\le {n}_{d}$, where ${n}_{d}$ is the value supplied in nobs when hddesc was created.
On entry, no observations due to zero weights or missing values.
On entry, ${\mathbf{nrndm}}=\u27e8\mathit{\text{value}}\u27e9$. Constraint: ${\mathbf{nrndm}}\ge 0$.
NE_INTERNAL_ERROR
An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact NAG for assistance.
See Section 7.5 in the Introduction to the NAG Library CL Interface for further information.
NE_NO_LICENCE
Your licence key may have expired or may not have been installed correctly.
See Section 8 in the Introduction to the NAG Library CL Interface for further information.
NE_REAL_ARRAY
On entry, column $j$ of the data matrix, $D$, is not consistent with information supplied in hddesc, $j=\u27e8\mathit{\text{value}}\u27e9$.
On entry, $i=\u27e8\mathit{\text{value}}\u27e9$ and ${\mathbf{wt}}\left[i-1\right]=\u27e8\mathit{\text{value}}\u27e9$. Constraint: ${\mathbf{wt}}\left[i-1\right]\ge 0.0$.
NE_ZERO_VARS
No model has been specified.
NW_ARRAY_SIZE
On entry, ${\mathbf{licomm}}=\u27e8\mathit{\text{value}}\u27e9$ and ${\mathbf{lrcomm}}=\u27e8\mathit{\text{value}}\u27e9$. Constraint: ${\mathbf{licomm}}\ge \u27e8\mathit{\text{value}}\u27e9$ and ${\mathbf{lrcomm}}\ge \u27e8\mathit{\text{value}}\u27e9$. The minimum array sizes for licomm and lrcomm are held in the first two elements of icomm repectively.
NW_POTENTIAL_PROBLEM
Column $j$ of the data matrix, $D$, required rounding more than expected when being treated as a categorical variable, $j=\u27e8\mathit{\text{value}}\u27e9$.
All output is returned using the rounded value(s).
The fixed part of the model contains categorical variables, but no intercept or main effects terms have been requested.
7Accuracy
Not applicable.
8Parallelism and Performance
Background information to multithreading can be found in the Multithreading documentation.
g02jfc makes calls to BLAS and/or LAPACK routines, which may be threaded within the vendor library used by this implementation. Consult the documentation for the vendor library for further information.
Please consult the X06 Chapter Introduction for information on how to control and interrogate the OpenMP environment used within this function. Please also consult the Users' Note for your implementation for any additional implementation-specific information.
9Further Comments
None.
10Example
This example fits a random effects model with three random submodels and two fixed effects to a simulated dataset with $90$ observations and $12$ variables. The model is fit using maximum likelihood (ML). Standard labels for the parameter estimates and variance components are obtained from g22ydc. See g02jhc for an example of how to construct custom labels.
The fixed effects design matrix, $X$, is constructed from the data matrix $D$ and ${\mathcal{M}}_{f}$, as encoded in hfixed. Details of the construction are described in Section 3 in g22yac and Section 3 in g22ycc.
It is possible to store the cross-product matrix, ${X}^{\mathrm{T}}X$ in a block diagonal form if ${\mathcal{M}}_{f}$ contains an overall subject effect, ${S}_{f}$. In this context ${S}_{f}$ is defined as a main effect or interaction term that is contained in all other terms. For example, if ${\mathcal{M}}_{f}$ simplifies to ${V}_{1}.{V}_{4}+{V}_{1}.{V}_{2}.{V}_{4}+{V}_{1}.{V}_{2}.{V}_{3}.{V}_{4}$, then ${S}_{f}={V}_{1}.{V}_{4}$. If it is advantageous to do so, g02jfc will make use of this block diagonal structure and fnlsv will be set to the number of levels in ${S}_{f}$, otherwise ${\mathbf{fnlsv}}=1$.
11.2Random Effects Design Matrix, $\mathit{Z}$
The random effects design matrix, $Z$, is constructed from the data matrix $D$ and ${\mathcal{M}}_{r}$ which is made up of nrndm submodels, ${\mathcal{M}}_{ri}$, where ${\mathcal{M}}_{ri}$ is encoded in ${\mathbf{hrndm}}\left[i-1\right]$. Each submodel is made up of two parts, the random effects and a subject term. The random effects are specified as described in Section 3 in g22yac and the subject term is specified via the g22yac optional parameter ${\mathbf{Subject}}$. The design matrix $Z$ is constructed as described in Section 3 in g22ycc using a model constructed from the nrndm submodels. As an example, if there were $3$ submodels:
-1+V07+V08+V09 / SUBJECT = V13
-1+V05+V06 / SUBJECT = V11.V12
V03+V04 / SUBJECT = V10.V11.V12
then $Z$ would be constructed as if g22ycc was called using the model
It should be noted that unless specified otherwise (by the inclusion of -1) a submodel will contain an intercept. This results in a term corresponding to the subject term being included in the combined model (V10.V11.V12 in this instance).
Each term in the expanded model corresponds to a variance component, so in this case, $g=8$.
When constructing $Z$ all contrast information specified when the submodels are constructed in calls to g22yac is ignored and dummy variables are used throughout.
It is possible to store the cross-product matrix, ${Z}^{\mathrm{T}}Z$ in a block diagonal form if ${\mathcal{M}}_{r}$ contains an overall subject effect, ${S}_{r}$. In this context ${S}_{r}$ is defined as a main effect or interaction term that is contained in all other subject terms. For example, if the random effects model is constructed from $3$ submodels with subject terms ${V}_{1}.{V}_{4}$, ${V}_{1}.{V}_{2}.{V}_{4}$ and ${V}_{1}.{V}_{2}.{V}_{3}.{V}_{4}$, then ${S}_{r}={V}_{1}.{V}_{4}$ and rnlsv will be set to the number of levels in ${S}_{r}$, otherwise ${\mathbf{rnlsv}}=1$.
12Optional Parameters
As well as the optional parameters common to all G22 handles described in g22zmcandg22znc, a number of additional optional parameters can be specified for a G22 handle holding the description of a linear mixed model, as returned by g02jfc in hlmm.
Each writeable optional parameter has an associated default value; to set any of them to a non-default value, use g22zmc. The value of any optional parameter can be queried using g22znc.
Most of the optional parameters described in this section are related to the behaviour g02jhc when fitting the model. These descriptions should, therefore, be read in conjunction with the documentation for that function.
The remainder of this section can be skipped if you wish to use the default values for all optional parameters.
The following is a list of the optional parameters available. A full description of each optional parameter is provided in Section 12.1.
A lower bound for the elements of ${\gamma}^{*}$, where ${\gamma}^{*}=\gamma /{\sigma}_{R}^{2}$.
Gamma Upper Bound
$r$
Default $\text{}={10}^{20}$
An upper bound for the elements of ${\gamma}^{*}$, where ${\gamma}^{*}=\gamma /{\sigma}_{R}^{2}$.
Initial Distance
$r$
Default $\text{}=100000.0$
The initial distance from the solution.
When ${\mathbf{Solver}}=\mathrm{E04LB}$, g02jhc passes ${\mathbf{Initial\; Distance}}$ to the solver as
stepmx.
When ${\mathbf{Solver}}=\mathrm{E04UC}$, this option is ignored.
Initial Value Strategy
$i$
Default $\text{}=\text{special}$
Controls how g02jhc will choose the initial values for the variance components, $\gamma $, if not supplied.
${\mathbf{Initial\; Value\; Strategy}}=0$
The MIVQUE0 estimates of the variance components based on the likelihood specified by ${\mathbf{Likelihood}}$ are used.
${\mathbf{Initial\; Value\; Strategy}}=1$
The MIVQUE0 estimates based on the maximum likelihood are used, irrespective of the value of ${\mathbf{Likelihood}}$.
See Rao (1972) for a description of the minimum variance quadratic unbiased estimators (MIVQUE0).
By default, for small problems, ${\mathbf{Initial\; Value\; Strategy}}=0$ and for large problems ${\mathbf{Initial\; Value\; Strategy}}=1$.
Constraint:
${\mathbf{Initial\; Value\; Strategy}}=0$ or $1$.
Likelihood
$a$
Default $\text{}=\mathrm{REML}$
${\mathbf{Likelihood}}$ defines whether g02jhc will use the restricted maximum likelihood (REML) or the maximum likelihood (ML) when fitting the model.
Constraint:
${\mathbf{Likelihood}}=\mathrm{REML}$ or $\mathrm{ML}$.
Linear Minimization Accuracy
$r$
Default $\text{}=0.9$
The accuracy of the linear minimizations.
When ${\mathbf{Solver}}=\mathrm{E04LB}$, g02jhc passes ${\mathbf{Linear\; Minimization\; Accuracy}}$ to the solver as
eta.
When ${\mathbf{Solver}}=\mathrm{E04UC}$, this option is ignored.
Line Search Tolerance
$r$
Default $\text{}=0.9$
The line search tolerance.
When ${\mathbf{Solver}}=\mathrm{E04LB}$, this option is ignored.
When ${\mathbf{Solver}}=\mathrm{E04UC}$, g02jhc passes ${\mathbf{Line\; Search\; Tolerance}}$ to the solver as
${\mathbf{Line\; Search\; Tolerance}}$.
List
NoList
Default
Optional parameter ${\mathbf{List}}$ enables printing of each optional parameter specification as it is supplied. ${\mathbf{NoList}}$ suppresses this printing.
Major Iteration Limit
$i$
Default $\text{}=\text{special}$
The number of major iterations.
When ${\mathbf{Solver}}=\mathrm{E04LB}$, g02jhc passes ${\mathbf{Major\; Iteration\; Limit}}$ to the solver as
maxcal.
In this case, the default value used is $1000$.
When ${\mathbf{Solver}}=\mathrm{E04UC}$, g02jhc passes ${\mathbf{Major\; Iteration\; Limit}}$ to the solver as
${\mathbf{Major\; Iteration\; Limit}}$.
In this case, the default value used is $\mathrm{max}\phantom{\rule{0.125em}{0ex}}(50,3\times g)$, where $g$ is the number of variance components being estimated (excluding the overall variance, ${\sigma}_{R}^{2}$).
Major Print Level
$i$
Default $\text{}=\text{special}$
The frequency that monitoring information is output to ${\mathbf{Unit\; Number}}$.
When ${\mathbf{Solver}}=\mathrm{E04LB}$, g02jhc passes ${\mathbf{Major\; Print\; Level}}$ to the solver as
iprint.
In this case, the default value used is $-1$ and hence no monitoring information will be output.
When ${\mathbf{Solver}}=\mathrm{E04UC}$, g02jhc passes ${\mathbf{Major\; Print\; Level}}$ to the solver as
${\mathbf{Major\; Print\; Level}}$.
In this case, the default value used is $0$ and hence no monitoring information will be output.
Maximum Number of Threads
$i$
Default $\text{}=\text{special}$
Controls the maximum number of threads used by g02jhc in a multithreaded library. By default, the maximum number of available threads are used.
In a library that is not multithreaded, this option has no effect.
When ${\mathbf{Solver}}=\mathrm{E04LB}$, this option is ignored.
When ${\mathbf{Solver}}=\mathrm{E04UC}$, g02jhc passes ${\mathbf{Minor\; Iteration\; Limit}}$ to the solver as
${\mathbf{Minor\; Iteration\; Limit}}$.
In this case, the default value used is $\mathrm{max}\phantom{\rule{0.125em}{0ex}}(50,3\times g)$, where $g$ is the number of variance components being estimated (excluding the overall variance, ${\sigma}_{R}^{2}$).
Minor Print Level
$i$
Default $\text{}=0$
The frequency that additional monitoring information is output to ${\mathbf{Unit\; Number}}$.
When ${\mathbf{Solver}}=\mathrm{E04LB}$, this option is ignored.
When ${\mathbf{Solver}}=\mathrm{E04UC}$, g02jhc passes ${\mathbf{Minor\; Print\; Level}}$ to the solver as
${\mathbf{Minor\; Print\; Level}}$.
The default value of $0$ means that no additional monitoring information will be output.
When ${\mathbf{Solver}}=\mathrm{E04LB}$, this option is ignored.
When ${\mathbf{Solver}}=\mathrm{E04UC}$, g02jhc passes ${\mathbf{Optimality\; Tolerance}}$ to the solver as
${\mathbf{Optimality\; Tolerance}}$.
Parallelisation Strategy
$i$
Default $\text{}=\text{special}$
If ${\mathbf{Maximum\; Number\; of\; Threads}}>0$ then ${\mathbf{Parallelisation\; Strategy}}$
controls how g02jhc is parallelised in a multithreaded library.
${\mathbf{Parallelisation\; Strategy}}=1$
g02jhc will attempt to parallelise operations involving $Z$, even if ${\mathbf{rnlsv}}=1$.
${\mathbf{Parallelisation\; Strategy}}=2$
g02jhc will only attempt to parallelise operations involving $Z$, if ${\mathbf{rnlsv}}>1$.
By default, ${\mathbf{Parallelisation\; Strategy}}=1$, however, for some models / datasets, this may be slower than using ${\mathbf{Parallelisation\; Strategy}}=2$ when ${\mathbf{rnlsv}}=1$.
In a library that is not multithreaded, this option has no effect.
Constraint:
${\mathbf{Parallelisation\; Strategy}}=1$ or $2$.
Solution Accuracy
$r$
Default $\text{}=0.0$
The accuracy to which the solution is required.
When ${\mathbf{Solver}}=\mathrm{E04LB}$, g02jhc passes ${\mathbf{Solution\; Accuracy}}$ to the solver as
xtol.
When ${\mathbf{Solver}}=\mathrm{E04UC}$, this option is ignored.
Solver
$a$
Default $\text{}=\text{special}$
Controls which solver g02jhc will use when fitting the model. By default, ${\mathbf{Solver}}=\mathrm{E04LB}$ is used for small problems and ${\mathbf{Solver}}=\mathrm{E04UC}$, otherwise.
If ${\mathbf{Solver}}=\mathrm{E04LB}$, then the solver used is the one implemented in e04lbc and if ${\mathbf{Solver}}=\mathrm{E04UC}$, then the solver used is the one implemented in e04ucc.
Constraint:
${\mathbf{Solver}}=\mathrm{E04LB}$ or $\mathrm{E04UC}$.
Sweep Tolerance
$r$
Default $\text{}=\text{special}$
The sweep tolerance used by g02jhc when performing the sweep operation Wolfinger et al. (1994). The default value used is ${\mathbf{Sweep\; Tolerance}}=\mathrm{max}\phantom{\rule{0.125em}{0ex}}(\epsilon ,\epsilon \times \left({\displaystyle \underset{i}{\mathrm{max}}}\phantom{\rule{0.25em}{0ex}}{\left({Z}^{\mathrm{T}}\right)}_{ii}\right))$, where $\epsilon =\sqrt{\mathit{machineprecision}}$.