NAG Library Routine Document
g22ydf (lm_submodel)
1
Purpose
g22ydf produces labels for the columns of a design matrix, model parameters and a vector of column inclusion flags suitable for use with routines in
Chapter G02. Thus allowing for submodels to be fit using the same design matrix.
2
Specification
Fortran Interface
Subroutine g22ydf ( 
hform, hxdesc, intcpt, ip, lisx, isx, lplab, plab, lvinfo, vinfo, ifail) 
Integer, Intent (In)  ::  lisx, lplab, lvinfo  Integer, Intent (Inout)  ::  ifail  Integer, Intent (Out)  ::  ip, isx(lisx), vinfo(lvinfo)  Character (*), Intent (Out)  ::  intcpt, plab(lplab)  Type (c_ptr), Intent (In)  ::  hform, hxdesc 

C Header Interface
#include nagmk26.h
void 
g22ydf_ (void **hform, void **hxdesc, char *intcpt, Integer *ip, const Integer *lisx, Integer isx[], const Integer *lplab, char plab[], const Integer *lvinfo, Integer vinfo[], Integer *ifail, const Charlen length_intcpt, const Charlen length_plab) 

3
Description
g22ydf is a utility routine for use with
g22yaf,
g22ybf and
g22ycf. It can be used to construct labels for the columns for an
$n\times {m}_{x}$ design matrix,
$X$, created by
g22ycf and return additional input vectors and flags required by a number of NAG Library model fitting routines.
Many of the analysis routines that require a design matrix to be supplied allow submodels to be defined through the use of a vector of ones or zeros indicating whether a column of
$X$ should be included or excluded from the analyses (see for example
isx in
g02daf or
g02gaf). This allows nested models to be fit without having to reconstructed the design matrix for each analysis.
Let
$\mathcal{M}$ denote a model constructed by
g22yaf,
$D$ a data matrix as described by
g22ybf and
$X$ be the corresponding design matrix constructed by
g22ycf from
$\mathcal{M}$ and
$D$. A different model,
${\mathcal{M}}_{S}$ is a submodel of
$\mathcal{M}$ if each term in
${\mathcal{M}}_{S}$, including the mean effect (intercept term) is also present in
$\mathcal{M}$.
If ${\mathcal{M}}_{S}$ is a submodel of $\mathcal{M}$, you can fit ${\mathcal{M}}_{S}$ to $D$ using a design matrix whose columns are a subset of the columns of $X$.
4
References
None.
5
Arguments
 1: $\mathbf{hform}$ – Type (c_ptr)Input

On entry: a G22 handle to the internal data structure containing a description of the required (sub)model
${\mathcal{M}}_{S}$, as returned in
hform by
g22yaf.
 2: $\mathbf{hxdesc}$ – Type (c_ptr)Input

On entry: a G22 handle to the internal data structure containing a description of the design matrix,
$D$, as returned in
hxdesc by
g22ycf.
 3: $\mathbf{intcpt}$ – Character(*)Output

On exit: if
${\mathbf{intcpt}}=\text{'M'}$, in order to fit the model
${\mathcal{M}}_{S}$ to
$D$ using
$X$, any analysis routine should include an implicit mean effect (intercept term).
${\mathbf{intcpt}}=\text{'Z'}$, if ${\mathcal{M}}_{S}$ does not include a mean effect or the mean effect has been explicitly included in the design matrix.
 4: $\mathbf{ip}$ – IntegerOutput

On exit:
$p$, the number of parameters in the model specified in
hform, including the intercept if one is present.
If ${\mathbf{lisx}}\ne 0$, if ${\mathbf{intcpt}}=\text{'Z'}$, $p={\sum}_{i=1}^{{m}_{x}}{\mathbf{isx}}\left(i\right)$, otherwise $p={\sum}_{i=1}^{{m}_{x}}{\mathbf{isx}}\left(i\right)+1$.
 5: $\mathbf{lisx}$ – IntegerInput

Constraint:
${\mathbf{lisx}}=0$ or ${\mathbf{lisx}}\ge {m}_{x}$, where ${m}_{x}$ is the number of columns in the design matrix $X$.
 6: $\mathbf{isx}\left({\mathbf{lisx}}\right)$ – Integer arrayOutput

On exit: if
${\mathbf{lisx}}\ne 0$, an array indicating which columns of the design matrix form the model specified in
hform.
 ${\mathbf{isx}}\left(j\right)=0$
 The $j$th column of the design matrix, $X$, should not be included in the analysis.
 ${\mathbf{isx}}\left(j\right)=1$
 The $j$th column of the design matrix, $X$, should be included in the analysis.
If
${\mathbf{lisx}}=0$,
isx is not referenced.
 7: $\mathbf{lplab}$ – IntegerInput

On entry: the length of
plab.
As $p\le {m}_{x}+1$, if labels are required, using ${\mathbf{lplab}}={m}_{x}+1$ will always be sufficient.
Constraint:
${\mathbf{lplab}}=0$ or ${\mathbf{lplab}}\ge p$.
 8: $\mathbf{plab}\left({\mathbf{lplab}}\right)$ – Character(*) arrayOutput

On exit: if
${\mathbf{lplab}}\ne 0$, the names associated with the
$p$ parameters in the model.
If
${\mathbf{intcpt}}=\text{'Z'}$, the labels in
plab are also the labels for the columns of design matrix used in the analysis.
If ${\mathbf{intcpt}}=\text{'M'}$, columns ${\mathbf{plab}}\left(2\right)$ to ${\mathbf{plab}}\left(p\right)$ are the corresponding column labels.
If a mean effect is present in ${M}_{S}$, the corresponding label is always in ${\mathbf{plab}}\left(1\right)$.
If
${\mathbf{lplab}}=0$,
plab is not referenced.
 9: $\mathbf{lvinfo}$ – IntegerInput

On entry: the length of
vinfo.
Let
${n}_{T}$ denote the number of terms in
${M}_{S}$,
${n}_{Tt}$ denote the number of variables in the
$t$th term and
${m}_{xt}$ denote the number of columns of
$X$ corresponding to the
$t$th term. The required size of
vinfo, denoted
$a$ is given by:
If the model includes a mean effect,
$a$ should be incremented by one.
The values
${n}_{T}$,
${n}_{Tt}$ and
${m}_{xt}$ are not trivial to calculate as they require the formula describing the model to be fully expanded and the contrast / dummy variable encoding to be known. Therefore, if
lisx,
lplab or
lvinfo are too small and
${\mathbf{lvinfo}}\ge 3$,
${\mathbf{ifail}}={\mathbf{92}}$ is returned and the required sizes for these arrays are returned in
${\mathbf{vinfo}}\left(1\right)$,
${\mathbf{vinfo}}\left(2\right)$ and
${\mathbf{vinfo}}\left(3\right)$ respectively.
Constraint:
${\mathbf{lvinfo}}=0$ or ${\mathbf{lvinfo}}\ge a$.
 10: $\mathbf{vinfo}\left({\mathbf{lvinfo}}\right)$ – Integer arrayOutput

On exit: if
${\mathbf{lvinfo}}\ne 0$, information encoding a description of the parameters in the model.
The encoding information can be extracted as follows:
(i) 
Set $k=1$. 
(ii) 
Iterate $j$ from $1$ to $p$.
1. 
Set $b={\mathbf{vinfo}}\left(k\right)$. 
2. 
Increment $k$. 
3. 
Iterate $i$ from $1$ to $b$.
(a) 
Set ${v}_{i}={\mathbf{vinfo}}\left(k\right)$. 
(b) 
Set ${l}_{i}={\mathbf{vinfo}}\left(k+1\right)$. 
(c) 
Set ${c}_{i}={\mathbf{vinfo}}\left(k+2\right)$. 
(d) 
Increment $k$ by $3$. 

4. 
The $j$th model parameter corresponds to the interaction between the $b$ variables held in columns ${v}_{1},{v}_{2},\dots ,{v}_{b}$ of $D$. Therefore, $b=1$ indicates a main effect, $b=2$ a twoway interaction, etc..
If $b=0$, the $j$th model parameter corresponds to the mean effect.
If ${l}_{i}=0$, the corresponding variable ${v}_{i}$ is binary, ordinal or continuous. Otherwise, ${l}_{i}$ is the level for the corresponding variable for model parameter $j$.
${c}_{i}$ is a numeric flag indicating the contrast used in the case of a categorical variable. With ${c}_{i}=0$ indicating that dummy variables were used for variable ${v}_{i}$ in this term. The remaining six types of contrast; treatment contrasts (with respect to the first and last levels), sum contrasts (with respect to the first and last levels), Helmert contrasts and polynomial contrasts, as described in g22ycf, are identified by the integers one to six respectively. 

If
${\mathbf{lvinfo}}=0$,
vinfo is not referenced.
 11: $\mathbf{ifail}$ – IntegerInput/Output

On entry:
ifail must be set to
$0$,
$1\text{ or}1$. If you are unfamiliar with this argument you should refer to
Section 3.4 in How to Use the NAG Library and its Documentation for details.
For environments where it might be inappropriate to halt program execution when an error is detected, the value
$1\text{ or}1$ is recommended. If the output of error messages is undesirable, then the value
$1$ is recommended. Otherwise, if you are not familiar with this argument, the recommended value is
$0$.
When the value $\mathbf{1}\text{ or}\mathbf{1}$ is used it is essential to test the value of ifail on exit.
On exit:
${\mathbf{ifail}}={\mathbf{0}}$ unless the routine detects an error or a warning has been flagged (see
Section 6).
6
Error Indicators and Warnings
If on entry
${\mathbf{ifail}}=0$ or
$1$, explanatory error messages are output on the current error message unit (as defined by
x04aaf).
Errors or warnings detected by the routine:
 ${\mathbf{ifail}}=11$

hform has not been initialized or is corrupt.
 ${\mathbf{ifail}}=12$

hform is not a G22 handle as generated by
g22yaf.
 ${\mathbf{ifail}}=13$

A variable name used when creating
hform is not present in
hxdesc.
Variable name:
$\u2329\mathit{\text{value}}\u232a$.
 ${\mathbf{ifail}}=14$

The model and the design matrix are not consistent. The design matrix was constructed in the presence of a mean effect and the model does not include a mean effect.
 ${\mathbf{ifail}}=15$

The model and the design matrix are not consistent. The model includes a term not present in the design matrix.
Term: $\u2329\mathit{\text{value}}\u232a$.
 ${\mathbf{ifail}}=16$

The model and the design matrix are not consistent.
Term: $\u2329\mathit{\text{value}}\u232a$.
This is likely due to the design matrix being constructed in the presence of either a mean effect or main effect that is not present in the model.
 ${\mathbf{ifail}}=17$

The model and the design matrix are not consistent. The model specifies different contrasts to those used when the design matrix was constructed. The contrasts specified in
hform will be ignored.
 ${\mathbf{ifail}}=21$

hxdesc has not been initialized or is corrupt.
 ${\mathbf{ifail}}=22$

hxdesc is not a G22 handle as generated by
g22ycf.
 ${\mathbf{ifail}}=41$

On entry, ${\mathbf{lisx}}=\u2329\mathit{\text{value}}\u232a$ and ${m}_{x}=\u2329\mathit{\text{value}}\u232a$.
Constraint: ${\mathbf{lisx}}=0$ or ${\mathbf{lisx}}\ge {m}_{x}$.
 ${\mathbf{ifail}}=71$

On entry, ${\mathbf{lplab}}=\u2329\mathit{\text{value}}\u232a$ and $p=\u2329\mathit{\text{value}}\u232a$.
Constraint: ${\mathbf{lplab}}=0$ or ${\mathbf{lplab}}\ge p$.
 ${\mathbf{ifail}}=81$

On entry,
plab is too short to hold the parameter labels. Long labels will be truncated.
The longest parameter label is
$\u2329\mathit{\text{value}}\u232a$.
 ${\mathbf{ifail}}=91$

On entry,
lvinfo is too small.
${\mathbf{lvinfo}}=\u2329\mathit{\text{value}}\u232a$.
Constraint:
${\mathbf{lvinfo}}=0$ or
${\mathbf{lvinfo}}\ge \u2329\mathit{\text{value}}\u232a$.
 ${\mathbf{ifail}}=92$

On entry, one or more of
lisx,
lplab or
lvinfo are nonzero, but too small.
Minimum values are zero, or
$\u2329\mathit{\text{value}}\u232a$,
$\u2329\mathit{\text{value}}\u232a$ and
$\u2329\mathit{\text{value}}\u232a$ respectively.
The minimum values are returned in the first three elements of
vinfo.
 ${\mathbf{ifail}}=99$
An unexpected error has been triggered by this routine. Please
contact
NAG.
See
Section 3.9 in How to Use the NAG Library and its Documentation for further information.
 ${\mathbf{ifail}}=399$
Your licence key may have expired or may not have been installed correctly.
See
Section 3.8 in How to Use the NAG Library and its Documentation for further information.
 ${\mathbf{ifail}}=999$
Dynamic memory allocation failed.
See
Section 3.7 in How to Use the NAG Library and its Documentation for further information.
7
Accuracy
Not applicable.
8
Parallelism and Performance
g22ydf is threaded by NAG for parallel execution in multithreaded implementations of the NAG Library.
Please consult the
X06 Chapter Introduction for information on how to control and interrogate the OpenMP environment used within this routine. Please also consult the
Users' Note for your implementation for any additional implementationspecific information.
None.
10
Example
This example performs a linear regression using
g02daf. The linear regression model is defined via a text string which is parsed using
g22yaf and the design matrix associated with the model is generated using
g22ycf. A submodel is then fit using the same design matrix.
Default parameter labels, as returned in
plab are used for both models. An example of using the information returned in
vinfo to construct more verbose parameter labels is given in
Section 10 in
g22ybf.
See also the examples for
g22yaf and
g22ycf.
10.1
Program Text
Program Text (g22ydfe.f90)
10.2
Program Data
Program Data (g22ydfe.d)
10.3
Program Results
Program Results (g22ydfe.r)