# NAG C Library Function Document

## nag_blgm_lm_submodel (g22ydc)

Note: please be advised that this function is classed as ‘experimental’ and its interface may be developed further in the future. Please see Section 3.1.1 in How to Use the NAG Library and its Documentation for further information.

## 1Purpose

nag_blgm_lm_submodel (g22ydc) produces labels for the columns of a design matrix, model parameters and a vector of column inclusion flags suitable for use with functions in Chapter g02. Thus allowing for submodels to be fit using the same design matrix.

## 2Specification

 #include #include
 void nag_blgm_lm_submodel (void *hform, void *hxdesc, Nag_IncludeIntercept intcpt, Integer *ip, Integer lisx, Integer isx[], Integer lplab, const char *plab[], Integer lenlab, Integer lvinfo, Integer vinfo[], NagError *fail)

## 3Description

nag_blgm_lm_submodel (g22ydc) is a utility function for use with nag_blgm_lm_formula (g22yac), nag_blgm_lm_describe_data (g22ybc) and nag_blgm_lm_design_matrix (g22ycc). It can be used to construct labels for the columns for an $n×{m}_{x}$ design matrix, $X$, created by nag_blgm_lm_design_matrix (g22ycc) and return additional input vectors and flags required by a number of NAG Library model fitting functions.
Many of the analysis functions that require a design matrix to be supplied allow submodels to be defined through the use of a vector of ones or zeros indicating whether a column of $X$ should be included or excluded from the analyses (see for example sx in nag_regsn_mult_linear (g02dac) or nag_glm_normal (g02gac)). This allows nested models to be fit without having to reconstructed the design matrix for each analysis.
Let $\mathcal{M}$ denote a model constructed by nag_blgm_lm_formula (g22yac), $D$ a data matrix as described by nag_blgm_lm_describe_data (g22ybc) and $X$ be the corresponding design matrix constructed by nag_blgm_lm_design_matrix (g22ycc) from $\mathcal{M}$ and $D$. A different model, ${\mathcal{M}}_{S}$ is a submodel of $\mathcal{M}$ if each term in ${\mathcal{M}}_{S}$, including the mean effect (intercept term) is also present in $\mathcal{M}$.
If ${\mathcal{M}}_{S}$ is a submodel of $\mathcal{M}$, you can fit ${\mathcal{M}}_{S}$ to $D$ using a design matrix whose columns are a subset of the columns of $X$.

None.

## 5Arguments

1:    $\mathbf{hform}$void *Input
On entry: a G22 handle to the internal data structure containing a description of the required (sub)model ${\mathcal{M}}_{S}$, as returned in hform by nag_blgm_lm_formula (g22yac).
2:    $\mathbf{hxdesc}$void *Input
On entry: a G22 handle to the internal data structure containing a description of the design matrix, $D$, as returned in hxdesc by nag_blgm_lm_design_matrix (g22ycc).
3:    $\mathbf{intcpt}$Nag_IncludeIntercept Output
On exit: if ${\mathbf{intcpt}}=\mathrm{Nag_Intercept}$, in order to fit the model ${\mathcal{M}}_{S}$ to $D$ using $X$, any analysis function should include an implicit mean effect (intercept term).
${\mathbf{intcpt}}=\mathrm{Nag_NoIntercept}$, if ${\mathcal{M}}_{S}$ does not include a mean effect or the mean effect has been explicitly included in the design matrix.
4:    $\mathbf{ip}$Integer *Output
On exit: $p$, the number of parameters in the model specified in hform, including the intercept if one is present.
If ${\mathbf{lisx}}\ne 0$, if ${\mathbf{intcpt}}=\mathrm{Nag_NoIntercept}$, $p={\sum }_{i=1}^{{m}_{x}}{\mathbf{isx}}\left[i-1\right]$, otherwise $p={\sum }_{i=1}^{{m}_{x}}{\mathbf{isx}}\left[i-1\right]+1$.
5:    $\mathbf{lisx}$IntegerInput
On entry: length of isx.
Constraint: ${\mathbf{lisx}}=0$ or ${\mathbf{lisx}}\ge {m}_{x}$, where ${m}_{x}$ is the number of columns in the design matrix $X$.
6:    $\mathbf{isx}\left[{\mathbf{lisx}}\right]$IntegerOutput
On exit: if ${\mathbf{lisx}}\ne 0$, an array indicating which columns of the design matrix form the model specified in hform.
${\mathbf{isx}}\left[j-1\right]=0$
The $j$th column of the design matrix, $X$, should not be included in the analysis.
${\mathbf{isx}}\left[j-1\right]=1$
The $j$th column of the design matrix, $X$, should be included in the analysis.
If ${\mathbf{lisx}}=0$, isx is not referenced and may be NULL.
7:    $\mathbf{lplab}$IntegerInput
On entry: the length of plab.
As $p\le {m}_{x}+1$, if labels are required, using ${\mathbf{lplab}}={m}_{x}+1$ will always be sufficient.
Constraint: ${\mathbf{lplab}}=0$ or ${\mathbf{lplab}}\ge p$.
8:    $\mathbf{plab}\left[{\mathbf{lplab}}\right]$const char *Output
On exit: if ${\mathbf{lplab}}\ne 0$, the names associated with the $p$ parameters in the model.
If ${\mathbf{intcpt}}=\mathrm{Nag_NoIntercept}$, the labels in plab are also the labels for the columns of design matrix used in the analysis.
If ${\mathbf{intcpt}}=\mathrm{Nag_Intercept}$, columns ${\mathbf{plab}}\left[1\right]$ to ${\mathbf{plab}}\left[p-1\right]$ are the corresponding column labels.
If a mean effect is present in ${M}_{S}$, the corresponding label is always in ${\mathbf{plab}}\left[0\right]$.
If ${\mathbf{lplab}}=0$, plab is not referenced and may be NULL.
Note: each element of plab must be a string of length at least ${\mathbf{lenlab}}-1$.
9:    $\mathbf{lenlab}$IntegerInput
On entry: length of the strings allocated in plab. At most ${\mathbf{lenlab}}-1$ non-null characters will be written into each element of plab.
Constraint: if ${\mathbf{plab}}\phantom{\rule{0.25em}{0ex}}\text{is not}\phantom{\rule{0.25em}{0ex}}\mathbf{NULL}$, ${\mathbf{lenlab}}\ge 1$.
10:  $\mathbf{lvinfo}$IntegerInput
On entry: the length of vinfo.
Let ${n}_{T}$ denote the number of terms in ${M}_{S}$, ${n}_{Tt}$ denote the number of variables in the $t$th term and ${m}_{xt}$ denote the number of columns of $X$ corresponding to the $t$th term. The required size of vinfo, denoted $a$ is given by:
 $a= ∑ t=1 nT mxt⁢1+3nTt.$
If the model includes a mean effect, $a$ should be incremented by one.
The values ${n}_{T}$, ${n}_{Tt}$ and ${m}_{xt}$ are not trivial to calculate as they require the formula describing the model to be fully expanded and the contrast / dummy variable encoding to be known. Therefore, if lisx, lplab or lvinfo are too small and ${\mathbf{lvinfo}}\ge 3$, ${\mathbf{fail}}\mathbf{.}\mathbf{code}=$ NW_ARRAY_SIZE is returned and the required sizes for these arrays are returned in ${\mathbf{vinfo}}\left[0\right]$, ${\mathbf{vinfo}}\left[1\right]$ and ${\mathbf{vinfo}}\left[2\right]$ respectively.
Constraint: ${\mathbf{lvinfo}}=0$ or ${\mathbf{lvinfo}}\ge a$.
11:  $\mathbf{vinfo}\left[{\mathbf{lvinfo}}\right]$IntegerOutput
On exit: if ${\mathbf{lvinfo}}\ne 0$, information encoding a description of the parameters in the model.
The encoding information can be extracted as follows:
(i) Set $k=1$.
(ii) Iterate $j$ from $1$ to $p$.
1. Set $b={\mathbf{vinfo}}\left[k-1\right]$.
2. Increment $k$.
3. Iterate $i$ from $1$ to $b$.
 (a) Set ${v}_{i}={\mathbf{vinfo}}\left[k-1\right]$. (b) Set ${l}_{i}={\mathbf{vinfo}}\left[k\right]$. (c) Set ${c}_{i}={\mathbf{vinfo}}\left[k+1\right]$. (d) Increment $k$ by $3$.
4. The $j$th model parameter corresponds to the interaction between the $b$ variables held in columns ${v}_{1},{v}_{2},\dots ,{v}_{b}$ of $D$. Therefore, $b=1$ indicates a main effect, $b=2$ a two-way interaction, etc..
If $b=0$, the $j$th model parameter corresponds to the mean effect.
If ${l}_{i}=0$, the corresponding variable ${v}_{i}$ is binary, ordinal or continuous. Otherwise, ${l}_{i}$ is the level for the corresponding variable for model parameter $j$.
${c}_{i}$ is a numeric flag indicating the contrast used in the case of a categorical variable. With ${c}_{i}=0$ indicating that dummy variables were used for variable ${v}_{i}$ in this term. The remaining six types of contrast; treatment contrasts (with respect to the first and last levels), sum contrasts (with respect to the first and last levels), Helmert contrasts and polynomial contrasts, as described in nag_blgm_lm_design_matrix (g22ycc), are identified by the integers one to six respectively.
If ${\mathbf{lvinfo}}=0$, vinfo is not referenced and may be NULL.
12:  $\mathbf{fail}$NagError *Input/Output
The NAG error argument (see Section 3.7 in How to Use the NAG Library and its Documentation).

## 6Error Indicators and Warnings

NE_ALLOC_FAIL
Dynamic memory allocation failed.
See Section 2.3.1.2 in How to Use the NAG Library and its Documentation for further information.
On entry, argument $〈\mathit{\text{value}}〉$ had an illegal value.
NE_FIELD_UNKNOWN
A variable name used when creating hform is not present in hxdesc.
Variable name: $〈\mathit{\text{value}}〉$.
NE_HANDLE
hform has not been initialized or is corrupt.
hform is not a G22 handle as generated by nag_blgm_lm_formula (g22yac).
hxdesc has not been initialized or is corrupt.
hxdesc is not a G22 handle as generated by nag_blgm_lm_design_matrix (g22ycc).
NE_INT
On entry, ${\mathbf{lenlab}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{lenlab}}\ge 1$.
On entry, ${\mathbf{lisx}}=〈\mathit{\text{value}}〉$ and ${m}_{x}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{lisx}}=0$ or ${\mathbf{lisx}}\ge {m}_{x}$.
On entry, ${\mathbf{lplab}}=〈\mathit{\text{value}}〉$ and $p=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{lplab}}=0$ or ${\mathbf{lplab}}\ge p$.
On entry, lvinfo is too small.
${\mathbf{lvinfo}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{lvinfo}}=0$ or ${\mathbf{lvinfo}}\ge 〈\mathit{\text{value}}〉$.
NE_INTERNAL_ERROR
An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact NAG for assistance.
See Section 2.7.6 in How to Use the NAG Library and its Documentation for further information.
NE_NO_LICENCE
Your licence key may have expired or may not have been installed correctly.
See Section 2.7.5 in How to Use the NAG Library and its Documentation for further information.
NE_NOT_CONS
The model and the design matrix are not consistent.
Term: $〈\mathit{\text{value}}〉$.
This is likely due to the design matrix being constructed in the presence of either a mean effect or main effect that is not present in the model.
The model and the design matrix are not consistent. The design matrix was constructed in the presence of a mean effect and the model does not include a mean effect.
The model and the design matrix are not consistent. The model includes a term not present in the design matrix.
Term: $〈\mathit{\text{value}}〉$.
NW_ARRAY_SIZE
On entry, one or more of lisx, lplab or lvinfo are nonzero, but too small.
Minimum values are zero, or $〈\mathit{\text{value}}〉$, $〈\mathit{\text{value}}〉$ and $〈\mathit{\text{value}}〉$ respectively.
The minimum values are returned in the first three elements of vinfo.
NW_NOT_CONS
The model and the design matrix are not consistent. The model specifies different contrasts to those used when the design matrix was constructed. The contrasts specified in hform will be ignored.
NW_TRUNCATED
On entry, plab is too short to hold the parameter labels. Long labels will be truncated.
The longest parameter label is $〈\mathit{\text{value}}〉$.

Not applicable.

## 8Parallelism and Performance

nag_blgm_lm_submodel (g22ydc) is threaded by NAG for parallel execution in multithreaded implementations of the NAG Library.
Please consult the x06 Chapter Introduction for information on how to control and interrogate the OpenMP environment used within this function. Please also consult the Users' Note for your implementation for any additional implementation-specific information.

None.

## 10Example

This example performs a linear regression using nag_regsn_mult_linear (g02dac). The linear regression model is defined via a text string which is parsed using nag_blgm_lm_formula (g22yac) and the design matrix associated with the model is generated using nag_blgm_lm_design_matrix (g22ycc). A submodel is then fit using the same design matrix.
Default parameter labels, as returned in plab are used for both models. An example of using the information returned in vinfo to construct more verbose parameter labels is given in Section 10 in nag_blgm_lm_describe_data (g22ybc).