# Source code for naginterfaces.library.blgm

```
# -*- coding: utf-8 -*-
r"""
Module Summary
--------------
Interfaces for the NAG Mark 29.0 `blgm` Chapter.
``blgm`` - Linear Model Specification
The functions in this module provide a mechanism for specifying a linear model using a text based modelling language and are intended to be used in conjunction with the model fitting functions from other modules, for example submodule :mod:`~naginterfaces.library.correg`.
See Also
--------
``naginterfaces.library.examples.blgm`` :
This subpackage contains examples for the ``blgm`` module.
See also the :ref:`library_blgm_ex` subsection.
Functionality Index
-------------------
**Linear model**
construct design matrix: :meth:`lm_design_matrix`
data description: :meth:`lm_describe_data`
nested model: :meth:`lm_submodel`
specification from formula string: :meth:`lm_formula`
**Service functions**
destroy a G22 handle: :meth:`handle_free`
general option getting function: :meth:`optget`
general option setting function: :meth:`optset`
For full information please refer to the NAG Library document
https://www.nag.com/numeric/nl/nagdoc_29/flhtml/g22/g22intro.html
"""
# NAG Copyright 2017-2023.
[docs]def lm_formula(hform, formula):
r"""
``lm_formula`` parses a text string containing a formula specifying a linear model and outputs a G22 handle to an internal data structure.
This G22 handle can then be passed to various functions in submodule ``blgm``.
In particular, the G22 handle can be passed to :meth:`lm_design_matrix` to produce a design matrix or :meth:`lm_submodel` to produce a vector of column inclusion flags suitable for use with functions in submodule :mod:`~naginterfaces.library.correg`.
Note: this function uses optional algorithmic parameters, see also: :meth:`optset`, :meth:`optget`.
.. _g22ya-py2-py-doc:
For full information please refer to the NAG Library document for g22ya
https://www.nag.com/numeric/nl/nagdoc_29/flhtml/g22/g22yaf.html
.. _g22ya-py2-py-parameters:
**Parameters**
**hform** : Handle, modified in place
`On entry`: must be set to a null Handle, alternatively an existing G22 handle may be supplied in which case this function will destroy the supplied G22 handle as if :meth:`handle_free` had been called.
`On exit`: holds a G22 handle to the internal data structure containing a description of the model :math:`\mathcal{M}` as specified in :math:`\mathrm{formula}`. You **must not** change the G22 handle other than through functions in submodule ``blgm``.
**formula** : str
A string containing the formula specifying :math:`\mathcal{M}`. See :ref:`Notes <g22ya-py2-py-notes>` for details on the allowed model syntax.
.. _g22ya-py2-py-other_params:
**Other Parameters**
**'Contrast'** : str
Default :math:`\text{} = \texttt{'FIRST'}`
This argument controls the default contrasts used for the categorical independent variables appearing in the model.
Six types of contrasts and dummy variables are available:
'FIRST'
Treatment contrasts relative to the first level of the variable will be used.
'LAST'
Treatment contrasts relative to the last level of the variable will be used.
'SUM FIRST'
Sum contrasts relative to the first level of the variable will be used.
'SUM LAST'
Sum contrasts relative to the last level of the variable will be used.
'HELMERT'
Helmert contrasts will be used.
'POLYNOMIAL'
Polynomial contrasts will be used.
'DUMMY'
Dummy variables will be used rather than a contrast.
See :meth:`lm_design_matrix` for more information on contrasts, their effect on the design matrix and how they are constructed.
This argument may have an `instance identifier` associated with it (see :meth:`optset` and :meth:`optget`).
The `instance identifier` must be the name of one of the variables appearing in the model supplied in :math:`\mathrm{formula}` when the G22 handle was created.
For example, `CONTRAST : VAR1 = HELMERT` would set Helmert contrasts for the variable named `VAR1`.
If no `instance identifier` is specified, the default contrast for all categorical variables in the model is changed, otherwise only the default contrast for the named variable is changed.
In some situations it might be necessary for a variable to use a different contrast, depending on where it appears in the model formula.
In order to allow contrasts to be specified on a term by term basis the :math:`@` operator can be used in the model formula.
The syntax for this operator is :math:`V_j@c`, where :math:`c` is one of: `F`, `L`, `SF`, `SL`, `H`, `P` or `D`, corresponding to treatment contrasts relative to the first and last levels, sum contrasts relative to the first and last levels, Helmert contrasts, polynomial contrasts or dummy variables respectively.
If the contrast has not been explicitly specified via the :math:`@` operator, the value obtained from the option 'Contrast' is used.
For example, setting :math:`\mathrm{formula}` to `VAR1 + VAR1@H.VAR2@P + VAR2@H.VAR3`, specifies that the variable named `VAR1` should use the default contrasts in the first term and Helmert contrasts in the second term.
The variable named `VAR2` should use polynomial contrasts in the second term and Helmert contrasts in the third term.
The variable named `VAR3` should use the default contrasts in the third term.
**'Explicit Mean'** : str
Default :math:`\text{} = \texttt{'NO'}`
If :math:`\text{‘Explicit Mean'} = \texttt{'YES'}`, any mean effect included in the model will be explicitly added to the design matrix, :math:`X`, as a column of :math:`1`\ s.
If :math:`\text{‘Explicit Mean'} = \texttt{'NO'}`, it is assumed that the function to which :math:`X` will be passed treats the mean effect as a special case, see :math:`\textit{mean}` in :meth:`correg.linregm_fit <naginterfaces.library.correg.linregm_fit>` for example.
**'Formula'** : str
This argument returns a verbose version of the model formula specified in :math:`\mathrm{formula}`, expanded and simplified to only contain variable names, the operators :math:`+` and :math:`.` and any contrast identifiers present.
**'Storage Order'** : str
Default :math:`\text{} = \texttt{'OBSVAR'}`
This option controls how the design matrix, :math:`X`, should be stored in its output array and only has an effect if the design matrix is being constructed using :meth:`lm_design_matrix`.
If :math:`\text{‘Storage Order'} = \texttt{'OBSVAR'}`, :math:`X_{{ij}}`, the value for the :math:`j`\ th variable of the :math:`i`\ th observation of the design matrix is stored in :math:`{\textit{x}}[i-1,j-1]`.
If :math:`\text{‘Storage Order'} = \texttt{'VAROBS'}`, :math:`X_{{ij}}`, the value for the :math:`j`\ th variable of the :math:`i`\ th observation of the design matrix is stored in :math:`{\textit{x}}[j-1,i-1]`.
Where :math:`\textit{x}` is the output argument of the same name in :meth:`lm_design_matrix`.
**'Subject'** : str
This argument gives the subject terms associated with the :math:`\mathrm{formula}` in a linear mixed effects model.
The supplied value must consist of a single term, representing either a single independent variable, or a single interaction term between two or more independent variables.
All variables in the subject term must not also appear in the model formula.
.. _g22ya-py2-py-errors:
**Raises**
**NagValueError**
(`errno` :math:`11`)
On entry, :math:`\mathrm{hform}` is not a null Handle or a recognised G22 handle.
(`errno` :math:`21`)
The formula contained a mismatched parenthesis.
The position in the formula string of the error is :math:`\langle\mathit{\boldsymbol{value}}\rangle`.
(`errno` :math:`22`)
An operator was missing.
The position in the formula string of the error is :math:`\langle\mathit{\boldsymbol{value}}\rangle`.
(`errno` :math:`23`)
Invalid use of an operator.
The position in the formula string of the error is :math:`\langle\mathit{\boldsymbol{value}}\rangle`.
(`errno` :math:`24`)
Invalid specification for the power operator.
The position in the formula string of the error is :math:`\langle\mathit{\boldsymbol{value}}\rangle`.
(`errno` :math:`25`)
Invalid specification for the colon operator.
The position in the formula string of the error is :math:`\langle\mathit{\boldsymbol{value}}\rangle`.
(`errno` :math:`26`)
Invalid specification for the mean.
The position in the formula string of the error is :math:`\langle\mathit{\boldsymbol{value}}\rangle`.
(`errno` :math:`27`)
Invalid variable name.
The position in the formula string of the error is :math:`\langle\mathit{\boldsymbol{value}}\rangle`.
(`errno` :math:`28`)
Missing variable name.
The position in the formula string of the error is :math:`\langle\mathit{\boldsymbol{value}}\rangle`.
(`errno` :math:`29`)
After processing, the model contains no terms.
(`errno` :math:`30`)
An invalid contrast specifier has been supplied.
The position in the formula string of the error is :math:`\langle\mathit{\boldsymbol{value}}\rangle`.
(`errno` :math:`41`)
On entry, an invalid :math:`\textit{option}` was supplied in :math:`\mathrm{formula}`.
(`errno` :math:`42`)
On entry, an :math:`\textit{option}` was supplied in :math:`\mathrm{formula}`, but the expected delimiter ':math:`=`' was not found.
(`errno` :math:`43`)
On entry, an :math:`\textit{option}` was supplied in :math:`\mathrm{formula}`, but the supplied :math:`\textit{optval}` was invalid.
**Warns**
**NagAlgorithmicWarning**
(`errno` :math:`31`)
A term contained a repeated variable with a different contrast specifier.
.. _g22ya-py2-py-notes:
**Notes**
**Background**
Let :math:`D` denote a data matrix with :math:`n` observations on :math:`m_d` independent variables, denoted :math:`V_1,V_2,\ldots,V_{m_d}`.
Let :math:`y` denote a vector of :math:`n` observations on a dependent variable.
A linear model, :math:`\mathcal{M}`, as the term is used in this function, expresses a relationship between the independent variables, :math:`V_j`, and the dependent variable.
This relationship can be expressed as a series of additive terms :math:`T_1+T_2 + \cdots`, with each term, :math:`T_t`, representing either a single independent variable :math:`V_j`, called the main effect of :math:`V_j`, or the interaction between two or more independent variables.
An interaction term, denoted here using the :math:`.` operator, allows the effect of an independent variable on the dependent variable to depend on the value of one or more other independent variables.
As an example, the three-way interaction between :math:`V_1,V_2` and :math:`V_3` is denoted :math:`V_1.V_2.V_3` and describes a situation where the effect of one of these three variables is influenced by the value of the other two.
This function takes a description of :math:`\mathcal{M}`, supplied as a text string containing a formula, and outputs a G22 handle to an internal data structure.
This G22 handle can then be passed to :meth:`lm_design_matrix` to produce a design matrix for use in analysis functions from other modules, for example the regression functions of submodule :mod:`~naginterfaces.library.correg`.
A more detailed description of what is meant by a G22 handle can be found in `the G22 Introduction <https://www.nag.com/numeric/nl/nagdoc_29/flhtml/g22/g22intro.html#backsec_handles>`__.
**Syntax**
In its most verbose form :math:`\mathcal{M}` can be described by one or more variable names, :math:`V_j`, and the two operators, :math:`+` and :math:`.`.
In order to allow a wide variety of models to be specified compactly this syntax is extended to six operators (:math:`+`, :math:`.`, :math:`*`, :math:`-`, :math:`:`, :math:`\hat{}`) and parentheses.
A formula describing the model is supplied to ``lm_formula`` via a character string which must obey the following rules:
(1) Variables can be denoted by arbitrary names, as long as
(i) The names used are a subset of those supplied to :meth:`lm_describe_data` when describing :math:`D`.
(#) The names do not contain any of the characters in :math:`+.*-:\hat{} ()@`.
(#) The :math:`.` operator denotes an interaction between two or more variables or terms, with :math:`V_1.V_2.V_3` denoting the three-way interaction between :math:`V_1`, :math:`V_2` and :math:`V_3`.
(#) A term in :math:`\mathcal{M}` can contain one or more variable names, separated using the :math:`.` operator, i.e., a term can be either a main effect or an interaction term between two or more variables.
(i) If a variable appears in an interaction term more than once, all subsequent appearances, after the first, are ignored, therefore, :math:`V_1.V_2.V_1` is the same as :math:`V_1.V_2`.
(#) The ordering of the variables in an interaction term is ignored when comparing terms, therefore, :math:`V_1.V_2` is the same as :math:`V_2.V_1`. This ordering may have an effect when the resulting G22 handle is passed to another function, for example :meth:`lm_design_matrix`.
(#) Applying the :math:`.` operator to two terms appends one to the other, for example, if :math:`T_1 = V_1.V_2` and :math:`T_2 = V_3.V_4`, :math:`T_1.T_2 = V_1.V_2.V_3.V_4`.
(#) The :math:`+` operator allows additional terms to be included in :math:`\mathcal{M}`, therefore, :math:`T_1+T_2` is a model that includes terms :math:`T_1` and :math:`T_2`.
(i) If a term is added to :math:`\mathcal{M}` more than once, all subsequent appearances, after the first, are ignored, therefore, :math:`T_1+T_2+T_1` is the same as :math:`T_1+T_2`.
(#) The ordering of the terms is ignored whilst parsing the formula, therefore, :math:`T_1+T_2` is the same as :math:`T_2+T_1`. This ordering may have an effect when the resulting G22 handle is passed to another function, for example :meth:`lm_design_matrix`.
(#) Internally, the terms are reordered so that all main effects come first, followed by two-way interactions, then three-way interactions, etc. The ordering within each of these categories is preserved.
(#) The :math:`*` operator can be used as a shorthand notation denoting the main effects and all interactions between the variables involved. Therefore, :math:`T_1*T_2` is equivalent to :math:`T_1+T_2+T_1.T_2` and :math:`T_1*T_2*T_3` is equivalent to :math:`T_1+T_2+T_3+T_1.T_2+T_1.T_3+T_2.T_3+T_1.T_2.T_3`.
(#) The :math:`-` operator removes a term from :math:`\mathcal{M}`, therefore, :math:`T_1*T_2*T_3-T_1.T_2.T_3` is equivalent to :math:`T_1+T_2+T_3+T_1.T_2+T_1.T_3+T_2.T_3` as the three-way interaction, :math:`T_1.T_2.T_3`, usually present due to :math:`T_1*T_2*T_3` has been removed.
(#) The :math:`:` operator is a shorthand way of specifying a series of variables, with :math:`V_1:V_j` being equivalent to :math:`V_1+V_2 + \cdots +V_j`.
(i) This operator can only be used if the variable names end in a numeric, therefore, :math:`\text{VAR2}:\text{VAR4}` would be valid, but :math:`\text{FVAR}:\text{LVAR}` would not.
(#) The root part of both variable names (i.e., the part before the trailing numeric, so :math:`\text{VAR}` in the valid example above) must be the same.
(#) The trailing numeric parts of the two variable names must be in ascending order.
(#) The :math:`\hat{}` operator is a shorthand notation for a series of :math:`*` operators. :math:`\left(T_1+T_2+T_3\right)\hat{} 2` is equivalent to :math:`\left(T_1+T_2+T_3\right)*\left(T_1+T_2+T_3\right)` which in turn is equivalent to :math:`T_1+T_2+T_3+T_1.T_2+T_1.T_3+T_2.T_3`.
(i) This notation is present primarily for use with the :math:`:` operator in examples of the form, :math:`\left(V_1:V_5\right)\hat{} 3` which specifies a model containing the main effects for variables :math:`V_1` to :math:`V_5` as well as all two - and three-way interactions.
(#) Using the :math:`\hat{}` operator on a single term has no effect, therefore, :math:`T_2\hat{} 2` is the same as :math:`T_2`.
`Precedence`
Each operator has an associated default precedence, but this can be overridden through the use of parentheses.
The default precedence is:
(1) The :math:`:` operator, with the resulting expression is treated as if it was surrounded by parentheses. Therefore, :math:`V_1+V_3:V_6*V_7` is equivalent to :math:`V_1+\left(V_3+V_4+V_5+V_6\right)*V_7`.
(#) The :math:`\hat{}` operator, with the resulting expression is treated as if it was surrounded by parentheses. Therefore, :math:`\left(T_1+T_2+T_3\right)\hat{} 2.T_4` is equivalent to :math:`\left(\left(T_1+T_2+T_3\right)\hat{} 2\right).T_4`, which is the equivalent to :math:`T_1.T_4+T_2.T_4+T_3.T_4+T_1.T_2.T_4+T_1.T_3.T_4+T_2.T_3.T_4`.
(#) The :math:`.` operator, so :math:`T_1*T_2.T_3` is equivalent to :math:`T_1*\left(T_2.T_3\right)`.
(#) The :math:`*` operator.
(i) When using parentheses with the :math:`*` or :math:`.` operators the usual rules of multiplication apply, therefore, :math:`\left(T_1+T_3.T_4\right).\left(T_5+T_7\right)` is equivalent to :math:`T_1.T_5+T_1.T_7+T_3.T_4.T_5+T_3.T_4.T_7` and :math:`\left(T_1+T_3.T_4\right)*\left(T_5+T_7\right)` is equivalent to :math:`T_1+T_5+T_7+T_3.T_4+T_1.T_5+T_1.T_7+T_3.T_4.T_5+T_3.T_4.T_7`.
(#) Syntax of the following form is invalid: :math:`T_1o\left(T_2\right)oT_3`, where :math:`o` indicates an operator, unless one or more of those operators are :math:`+` and/or :math:`-`. Therefore, :math:`T_1.\left(T_2+T_3\right)*T_4` is invalid, whilst :math:`T_1.\left(T_2+T_3\right)+T_4` is valid.
(#) The :math:`+` and :math:`-` operators have equal precedence.
(i) If the terms associated with a :math:`-` operator do not occur in the current expression they are ignored, therefore, :math:`T_1+\left(T_2-T_1\right)` is the equivalent to :math:`T_1+T_2`; the :math:`\left(T_2-T_1\right)` part of the expression is calculated first and results in :math:`T_2` as the :math:`T_1` term does not exist in this particular sub-expression so cannot be removed.
`Mean Effect / Intercept Term`
A mean effect (or intercept term) can be explicitly added to a formula by specifying :math:`1` and can be explicitly excluded from the formula by specifying :math:`-1`.
For example, :math:`1+V_1+V_2` indicates a model with the main effects of two variables and a mean effect, whereas :math:`V_1+V_2-1` denotes the same model, but without the mean effect.
The mean indicator can appear anywhere in the formula string as long as it is not contained within parentheses.
If the mean effect is not explicitly mentioned in the model formula, the model is assumed to include a mean effect.
**Optional Parameters**
``lm_formula`` accepts a number of optional parameters described in :ref:`Other Parameters <g22ya-py2-py-other_params>`.
Usually these parameters are set via call to :meth:`optset`, however when specifying a subject term in a mixed effects linear regression model it is often more convenient to supply the information along with the rest of the formula.
Therefore, writeable optional parameters can be set via the :math:`\mathrm{formula}` argument.
The delimiter :math:`/` must be used between the main formula and the optional parameter.
For example, supplying a formula of the form :math:`V_1+V_2/\text{SUBJECT} = V_3.V_4`, would specify a model formula of :math:`V_1+V_2` and set the optional parameter 'Subject' to :math:`V_3.V_4`.
See Also
--------
:meth:`naginterfaces.library.examples.blgm.lm_formula_ex.main`
:meth:`naginterfaces.library.examples.correg.glm_binomial_ex.main`
:meth:`naginterfaces.library.examples.correg.lmm_init_combine_ex.main`
"""
raise NotImplementedError
[docs]def lm_describe_data(hddesc, nobs, levels, vnames=None):
r"""
``lm_describe_data`` describes a data matrix.
Note: this function uses optional algorithmic parameters, see also: :meth:`optset`, :meth:`optget`.
.. _g22yb-py2-py-doc:
For full information please refer to the NAG Library document for g22yb
https://www.nag.com/numeric/nl/nagdoc_29/flhtml/g22/g22ybf.html
.. _g22yb-py2-py-parameters:
**Parameters**
**hddesc** : Handle, modified in place
`On entry`: must be set to a null Handle, alternatively an existing G22 handle may be supplied in which case this function will destroy the supplied G22 handle as if :meth:`handle_free` had been called.
`On exit`: holds a G22 handle to the internal data structure containing a description of the data matrix, :math:`D`. You **must not** change the G22 handle other than through the functions in submodule ``blgm``.
**nobs** : int
:math:`n`, the number of observations in the data matrix, :math:`D`.
**levels** : int, array-like, shape :math:`\left(\textit{nvar}\right)`
:math:`\mathrm{levels}[\textit{j}-1]` contains the number of levels associated with the :math:`\textit{j}`\ th variable of the data matrix, for :math:`\textit{j} = 1,2,\ldots,\textit{nvar}`.
If the :math:`j`\ th variable is binary, ordinal or continuous, :math:`\mathrm{levels}[j-1]` should be set to :math:`1`; otherwise :math:`\mathrm{levels}[j-1]` should be set to the number of levels associated with the :math:`j`\ th variable and the corresponding column of the data matrix is assumed to take the value :math:`1` to :math:`\mathrm{levels}[j-1]`.
**vnames** : None or str, array-like, shape :math:`\left(\textit{lvnames}\right)`, optional
If :math:`\mathrm{vnames}` is not **None**, :math:`\mathrm{vnames}[\textit{j}-1]` must contain the name of the :math:`\textit{j}`\ th variable, for :math:`\textit{j} = 1,2,\ldots,\textit{nvar}`.
The names supplied in :math:`\mathrm{vnames}` should be at most :math:`50` characters long and be unique.
If a name longer than :math:`50` characters is supplied it will be truncated.
Variable names must not contain any of the characters +.*-:^()@.
.. _g22yb-py2-py-other_params:
**Other Parameters**
**'Number of Observations'** : int
:math:`n`, the number of observations in the data matrix.
**'Number of Variables'** : int
If queried, this option will return :math:`m_d`, the number of variables in the data matrix.
**'Storage Order'** : str
Default :math:`\text{} = \texttt{'OBSVAR'}`
This option states how the data matrix, :math:`D`, will be stored in its input array.
If :math:`\text{‘Storage Order'} = \texttt{'OBSVAR'}`, :math:`D_{{ij}}`, the value for the :math:`j`\ th variable of the :math:`i`\ th observation of the data matrix is stored in :math:`{\textit{dat}}[i-1,j-1]`.
If :math:`\text{‘Storage Order'} = \texttt{'VAROBS'}`, :math:`D_{{ij}}`, the value for the :math:`j`\ th variable of the :math:`i`\ th observation of the data matrix is stored in :math:`{\textit{dat}}[j-1,i-1]`.
Where :math:`\textit{dat}` is the input argument of the same name in :meth:`lm_design_matrix`.
.. _g22yb-py2-py-errors:
**Raises**
**NagValueError**
(`errno` :math:`11`)
On entry, :math:`\mathrm{hddesc}` is not a null Handle or a recognised G22 handle.
(`errno` :math:`21`)
On entry, :math:`\mathrm{nobs} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\mathrm{nobs}\geq 0`.
(`errno` :math:`31`)
On entry, :math:`\textit{nvar} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\textit{nvar}\geq 0`.
(`errno` :math:`41`)
On entry, :math:`j = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\mathrm{levels}[j-1] = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\mathrm{levels}[\textit{i}-1]\geq 1`.
(`errno` :math:`51`)
On entry, :math:`\textit{lvnames} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\textit{nvar} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\textit{lvnames} = 0` or :math:`\textit{nvar}`.
(`errno` :math:`61`)
On entry, variable name :math:`i` contains one more invalid characters, :math:`i = \langle\mathit{\boldsymbol{value}}\rangle`.
(`errno` :math:`62`)
On entry, variable names :math:`i` and :math:`j` are not unique, :math:`i = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`j = \langle\mathit{\boldsymbol{value}}\rangle`.
(`errno` :math:`63`)
On entry, variable names :math:`i` and :math:`j` are not unique (possibly due to truncation), :math:`i = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`j = \langle\mathit{\boldsymbol{value}}\rangle`.
Maximum variable name length is :math:`50`.
**Warns**
**NagAlgorithmicWarning**
(`errno` :math:`64`)
At least one variable name was truncated to :math:`50` characters. Each truncated name is unique and will be used in all output.
.. _g22yb-py2-py-notes:
**Notes**
Let :math:`D` denote a data matrix with :math:`n` observations on :math:`m_d` independent variables, denoted :math:`V_1,V_2,\ldots,V_{m_d}`.
The :math:`j`\ th independent variable, :math:`V_j` can be classified as either binary, categorical, ordinal or continuous, where:
Binary
:math:`V_j` can take the value :math:`1` or :math:`0`.
Categorical
:math:`V_j` can take one of :math:`L_j` distinct values or levels. Each level represents a discrete category but does not necessarily imply an ordering. The value used to represent each level is, therefore, arbitrary and, by convention and for convenience, is taken to be the integers from :math:`1` to :math:`L_j`.
Ordinal
As with a categorical variable :math:`V_j` can take one of :math:`L_j` distinct values or levels. However, unlike a categorical variable, the levels of an ordinal variable imply an ordering and hence the value used to represent each level is not arbitrary. For example, :math:`V_j = 4` implies a value that is twice as large as :math:`V_j = 2`.
Continuous
:math:`V_j` can take any real value.
``lm_describe_data`` returns a G22 handle containing a description of a data matrix, :math:`D`.
The data matrix makes no distinction between binary, ordinal or continuous variables.
A name can also be assigned to each variable.
If names are not supplied then the default vector of names, :math:`\left\{\text{‘V1'},\text{‘V2'},\ldots \right\}` is used.
See Also
--------
:meth:`naginterfaces.library.examples.blgm.lm_formula_ex.main`
:meth:`naginterfaces.library.examples.correg.glm_binomial_ex.main`
:meth:`naginterfaces.library.examples.correg.lmm_init_combine_ex.main`
"""
raise NotImplementedError
[docs]def lm_design_matrix(hform, hddesc, dat, hxdesc):
r"""
``lm_design_matrix`` generates a design matrix from a data matrix and model description.
Note: this function uses optional algorithmic parameters, see also: :meth:`optset`, :meth:`optget`.
.. _g22yc-py2-py-doc:
For full information please refer to the NAG Library document for g22yc
https://www.nag.com/numeric/nl/nagdoc_29/flhtml/g22/g22ycf.html
.. _g22yc-py2-py-parameters:
**Parameters**
**hform** : Handle
A G22 handle to the internal data structure containing a description of the model :math:`\mathcal{M}` as returned in :math:`\textit{hform}` by :meth:`lm_formula`.
**hddesc** : Handle
A G22 handle to the internal data structure containing a description of the data matrix, :math:`D` as returned in :math:`\textit{hddesc}` by :meth:`lm_describe_data`.
**dat** : float, array-like, shape :math:`\left(:, :\right)`
The data matrix, :math:`D`. By default :math:`D_{{ij}}`, the :math:`\textit{i}`\ th value for the :math:`\textit{j}`\ th variable, for :math:`\textit{j} = 1,2,\ldots,m_d`, for :math:`\textit{i} = 1,2,\ldots,n`, should be supplied in :math:`\mathrm{dat}[i-1,j-1]`.
If the option 'Storage Order', described in :meth:`lm_describe_data`, is set to 'VAROBS', :math:`D_{{ij}}` should be supplied in :math:`\mathrm{dat}[j-1,i-1]`.
**hxdesc** : Handle, modified in place
`On entry`: must be set to a null Handle, alternatively an existing G22 handle may be supplied in which case this function will destroy the supplied G22 handle as if :meth:`handle_free` had been called.
`On exit`: holds a G22 handle to the internal data structure containing a description of the design matrix, :math:`X`. You **must not** change the G22 handle other than through the functions in submodule ``blgm``.
**Returns**
**x** : float, ndarray, shape :math:`\left(:, :\right)`
The design matrix, :math:`X`. By default :math:`X_{{ij}}`, the :math:`\textit{i}`\ th value for the :math:`\textit{j}`\ th column, for :math:`\textit{j} = 1,2,\ldots,m_x`, for :math:`\textit{i} = 1,2,\ldots,n`, is returned in :math:`\mathrm{x}[i-1,j-1]`.
If the option 'Storage Order', described in :meth:`lm_formula`, is set to 'VAROBS', :math:`X_{{ij}}` is returned in :math:`\mathrm{x}[j-1,i-1]`.
.. _g22yc-py2-py-other_params:
**Other Parameters**
**'Formula'** : str
This option returns a verbose formula string describing the model, :math:`\mathcal{M}`, used to create the design matrix.
This formula will only contain variable names, the operators ':math:`+`' and ':math:`.`' and any contrast identifiers present.
**'Min Number of Columns'** : int
This option returns the minimum number of columns required to hold the design matrix, :math:`X`.
In most cases :math:`\text{‘Min Number of Columns'} = \text{‘Number of Columns'}`.
The one exception is when :math:`\mathrm{errno}` = 71, that is the size of :math:`\mathrm{x}` was too small but the data matrix given in :math:`\mathrm{dat}` can be used as the design matrix.
In this case, :math:`\text{‘Number of Columns'} = m_x = m_d` and :math:`\text{‘Min Number of Columns'}` holds the number of columns that would be required if only the relevant parts of :math:`\mathrm{dat}` were copied into a new array.
**'Number of Columns'** : int
This option returns :math:`m_x`, the number of columns in the design matrix.
**'Number of Observations'** : int
This option returns :math:`n`, the number of observations in the design matrix.
**'Storage Order'** : str
This option returns how the design matrix, :math:`X`, is stored in :math:`\mathrm{x}`.
If :math:`\text{‘Storage Order'} = \texttt{'OBSVAR'}`, :math:`X_{{ij}}`, the value for the :math:`j`\ th variable of the :math:`i`\ th observation of the design matrix is stored in :math:`\mathrm{x}[i-1,j-1]`.
If :math:`\text{‘Storage Order'} = \texttt{'VAROBS'}`, :math:`X_{{ij}}`, the value for the :math:`j`\ th variable of the :math:`i`\ th observation of the design matrix is stored in :math:`\mathrm{x}[j-1,i-1]`.
It should be noted that 'Storage Order' is not writeable.
If you wish to change the storage order of the design matrix you need to change 'Storage Order' in :math:`\mathrm{hform}` as described in :ref:`Other Parameters for lm_formula <g22ya-py2-py-other_params>` prior to calling ``lm_design_matrix``.
.. _g22yc-py2-py-errors:
**Raises**
**NagValueError**
(`errno` :math:`11`)
:math:`\mathrm{hform}` has not been initialized or is corrupt.
(`errno` :math:`12`)
:math:`\mathrm{hform}` is not a G22 handle as generated by :meth:`lm_formula`.
(`errno` :math:`13`)
A variable name used when creating :math:`\mathrm{hform}` is not present in :math:`\mathrm{hddesc}`.
Variable name: :math:`\langle\mathit{\boldsymbol{value}}\rangle`.
(`errno` :math:`21`)
:math:`\mathrm{hddesc}` has not been initialized or is corrupt.
(`errno` :math:`22`)
:math:`\mathrm{hddesc}` is not a G22 handle as generated by :meth:`lm_describe_data`.
(`errno` :math:`31`)
On entry, column :math:`j` of the data matrix, :math:`D`, is not consistent with information supplied in :math:`\mathrm{hddesc}`, :math:`j = \langle\mathit{\boldsymbol{value}}\rangle`.
(`errno` :math:`41`)
On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\textit{lddat} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\textit{lddat}\geq n`.
(`errno` :math:`42`)
On entry, :math:`m_d = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\textit{lddat} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\textit{lddat}\geq m_d`.
(`errno` :math:`51`)
On entry, :math:`m_d = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\textit{sddat} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\textit{sddat}\geq m_d`.
(`errno` :math:`52`)
On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\textit{sddat} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\textit{sddat}\geq n`.
(`errno` :math:`61`)
On entry, :math:`\mathrm{hxdesc}` is not a null Handle or a recognised G22 handle.
**Warns**
**NagAlgorithmicWarning**
(`errno` :math:`14`)
The model contains categorical variables, but no intercept or main effects terms have been requested.
Please check the design matrix returned matches the model you require.
(`errno` :math:`32`)
Column :math:`j` of the data matrix, :math:`D`, required rounding more than expected when being treated as a categorical variable, :math:`j = \langle\mathit{\boldsymbol{value}}\rangle`.
.. _g22yc-py2-py-notes:
**Notes**
``lm_design_matrix`` generates a design matrix from a data matrix and a model description.
Design matrices encapsulate the observed values of the independent variables and the required model in a form that can be used by many of the model fitting functions available in the NAG Library, for example those in submodule :mod:`~naginterfaces.library.correg`.
**Notation**
Let :math:`D` denote a data matrix with :math:`n` observations on :math:`m_d` independent variables, denoted by :math:`V_j`, for :math:`\textit{j} = 1,2,\ldots,m_d`.
If :math:`V_j` is a categorical variable, let :math:`L_j` denote the number of levels associated with it.
If :math:`V_j` is a binary, ordinal or continuous variable, let :math:`L_j = 1`.
Let :math:`V_{{ji}}` denote the :math:`i`\ th value of :math:`V_j`.
Let :math:`\mathcal{M}` denote a model made up of one or more terms, denoted by :math:`T_i`.
Each term consists of either a main effect or an interaction and hence can be described using one or more variable names :math:`V_j` and the interaction operator ':math:`.`'.
The operator ':math:`+`' is used to denote the addition of a term to the model.
Therefore, :math:`\mathcal{M} = T_1+T_2+T_3 = V_1+V_2+{V_1.V_2}` denotes a model with three terms, the first two terms being the main effects for variables :math:`V_1` and :math:`V_2` and the last term the interaction between them.
For simplicity we reorder the terms of the model by the number of variables in them, so main effects come first, then two-way interactions, then three-way interactions etc.
By default it is assumed that the model :math:`\mathcal{M}` contains a mean effect (or intercept term), if the mean effect is excluded, this will be denoted by ':math:`-1`', so :math:`\mathcal{M} = T_1` is a model with one term and a mean effect and :math:`\mathcal{M} = T_1-1` is the same model with the mean effect dropped.
``lm_design_matrix`` generates an :math:`n\times m_x` design matrix, :math:`X`, from :math:`D` and :math:`\mathcal{M}`.
**Dummy Variables**
When constructing a design matrix, we cannot work directly with categorical variables.
Categorical variables must first be recoded into dummy variables.
A categorical variable :math:`V_j` requires :math:`L_j` dummy variables.
Let :math:`\mathcal{D}^{{j}}` denote an :math:`n\times L_j` matrix of dummy variables for :math:`V_j` defined as
.. math::
\mathcal{D}_{{li}}^j = \left\{\begin{array}{l} 1 \text{; if }V_{{ji}} = l, \\ 0 \text{; otherwise} \end{array}\right.
where :math:`\mathcal{D}_l^j` is the :math:`l`\ th column of :math:`\mathcal{D}^j` and :math:`\mathcal{D}_{{li}}^j` is the :math:`i`\ th element of :math:`\mathcal{D}_l^j`.
For a binary, ordinal or continuous variable, :math:`\mathcal{D}_{{1i}}^j = V_{{ji}}`.
**Full Design Matrix**
Given a model, :math:`\mathcal{M}`, and the matrices of dummy variables constructing the full design matrix :math:`X_F` is trivial.
Each term is processed in order and
(1) If term :math:`i` is a main effect, that is :math:`T_i = V_j` for some :math:`j`, :math:`\mathcal{D}^j` is copied into :math:`X_F`.
(#) If term :math:`i` is a two-way interaction, that is :math:`T_i = {V_j.V_k}`, for some :math:`j\neq k`, then
(i) Loop over :math:`l_j = 1,2,\ldots L_j`.
(#) Loop over :math:`l_k = 1,2,\ldots L_k`.
(#) Add a column to :math:`X_F` corresponding to the element-wise product of :math:`\mathcal{D}_{l_j}^j` and :math:`\mathcal{D}_{{l_k}}^k`.
(#) Higher interaction terms are handled in a similar manner as the two-way interactions by adding columns constructed from multiplying all combinations of the columns of the corresponding :math:`\mathcal{D}`\ s that correspond to the variables involved. In all cases, the variables towards the right hand side of a term are iterated over the quickest.
**Contrasts**
Using the full design matrix :math:`X_F` in an analysis can result in an overparameterized model.
This is due to :math:`X_F` often not being of full rank as the sum of all the dummy variables for a particular variable is a vector of ones.
This source of overparameterization can be alleviated by using a design matrix :math:`X` where (some) dummy variables are replaced by contrasts.
For a categorical variable :math:`V_j` the contrasts are a set of :math:`L_j-1` functionally independent linear combinations of the dummy variables.
Whilst the choice of contrasts used in term :math:`T_i` will affect the individual model coefficients (parameters), it has no effect on the overall contribution of :math:`T_i`.
For a given variable :math:`V_j`, the contrasts can be represented by an :math:`L_j\times L_j-1` matrix, :math:`C_j`.
The rows of :math:`C_j` correspond to a particular value of :math:`V_j` and the columns correspond to the values to use in the design matrix.
Six types of contrast are available in ``lm_design_matrix``; two types of treatment contrasts, two types of sum contrasts, Helmert contrasts and polynomial contrasts.
Unless specified otherwise, the contrasts used by ``lm_design_matrix`` are treatment contrasts relative to the first level.
See the description of the option 'Contrast' in :meth:`lm_formula` for ways of changing the contrasts used.
`Treatment Contrasts`
Treatment contrasts are taken relative to either the first or last level of the variable.
For example, if :math:`L_j = 4`,
.. math::
C_j = \begin{pmatrix}0&0&0\\1&0&0\\0&1&0\\0&0&1\end{pmatrix}
would be the contrast matrix for :math:`V_j` using treatment contrasts relative to the first level.
The contrast matrix obtained when using treatment contrasts relative to the last level is similar, but the row of zeros appears at the bottom and all other rows are shifted up one.
Strictly speaking, the term `contrast` implies that each row in the contrast matrix sums to zero.
That is not the case for treatment contrasts, however they are included as this coding is commonly used in practice.
`Sum Contrasts`
Sum contrasts are similar to treatment contrasts and again can be taken relative to the first or last level of the variable.
Unlike treatment contrasts, sum contrasts effectively constrain the coefficients related to the variable to sum to zero.
For example, if :math:`L_j = 4`,
.. math::
C_j = \begin{pmatrix}1&0&0\\0&1&0\\0&0&1\\-1&-1&-1\end{pmatrix}
would be the contrast matrix for :math:`V_j` using treatment contrasts relative to the last level.
The contrast matrix obtained when using treatment contrasts relative to the first level is similar, but the row of :math:`-1`\ s appears at the top and all other rows are shifted down one.
`Helmert Contrasts`
With Helmert contrasts level :math:`l` of the variable is compared with the average effect of all previous levels.
For example, if :math:`L_j = 4`,
.. math::
C_j = \begin{pmatrix}-1&-1&-1\\1&-1&-1\\0&2&-1\\0&0&3\end{pmatrix}
would be the contrast matrix for :math:`V_j` using Helmert contrasts.
`Polynomial Contrasts`
With polynomial contrasts the entries in the columns of :math:`C_j` correspond in linear, quadratic, cubic, quartic, etc. terms to a hypothetical underlying numeric variable that takes equally spaced values at each level.
For example, if :math:`L_j = 4`,
.. math::
C_j = \begin{pmatrix}-0.67&0.50&-0.22\\-0.22&-0.50&0.67\\0.22&-0.50&-0.67\\0.67&0.50&0.22\end{pmatrix}
would be the contrast matrix for :math:`V_j` using polynomial contrasts.
`When Contrasts Can Be Used`
Depending on the specifics of the model, :math:`\mathcal{M}`, it may not be possible to always replace the :math:`L_j` dummy variables with :math:`L_j-1` contrasts for all variables in all terms and retain the same model.
A simple example of this is a data matrix, :math:`D`, with four observations and two variables which have two and three levels respectively.
This data matrix might look something like:
.. math::
D = \begin{pmatrix}1&1\\2&3\\1&2\\2&2\end{pmatrix}
For the sake of argument, assume that our model contains the main effect for each variable, but does not contain a mean effect (or intercept term).
So using the notation established earlier, :math:`\mathcal{M} = V_1+V_2-1`.
The full design matrix, :math:`X_F`, for this data matrix and model would be
.. math::
X_F = \begin{pmatrix}1&0&&1&0&0\\0&1&&0&0&1\\1&0&&0&1&0\\0&1&&0&1&0\end{pmatrix}
However, :math:`X_F` is not of full rank (and hence :math:`\mathcal{M}` is overparameterized) because the sum of the first two columns is a vector of ones as is the sum of the last three columns.
In order to alleviate this we might try constructing :math:`X_C` where the dummy variables have been replaced by contrasts.
Assuming treatment contrasts, relative to the first level, we would have
.. math::
X_C = \begin{pmatrix}0&&0&0\\1&&0&1\\0&&1&0\\1&&1&0\end{pmatrix}
However, using :math:`X_C` makes an implicit assumption that the expected value of the dependent variable (the quantity being modelled) is zero when :math:`V_1 = 1` and :math:`V_2 = 1`.
This assumption was not made when we used :math:`X_F` and hence the two design matrices are not equivalent.
One solution would be to use dummy variables for :math:`V_1` and contrasts for :math:`V_2`, which would result in a design matrix, :math:`X` of
.. math::
X = \begin{pmatrix}1&0&&0&0\\0&1&&0&1\\1&0&&1&0\\0&1&&1&0\end{pmatrix}
Using :math:`X` would give an equivalent model to using :math:`X_F`.
The algorithm used by ``lm_design_matrix`` to decide which variables, in which terms, can be coded as contrasts and which need to be coded as dummy variables is described below.
Suppose :math:`V_j` is any variable that appears in term :math:`T_i`, let :math:`T_{{i{}\left(j\right)}}` denote the term obtained by dropping :math:`V_j` from :math:`T_i`.
For example, if :math:`T_3 = {V_1.V_2.V_3}`, :math:`T_{{3{}\left(2\right)}} = {V_1.V_3}`.
In this context, the empty term is taken to be the mean effect (or intercept term).
We say that :math:`T_{{i\left(j\right)}}` appears in :math:`\mathcal{M}` if there exists a term :math:`T_k`, :math:`k < i`, that contains all of the variables appearing in :math:`T_{{i\left(j\right)}}`.
In most cases :math:`T_k = T_{{i\left(j\right)}}`, but this is not required.
Note, as stated earlier, the terms in :math:`\mathcal{M}` are ordered by the number of variables in them.
A variable, :math:`V_j` in term :math:`T_i` is coded by contrasts if :math:`T_{{i\left(j\right)}}` appears in :math:`\mathcal{M}` and by dummy variables otherwise.
It is, therefore, possible for variable :math:`V_j` to be coded by contrasts in some terms and dummy variables in others within the same :math:`X`.
The above rule assumes the presence of a mean effect.
If no such effect is present in the model, the main effect of the first categorical variable is coded by dummy variables to compensate.
If no main effects appear in the model, the warning :math:`\mathrm{errno}` = 14 is returned.
A longer description and informal proof that the resulting :math:`X` is a suitable design matrix for the model of interest can be found in module two of Chambers and Hastie (1992).
**Mean Effect**
The mean effect (or intercept term) is included in a design matrix by adding a column of ones as the first column of :math:`X`.
However, many model fitting functions in the NAG Library handle the mean effect as a special case and do not require it to be explicitly added to the design matrix.
Therefore, by default, ``lm_design_matrix`` does not explicitly add the mean effect to the design matrix.
This behaviour can be changed via the option 'Explicit Mean' in :meth:`lm_formula`.
.. _g22yc-py2-py-references:
**References**
Chambers, J M and Hastie, T J, 1992, `Statistical Models in S`, Wadsworth and Brooks/Cole Computer Science Series
See Also
--------
:meth:`naginterfaces.library.examples.correg.glm_binomial_ex.main`
"""
raise NotImplementedError
[docs]def lm_submodel(what, hform, hxdesc, lisx, lplab, lvinfo, lenlab=210):
r"""
``lm_submodel`` produces labels for the columns of a design matrix, model parameters and a vector of column inclusion flags suitable for use with functions in submodule :mod:`~naginterfaces.library.correg`.
Thus allowing for submodels to be fit using the same design matrix.
.. _g22yd-py2-py-doc:
For full information please refer to the NAG Library document for g22yd
https://www.nag.com/numeric/nl/nagdoc_29/flhtml/g22/g22ydf.html
.. _g22yd-py2-py-parameters:
**Parameters**
**what** : str
Controls what labels are to be produced:
:math:`\mathrm{what} = \text{‘S'}`
Labels for a submodel are required. The submodel must be supplied in :math:`\mathrm{hform}`.
:math:`\mathrm{what} = \text{‘X'}`
Labels for the design matrix :math:`X`.
If :math:`\mathrm{hxdesc}` was returned by :meth:`correg.lmm_init <naginterfaces.library.correg.lmm_init>` in :math:`\textit{hlmm}` then :math:`X` is the design matrix associated with the fixed parameters.
:math:`\mathrm{what} = \text{‘Z'}`
Labels for the design matrix :math:`Z`.
If :math:`\mathrm{hxdesc}` was returned by :meth:`correg.lmm_init <naginterfaces.library.correg.lmm_init>` in :math:`\textit{hlmm}` then :math:`Z` is the part of the design matrix associated with the random parameters.
:math:`\mathrm{what} = \text{‘V'}`
Labels for the variance components.
**hform** : Handle
A G22 handle to the internal data structure containing a description of the required submodel :math:`\mathcal{M}_S`, as returned in :math:`\textit{hform}` by :meth:`lm_formula`. If :math:`\mathrm{what} != \text{‘S'}` :math:`\mathrm{hform}` is not referenced and need not be set.
**hxdesc** : Handle
A G22 handle to the internal data structure containing a description of the design matrix, :math:`D`.
**lisx** : int
Length of :math:`\mathrm{isx}`.
**lplab** : int
The length of :math:`\mathrm{plab}`.
As :math:`p\leq m_x+1`, if labels are required, using :math:`\mathrm{lplab} = m_x+1` will always be sufficient.
**lvinfo** : int
The length of :math:`\mathrm{vinfo}`.
Let :math:`n_T` denote the number of terms in :math:`M_S`, :math:`n_{{Tt}}` denote the number of variables in the :math:`t`\ th term and :math:`m_{{xt}}` denote the number of columns of :math:`X` corresponding to the :math:`t`\ th term.
The required size of :math:`\mathrm{vinfo}`, denoted :math:`a` is given by:
.. math::
a = \sum_{1}^{n_T}{m_{{xt}}\left(1+3n_{{Tt}}\right)}\text{.}
If the model includes a mean effect, :math:`a` should be incremented by one.
The values :math:`n_T`, :math:`n_{{Tt}}` and :math:`m_{{xt}}` are not trivial to calculate as they require the formula describing the model to be fully expanded and the contrast / dummy variable encoding to be known.
Therefore, if :math:`\mathrm{lisx}`, :math:`\mathrm{lplab}` or :math:`\mathrm{lvinfo}` are too small and :math:`\mathrm{lvinfo}\geq 3`, :math:`\mathrm{errno}` = 102 is returned and the required sizes for these arrays are returned in :math:`\mathrm{vinfo}[0]`, :math:`\mathrm{vinfo}[1]` and :math:`\mathrm{vinfo}[2]` respectively.
**lenlab** : int, optional
Length of the strings allocated in :math:`\mathrm{plab}`. At most :math:`\mathrm{lenlab}` characters will be written into each element of :math:`\mathrm{plab}`.
**Returns**
**intcpt** : str
If :math:`\mathrm{intcpt} = \text{‘M'}`, in order to fit the model :math:`\mathcal{M}_S` to :math:`D` using :math:`X`, any analysis function should include an implicit mean effect (intercept term).
:math:`\mathrm{intcpt} = \text{‘Z'}`, if :math:`\mathcal{M}_S` does not include a mean effect or the mean effect has been explicitly included in the design matrix.
**ip** : int
:math:`p`, the number of parameters in the (sub)model, including the intercept if one is present. If :math:`\mathrm{what} = \text{‘S'}`, then the submodel is the one specified in :math:`\mathrm{hform}` otherwise the model is the one used when defining the design matrix described in :math:`\mathrm{hxdesc}`.
If :math:`\mathrm{lisx} \neq 0`, if :math:`\mathrm{intcpt} = \text{‘Z'}`, :math:`p = \sum_{{i = 1}}^{m_x}\mathrm{isx}[i-1]`, otherwise :math:`p = \sum_{{i = 1}}^{m_x}\mathrm{isx}[i-1]+1`.
**isx** : None or int, ndarray, shape :math:`\left(\mathrm{lisx}\right)`
If :math:`\mathrm{lisx} \neq 0`, an array indicating which columns of the design matrix from the model specified in :math:`\mathrm{hform}` are to be used.
:math:`\mathrm{isx}[j-1] = 0`
The :math:`j`\ th column of the design matrix, :math:`X`, should not be included in the analysis.
:math:`\mathrm{isx}[j-1] = 1`
The :math:`j`\ th column of the design matrix, :math:`X`, should be included in the analysis.
If :math:`\mathrm{lisx} = 0`, :math:`\mathrm{isx}` is not referenced.
**plab** : None or str, ndarray, shape :math:`\left(\min\left(\mathrm{ip},\mathrm{lplab}\right)\right)`
If :math:`\mathrm{lplab} \neq 0`, the names associated with the :math:`p` parameters in the model.
If :math:`\mathrm{intcpt} = \text{‘Z'}`, the labels in :math:`\mathrm{plab}` are also the labels for the columns of design matrix used in the analysis.
If :math:`\mathrm{intcpt} = \text{‘M'}`, columns :math:`\mathrm{plab}[1]` to :math:`\mathrm{plab}[p-1]` are the corresponding column labels.
If a mean effect is present in :math:`M_S`, the corresponding label is always in :math:`\mathrm{plab}[0]`.
If :math:`\mathrm{lplab} = 0`, :math:`\mathrm{plab}` is not referenced.
**vinfo** : None or int, ndarray, shape :math:`\left(\mathrm{lvinfo}\right)`
If :math:`\mathrm{lvinfo} \neq 0`, information encoding a description of the parameters in the model.
The encoding information can be extracted as follows:
(i) Set :math:`k = 1`.
(#) Iterate :math:`j` from :math:`1` to :math:`p`.
(1) Set :math:`b = \mathrm{vinfo}[k-1]`.
(#) Increment :math:`k`.
(#) Iterate :math:`i` from :math:`1` to :math:`b`.
(a) Set :math:`v_i = \mathrm{vinfo}[k-1]`.
(#) Set :math:`l_i = \mathrm{vinfo}[k]`.
(#) Set :math:`c_i = \mathrm{vinfo}[k+1]`.
(#) Increment :math:`k` by :math:`3`.
(#) The :math:`j`\ th model parameter corresponds to the interaction between the :math:`b` variables held in columns :math:`v_1,v_2,\ldots,v_b` of :math:`D`. Therefore, :math:`b = 1` indicates a main effect, :math:`b = 2` a two-way interaction, etc..
If :math:`b = 0`, the :math:`j`\ th model parameter corresponds to the mean effect.
If :math:`l_i = 0`, the corresponding variable :math:`v_i` is binary, ordinal or continuous.
Otherwise, :math:`l_i` is the level for the corresponding variable for model parameter :math:`j`.
:math:`c_i` is a numeric flag indicating the contrast used in the case of a categorical variable.
With :math:`c_i = 0` indicating that dummy variables were used for variable :math:`v_i` in this term.
The remaining six types of contrast; treatment contrasts (with respect to the first and last levels), sum contrasts (with respect to the first and last levels), Helmert contrasts and polynomial contrasts, as described in :meth:`lm_design_matrix`, are identified by the integers one to six respectively.
If :math:`\mathrm{lvinfo} = 0`, :math:`\mathrm{vinfo}` is not referenced.
.. _g22yd-py2-py-errors:
**Raises**
**NagValueError**
(`errno` :math:`11`)
On entry, :math:`\mathrm{what} = \langle\mathit{\boldsymbol{value}}\rangle` was an illegal value.
(`errno` :math:`12`)
Supplied value of :math:`\mathrm{what}` is not valid for the G22 handle supplied in :math:`\mathrm{hxdesc}`.
(`errno` :math:`21`)
:math:`\mathrm{hform}` has not been initialized or is corrupt.
(`errno` :math:`22`)
:math:`\mathrm{hform}` is not a G22 handle as generated by :meth:`lm_formula`.
(`errno` :math:`23`)
A variable name used when creating :math:`\mathrm{hform}` is not present in :math:`\mathrm{hxdesc}`.
Variable name: :math:`\langle\mathit{\boldsymbol{value}}\rangle`.
(`errno` :math:`24`)
The model and the design matrix are not consistent. The design matrix was constructed in the presence of a mean effect and the model does not include a mean effect.
(`errno` :math:`25`)
The model and the design matrix are not consistent. The model includes a term not present in the design matrix.
Term: :math:`\langle\mathit{\boldsymbol{value}}\rangle`.
(`errno` :math:`26`)
The model and the design matrix are not consistent.
Term: :math:`\langle\mathit{\boldsymbol{value}}\rangle`.
This is likely due to the design matrix being constructed in the presence of either a mean effect or main effect that is not present in the model.
(`errno` :math:`31`)
:math:`\mathrm{hxdesc}` has not been initialized or is corrupt.
(`errno` :math:`32`)
:math:`\mathrm{hxdesc}` is not a G22 handle as generated by :meth:`lm_design_matrix`.
(`errno` :math:`61`)
On entry, :math:`\mathrm{lisx} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`m_x = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\mathrm{lisx} = 0` or :math:`\mathrm{lisx}\geq m_x`.
(`errno` :math:`81`)
On entry, :math:`\mathrm{lplab} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`p = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\mathrm{lplab} = 0` or :math:`\mathrm{lplab}\geq p`.
(`errno` :math:`91`)
On entry, :math:`\mathrm{plab}` is too short to hold the parameter labels. Long labels will be truncated.
The longest parameter label is :math:`\langle\mathit{\boldsymbol{value}}\rangle`.
(`errno` :math:`101`)
On entry, :math:`\mathrm{lvinfo}` is too small.
:math:`\mathrm{lvinfo} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\mathrm{lvinfo} = 0` or :math:`\mathrm{lvinfo}\geq \langle\mathit{\boldsymbol{value}}\rangle`.
**Warns**
**NagAlgorithmicWarning**
(`errno` :math:`27`)
The model and the design matrix are not consistent. The model specifies different contrasts to those used when the design matrix was constructed. The contrasts specified in :math:`\mathrm{hform}` will be ignored.
(`errno` :math:`28`)
The model may not be as expected.
This is due to the model not containing the categorical variable adjusted to account for no mean effect when the design matrix was constructed.
(`errno` :math:`33`)
:math:`\mathrm{hxdesc}` has not passed through the model fitting function.
(`errno` :math:`102`)
On entry, one or more of :math:`\mathrm{lisx}`, :math:`\mathrm{lplab}` or :math:`\mathrm{lvinfo}` are nonzero, but too small.
Minimum values are zero, or :math:`\langle\mathit{\boldsymbol{value}}\rangle`, :math:`\langle\mathit{\boldsymbol{value}}\rangle` and :math:`\langle\mathit{\boldsymbol{value}}\rangle` respectively.
The minimum values are returned in the first three elements of :math:`\mathrm{vinfo}`.
.. _g22yd-py2-py-notes:
**Notes**
``lm_submodel`` is a utility function for use with :meth:`lm_formula`, :meth:`lm_describe_data` and :meth:`lm_design_matrix`.
It can be used to construct labels for the columns for an :math:`n\times m_x` design matrix, :math:`X`, created by :meth:`lm_design_matrix` and return additional input vectors and flags required by a number of NAG Library model fitting functions.
Many of the analysis functions that require a design matrix to be supplied allow submodels to be defined through the use of a vector of ones or zeros indicating whether a column of :math:`X` should be included or excluded from the analyses (see for example :math:`\textit{isx}` in :meth:`correg.linregm_fit <naginterfaces.library.correg.linregm_fit>` or :meth:`correg.glm_normal <naginterfaces.library.correg.glm_normal>`).
This allows nested models to be fit without having to reconstructed the design matrix for each analysis.
Let :math:`\mathcal{M}` denote a model constructed by :meth:`lm_formula`, :math:`D` a data matrix as described by :meth:`lm_describe_data` and :math:`X` be the corresponding design matrix constructed by :meth:`lm_design_matrix` from :math:`\mathcal{M}` and :math:`D`.
A different model, :math:`\mathcal{M}_S` is a submodel of :math:`\mathcal{M}` if each term in :math:`\mathcal{M}_S`, including the mean effect (intercept term) is also present in :math:`\mathcal{M}`.
If :math:`\mathcal{M}_S` is a submodel of :math:`\mathcal{M}`, you can fit :math:`\mathcal{M}_S` to :math:`D` using a design matrix whose columns are a subset of the columns of :math:`X`.
See Also
--------
:meth:`naginterfaces.library.examples.correg.glm_binomial_ex.main`
:meth:`naginterfaces.library.examples.correg.lmm_init_combine_ex.main`
"""
raise NotImplementedError
[docs]def handle_free(handle):
r"""
``handle_free`` destroys a G22 handle and deallocates all the memory used.
.. _g22za-py2-py-doc:
For full information please refer to the NAG Library document for g22za
https://www.nag.com/numeric/nl/nagdoc_29/flhtml/g22/g22zaf.html
.. _g22za-py2-py-parameters:
**Parameters**
**handle** : Handle, modified in place
`On entry`: the G22 handle to be destroyed.
`On exit`: the handle is destroyed and set to a null Handle.
.. _g22za-py2-py-errors:
**Raises**
**NagValueError**
(`errno` :math:`12`)
:math:`\mathrm{handle}` has been corrupted.
(`errno` :math:`13`)
:math:`\mathrm{handle}` is a handle to an unknown data structure.
**Warns**
**NagAlgorithmicWarning**
(`errno` :math:`11`)
:math:`\mathrm{handle}` has not been initialized.
.. _g22za-py2-py-notes:
**Notes**
Each G22 handle should be deallocated to avoid memory leaks.
Therefore, ``handle_free`` should be called on all such handles which are no longer needed.
Please note that passing an uninitialized handle might cause unpredictable behaviour, including a crash of your program.
"""
raise NotImplementedError
[docs]def optset(handle, optstr):
r"""
``optset`` is a general option setting function for functions in submodule ``blgm``.
It can set a single option or reset all of them to their default.
.. _g22zm-py2-py-doc:
For full information please refer to the NAG Library document for g22zm
https://www.nag.com/numeric/nl/nagdoc_29/flhtml/g22/g22zmf.html
.. _g22zm-py2-py-parameters:
**Parameters**
**handle** : Handle
The G22 handle which **must** have been initialized by one of submodule ``blgm``'s initialization functions.
**optstr** : str
A string identifying the option, its value and, where required, the instance identifier.
Defaults
Resets all options to their default values.
:math:`\textit{option} = \textit{optval}`
Sets (all instances) of :math:`\textit{option}` to :math:`\textit{optval}`.
:math:`\textit{option}:\textit{instance identifier} = \textit{optval}`
Sets a single instance of :math:`\textit{option}` to :math:`\textit{optval}`.
:math:`\textit{option} = \mathbf{default}`
Resets (all instances) of :math:`\textit{option}` to their default value.
:math:`\textit{option}:\textit{instance identifier} = \mathbf{default}`
Resets a single instance of :math:`\textit{option}` to its default value.
:math:`\mathrm{optstr}` is case insensitive and :math:`\textit{option}`, `instance identifier` and :math:`\textit{optval}` may consist of one or more tokens separated by white space.
See the documentation of the individual functions in `the G22 Introduction <https://www.nag.com/numeric/nl/nagdoc_29/flhtml/g22/g22intro.html>`__ for details of valid values for :math:`\textit{option}`, `instance identifier` and :math:`\textit{optval}`.
.. _g22zm-py2-py-errors:
**Raises**
**NagValueError**
(`errno` :math:`11`)
:math:`\mathrm{handle}` has not been initialized or is corrupt.
(`errno` :math:`12`)
:math:`\mathrm{handle}` is not a G22 handle.
(`errno` :math:`21`)
On entry, :math:`\textit{option}` was not recognized.
:math:`\mathrm{optstr} = \langle\mathit{\boldsymbol{value}}\rangle`.
(`errno` :math:`22`)
On entry, the expected delimiter ':math:`=`' was not found.
:math:`\mathrm{optstr} = \langle\mathit{\boldsymbol{value}}\rangle`.
(`errno` :math:`23`)
On entry, :math:`\textit{option}` is read only.
:math:`\mathrm{optstr} = \langle\mathit{\boldsymbol{value}}\rangle`.
(`errno` :math:`24`)
On entry, could not convert :math:`\textit{optval}` to an integer.
:math:`\mathrm{optstr} = \langle\mathit{\boldsymbol{value}}\rangle`.
(`errno` :math:`25`)
On entry, could not convert :math:`\textit{optval}` to a real.
:math:`\mathrm{optstr} = \langle\mathit{\boldsymbol{value}}\rangle`.
(`errno` :math:`26`)
On entry, :math:`\textit{optval}` is not a valid value for :math:`\textit{option}`.
:math:`\mathrm{optstr} = \langle\mathit{\boldsymbol{value}}\rangle`.
(`errno` :math:`121`)
Invalid `instance identifier` for :math:`\textit{option}`.
On entry, :math:`\mathrm{optstr} = \langle\mathit{\boldsymbol{value}}\rangle`.
(`errno` :math:`122`)
Numeric `instance identifier` is out of range.
On entry, :math:`\textit{instance identifier} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\langle\mathit{\boldsymbol{value}}\rangle\leq \textit{instance identifier}` and :math:`\textit{instance identifier}\leq \langle\mathit{\boldsymbol{value}}\rangle`.
**Warns**
**NagAlgorithmicWarning**
(`errno` :math:`123`)
On entry, :math:`\textit{option}` cannot have an associated `instance identifier`. The supplied `instance identifier` was ignored.
:math:`\mathrm{optstr} = \langle\mathit{\boldsymbol{value}}\rangle`.
.. _g22zm-py2-py-notes:
**Notes**
``optset`` can only be called on G22 handles.
Its purpose is to reset all options to their default values or set a single option to a user-supplied value.
Options and their values are, in general, presented as a character string of the form ':math:`\textit{option} = \textit{optval}`'; alphabetic characters can be supplied in either upper or lower case. :math:`\textit{optval}` will normally be either an integer, real or character value as defined in the description of the specific option.
In addition, all options can take an :math:`\textit{optval}` DEFAULT which resets the option to its default value.
In cases where an option may have multiple instances an `instance identifier` can be specified.
This is presented using the form ':math:`\textit{option}:\textit{instance identifier} = \textit{optval}`'.
In such cases, if the instance identifier is omitted, the value of all instances are changed.
Information relating to available option names, their corresponding valid values, whether the use of an instance identifier may be appropriate and what form it can take is given in the individual function documents.
See Also
--------
:meth:`naginterfaces.library.examples.blgm.lm_formula_ex.main`
"""
raise NotImplementedError
[docs]def optget(handle, optstr):
r"""
``optget`` is a general option getting function for submodule ``blgm``.
It is used to query the value of options.
.. _g22zn-py2-py-doc:
For full information please refer to the NAG Library document for g22zn
https://www.nag.com/numeric/nl/nagdoc_29/flhtml/g22/g22znf.html
.. _g22zn-py2-py-parameters:
**Parameters**
**handle** : Handle
The G22 handle which **must** have been initialized by one of submodule ``blgm``'s initialization functions.
**optstr** : str
A string identifying the option and, where required, the instance identifier.
**identify**
Returns a string description of the G22 handle supplied in :math:`\mathrm{handle}`. See `Further Comments <https://www.nag.com/numeric/nl/nagdoc_29/flhtml/g22/g22znf.html#fcomments>`__ for more details.
:math:`\textit{option}`
Returns the value of :math:`\textit{option}`. If there are multiple instances of :math:`\textit{option}`, the value of the first is returned. If not all instances of :math:`\textit{option}` have the same value, :math:`\mathrm{errno}` = 124 is returned.
:math:`\textit{option}:\textit{instance identifier}`
Returns the value of a single instance of :math:`\textit{option}`.
:math:`\mathrm{optstr}` is case insensitive and :math:`\textit{option}` and `instance identifier` may consist of one or more tokens separated by white space.
See the documentation of the individual submodule ``blgm`` functions for details of valid values for :math:`\textit{option}` and `instance identifier`.
**Returns**
**optvalue** : dict
The option-value ``dict``, with the following keys:
``'value'`` : float, int or str
The value of the requested option.
``'annotation'`` : None or str
Possible additional information about the option value.
.. _g22zn-py2-py-errors:
**Raises**
**NagValueError**
(`errno` :math:`11`)
:math:`\mathrm{handle}` has not been initialized or is corrupt.
(`errno` :math:`12`)
:math:`\mathrm{handle}` is not a G22 handle.
(`errno` :math:`21`)
On entry, :math:`\textit{option}` was not recognized: :math:`\mathrm{optstr} = \langle\mathit{\boldsymbol{value}}\rangle`.
(`errno` :math:`22`)
On entry, :math:`\textit{option}` is not readable: :math:`\mathrm{optstr} = \langle\mathit{\boldsymbol{value}}\rangle`.
(`errno` :math:`121`)
Invalid `instance identifier` for :math:`\textit{option}`.
On entry, :math:`\mathrm{optstr} = \langle\mathit{\boldsymbol{value}}\rangle`.
(`errno` :math:`122`)
Numeric `instance identifier` is out of range.
On entry, :math:`\textit{instance identifier} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\langle\mathit{\boldsymbol{value}}\rangle\leq \textit{instance identifier}` and :math:`\textit{instance identifier}\leq \langle\mathit{\boldsymbol{value}}\rangle`.
**Warns**
**NagAlgorithmicWarning**
(`errno` :math:`123`)
On entry, :math:`\textit{option}` cannot have an associated `instance identifier`. The supplied `instance identifier` was ignored.
:math:`\mathrm{optstr} = \langle\mathit{\boldsymbol{value}}\rangle`.
(`errno` :math:`124`)
:math:`\textit{option}` has multiple instances. Information from the first instance has been returned.
.. _g22zn-py2-py-notes:
**Notes**
``optget`` can only be called on G22 handles.
It can be used to query the current values of options.
The option of interest is presented as a character string of the form ':math:`\textit{option}`'
In cases where an option may have multiple instances in a particular G22 handle an `instance identifier` can be specified.
This is presented using the form ':math:`\textit{option}:\textit{instance identifier}`'.
In such cases, if the instance identifier is omitted, the value of the first instance is returned.
If the value of option is not the same for all instances and an instance identifier is omitted, a warning is raised.
Information relating to available option names, their corresponding valid values, whether the use of an instance identifier may be appropriate and what form it can take is given in the individual function documents.
"""
raise NotImplementedError
```