naginterfaces.library.blgm.lm_​formula

naginterfaces.library.blgm.lm_formula(hform, formula)[source]

lm_formula parses a text string containing a formula specifying a linear model and outputs a G22 handle to an internal data structure. This G22 handle can then be passed to various functions in submodule blgm. In particular, the G22 handle can be passed to lm_design_matrix() to produce a design matrix or lm_submodel() to produce a vector of column inclusion flags suitable for use with functions in submodule correg.

Note: this function uses optional algorithmic parameters, see also: optset(), optget().

For full information please refer to the NAG Library document for g22ya

https://www.nag.com/numeric/nl/nagdoc_28.7/flhtml/g22/g22yaf.html

Parameters
hformHandle, modified in place

On entry: must be set to a null Handle, alternatively an existing G22 handle may be supplied in which case this function will destroy the supplied G22 handle as if handle_free() had been called.

On exit: holds a G22 handle to the internal data structure containing a description of the model as specified in . You must not change the G22 handle other than through functions in submodule blgm.

formulastr

A string containing the formula specifying . See Notes for details on the allowed model syntax.

Other Parameters
‘Contrast’str

Default

This argument controls the default contrasts used for the categorical independent variables appearing in the model. Six types of contrasts and dummy variables are available:

‘FIRST’

Treatment contrasts relative to the first level of the variable will be used.

‘LAST’

Treatment contrasts relative to the last level of the variable will be used.

‘SUM FIRST’

Sum contrasts relative to the first level of the variable will be used.

‘SUM LAST’

Sum contrasts relative to the last level of the variable will be used.

‘HELMERT’

Helmert contrasts will be used.

‘POLYNOMIAL’

Polynomial contrasts will be used.

‘DUMMY’

Dummy variables will be used rather than a contrast.

See lm_design_matrix() for more information on contrasts, their effect on the design matrix and how they are constructed.

This argument may have an instance identifier associated with it (see optset() and optget()). The instance identifier must be the name of one of the variables appearing in the model supplied in when the G22 handle was created. For example, CONTRAST : VAR1 = HELMERT would set Helmert contrasts for the variable named VAR1.

If no instance identifier is specified, the default contrast for all categorical variables in the model is changed, otherwise only the default contrast for the named variable is changed.

In some situations it might be necessary for a variable to use a different contrast, depending on where it appears in the model formula. In order to allow contrasts to be specified on a term by term basis the operator can be used in the model formula. The syntax for this operator is , where is one of: F, L, SF, SL, H, P or D, corresponding to treatment contrasts relative to the first and last levels, sum contrasts relative to the first and last levels, Helmert contrasts, polynomial contrasts or dummy variables respectively.

If the contrast has not been explicitly specified via the operator, the value obtained from the option ‘Contrast’ is used.

For example, setting to VAR1 + VAR1@H.VAR2@P + VAR2@H.VAR3, specifies that the variable named VAR1 should use the default contrasts in the first term and Helmert contrasts in the second term. The variable named VAR2 should use polynomial contrasts in the second term and Helmert contrasts in the third term. The variable named VAR3 should use the default contrasts in the third term.

‘Explicit Mean’str

Default

If , any mean effect included in the model will be explicitly added to the design matrix, , as a column of s.

If , it is assumed that the function to which will be passed treats the mean effect as a special case, see in correg.linregm_fit for example.

‘Formula’str

This argument returns a verbose version of the model formula specified in , expanded and simplified to only contain variable names, the operators and and any contrast identifiers present.

‘Storage Order’str

Default

This option controls how the design matrix, , should be stored in its output array and only has an effect if the design matrix is being constructed using lm_design_matrix().

If , , the value for the th variable of the th observation of the design matrix is stored in .

If , , the value for the th variable of the th observation of the design matrix is stored in .

Where is the output argument of the same name in lm_design_matrix().

‘Subject’str

This argument gives the subject terms associated with the in a linear mixed effects model.

The supplied value must consist of a single term, representing either a single independent variable, or a single interaction term between two or more independent variables. All variables in the subject term must not also appear in the model formula.

Raises
NagValueError
(errno )

On entry, is not a null Handle or a recognised G22 handle.

(errno )

The formula contained a mismatched parenthesis.

The position in the formula string of the error is .

(errno )

An operator was missing.

The position in the formula string of the error is .

(errno )

Invalid use of an operator.

The position in the formula string of the error is .

(errno )

Invalid specification for the power operator.

The position in the formula string of the error is .

(errno )

Invalid specification for the colon operator.

The position in the formula string of the error is .

(errno )

Invalid specification for the mean.

The position in the formula string of the error is .

(errno )

Invalid variable name.

The position in the formula string of the error is .

(errno )

Missing variable name.

The position in the formula string of the error is .

(errno )

After processing, the model contains no terms.

(errno )

An invalid contrast specifier has been supplied.

The position in the formula string of the error is .

(errno )

On entry, an invalid was supplied in .

(errno )

On entry, an was supplied in , but the expected delimiter ‘’ was not found.

(errno )

On entry, an was supplied in , but the supplied was invalid.

Warns
NagAlgorithmicWarning
(errno )

A term contained a repeated variable with a different contrast specifier.

Notes

Background

Let denote a data matrix with observations on independent variables, denoted . Let denote a vector of observations on a dependent variable.

A linear model, , as the term is used in this function, expresses a relationship between the independent variables, , and the dependent variable. This relationship can be expressed as a series of additive terms , with each term, , representing either a single independent variable , called the main effect of , or the interaction between two or more independent variables. An interaction term, denoted here using the operator, allows the effect of an independent variable on the dependent variable to depend on the value of one or more other independent variables. As an example, the three-way interaction between and is denoted and describes a situation where the effect of one of these three variables is influenced by the value of the other two.

This function takes a description of , supplied as a text string containing a formula, and outputs a G22 handle to an internal data structure. This G22 handle can then be passed to lm_design_matrix() to produce a design matrix for use in analysis functions from other modules, for example the regression functions of submodule correg.

A more detailed description of what is meant by a G22 handle can be found in the G22 Introduction.

Syntax

In its most verbose form can be described by one or more variable names, , and the two operators, and . In order to allow a wide variety of models to be specified compactly this syntax is extended to six operators (, , , , , ) and parentheses.

A formula describing the model is supplied to lm_formula via a character string which must obey the following rules:

  1. Variables can be denoted by arbitrary names, as long as

    1. The names used are a subset of those supplied to lm_describe_data() when describing .

    2. The names do not contain any of the characters in .

  2. The operator denotes an interaction between two or more variables or terms, with denoting the three-way interaction between , and .

  3. A term in can contain one or more variable names, separated using the operator, i.e., a term can be either a main effect or an interaction term between two or more variables.

    1. If a variable appears in an interaction term more than once, all subsequent appearances, after the first, are ignored, therefore, is the same as .

    2. The ordering of the variables in an interaction term is ignored when comparing terms, therefore, is the same as . This ordering may have an effect when the resulting G22 handle is passed to another function, for example lm_design_matrix().

    3. Applying the operator to two terms appends one to the other, for example, if and , .

  4. The operator allows additional terms to be included in , therefore, is a model that includes terms and .

    1. If a term is added to more than once, all subsequent appearances, after the first, are ignored, therefore, is the same as .

    2. The ordering of the terms is ignored whilst parsing the formula, therefore, is the same as . This ordering may have an effect when the resulting G22 handle is passed to another function, for example lm_design_matrix().

    3. Internally, the terms are reordered so that all main effects come first, followed by two-way interactions, then three-way interactions, etc. The ordering within each of these categories is preserved.

  5. The operator can be used as a shorthand notation denoting the main effects and all interactions between the variables involved. Therefore, is equivalent to and is equivalent to .

  6. The operator removes a term from , therefore, is equivalent to as the three-way interaction, , usually present due to has been removed.

  7. The operator is a shorthand way of specifying a series of variables, with being equivalent to .

    1. This operator can only be used if the variable names end in a numeric, therefore, would be valid, but would not.

    2. The root part of both variable names (i.e., the part before the trailing numeric, so in the valid example above) must be the same.

    3. The trailing numeric parts of the two variable names must be in ascending order.

  8. The operator is a shorthand notation for a series of operators. is equivalent to which in turn is equivalent to .

    1. This notation is present primarily for use with the operator in examples of the form, which specifies a model containing the main effects for variables to as well as all two - and three-way interactions.

    2. Using the operator on a single term has no effect, therefore, is the same as .

Precedence

Each operator has an associated default precedence, but this can be overridden through the use of parentheses. The default precedence is:

  1. The operator, with the resulting expression is treated as if it was surrounded by parentheses. Therefore, is equivalent to .

  2. The operator, with the resulting expression is treated as if it was surrounded by parentheses. Therefore, is equivalent to , which is the equivalent to .

  3. The operator, so is equivalent to .

  4. The operator.

    1. When using parentheses with the or operators the usual rules of multiplication apply, therefore, is equivalent to and is equivalent to .

    2. Syntax of the following form is invalid: , where indicates an operator, unless one or more of those operators are and/or . Therefore, is invalid, whilst is valid.

  5. The and operators have equal precedence.

    1. If the terms associated with a operator do not occur in the current expression they are ignored, therefore, is the equivalent to ; the part of the expression is calculated first and results in as the term does not exist in this particular sub-expression so cannot be removed.

Mean Effect / Intercept Term

A mean effect (or intercept term) can be explicitly added to a formula by specifying and can be explicitly excluded from the formula by specifying . For example, indicates a model with the main effects of two variables and a mean effect, whereas denotes the same model, but without the mean effect. The mean indicator can appear anywhere in the formula string as long as it is not contained within parentheses.

If the mean effect is not explicitly mentioned in the model formula, the model is assumed to include a mean effect.

Optional Parameters

lm_formula accepts a number of optional parameters described in Other Parameters. Usually these parameters are set via call to optset(), however when specifying a subject term in a mixed effects linear regression model it is often more convenient to supply the information along with the rest of the formula. Therefore, writeable optional parameters can be set via the argument. The delimiter must be used between the main formula and the optional parameter. For example, supplying a formula of the form , would specify a model formula of and set the optional parameter ‘Subject’ to .