G22 (Blgm)

Linear Model Specification

The routines in this chapter provide a mechanism for specifying a linear model using a text based modelling language and are intended to be used in conjunction with the model fitting routines from other chapters, for example Chapter G02.

Chapter G22 makes heavy use of data structures for passing information between routines. For portability reasons these structures are not referred to directly, but through C pointers. Throughout the documentation these pointers are referred to as G22 handles.

Once the G22 handle is no longer required the associated memory should be released by calling g22zaf. It is always safe to release the memory associated with a G22 handle as no G22 handle will ever reference another.

Let $D$ denote a data matrix with $n$ observations on ${m}_{d}$ independent variables, denoted ${V}_{1},{V}_{2},\dots ,{V}_{{m}_{d}}$. Let $y$ denote a vector of $n$ observations on a dependent variable.

A linear model, $\mathcal{M}$, as the term is used in this chapter, expresses a relationship between the independent variables, ${V}_{j}$, and the dependent variable. This relationship can be expressed as a series of additive terms ${T}_{1}+{T}_{2}+\cdots $, with each term, ${T}_{t}$, representing either a single independent variable ${V}_{j}$, called the main effect of ${V}_{j}$, or the interaction between two or more independent variables. An interaction term, denoted here using the $.$ operator, allows the effect of an independent variable on the dependent variable to depend on the value of one or more other independent variables.

Once any G22 handle is no longer required the associated memory should be released by calling g22zaf.

All G22 handles have optional arguments associated with them. Some of these optional arguments are specific to a particular type of G22 handle and are described in the documentation of the routine that creates the G22 handle. Other optional arguments are common across all G22 handles. These are described in the documentation for g22zmf and g22znf which can be used to set and get an optional argument respectively.

Prior to specifying a linear model the data matrix, $D$, must be described. This is done using g22ybf. The linear model, $\mathcal{M}$, can then be specified as a text string containing a formula. This allows the model to be specified via variable names and avoids the need to explicitly handle interaction terms. The linear model is specified using g22yaf and the documentation for that routine describes the syntax of the formula.

In many of the routines in the NAG Library, for example the regression routines in Chapter G02, a linear model is specified directly via the design matrix, $X$. The design matrix is defined by the data matrix, $D$ and the linear model, $\mathcal{M}$ and its construction usually requires the use of dummy variables. g22ycf constructs the design matrix from the formula supplied to g22yaf.

The utility routine g22ydf can be used to obtain labels for the parameters of the model as well as a variety of information required by some of the routines in Chapter G02.

Linear model, |

construct design matrix | g22ycf |

data description | g22ybf |

nested model | g22ydf |

specification from formula string | g22yaf |

Service routines, |

destroy a G22 handle | g22zaf |

general option getting routine | g22znf |

general option setting routine | g22zmf |

None.

None.

None.