G22 (Blgm)

Linear Model Specification

The functions in this chapter provide a mechanism for specifying a linear model using a text based modelling language and are intended to be used in conjunction with the model fitting functions from other chapters, for example Chapter G02.

Chapter G22 makes heavy use of data structures for passing information between functions. For portability reasons these structures are not referred to directly, but through void pointers. Throughout the documentation these pointers are referred to as G22 handles.

Once the G22 handle is no longer required the associated memory should be released by calling g22zac. It is always safe to release the memory associated with a G22 handle as no G22 handle will ever reference another.

Let $D$ denote a data matrix with $n$ observations on ${m}_{d}$ independent variables, denoted ${V}_{1},{V}_{2},\dots ,{V}_{{m}_{d}}$. Let $y$ denote a vector of $n$ observations on a dependent variable.

A linear model, $\mathcal{M}$, as the term is used in this chapter, expresses a relationship between the independent variables, ${V}_{j}$, and the dependent variable. This relationship can be expressed as a series of additive terms ${T}_{1}+{T}_{2}+\cdots $, with each term, ${T}_{t}$, representing either a single independent variable ${V}_{j}$, called the main effect of ${V}_{j}$, or the interaction between two or more independent variables. An interaction term, denoted here using the $.$ operator, allows the effect of an independent variable on the dependent variable to depend on the value of one or more other independent variables.

Once any G22 handle is no longer required the associated memory should be released by calling g22zac.

All G22 handles have optional arguments associated with them. Some of these optional arguments are specific to a particular type of G22 handle and are described in the documentation of the function that creates the G22 handle. Other optional arguments are common across all G22 handles. These are described in the documentation for g22zmc and g22znc.

Prior to specifying a linear model the data matrix, $D$, must be described. This is done using g22ybc. The linear model, $\mathcal{M}$, can then be specified as a text string containing a formula. This allows the model to be specified via variable names and avoids the need to explicitly handle interaction terms. The linear model is specified using g22yac and the documentation for that function describes the syntax of the formula.

In many of the functions in the NAG Library, for example the regression functions in Chapter G02, a linear model is specified directly via the design matrix, $X$. The design matrix is defined by the data matrix, $D$ and the linear model, $\mathcal{M}$ and its construction usually requires the use of dummy variables. g22ycc constructs the design matrix from the formula supplied to g22yac.

The utility function g22ydc can be used to obtain labels for the parameters of the model as well as a variety of information required by some of the functions in Chapter G02.

Linear model, |

construct design matrix | g22ycc |

data description | g22ybc |

nested model | g22ydc |

specification from formula string | g22yac |

Service functions, |

destroy a G22 handle | g22zac |

general option getting function | g22znc |

general option setting function | g22zmc |

None.

None.

None.