NAG CL Interfaceg04eac (dummyvars)

1Purpose

g04eac computes orthogonal polynomial or dummy variables for a factor or classification variable.

2Specification

 #include
 void g04eac (Nag_DummyType type, Integer n, Integer levels, const Integer factor[], double x[], Integer tdx, const double v[], double num_reps[], NagError *fail)
The function may be called by the names: g04eac, nag_anova_dummyvars or nag_dummy_vars.

3Description

In the analysis of an experimental design using a general linear model the factors or classification variables that specify the design have to be coded as dummy variables. g04eac computes dummy variables that can then be used in the fitting of the general linear model using g02dac.
If the factor of length $n$ has $k$ levels then the simplest representation is to define $k$ dummy variables, ${X}_{\mathit{j}}$ such that ${X}_{\mathit{j}}=1$ if the factor is at level $\mathit{j}$ and 0 otherwise, for $\mathit{j}=1,2,\dots ,k$. However, there is usually a mean included in the model and the sum of the dummy variables will be aliased with the mean. To avoid the extra redundant argument, $k-1$ dummy variables can be defined as the contrasts between one level of the factor, the reference level and the remaining levels. If the reference level is the first level then the dummy variables can be defined as ${X}_{\mathit{j}}=1$ if the factor is at level $\mathit{j}$ and 0 otherwise, for $\mathit{j}=2,3,\dots ,k$. Alternatively, the last level can be used as the reference level.
A second way of defining the $k-1$ dummy variables is to use a Helmert matrix in which levels $2,3,\dots ,k$ are compared with the average effect of the previous levels. For example if $k=4$ then the contrasts would be:
 $1 -1 -1 -1 2 1 -1 -1 3 0 2 -1 4 0 0 3$
Thus variable $\mathit{j}$, for $\mathit{j}=1,2,\dots ,k-1$, is given by
• ${X}_{j}=-1$ if factor is at level less than $j+1$
• ${X}_{j}={\sum }_{i=1}^{j}{r}_{i}/{r}_{j+1}$ if factor is at level $j+1$
• ${X}_{j}=0$ if factor is at level greater than $j+1$
where ${r}_{j}$ is the number of replicates of level $j$.
If the factor can be considered as a set of values from an underlying continuous variable then the factor can be represented by a set of $k-1$ orthogonal polynomials representing the linear, quadratic, etc. effects of the underlying variable. The orthogonal polynomial is computed using Forsythe's algorithm (see Forsythe (1957) and Cooper (1968)). The values of the underlying continuous variable represented by the factor levels have to be supplied to the function.
The orthogonal polynomials are standardized so that the sum of squares for each dummy variable is one. For the other methods integer $\left(±1\right)$ representations are retained except that in the Helmert representation the code of level $j+1$ in dummy variable $j$ will be a fraction.

4References

Cooper B E (1968) Algorithm AS 10. The use of orthogonal polynomials Appl. Statist. 17 283–287
Forsythe G E (1957) Generation and use of orthogonal polynomials for data fitting with a digital computer J. Soc. Indust. Appl. Math. 5 74–88

5Arguments

1: $\mathbf{type}$Nag_DummyType Input
On entry: the type of dummy variable to be computed.
${\mathbf{type}}=\mathrm{Nag_Poly}$
An orthogonal Polynomial representation is computed.
${\mathbf{type}}=\mathrm{Nag_Helmert}$
A Helmert matrix representation is computed.
${\mathbf{type}}=\mathrm{Nag_FirstLevel}$
The contrasts relative to the First level are computed.
${\mathbf{type}}=\mathrm{Nag_LastLevel}$
The contrasts relative to the Last level are computed.
${\mathbf{type}}=\mathrm{Nag_AllLevels}$
A complete set of dummy variables is computed.
Constraint: ${\mathbf{type}}=\mathrm{Nag_Poly}$, $\mathrm{Nag_Helmert}$, $\mathrm{Nag_FirstLevel}$, $\mathrm{Nag_LastLevel}$ or $\mathrm{Nag_AllLevels}$.
2: $\mathbf{n}$Integer Input
On entry: the number of observations for which the dummy variables are to be computed, $n$.
Constraint: ${\mathbf{n}}\ge {\mathbf{levels}}$.
3: $\mathbf{levels}$Integer Input
On entry: the number of levels of the factor, $k$.
Constraint: ${\mathbf{levels}}\ge 2$.
4: $\mathbf{factor}\left[{\mathbf{n}}\right]$const Integer Input
On entry: the $n$ values of the factor.
Constraint: $1\le {\mathbf{factor}}\left[\mathit{i}-1\right]\le {\mathbf{levels}}$, for $\mathit{i}=1,2,\dots ,n$.
5: $\mathbf{x}\left[{\mathbf{n}}×{\mathbf{tdx}}\right]$double Output
Note: the $\left(i,j\right)$th element of the matrix $X$ is stored in ${\mathbf{x}}\left[\left(i-1\right)×{\mathbf{tdx}}+j-1\right]$.
On exit: the $n$ by ${k}^{*}$ matrix of dummy variables, where ${k}^{*}=k-1$ if ${\mathbf{type}}=\mathrm{Nag_Poly}$, $\mathrm{Nag_Helmert}$, $\mathrm{Nag_FirstLevel}$ or $\mathrm{Nag_LastLevel}$ and ${k}^{*}=k$ if ${\mathbf{type}}=\mathrm{Nag_AllLevels}$.
6: $\mathbf{tdx}$Integer Input
On entry: the stride separating matrix column elements in the array x.
Constraints:
• if ${\mathbf{type}}=\mathrm{Nag_Poly}$, $\mathrm{Nag_Helmert}$, $\mathrm{Nag_FirstLevel}$ or $\mathrm{Nag_LastLevel}$, ${\mathbf{tdx}}\ge {\mathbf{levels}}-1$;
• if ${\mathbf{type}}=\mathrm{Nag_AllLevels}$, ${\mathbf{tdx}}\ge {\mathbf{levels}}$.
7: $\mathbf{v}\left[\mathit{dim}\right]$const double Input
Note: the dimension, dim, of the array v must be at least
• ${\mathbf{levels}}$ when ${\mathbf{type}}=\mathrm{Nag_Poly}$;
• $1$ otherwise.
On entry: if ${\mathbf{type}}=\mathrm{Nag_Poly}$, the $k$ distinct values of the underlying variable for which the orthogonal polynomial is to be computed. If ${\mathbf{type}}\ne \mathrm{Nag_Poly}$, v is not referenced.
Constraint: if ${\mathbf{type}}=\mathrm{Nag_Poly}$, then the $k$ values of v must be distinct.
8: $\mathbf{num_reps}\left[{\mathbf{levels}}\right]$double Output
On exit: ${\mathbf{num_reps}}\left[\mathit{i}-1\right]$ contains the number of replications for each level of the factor, ${r}_{\mathit{i}}$, for $\mathit{i}=1,2,\dots ,k$.
9: $\mathbf{fail}$NagError * Input/Output
The NAG error argument (see Section 7 in the Introduction to the NAG Library CL Interface).

6Error Indicators and Warnings

NE_2_INT_ARG_LT
On entry, ${\mathbf{n}}=〈\mathit{value}〉$ while ${\mathbf{levels}}=〈\mathit{value}〉$. These arguments must satisfy ${\mathbf{n}}\ge {\mathbf{levels}}$.
On entry, ${\mathbf{tdx}}=〈\mathit{value}〉$ while ${\mathbf{levels}}=〈\mathit{value}〉$. These arguments must satisfy ${\mathbf{tdx}}\ge {\mathbf{levels}}$.
On entry, ${\mathbf{tdx}}=〈\mathit{value}〉$ while ${\mathbf{levels}}-1=〈\mathit{value}〉$. These arguments must satisfy ${\mathbf{tdx}}\ge {\mathbf{levels}}-1$.
NE_ALLOC_FAIL
Dynamic memory allocation failed.
NE_ARRAY_CONS
The contents of array v are not valid.
Constraint: all values of v must be distinct.
On entry, argument type had an illegal value.
NE_G04EA_LEVELS
All levels are not represented in array factor.
NE_G04EA_ORTHO_POLY
An orthogonal polynomial has all values zero. This will be due to some values of v being close together. This can only occur if ${\mathbf{type}}=\mathrm{Nag_Poly}$.
NE_INT_ARG_LT
On entry, levels must not be less than 2: ${\mathbf{levels}}=〈\mathit{value}〉$.
NE_INT_ARRAY_CONS
On entry, ${\mathbf{factor}}\left[0\right]=〈\mathit{value}〉$.
Constraint: $1\le {\mathbf{factor}}\left[0\right]\le {\mathbf{levels}}$.
NE_INTERNAL_ERROR
An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact NAG for assistance.

7Accuracy

The computations are stable.

8Parallelism and Performance

g04eac is not threaded in any implementation.

Other functions for fitting polynomials can be found in Chapter E02.

10Example

Data are read in from an experiment with four treatments and three observations per treatment with the treatment coded as a factor. g04eac is used to compute the required dummy variables and the model is then fitted by g02dac.

10.1Program Text

Program Text (g04eace.c)

10.2Program Data

Program Data (g04eace.d)

10.3Program Results

Program Results (g04eace.r)