NAG Library Routine Document
g04caf (factorial)
1
Purpose
g04caf computes an analysis of variance table and treatment means for a complete factorial design.
2
Specification
Fortran Interface
Subroutine g04caf ( 
n, y, nfac, lfac, nblock, inter, irdf, mterm, table, itotal, tmean, maxt, e, imean, semean, bmean, r, iwk, ifail) 
Integer, Intent (In)  ::  n, nfac, lfac(nfac), nblock, inter, irdf, mterm, maxt  Integer, Intent (Inout)  ::  ifail  Integer, Intent (Out)  ::  itotal, imean(mterm), iwk(n+3*nfac)  Real (Kind=nag_wp), Intent (In)  ::  y(n)  Real (Kind=nag_wp), Intent (Out)  ::  table(mterm,5), tmean(maxt), e(maxt), semean(mterm), bmean(nblock+1), r(n) 

C Header Interface
#include <nagmk26.h>
void 
g04caf_ (const Integer *n, const double y[], const Integer *nfac, const Integer lfac[], const Integer *nblock, const Integer *inter, const Integer *irdf, const Integer *mterm, double table[], Integer *itotal, double tmean[], const Integer *maxt, double e[], Integer imean[], double semean[], double bmean[], double r[], Integer iwk[], Integer *ifail) 

3
Description
An experiment consists of a collection of units, or plots, to which a number of treatments are applied. In a factorial experiment the effects of several different sets of conditions are compared, e.g., three different temperatures,
${T}_{1}$,
${T}_{2}$ and
${T}_{3}$, and two different pressures,
${P}_{1}$ and
${P}_{2}$. The conditions are known as factors and the different values the conditions take are known as levels. In a factorial experiment the experimental treatments are the combinations of all the different levels of all factors, e.g.,
The effect of a factor averaged over all other factors is known as a main effect, and the effect of a combination of some of the factors averaged over all other factors is known as an interaction. This can be represented by a linear model. In the above example if the response was
${y}_{ijk}$ for the
$k$th replicate of the
$i$th level of
$T$ and the
$j$th level of
$P$ the linear model would be
where
$\mu $ is the overall mean,
${t}_{i}$ is the main effect of
$T$,
${p}_{j}$ is the main effect of
$P$,
${\gamma}_{ij}$ is the
$T\times P$ interaction and
${e}_{ijk}$ is the random error term. In order to find unique estimates constraints are placed on the parameters estimates. For the example here these are:
where
$\hat{\text{}}$ denotes the estimate.
If there is variation in the experimental conditions (e.g., in an experiment on the production of a material different batches of raw material may be used, or the experiment may be carried out on different days), then plots that are similar are grouped together into blocks. For a balanced complete factorial experiment all the treatment combinations occur the same number of times in each block.
g04caf computes the analysis of variance (ANOVA) table by sequentially computing the totals and means for an effect from the residuals computed when previous effects have been removed. The effect sum of squares is the sum of squared totals divided by the number of observations per total. The means are then subtracted from the residuals to compute a new set of residuals. At the same time the means for the original data are computed. When all effects are removed the residual sum of squares is computed from the residuals. Given the sums of squares an ANOVA table is then computed along with standard errors for the difference in treatment means.
The data for g04caf has to be in standard order given by the order of the factors. Let there be $k$ factors, ${f}_{1},{f}_{2},\dots ,{f}_{k}$ in that order with levels ${l}_{1},{l}_{2},\dots ,{l}_{k}$ respectively. Standard order requires the levels of factor ${f}_{1}$ are in order $1,2,\dots ,{l}_{1}$ and within each level of ${f}_{1}$ the levels of ${f}_{2}$ are in order $1,2,\dots ,{l}_{2}$ and so on.
For an experiment with blocks the data is for block $1$ then for block $2$, etc. Within each block the data must be arranged so that the levels of factor ${f}_{1}$ are in order $1,2,\dots ,{l}_{1}$ and within each level of ${f}_{1}$ the levels of ${f}_{2}$ are in order $1,2,\dots ,{l}_{2}$ and so on. Any within block replication of treatment combinations must occur within the levels of ${f}_{k}$.
The ANOVA table is given in the following order. For a complete factorial experiment the first row is for blocks, if present, then the main effects of the factors in their order, e.g.,
${f}_{1}$ followed by
${f}_{2}$, etc. These are then followed by all the two factor interactions then all the three factor interactions, etc., the last two rows being for the residual and total sums of squares. The interactions are arranged in lexical order for the given factor order. For example, for the three factor interactions for a five factor experiment the
$10$ interactions would be in the following order:
4
References
Cochran W G and Cox G M (1957) Experimental Designs Wiley
Davis O L (1978) The Design and Analysis of Industrial Experiments Longman
John J A and Quenouille M H (1977) Experiments: Design and Analysis Griffin
5
Arguments
 1: $\mathbf{n}$ – IntegerInput

On entry: the number of observations.
Constraints:
 ${\mathbf{n}}\ge 4$;
 if ${\mathbf{nblock}}>1$, n must be a multiple of nblock;
 n must be a multiple of the number of treatment combinations, that is a multiple of $\prod _{i=1}^{k}}{\mathbf{lfac}}\left(i\right)$.
 2: $\mathbf{y}\left({\mathbf{n}}\right)$ – Real (Kind=nag_wp) arrayInput

On entry: the observations in standard order, see
Section 3.
 3: $\mathbf{nfac}$ – IntegerInput

On entry: $k$, the number of factors.
Constraint:
${\mathbf{nfac}}\ge 1$.
 4: $\mathbf{lfac}\left({\mathbf{nfac}}\right)$ – Integer arrayInput

On entry: ${\mathbf{lfac}}\left(\mathit{i}\right)$ must contain the number of levels for the $\mathit{i}$th factor, for $\mathit{i}=1,2,\dots ,k$.
Constraint:
${\mathbf{lfac}}\left(\mathit{i}\right)\ge 2$, for $\mathit{i}=1,2,\dots ,k$.
 5: $\mathbf{nblock}$ – IntegerInput

On entry: the number of blocks. If there are no blocks, set ${\mathbf{nblock}}=0$ or $1$.
Constraints:
 ${\mathbf{nblock}}\ge 0$;
 if ${\mathbf{nblock}}\ge 2$, ${\mathbf{n}}/{\mathbf{nblock}}$ must be a multiple of the number of treatment combinations, that is a multiple of $\prod _{i=1}^{k}}{\mathbf{lfac}}\left(i\right)$.
 6: $\mathbf{inter}$ – IntegerInput

On entry: the maximum number of factors in an interaction term. If no interaction terms are to be computed, set ${\mathbf{inter}}=0$ or $1$.
Constraint:
$0\le {\mathbf{inter}}\le {\mathbf{nfac}}$.
 7: $\mathbf{irdf}$ – IntegerInput

On entry: the adjustment to the residual and total degrees of freedom. The total degrees of freedom are set to
${\mathbf{n}}{\mathbf{irdf}}$ and the residual degrees of freedom adjusted accordingly. For examples of the use of
irdf see
Section 9.
Constraint:
${\mathbf{irdf}}\ge 0$.
 8: $\mathbf{mterm}$ – IntegerInput

On entry: the maximum number of terms in the analysis of variance table, see
Section 9.
Constraint:
${\mathbf{mterm}}$ must be large enough to contain the terms specified by
nfac,
inter and
nblock. If the routine exits with
${\mathbf{ifail}}\ge {\mathbf{2}}$, the required minimum value of
mterm is returned in
itotal.
 9: $\mathbf{table}\left({\mathbf{mterm}},5\right)$ – Real (Kind=nag_wp) arrayOutput

On exit: the first
itotal rows of
table contain the analysis of variance table. The first column contains the degrees of freedom, the second column contains the sum of squares, the third column (except for the row corresponding to the total sum of squares) contains the mean squares, i.e., the sums of squares divided by the degrees of freedom, and the fourth and fifth columns contain the
$F$ ratio and significance level, respectively (except for rows corresponding to the total sum of squares, and the residual sum of squares). All other cells of the table are set to zero.
The first row corresponds to the blocks and is set to zero if there are no blocks. The
itotalth row corresponds to the total sum of squares for
y and the
$\left({\mathbf{itotal}}1\right)$th row corresponds to the residual sum of squares. The central rows of the table correspond to the main effects followed by the interaction if specified by
inter. The main effects are in the order specified by
lfac and the interactions are in lexical order, see
Section 3.
 10: $\mathbf{itotal}$ – IntegerOutput

On exit: the row in
table corresponding to the total sum of squares. The number of treatment effects is
${\mathbf{itotal}}3$.
 11: $\mathbf{tmean}\left({\mathbf{maxt}}\right)$ – Real (Kind=nag_wp) arrayOutput

On exit: the treatment means. The position of the means for an effect is given by the index in
imean. For a given effect the means are in standard order, see
Section 3.
 12: $\mathbf{maxt}$ – IntegerInput

On entry: the maximum number of treatment means to be computed, see
Section 9. If the value of
maxt is too small for the required analysis then the minimum number is returned in
${\mathbf{imean}}\left(1\right)$.
Constraint:
${\mathbf{maxt}}$ must be large enough for the number of means specified by
lfac and
inter; if
${\mathbf{inter}}={\mathbf{nfac}}$ then
${\mathbf{maxt}}\ge {\displaystyle \prod _{i=1}^{k}}\left({\mathbf{lfac}}\left(i\right)+1\right)1$.
 13: $\mathbf{e}\left({\mathbf{maxt}}\right)$ – Real (Kind=nag_wp) arrayOutput

On exit: the estimated effects in the same order as for the means in
tmean.
 14: $\mathbf{imean}\left({\mathbf{mterm}}\right)$ – Integer arrayOutput

On exit: indicates the position of the effect means in
tmean. The effect means corresponding to the first treatment effect in the ANOVA table are stored in
${\mathbf{tmean}}\left(1\right)$ up to
${\mathbf{tmean}}\left({\mathbf{imean}}\left(1\right)\right)$. Other effect means corresponding to the
$i$th treatment effect,
$i=1,2,\dots ,{\mathbf{itotal}}3$, are stored in
${\mathbf{tmean}}\left({\mathbf{imean}}\left(i1\right)+1\right)$ up to
${\mathbf{tmean}}\left({\mathbf{imean}}\left(i\right)\right)$.
 15: $\mathbf{semean}\left({\mathbf{mterm}}\right)$ – Real (Kind=nag_wp) arrayOutput

On exit: the standard error of the difference between means corresponding to the $i$th treatment effect in the ANOVA table.
 16: $\mathbf{bmean}\left({\mathbf{nblock}}+1\right)$ – Real (Kind=nag_wp) arrayOutput

On exit: ${\mathbf{bmean}}\left(1\right)$ contains the grand mean, if ${\mathbf{nblock}}>1$, ${\mathbf{bmean}}\left(2\right)$ up to ${\mathbf{bmean}}\left({\mathbf{nblock}}+1\right)$ contain the block means.
 17: $\mathbf{r}\left({\mathbf{n}}\right)$ – Real (Kind=nag_wp) arrayOutput

On exit: the residuals.
 18: $\mathbf{iwk}\left({\mathbf{n}}+3\times {\mathbf{nfac}}\right)$ – Integer arrayWorkspace

 19: $\mathbf{ifail}$ – IntegerInput/Output

On entry:
ifail must be set to
$0$,
$1\text{or}1$. If you are unfamiliar with this argument you should refer to
Section 3.4 in How to Use the NAG Library and its Documentation for details.
For environments where it might be inappropriate to halt program execution when an error is detected, the value
$1\text{or}1$ is recommended. If the output of error messages is undesirable, then the value
$1$ is recommended. Otherwise, if you are not familiar with this argument, the recommended value is
$0$.
When the value $\mathbf{1}\text{or}\mathbf{1}$ is used it is essential to test the value of ifail on exit.
On exit:
${\mathbf{ifail}}={\mathbf{0}}$ unless the routine detects an error or a warning has been flagged (see
Section 6).
6
Error Indicators and Warnings
If on entry
${\mathbf{ifail}}=0$ or
$1$, explanatory error messages are output on the current error message unit (as defined by
x04aaf).
Errors or warnings detected by the routine:
 ${\mathbf{ifail}}=1$

On entry, ${\mathbf{inter}}=\u2329\mathit{\text{value}}\u232a$.
Constraint: ${\mathbf{inter}}\ge 0$.
On entry, ${\mathbf{inter}}=\u2329\mathit{\text{value}}\u232a$ and ${\mathbf{nfac}}=\u2329\mathit{\text{value}}\u232a$.
Constraint: ${\mathbf{inter}}\le {\mathbf{nfac}}$.
On entry, ${\mathbf{irdf}}=\u2329\mathit{\text{value}}\u232a$.
Constraint: ${\mathbf{irdf}}\ge 0$.
On entry, ${\mathbf{n}}=\u2329\mathit{\text{value}}\u232a$.
Constraint: ${\mathbf{n}}\ge 4$.
On entry, ${\mathbf{nblock}}=\u2329\mathit{\text{value}}\u232a$.
Constraint: ${\mathbf{nblock}}\ge 0$.
On entry, ${\mathbf{nfac}}=\u2329\mathit{\text{value}}\u232a$.
Constraint: ${\mathbf{nfac}}\ge 1$.
 ${\mathbf{ifail}}=2$

On entry, $i=\u2329\mathit{\text{value}}\u232a$ and ${\mathbf{lfac}}\left(i\right)=\u2329\mathit{\text{value}}\u232a$
Constraint: ${\mathbf{lfac}}\left(i\right)\ge 2$.
On entry, ${\mathbf{maxt}}=\u2329\mathit{\text{value}}\u232a$
Constraint: ${\mathbf{maxt}}\ge \u2329\mathit{\text{value}}\u232a$.
On entry, ${\mathbf{mterm}}=\u2329\mathit{\text{value}}\u232a$
Constraint: ${\mathbf{mterm}}\ge \u2329\mathit{\text{value}}\u232a$.
On entry,
${\mathbf{n}}=\u2329\mathit{\text{value}}\u232a$ and
${\mathbf{nblock}}=\u2329\mathit{\text{value}}\u232a$.
Constraint: when
${\mathbf{nblock}}>1$,
n must be a multiple of
nblock.
On entry, the number of plots per block is incompatible with the number of plot factors.
 ${\mathbf{ifail}}=3$

On entry, the values of
y are constant.
 ${\mathbf{ifail}}=4$

There are no degrees of freedom for the residual sum of squares. The standard errors and $F$statistics cannot be computed.
The residual sum of squares is zero. The standard errors and $F$statistics cannot be computed.
 ${\mathbf{ifail}}=99$
An unexpected error has been triggered by this routine. Please
contact
NAG.
See
Section 3.9 in How to Use the NAG Library and its Documentation for further information.
 ${\mathbf{ifail}}=399$
Your licence key may have expired or may not have been installed correctly.
See
Section 3.8 in How to Use the NAG Library and its Documentation for further information.
 ${\mathbf{ifail}}=999$
Dynamic memory allocation failed.
See
Section 3.7 in How to Use the NAG Library and its Documentation for further information.
7
Accuracy
The block and treatment sums of squares are computed from the block and treatment residual totals. The residuals are updated as each effect is computed and the residual sum of squares computed directly from the residuals. This avoids any loss of accuracy in subtracting sums of squares.
8
Parallelism and Performance
g04caf makes calls to BLAS and/or LAPACK routines, which may be threaded within the vendor library used by this implementation. Consult the documentation for the vendor library for further information.
Please consult the
X06 Chapter Introduction for information on how to control and interrogate the OpenMP environment used within this routine. Please also consult the
Users' Note for your implementation for any additional implementationspecific information.
The number of rows in the ANOVA table and the number of treatment means are given by the following formulae.
Let there be
$k$ factors with levels
${l}_{i}$ for
$i=1,2,\dots ,k$, and let
$t$ be the maximum number of terms in an interaction then the number of rows in the ANOVA tables is:
The number of treatment means is:
where
${S}_{i}$ is the set of all combinations of the
$k$ factors
$i$ at a time.
To estimate missing values the Healy and Westmacott procedure or its derivatives may be used, see
John and Quenouille (1977). This is an iterative procedure in which estimates of the missing values are adjusted by subtracting the corresponding values of the residuals. The new estimates are then used in the analysis of variance. This process is repeated until convergence. A suitable initial value may be the grand mean. When using this procedure
irdf should be set to the number of missing values plus one to obtain the correct degrees of freedom for the residual sum of squares.
For analysis of covariance the residuals are obtained from an analysis of variance of both the response variable and the covariates. The residuals from the response variable are then regressed on the residuals from the covariates using, say,
g02cbf or
g02daf. The coefficients obtained from the regression can be examined for significance and used to produce an adjusted dependent variable using the original response variable and covariate. An approximate adjusted analysis of variance table can then be produced by using the adjusted dependent variable. In this case
irdf should be set to one plus the number of fitted covariates.
For designs such as Latin squares one more of the blocking factors has to be removed in a preliminary analysis before the final analysis. This preliminary analysis can be performed using
g04bbf or a prior call to
g04caf if the data is reordered between calls. The residuals from the preliminary analysis are then input to
g04caf. In these cases
irdf should be set to the difference between
n and the residual degrees of freedom from preliminary analysis. Care should be taken when using this approach as there is no check on the orthogonality of the two analyses.
10
Example
The data, given by
John and Quenouille (1977), is for the yield of turnips for a factorial experiment with two factors, the amount of phosphate with
$6$ levels and the amount of liming with
$3$ levels. The design was replicated in
$3$ blocks. The data is input and the analysis of variance computed. The analysis of variance table and tables of means with their standard errors are printed.
10.1
Program Text
Program Text (g04cafe.f90)
10.2
Program Data
Program Data (g04cafe.d)
10.3
Program Results
Program Results (g04cafe.r)