hide long namesshow long names
hide short namesshow short names
Integer type:  int32  int64  nag_int  show int32  show int32  show int64  show int64  show nag_int  show nag_int

PDF version (NAG web site, 64-bit version, 64-bit version)
Chapter Contents
Chapter Introduction
NAG Toolbox

NAG Toolbox Chapter Introduction

G04 — Analysis of Variance

Scope of the Chapter

This chapter is concerned with methods for analysing the results of designed experiments. The range of experiments covered include:
Further designs may be analysed by combining the analyses provided by multiple calls to functions or by using general linear model functions provided in Chapter G02.

Background to the Problems

Experimental Designs

An experimental design consists of a plan for allocating a set of controlled conditions, the treatments, to subsets of the experimental material, the plots or units. Two examples are:
(i) In an experiment to examine the effects of different diets on the growth of chickens, the chickens were kept in pens and a different diet was fed to the birds in each pen. In this example the pens are the units and the different diets are the treatments.
(ii) In an experiment to compare four materials for wear-loss, a sample from each of the materials is tested in a machine that simulates wear. The machine can take four samples at a time and a number of runs are made. In this experiment the treatments are the materials and the units are the samples from the materials.
In designing an experiment the following principles are important.
(a) Randomization: given the overall plan of the experiment, the final allocation of treatments to units is performed using a suitable random allocation. This avoids the possibility of a systematic bias in the allocation and gives a basis for the statistical analysis of the experiment.
(b) Replication: each treatment should be ‘observed’ more than once. So in example (b) more than one sample from each material should be tested. Replication allows for an estimate of the variability of the treatment effect to be measured.
(c) Blocking: in many situations the experimental material will not be homogeneous and there may be some form of systematic variation in the experimental material. In order to reduce the effect of systematic variation the material can be grouped into blocks so that units within a block are similar but there is variation between blocks. For example, in an animal experiment litters may be considered as blocks; in an industrial experiment it may be material from one production batch.
(d) Factorial designs: if more than one type of treatment is under consideration, for example the effect of changes in temperature and changes in pressure, a factorial design consists of looking at all combinations of temperature and pressure. The different types of treatment are known as factors and the different values of the factors that are considered in the experiment are known as levels. So if three temperatures and four different pressures were being considered, then factor 11 (temperature) would have 33 levels and factor 22 (pressure) would have four levels and the design would be a 3 × 43×4 factorial giving a total of 1212 treatment combinations. This design has the advantage of being able to detect the interaction between factors, that is, the effect of the combination of factors.
The following are examples of standard experimental designs; in the descriptions, it is assumed that there are tt treatments.
(a) Completely Randomised Design: there are no blocks and the treatments are allocated to units at random.
(b) Randomised Complete Block Design: the experimental units are grouped into bb blocks of tt units and each treatment occurs once in each block. The treatments are allocated to units within blocks at random.
(c) Latin Square Designs: the units can be represented as cells of a tt by tt square classified by rows and columns. The tt rows and tt columns represent sources of variation in the experimental material. The design allocates the treatments to the units so that each treatment occurs once in each row and each column.
(d) Balanced Incomplete Block Designs: the experimental units are grouped into bb blocks of k < tk<t units. The treatments are allocated so that each treatment is replicated the same number of times and each treatment occurs in the same block with any other treatment the same number of times. The treatments are allocated to units within blocks at random.
(e) Complete Factorial Experiments: if there are tt treatment combinations derived from the levels of all factors then either there are no blocks or the blocks are of size tt units.
Other designs include: partially balanced incomplete block designs, split-plot designs, factorial designs with confounding, and fractional factorial designs. For further information on these designs, see Cochran and Cox (1957), Davis (1978) or John and Quenouille (1977).

Analysis of Variance

The analysis of a designed experiment usually consists of two stages. The first is the computation of the estimate of variance of the underlying random variation in the experiment along with tests for the overall effect of treatments. This results in an analysis of variance (ANOVA) table. The second stage is a more detailed examination of the effect of different treatments either by comparing the difference in treatment means with an appropriate standard error or by the use of orthogonal contrasts.
The analysis assumes a linear model such as
yij = μ + δi + τl + eij,
yij=μ+δi+τl+eij,
where yijyij is the observed value for unit jj of block ii, μμ is the overall mean, δiδi is the effect of the iith block, τlτl is the effect of the llth treatment which has been applied to the unit, and eijeij is the random error term associated with this unit. The expected value of eijeij is zero and its variance is σ2σ2.
In the analysis of variance, the total variation, measured by the sum of squares of observations about the overall mean, is partitioned into the sum of squares due to blocks, the sum of squares due to treatments, and a residual or error sum of squares. This partition corresponds to the parameters ββ, ττ and σσ. In parallel to the partition of the sum of squares there is a partition of the degrees of freedom associated with the sums of squares. The total degrees of freedom is n1n-1, where nn is the number of observations. This is partitioned into b1b-1 degrees of freedom for blocks, t1t-1 degrees of freedom for treatments, and ntb + 1n-t-b+1 degrees of freedom for the residual sum of squares. From these the mean squares can be computed as the sums of squares divided by their degrees of freedom. The residual mean square is an estimate of σ2σ2. An FF-test for an overall effect of the treatments can be calculated as the ratio of the treatment mean square to the residual mean square.
For row and column designs the model is
yij = μ + ρi + γj + τl + eij,
yij=μ+ρi+γj+τl+eij,
where ρiρi is the effect of the iith row and γjγj is the effect of the jjth column. Usually the rows and columns are orthogonal. In the analysis of variance the total variation is partitioned into rows, columns treatments and residual.
In the case of factorial experiments, the treatment sum of squares and degrees of freedom may be partitioned into main effects for the factors and interactions between factors. The main effect of a factor is the effect of the factor averaged over all other factors. The interaction between two factors is the additional effect of the combination of the two factors, over and above the additive effects of the two factors, averaged over all other factors. For a factorial experiment in blocks with two factors, AA and BB, in which the jjth unit of the iith block received level ll of factor AA and level kk of factor BB the model is
yij = μ + δi + (αl + βk + αβlk) + eij,
yij=μ+δi+(αl+βk+αβlk)+eij,
where αlαl is the main effect of level ll of factor aa, βkβk is the main effect of level kk of factor BB, and αβlkαβlk is the interaction between level ll of AA and level kk of BB. Higher-order interactions can be defined in a similar way.
Once the significant treatment effects have been uncovered they can be further investigated by comparing the differences between the means with the appropriate standard error. Some of the assumptions of the analysis can be checked by examining the residuals.

Recommendations on Choice and Use of Available Functions

This chapter contains functions that can handle a wide range of experimental designs plus functions for further analysis and a function to compute dummy variables for use in a general linear model.
nag_anova_random (g04bb) computes the analysis of variance and treatment means with standard errors for any block design with equal sized blocks. The function will handle both complete block designs and balanced and partially balanced incomplete block designs.
nag_anova_rowcol (g04bc) computes the analysis of variance and treatment means with standard errors for a row and column designs such as a Latin square.
nag_anova_factorial (g04ca) computes the analysis of variance and treatment means with standard errors for a complete factorial experiment.
Other designs can be analysed by combinations of calls to nag_anova_random (g04bb), nag_anova_rowcol (g04bc) and nag_anova_factorial (g04ca). The functions compute the residuals from the model specified by the design, so these can then be input as the response variable in a second call to one of the functions. For example a factorial experiment in a Latin square design can be analysed by first calling nag_anova_rowcol (g04bc) to remove the row and column effects and then calling nag_anova_factorial (g04ca) with the residuals from nag_anova_rowcol (g04bc) as the response variable to compute the ANOVA for the treatments. Another example would be to use both nag_correg_linregm_fit (g02da) and nag_anova_random (g04bb) to compute an analysis of covariance.
It is also possible to analyse factorial experiments in which some effects have been confounded with blocks or some fractional factorial experiments. For examples see Morgan (1993).
For experiments with missing values, these values can be estimated by using the Healy and Westmacott procedure; see John and Quenouille (1977). This procedure involves starting with initial estimates for the missing values and then making adjustments based on the residuals from the analysis. The improved estimates are then used in further iterations of the process.
For designs that cannot be analysed by the above approach the function nag_anova_dummyvars (g04ea) can be used to compute dummy variables from the classification variables or factors that define the design. These dummy variables can then be used with the general linear model function nag_correg_linregm_fit (g02da).
As well as the functions considered above the function nag_anova_hier2 (g04ag) computes the analysis of variance for a two strata nested design.
In addition to the functions for computing the means and the basic analysis of variance two functions are available for further analysis.
nag_anova_contrasts (g04da) computes the sum of squares for a user-defined contrast between means. For example, if there are four treatments, the first is a control and the other three are different amounts of a chemical the contrasts that are the difference between no chemical and chemical and the linear effect of chemical could be defined. nag_anova_contrasts (g04da) could be used to compute the sums of squares for these contrasts from which the appropriate FF-tests could be computed.
nag_anova_confidence (g04db) computes simultaneous confidence intervals for the differences between means with the choice of different methods such as the Tukey–Kramer, Bonferron and Dunn–Sidak.

Functionality Index

Analysis of variance for, 
    complete factorial design nag_anova_factorial (g04ca)
    general block design or completely randomized design nag_anova_random (g04bb)
    general block design or completely randomized design, 
        row and column design nag_anova_rowcol (g04bc)
    two-way hierarchical classification, subgroups of unequal size nag_anova_hier2 (g04ag)
General linear model, 
    generate dummy variables and orthogonal polynomials nag_anova_dummyvars (g04ea)
Inferences on means, 
    simultaneous confidence intervals nag_anova_confidence (g04db)
    sum of squares for contrast between means nag_anova_contrasts (g04da)

References

Cochran W G and Cox G M (1957) Experimental Designs Wiley
Davis O L (1978) The Design and Analysis of Industrial Experiments Longman
John J A (1987) Cyclic Designs Chapman and Hall
John J A and Quenouille M H (1977) Experiments: Design and Analysis Griffin
Morgan G W (1993) Analysis of variance using the NAG Fortran Library: Examples from Cochran and Cox NAG Technical Report TR 3/93 NAG Ltd, Oxford
Searle S R (1971) Linear Models Wiley

PDF version (NAG web site, 64-bit version, 64-bit version)
Chapter Contents
Chapter Introduction
NAG Toolbox

© The Numerical Algorithms Group Ltd, Oxford, UK. 2009–2013