G04 Chapter Introduction : NAG Library, Mark 27

An experimental design consists of a plan for allocating a set of controlled conditions, the treatments, to subsets of the experimental material, the plots or units. Two examples are:

(i)In an experiment to examine the effects of different diets on the growth of chickens, the chickens were kept in pens and a different diet was fed to the birds in each pen. In this example the pens are the units and the different diets are the treatments.
(ii)In an experiment to compare four materials for wear-loss, a sample from each of the materials is tested in a machine that simulates wear. The machine can take four samples at a time and a number of runs are made. In this experiment the treatments are the materials and the units are the samples from the materials.

In designing an experiment the following principles are important.

(a)Randomization: given the overall plan of the experiment, the final allocation of treatments to units is performed using a suitable random allocation. This avoids the possibility of a systematic bias in the allocation and gives a basis for the statistical analysis of the experiment.
(b)Replication: each treatment should be ‘observed’ more than once. So in example (b) more than one sample from each material should be tested. Replication allows for an estimate of the variability of the treatment effect to be measured.
(c)Blocking: in many situations the experimental material will not be homogeneous and there may be some form of systematic variation in the experimental material. In order to reduce the effect of systematic variation the material can be grouped into blocks so that units within a block are similar but there is variation between blocks. For example, in an animal experiment litters may be considered as blocks; in an industrial experiment it may be material from one production batch.
(d)Factorial designs: if more than one type of treatment is under consideration, for example the effect of changes in temperature and changes in pressure, a factorial design consists of looking at all combinations of temperature and pressure. The different types of treatment are known as factors and the different values of the factors that are considered in the experiment are known as levels. So if three temperatures and four different pressures were being considered, then factor $1$ (temperature) would have $3$ levels and factor $2$ (pressure) would have four levels and the design would be a $3 \times 4$ factorial giving a total of $12$ treatment combinations. This design has the advantage of being able to detect the interaction between factors, that is, the effect of the combination of factors.

The following are examples of standard experimental designs; in the descriptions, it is assumed that there are

t

treatments.

(a)Completely Randomised Design: there are no blocks and the treatments are allocated to units at random.
(b)Randomised Complete Block Design: the experimental units are grouped into $b$ blocks of $t$ units and each treatment occurs once in each block. The treatments are allocated to units within blocks at random.
(c)Latin Square Designs: the units can be represented as cells of a $t$ by $t$ square classified by rows and columns. The $t$ rows and $t$ columns represent sources of variation in the experimental material. The design allocates the treatments to the units so that each treatment occurs once in each row and each column.
(d)Balanced Incomplete Block Designs: the experimental units are grouped into $b$ blocks of $k < t$ units. The treatments are allocated so that each treatment is replicated the same number of times and each treatment occurs in the same block with any other treatment the same number of times. The treatments are allocated to units within blocks at random.
(e)Complete Factorial Experiments: if there are $t$ treatment combinations derived from the levels of all factors then either there are no blocks or the blocks are of size $t$ units.

Other designs include: partially balanced incomplete block designs, split-plot designs, factorial designs with confounding, and fractional factorial designs. For further information on these designs, see Cochran and Cox (1957), Davis (1978) or John and Quenouille (1977).

2.2 Analysis of Variance

The analysis of a designed experiment usually consists of two stages. The first is the computation of the estimate of variance of the underlying random variation in the experiment along with tests for the overall effect of treatments. This results in an analysis of variance (ANOVA) table. The second stage is a more detailed examination of the effect of different treatments either by comparing the difference in treatment means with an appropriate standard error or by the use of orthogonal contrasts.

The analysis assumes a linear model such as

y_{i j} = μ + δ_{i} + τ_{l} + e_{i j},

where

y_{i j}

is the observed value for unit

j

of block

i

μ

is the overall mean,

δ_{i}

is the effect of the

i

th block,

τ_{l}

is the effect of the

l

th treatment which has been applied to the unit, and

e_{i j}

is the random error term associated with this unit. The expected value of

e_{i j}

is zero and its variance is

σ^{2}

In the analysis of variance, the total variation, measured by the sum of squares of observations about the overall mean, is partitioned into the sum of squares due to blocks, the sum of squares due to treatments, and a residual or error sum of squares. This partition corresponds to the parameters

β

τ

and

σ

. In parallel to the partition of the sum of squares there is a partition of the degrees of freedom associated with the sums of squares. The total degrees of freedom is

n - 1

, where

n

is the number of observations. This is partitioned into

b - 1

degrees of freedom for blocks,

t - 1

degrees of freedom for treatments, and

n - t - b + 1

degrees of freedom for the residual sum of squares. From these the mean squares can be computed as the sums of squares divided by their degrees of freedom. The residual mean square is an estimate of

σ^{2}

. An

F

-test for an overall effect of the treatments can be calculated as the ratio of the treatment mean square to the residual mean square.

For row and column designs the model is

y_{i j} = μ + ρ_{i} + γ_{j} + τ_{l} + e_{i j},

where

ρ_{i}

is the effect of the

i

th row and

γ_{j}

is the effect of the

j

th column. Usually the rows and columns are orthogonal. In the analysis of variance the total variation is partitioned into rows, columns treatments and residual.

In the case of factorial experiments, the treatment sum of squares and degrees of freedom may be partitioned into main effects for the factors and interactions between factors. The main effect of a factor is the effect of the factor averaged over all other factors. The interaction between two factors is the additional effect of the combination of the two factors, over and above the additive effects of the two factors, averaged over all other factors. For a factorial experiment in blocks with two factors,

A

and

B

, in which the

j

th unit of the

i

th block received level

l

of factor

A

and level

k

of factor

B

the model is

y_{i j} = μ + δ_{i} + (α_{l} + β_{k} + α β_{l k}) + e_{i j},

where

α_{l}

is the main effect of level

l

of factor

a

β_{k}

is the main effect of level

k

of factor

B

, and

α β_{l k}

is the interaction between level

l

A

and level

k

B

. Higher-order interactions can be defined in a similar way.

Once the significant treatment effects have been uncovered they can be further investigated by comparing the differences between the means with the appropriate standard error. Some of the assumptions of the analysis can be checked by examining the residuals.

2.3 Intraclass Correlation

Many experiments and investigations involve the assignment of a value (score) to a number of experimental units or objects of interest (subjects). The method used to score the subject will often be affected by measurement error which can, in turn, affect the analysis and interpretation of the data. Measurement error can be especially high when the score is based on the subjective opinion of one or more individuals (raters) and therefore it is important to be able to assess its magnitude. One way of doing this is to run a reliability study and calculate the intraclass correlation (ICC). The term intraclass correlation is a general one and can mean either a measure of interrater reliability, i.e., a measure of how similar the raters are, or intrarater reliability, i.e., a measure of how consistent each rater is.

There are a numerous different versions of the ICC, six of which are available in this chapter. The different versions of the ICC can lead to different conclusions when applied to the same data, it is therefore essential to choose the most appropriate based on the design of the reliability study and whether inter- or intrarater reliability is of interest. The six measures of the ICC are split into three different types of studies, denoted:

ICC (1, 1)

ICC (2, 1)

and

ICC (3, 1)

. Each class of study results in two forms of the ICC, depending on whether inter- or intrarater reliability is of interest. A full description of the different designs and corresponding ICCs is given in Section 3 in g04gaf.

3 Recommendations on Choice and Use of Available Routines

This chapter contains routines that can handle a wide range of experimental designs plus routines for further analysis and a routine to compute dummy variables for use in a general linear model.

g04bbf computes the analysis of variance and treatment means with standard errors for any block design with equal sized blocks. The routine will handle both complete block designs and balanced and partially balanced incomplete block designs.

g04bcf computes the analysis of variance and treatment means with standard errors for a row and column designs such as a Latin square.

g04caf computes the analysis of variance and treatment means with standard errors for a complete factorial experiment.

Other designs can be analysed by combinations of calls to g04bbf, g04bcf and g04caf. The routines compute the residuals from the model specified by the design, so these can then be input as the response variable in a second call to one of the routines. For example a factorial experiment in a Latin square design can be analysed by first calling g04bcf to remove the row and column effects and then calling g04caf with the residuals from g04bcf as the response variable to compute the ANOVA for the treatments. Another example would be to use both g02daf and g04bbf to compute an analysis of covariance.

For experiments with missing values, these values can be estimated by using the Healy and Westmacott procedure; see John and Quenouille (1977). This procedure involves starting with initial estimates for the missing values and then making adjustments based on the residuals from the analysis. The improved estimates are then used in further iterations of the process.

For designs that cannot be analysed by the above approach the routine g04eaf can be used to compute dummy variables from the classification variables or factors that define the design. These dummy variables can then be used with the general linear model routine g02daf.

As well as the routines considered above the routine g04agf computes the analysis of variance for a two strata nested design.

In addition to the routines for computing the means and the basic analysis of variance two routines are available for further analysis.

g04daf computes the sum of squares for a user-defined contrast between means. For example, if there are four treatments, the first is a control and the other three are different amounts of a chemical the contrasts that are the difference between no chemical and chemical and the linear effect of chemical could be defined. g04daf could be used to compute the sums of squares for these contrasts from which the appropriate

F

-tests could be computed.

g04dbf computes simultaneous confidence intervals for the differences between means with the choice of different methods such as the Tukey–Kramer, Bonferron and Dunn–Sidak.

g04gaf calculates the intraclass correlation (ICC) from a reliability study and can return a measure of either the inter- or intrarater reliability.

4 Functionality Index

Analysis of variance for,

complete factorial design

g04caf

general block design or completely randomized design

g04bbf

general block design or completely randomized design,

row and column design

g04bcf

two-way hierarchical classification, subgroups of unequal size

g04agf