Integer, Intent (In)	::	nrep, nrow, ncol, nt, it(*), ldtabl, ldc, irdf
Integer, Intent (Inout)	::	ifail
Integer, Intent (Out)	::	irep(nt)
Real (Kind=nag_wp), Intent (In)	::	y(nrepnrowncol), tol
Real (Kind=nag_wp), Intent (Inout)	::	tabl(ldtabl,5), c(ldc,nt)
Real (Kind=nag_wp), Intent (Out)	::	gmean, tmean(nt), rpmean(nrep), rmean(nrepnrow), cmean(nrepncol), r(nrepnrowncol), ef(nt), wk(3*nt)

C Header Interface

#include <nag.h>

void

g04bcf_ (const Integer *nrep, const Integer *nrow, const Integer *ncol, const double y[], const Integer *nt, const Integer it[], double *gmean, double tmean[], double tabl[], const Integer *ldtabl, double c[], const Integer *ldc, Integer irep[], double rpmean[], double rmean[], double cmean[], double r[], double ef[], const double *tol, const Integer *irdf, double wk[], Integer *ifail)

The routine may be called by the names g04bcf or nagf_anova_rowcol.

3 Description

In a row and column design the experimental material can be characterised by a two-way classification, nominally called rows and columns. Each experimental unit can be considered as being located in a particular row and column. It is assumed that all rows are of the same length and all columns are of the same length. Sets of equal numbers of rows/columns can be grouped together to form replicates, sometimes known as squares or rectangles, as appropriate.

If for a replicate, the number of rows, the number of columns and the number of treatments are equal and every treatment occurs once in each row and each column then the design is a Latin square. If this is not the case the treatments will be non-orthogonal to rows and columns. For example in the case of a lattice square each treatment occurs only once in each square.

For a row and column design, with

t

treatments in

r

rows and

c

columns and

b

replicates or squares with

n = b r c

observations the linear model is:

y_{i j k (l)} = μ + β_{i} + ρ_{j} + γ_{k} + τ_{l} + e_{i j k}

for

i = 1, 2 \dots, b

j = 1, 2, \dots, r

k = 1, 2 \dots, c

and

l = 1, 2, \dots, t

, where

β_{i}

is the effect of the

i

th replicate,

ρ_{j}

is the effect of the

j

th row,

γ_{k}

is the effect of the

k

th column and the

i j k (l)

notation indicates that the

l

th treatment is applied to the unit in row

j

, column

k

of replicate

i

To compute the analysis of variance for a row and column design the mean is computed and subtracted from the observations to give,

y_{i j k (l)}^{'} = y_{i j k (l)} - \hat{μ}

. Since the replicates, rows and columns are orthogonal the estimated effects, ignoring treatment effects,

{\hat{β}}_{i}

{\hat{ρ}}_{j}

{\hat{γ}}_{k}

, can be computed using the appropriate means of the

y_{i j k (l)}^{'}

, and the unadjusted sum of squares computed as the appropriate sum of squared totals for the

y_{i j k (l)}^{'}

divided by number of units per total. The observations adjusted for replicates, rows and columns can then be computed by subtracting the estimated effects from

y_{i j k (l)}^{'}

to give

y_{i j k (l)}^{''}

In the case of a Latin square design the treatments are orthogonal to replicates, rows and columns and so the treatment effects,

{\hat{τ}}_{l}

, can be estimated as the treatment means of the adjusted observations,

y_{i j k (l)}^{''}

. The treatment sum of squares is computed as the sum of squared treatment totals of the

y_{i j (l)}^{''}

divided by the number of times each treatment is replicated. Finally the residuals, and hence the residual sum of squares, are given by

r_{i j (l)} = y_{i j (l)}^{''} - {\hat{τ}}_{l}

For a design which is not orthogonal, for example a lattice square or an incomplete Latin square, the treatment effects adjusted for replicates, rows and columns need to be computed. The adjusted treatment effects are found as the solution to the equations:

A \hat{τ} = (R - N_{b} N_{b}^{T} / (r c) - N_{r} N_{r}^{T} / (b c) - N_{c} N_{c}^{T} / (b r)) \hat{τ} = q

where

q

is the vector of the treatment totals of the observations adjusted for replicates, rows and columns,

y_{i j k (l)}^{''}

R

is a diagonal matrix with

R_{l l}

equal to the number of times the

l

th treatment is replicated, and

N_{b}

is the

t \times b

incidence matrix, with

N_{l, i}

equal to the number of times treatment

l

occurs in replicate

i

, with

N_{r}

and

N_{c}

being similarly defined for rows and columns. The solution to the equations can be written as:

\hat{τ} = Ω q

where,

Ω

is a generalized inverse of

A

. The solution is found from the eigenvalue decomposition of

A

. The residuals are first calculated by subtracting the estimated adjusted treatment effects from the adjusted observations to give

r_{i j (l)}^{'} = y_{i j (l)}^{''} - {\hat{τ}}_{l}

. However, since only the unadjusted replicate, row and column effects have been removed and they are not orthogonal to treatments, the replicate, row and column means of the

r_{i j (l)}^{'}

have to be subtracted to give the correct residuals,

r_{i j (l)}

and residual sum of squares.

Given the sums of squares, the mean squares are computed as the sums of squares divided by the degrees of freedom. The degrees of freedom for the unadjusted replicates, rows and columns are

b - 1

r - 1

and

c - 1

respectively and for the Latin square designs the degrees of freedom for the treatments is

t - 1

. In the general case the degrees of freedom for treatments is the rank of the matrix

Ω

. The

F

-statistic given by the ratio of the treatment mean square to the residual mean square tests the hypothesis:

H_{0} : τ_{1} = τ_{2} = \dots = τ_{t} = 0 .

The standard errors for the difference in treatment effects, or treatment means, for Latin square designs, are given by:

s e ({\hat{τ}}_{j} - {\hat{τ}}_{j *}) = \sqrt{2 s^{2} / (b t)}

where

s^{2}

is the residual mean square. In the general case the variances of the treatment effects are given by:

Var (\hat{τ}) = Ω s^{2}

from which the appropriate standard errors of the difference between treatment effects or the difference between adjusted means can be calculated.

The analysis of a row-column design can be considered as consisting of different strata: the replicate stratum, the rows within replicate and the columns within replicate strata and the units stratum. In the Latin square design all the information on the treatment effects is given at the units stratum. In other designs there may be a loss of information due to the non-orthogonality of treatments and replicates, rows and columns and information on treatments may be available in higher strata. The efficiency of the estimation at the units stratum is given by the (canonical) efficiency factors, these are the nonzero eigenvalues of the matrix,

A

, divided by the number of replicates in the case of equal replication, or by the mean of the number of replicates in the unequally replicated case, see John (1987). If more than one eigenvalue is zero then the design is said to be disconnected and information on some treatment comparisons can only be obtained from higher strata.

4 References

Cochran W G and Cox G M (1957) Experimental Designs Wiley

Davis O L (1978) The Design and Analysis of Industrial Experiments Longman

John J A (1987) Cyclic Designs Chapman and Hall

John J A and Quenouille M H (1977) Experiments: Design and Analysis Griffin

Searle S R (1971) Linear Models Wiley

5 Arguments

1: $nrep$ – Integer Input

On entry:

b

, the number of replicates.

Constraint:

nrep \geq 1

2: $nrow$ – Integer Input

On entry:

r

, the number of rows per replicate.

Constraint:

nrow \geq 2

3: $ncol$ – Integer Input

On entry:

c

, the number of columns per replicate.

Constraint:

ncol \geq 2

4: $y (nrep \times nrow \times ncol)$ – Real (Kind=nag_wp) array Input

On entry: the

n = b r c

observations ordered by columns within rows within replicates. That is

y (r c (i - 1) + r (j - 1) + k)

contains the observation from the

k

th column of the

j

th row of the

i

th replicate, for

i = 1, 2, \dots, b

j = 1, 2, \dots, r

and

k = 1, 2, \dots, c

5: $nt$ – Integer Input

On entry: the number of treatments. If only replicates, rows and columns are required in the analysis then set

nt = 1

Constraint:

nt \geq 1

6: $it (*)$ – Integer array Input

Note: the dimension of the array it must be at least

nrep \times nrow \times ncol

nt > 1

, and at least

1

otherwise.

On entry: if

nt > 1

it (i)

indicates which of the nt treatments unit

i

received, for

i = 1, 2, \dots, n

nt = 1

, it is not referenced.

Constraint: if

nt \geq 2

1 \leq it (i) \leq nt

, for

i = 1, 2, \dots, n

7: $gmean$ – Real (Kind=nag_wp) Output

On exit: the grand mean,

\hat{μ}

8: $tmean (nt)$ – Real (Kind=nag_wp) array Output

On exit: if

nt \geq 2

tmean (l)

contains the (adjusted) mean for the

l

th treatment,

{\hat{μ}}^{*} + {\hat{τ}}_{l}

, for

l = 1, 2, \dots, t

, where

{\hat{μ}}^{*}

is the mean of the treatment adjusted observations

y_{i j k (l)} - {\hat{τ}}_{l}

. Otherwise tmean is not referenced.

9: $tabl (ldtabl, 5)$ – Real (Kind=nag_wp) array Output

On exit: the analysis of variance table. Column 1 contains the degrees of freedom, column 2 the sum of squares, and where appropriate, column 3 the mean squares, column 4 the

F

-statistic and column 5 the significance level of the

F

-statistic. Row 1 is for replicates, row 2 for rows, row 3 for columns, row 4 for treatments (if

nt > 1

), row 5 for residual and row 6 for total. Mean squares are computed for all but the total row,

F

-statistics and significance are computed for treatments, replicates, rows and columns. Any unfilled cells are set to zero.

10: $ldtabl$ – Integer Input

On entry: the first dimension of the array tabl as declared in the (sub)program from which g04bcf is called.

Constraint:

ldtabl \geq 6

11: $c (ldc, nt)$ – Real (Kind=nag_wp) array Output

On exit: the upper triangular part of c contains the variance-covariance matrix of the treatment effects, the strictly lower triangular part contains the standard errors of the difference between two treatment effects (means), i.e.,

c (i, j)

contains the covariance of treatment

i

and

j

j \geq i

and the standard error of the difference between treatment

i

and

j

j < i

, for

i = 1, 2, \dots, t

and

j = 1, 2, \dots, t

12: $ldc$ – Integer Input

On entry: the first dimension of the array c as declared in the (sub)program from which g04bcf is called.

Constraint:

ldc \geq nt

13: $irep (nt)$ – Integer array Output

On exit: if

nt > 1

, the treatment replications,

R_{l l}

, for

l = 1, 2, \dots, nt

. Otherwise irep is not referenced.

14: $rpmean (nrep)$ – Real (Kind=nag_wp) array Output

On exit: if

nrep > 1

rpmean (i)

contains the mean for the

i

th replicate,

\hat{μ} + {\hat{β}}_{i}

, for

i = 1, 2, \dots, b

. Otherwise rpmean is not referenced.

15: $rmean (nrep \times nrow)$ – Real (Kind=nag_wp) array Output

On exit:

rmean (j)

contains the mean for the

j

th row,

\hat{μ} + {\hat{ρ}}_{i}

, for

j = 1, 2, \dots, r

16: $cmean (nrep \times ncol)$ – Real (Kind=nag_wp) array Output

On exit:

cmean (k)

contains the mean for the

k

th column,

\hat{μ} + {\hat{γ}}_{k}

, for

k = 1, 2, \dots, c

17: $r (nrep \times nrow \times ncol)$ – Real (Kind=nag_wp) array Output

On exit: the residuals,

r_{i}

, for

i = 1, 2, \dots, n

18: $ef (nt)$ – Real (Kind=nag_wp) array Output

On exit: if

nt \geq 2

, the canonical efficiency factors. Otherwise ef is not referenced.

19: $tol$ – Real (Kind=nag_wp) Input

On entry: the tolerance value used to check for zero eigenvalues of the matrix

Ω

. If

tol = 0.0

a default value of

0.00001

is used.

Constraint:

tol \geq 0.0

20: $irdf$ – Integer Input

On entry: an adjustment to the degrees of freedom for the residual and total.

$irdf \geq 1$: The degrees of freedom for the total is set to $n - irdf$ and the residual degrees of freedom adjusted accordingly.
$irdf = 0$: The total degrees of freedom for the total is set to $n - 1$ , as usual.

Constraint:

irdf \geq 0

21: $wk (3 \times nt)$ – Real (Kind=nag_wp) array Workspace

22: $ifail$ – Integer Input/Output

On entry: ifail must be set to

0

−1

1

to set behaviour on detection of an error; these values have no effect when no error is detected.

A value of

0

causes the printing of an error message and program execution will be halted; otherwise program execution continues. A value of

−1

means that an error message is printed while a value of

1

means that it is not.

If halting is not appropriate, the value

−1

1

is recommended. If message printing is undesirable, then the value

1

is recommended. Otherwise, the value

−1

is recommended since useful values can be provided in some output arguments even when

ifail \neq 0

on exit. When the value $- 1$ or $1$ is used it is essential to test the value of ifail on exit.

On exit:

ifail = 0

unless the routine detects an error or a warning has been flagged (see Section 6).

6 Error Indicators and Warnings

If on entry

ifail = 0

−1

, explanatory error messages are output on the current error message unit (as defined by x04aaf).

Errors or warnings detected by the routine:

Note: in some cases g04bcf may return useful information.

$ifail = 1$: On entry, $irdf = ⟨ value ⟩$ .
Constraint: $irdf \geq 0$ .

On entry, $ldc = ⟨ value ⟩$ and $nt = ⟨ value ⟩$ .
Constraint: $ldc \geq nt$ .

On entry, $ldtabl = ⟨ value ⟩$ .
Constraint: $ldtabl \geq 6$ .

On entry, $ncol = ⟨ value ⟩$ .
Constraint: $ncol \geq 2$ .

On entry, $nrep = ⟨ value ⟩$ .
Constraint: $nrep \geq 1$ .

On entry, $nrow = ⟨ value ⟩$ .
Constraint: $nrow \geq 2$ .

On entry, $nt = ⟨ value ⟩$ .
Constraint: $nt \geq 1$ .

On entry, $tol = ⟨ value ⟩$ .
Constraint: $tol \geq 0.0$ .

$ifail = 2$: On entry, at least one treatment is not present. Treatment $⟨ value ⟩$ does not appear in it.

On entry, $i = ⟨ value ⟩$ , $it (i) = ⟨ value ⟩$ and $nt = ⟨ value ⟩$ .
Constraint: $1 \leq it (i) \leq nt$ .

$ifail = 3$: On entry, the values in y are constant.

$ifail = 4$: A computed standard error is zero. This is due to rounding errors and is an unlikely error exit.

The computation of the eigenvalues has failed to converge. This is an unlikely error exit.

$ifail = 5$: The treatments are totally confounded with blocks. The treatment sum of squares and degrees of freedom are zero. The analysis of variance table is not computed, except for block and total sums of squares and degrees of freedom.

$ifail = 6$: The residual degrees of freedom is zero. Columns $3, 4$ and $5$ of the analysis of variance table will not be computed and the matrix of standard errors and covariances, c, will not be scaled by $s$ or $s^{2}$ .

The residual mean square is zero. Columns $3, 4$ and $5$ of the analysis of variance table will not be computed and the matrix of standard errors and covariances, c, will not be scaled by $s$ or $s^{2}$ .

$ifail = 7$: The design is disconnected. The standard errors may not valid. The design may be nested.

$ifail = - 99$: An unexpected error has been triggered by this routine. Please contact NAG.
See Section 7 in the Introduction to the NAG Library FL Interface for further information.

$ifail = - 399$: Your licence key may have expired or may not have been installed correctly.
See Section 8 in the Introduction to the NAG Library FL Interface for further information.

$ifail = - 999$: Dynamic memory allocation failed.
See Section 9 in the Introduction to the NAG Library FL Interface for further information.

7 Accuracy

The algorithm used in g04bcf, described in Section 3, achieves greater accuracy than the traditional algorithms based on the subtraction of sums of squares.

8 Parallelism and Performance

g04bcf is threaded by NAG for parallel execution in multithreaded implementations of the NAG Library.

g04bcf makes calls to BLAS and/or LAPACK routines, which may be threaded within the vendor library used by this implementation. Consult the documentation for the vendor library for further information.

Please consult the X06 Chapter Introduction for information on how to control and interrogate the OpenMP environment used within this routine. Please also consult the Users' Note for your implementation for any additional implementation-specific information.

9 Further Comments

To estimate missing values the Healy and Westmacott procedure or its derivatives may be used, see John and Quenouille (1977). This is an iterative procedure in which estimates of the missing values are adjusted by subtracting the corresponding values of the residuals. The new estimates are then used in the analysis of variance. This process is repeated until convergence. A suitable initial value may be the grand mean. When using this procedure irdf should be set to the number of missing values plus one to obtain the correct degrees of freedom for the residual sum of squares.

For analysis of covariance the residuals are obtained from an analysis of variance of both the response variable and the covariates. The residuals from the response variable are then regressed on the residuals from the covariates using, say, g02cbf or g02daf. The results from those routines can be used to test for the significance of the covariates. To test the significance of the treatment effects after fitting the covariate, the residual sum of squares from the regression should be compared with the residual sum of squares obtained from the equivalent regression but using the residuals from fitting replicates, rows and columns only.

10 Example

The data for a

5 \times 5

Latin square is input and the ANOVA and treatment means computed and printed. Since the design is orthogonal only one standard error need be printed

g04bc: FL CL CPP AD

NAG FL Interfaceg04bcf (rowcol)

▸▿ Contents

1 Purpose

2 Specification

3 Description

4 References

5 Arguments

6 Error Indicators and Warnings

7 Accuracy

8 Parallelism and Performance

9 Further Comments

10 Example

10.1 Program Text

10.2 Program Data

10.3 Program Results

NAG FL Interface
g04bcf (rowcol)