hide long namesshow long names
hide short namesshow short names
Integer type:  int32  int64  nag_int  show int32  show int32  show int64  show int64  show nag_int  show nag_int

PDF version (NAG web site, 64-bit version, 64-bit version)
Chapter Contents
Chapter Introduction
NAG Toolbox

NAG Toolbox: nag_anova_factorial (g04ca)

Purpose

nag_anova_factorial (g04ca) computes an analysis of variance table and treatment means for a complete factorial design.

Syntax

[table, itotal, tmean, e, imean, semean, bmean, r, ifail] = g04ca(y, lfac, nblock, inter, irdf, mterm, maxt, 'n', n, 'nfac', nfac)
[table, itotal, tmean, e, imean, semean, bmean, r, ifail] = nag_anova_factorial(y, lfac, nblock, inter, irdf, mterm, maxt, 'n', n, 'nfac', nfac)

Description

An experiment consists of a collection of units, or plots, to which a number of treatments are applied. In a factorial experiment the effects of several different sets of conditions are compared, e.g., three different temperatures, T1T1, T2T2 and T3T3, and two different pressures, P1P1 and P2P2. The conditions are known as factors and the different values the conditions take are known as levels. In a factorial experiment the experimental treatments are the combinations of all the different levels of all factors, e.g.,
T1P1, T2P1, T3P1
T1P2, T2P2, T3P2
T1P1, T2P1, T3P1 T1P2, T2P2, T3P2
The effect of a factor averaged over all other factors is known as a main effect, and the effect of a combination of some of the factors averaged over all other factors is known as an interaction. This can be represented by a linear model. In the above example if the response was yijkyijk for the kkth replicate of the iith level of TT and the jjth level of PP the linear model would be
yijk = μ + ti + pj + γij + eijk
yijk = μ+ ti+ pj+ γij+ eijk
where μμ is the overall mean, titi is the main effect of TT, pjpj is the main effect of PP, γijγij is the T × PT×P interaction and eijkeijk is the random error term. In order to find unique estimates constraints are placed on the parameters estimates. For the example here these are:
3
i = 0,
i = 1
2
j = 0,
j = 1
3
γ̂ij = 0,
i = 1
for ​j = 1,2​ and
2
γ̂ij = 0,
j = 1
for ​ i = 1,2,3 ,
i=13t^i=0, j=12p^j=0, i=1 3 γ^ij = 0 , for ​j=1,2​ and j=1 2 γ^ ij = 0 , for ​ i=1,2,3 ,
where ​ ​^​ ​^ denotes the estimate.
If there is variation in the experimental conditions (e.g., in an experiment on the production of a material different batches of raw material may be used, or the experiment may be carried out on different days), then plots that are similar are grouped together into blocks. For a balanced complete factorial experiment all the treatment combinations occur the same number of times in each block.
nag_anova_factorial (g04ca) computes the analysis of variance (ANOVA) table by sequentially computing the totals and means for an effect from the residuals computed when previous effects have been removed. The effect sum of squares is the sum of squared totals divided by the number of observations per total. The means are then subtracted from the residuals to compute a new set of residuals. At the same time the means for the original data are computed. When all effects are removed the residual sum of squares is computed from the residuals. Given the sums of squares an ANOVA table is then computed along with standard errors for the difference in treatment means.
The data for nag_anova_factorial (g04ca) has to be in standard order given by the order of the factors. Let there be kk factors, f1,f2,,fkf1,f2,,fk in that order with levels l1,l2,,lkl1,l2,,lk respectively. Standard order requires the levels of factor f1f1 are in order 1,2,,l11,2,,l1 and within each level of f1f1 the levels of f2f2 are in order 1,2,,l21,2,,l2 and so on.
For an experiment with blocks the data is for block 11 then for block 22, etc. Within each block the data must be arranged so that the levels of factor f1f1 are in order 1,2,,l11,2,,l1 and within each level of f1f1 the levels of f2f2 are in order 1,2,,l21,2,,l2 and so on. Any within block replication of treatment combinations must occur within the levels of fkfk.
The ANOVA table is given in the following order. For a complete factorial experiment the first row is for blocks, if present, then the main effects of the factors in their order, e.g., f1f1 followed by f2f2, etc. These are then followed by all the two factor interactions then all the three factor interactions, etc., the last two rows being for the residual and total sums of squares. The interactions are arranged in lexical order for the given factor order. For example, for the three factor interactions for a five factor experiment the 1010 interactions would be in the following order:
f1f2f3
f1f2f4
f1f2f5
f1f3f4
f1f3f5
f1f4f5
f2f3f4
f2f3f5
f2f4f5
f3f4f5
f1f2f3 f1f2f4 f1f2f5 f1f3f4 f1f3f5 f1f4f5 f2f3f4 f2f3f5 f2f4f5 f3f4f5

References

Cochran W G and Cox G M (1957) Experimental Designs Wiley
Davis O L (1978) The Design and Analysis of Industrial Experiments Longman
John J A and Quenouille M H (1977) Experiments: Design and Analysis Griffin

Parameters

Compulsory Input Parameters

1:     y(n) – double array
n, the dimension of the array, must satisfy the constraint
  • n4n4
  • if nblock > 1nblock>1, n must be a multiple of nblock
  • n must be a multiple of the number of treatment combinations, that is a multiple of i = 1klfac(i)i=1klfaci
  • .
    The observations in standard order, see Section [Description].
    2:     lfac(nfac) – int64int32nag_int array
    nfac, the dimension of the array, must satisfy the constraint nfac1nfac1.
    lfac(i)lfaci must contain the number of levels for the iith factor, for i = 1,2,,ki=1,2,,k.
    Constraint: lfac(i)2lfaci2, for i = 1,2,,ki=1,2,,k.
    3:     nblock – int64int32nag_int scalar
    The number of blocks. If there are no blocks, set nblock = 0nblock=0 or 11.
    Constraints:
    • nblock0nblock0;
    • if nblock2nblock2, n / nblockn/nblock must be a multiple of the number of treatment combinations, that is a multiple of i = 1klfac(i)i=1klfaci.
    4:     inter – int64int32nag_int scalar
    The maximum number of factors in an interaction term. If no interaction terms are to be computed, set inter = 0inter=0 or 11.
    Constraint: 0internfac0internfac.
    5:     irdf – int64int32nag_int scalar
    The adjustment to the residual and total degrees of freedom. The total degrees of freedom are set to nirdfn-irdf and the residual degrees of freedom adjusted accordingly. For examples of the use of irdf see Section [Further Comments].
    Constraint: irdf0irdf0.
    6:     mterm – int64int32nag_int scalar
    The maximum number of terms in the analysis of variance table, see Section [Further Comments].
    Constraint: mtermmterm must be large enough to contain the terms specified by nfac, inter and nblock. If the function exits with ifail2ifail2, the required minimum value of mterm is returned in itotal.
    7:     maxt – int64int32nag_int scalar
    The maximum number of treatment means to be computed, see Section [Further Comments]. If the value of maxt is too small for the required analysis then the minimum number is returned in imean(1)imean1.
    Constraint: maxtmaxt must be large enough for the number of means specified by lfac and inter; if inter = nfacinter=nfac then maxti = 1k(lfac(i) + 1)1maxti=1k(lfaci+1)-1.

    Optional Input Parameters

    1:     n – int64int32nag_int scalar
    Default: The dimension of the array y.
    The number of observations.
    Constraints:
    • n4n4;
    • if nblock > 1nblock>1, n must be a multiple of nblock;
    • n must be a multiple of the number of treatment combinations, that is a multiple of i = 1klfac(i)i=1klfaci.
    2:     nfac – int64int32nag_int scalar
    Default: The dimension of the array lfac.
    kk, the number of factors.
    Constraint: nfac1nfac1.

    Input Parameters Omitted from the MATLAB Interface

    iwk

    Output Parameters

    1:     table(mterm,55) – double array
    The first itotal rows of table contain the analysis of variance table. The first column contains the degrees of freedom, the second column contains the sum of squares, the third column (except for the row corresponding to the total sum of squares) contains the mean squares, i.e., the sums of squares divided by the degrees of freedom, and the fourth and fifth columns contain the FF ratio and significance level, respectively (except for rows corresponding to the total sum of squares, and the residual sum of squares). All other cells of the table are set to zero.
    The first row corresponds to the blocks and is set to zero if there are no blocks. The itotalth row corresponds to the total sum of squares for y and the (itotal1)(itotal-1)th row corresponds to the residual sum of squares. The central rows of the table correspond to the main effects followed by the interaction if specified by inter. The main effects are in the order specified by lfac and the interactions are in lexical order, see Section [Description].
    2:     itotal – int64int32nag_int scalar
    The row in table corresponding to the total sum of squares. The number of treatment effects is itotal3itotal-3.
    3:     tmean(maxt) – double array
    The treatment means. The position of the means for an effect is given by the index in imean. For a given effect the means are in standard order, see Section [Description].
    4:     e(maxt) – double array
    The estimated effects in the same order as for the means in tmean.
    5:     imean(mterm) – int64int32nag_int array
    Indicates the position of the effect means in tmean. The effect means corresponding to the first treatment effect in the ANOVA table are stored in tmean(1)tmean1 up to tmean(imean(1))tmean(imean1). Other effect means corresponding to the iith treatment effect, i = 1,2,,itotal3i=1,2,,itotal-3, are stored in tmean(imean(i1) + 1)tmeanimeani-1+1 up to tmean(imean(i))tmeanimeani.
    6:     semean(mterm) – double array
    The standard error of the difference between means corresponding to the iith treatment effect in the ANOVA table.
    7:     bmean(nblock + 1nblock+1) – double array
    bmean(1)bmean1 contains the grand mean, if nblock > 1nblock>1, bmean(2)bmean2 up to bmean(nblock + 1)bmeannblock+1 contain the block means.
    8:     r(n) – double array
    The residuals.
    9:     ifail – int64int32nag_int scalar
    ifail = 0ifail=0 unless the function detects an error (see [Error Indicators and Warnings]).

    Error Indicators and Warnings

    Errors or warnings detected by the function:
      ifail = 1ifail=1
    On entry,n < 4n<4,
    ornfac < 1nfac<1,
    ornblock < 0nblock<0,
    orinter < 0inter<0,
    orinter > nfacinter>nfac,
    orirdf < 0irdf<0.
      ifail = 2ifail=2
    On entry,lfac(i)1lfaci1, for some i = 1,2,,nfaci=1,2,,nfac,
    orthe value of maxt is too small,
    orthe value of mterm is too small,
    ornblock > 1nblock>1 and n is not a multiple of nblock,
    orthe number of plots per block is not a multiple of the number of treatment combinations.
      ifail = 3ifail=3
    On entry,the values of y are constant.
      ifail = 4ifail=4
    There are no degrees of freedom for the residual or the residual sum of squares is zero. In either case the standard errors and FF-statistics cannot be computed.

    Accuracy

    The block and treatment sums of squares are computed from the block and treatment residual totals. The residuals are updated as each effect is computed and the residual sum of squares computed directly from the residuals. This avoids any loss of accuracy in subtracting sums of squares.

    Further Comments

    The number of rows in the ANOVA table and the number of treatment means are given by the following formulae.
    Let there be kk factors with levels lili for i = 1,2,,ki=1,2,,k, and let tt be the maximum number of terms in an interaction then the number of rows in the ANOVA tables is:
    t
    (k)
    i
    + 3.
    i = 1
    i=1t k i +3.
    The number of treatment means is:
    t
     lj,
    i = 1 j Si
    i =1 t j S i l j ,
    where SiSi is the set of all combinations of the kk factors ii at a time.
    To estimate missing values the Healy and Westmacott procedure or its derivatives may be used, see John and Quenouille (1977). This is an iterative procedure in which estimates of the missing values are adjusted by subtracting the corresponding values of the residuals. The new estimates are then used in the analysis of variance. This process is repeated until convergence. A suitable initial value may be the grand mean. When using this procedure irdf should be set to the number of missing values plus one to obtain the correct degrees of freedom for the residual sum of squares.
    For analysis of covariance the residuals are obtained from an analysis of variance of both the response variable and the covariates. The residuals from the response variable are then regressed on the residuals from the covariates using, say, nag_correg_linregs_noconst (g02cb) or nag_correg_linregm_fit (g02da). The coefficients obtained from the regression can be examined for significance and used to produce an adjusted dependent variable using the original response variable and covariate. An approximate adjusted analysis of variance table can then be produced by using the adjusted dependent variable. In this case irdf should be set to one plus the number of fitted covariates.
    For designs such as Latin squares one more of the blocking factors has to be removed in a preliminary analysis before the final analysis. This preliminary analysis can be performed using nag_anova_random (g04bb) or a prior call to nag_anova_factorial (g04ca) if the data is reordered between calls. The residuals from the preliminary analysis are then input to nag_anova_factorial (g04ca). In these cases irdf should be set to the difference between n and the residual degrees of freedom from preliminary analysis. Care should be taken when using this approach as there is no check on the orthogonality of the two analyses.

    Example

    function nag_anova_factorial_example
    y = [274;
         361;
         253;
         325;
         317;
         339;
         326;
         402;
         336;
         379;
         345;
         361;
         352;
         334;
         318;
         339;
         393;
         358;
         350;
         340;
         203;
         397;
         356;
         298;
         382;
         376;
         355;
         418;
         387;
         379;
         432;
         339;
         293;
         322;
         417;
         342;
         82;
         297;
         133;
         306;
         352;
         361;
         220;
         333;
         270;
         388;
         379;
         274;
         336;
         307;
         266;
         389;
         333;
         353];
    lfac = [int64(6);3];
    nblock = int64(3);
    inter = int64(2);
    irdf = int64(0);
    mterm = int64(6);
    maxt = int64(27);
    [table, itotal, tmean, e, imean, semean, bmean, r, ifail] = ...
        nag_anova_factorial(y, lfac, nblock, inter, irdf, mterm, maxt)
    
     
    
    table =
    
       1.0e+05 *
    
        0.0000    0.3012    0.1506    0.0001    0.0000
        0.0001    0.7301    0.1460    0.0001    0.0000
        0.0000    0.2160    0.1080    0.0001    0.0000
        0.0001    0.3119    0.0312    0.0000    0.0000
        0.0003    0.6663    0.0196         0         0
        0.0005    2.2254         0         0         0
    
    
    itotal =
    
                        6
    
    
    tmean =
    
      254.7778
      339.0000
      333.3333
      367.7778
      330.7778
      360.6667
      334.2778
      353.7778
      305.1111
      235.3333
      332.6667
      196.3333
      342.6667
      341.6667
      332.6667
      309.3333
      370.3333
      320.3333
      395.0000
      370.3333
      338.0000
      373.3333
      326.6667
      292.3333
      350.0000
      381.0000
      351.0000
    
    
    e =
    
      -76.2778
        7.9444
        2.2778
       36.7222
       -0.2778
       29.6111
        3.2222
       22.7222
      -25.9444
      -22.6667
       55.1667
      -32.5000
        0.4444
      -20.0556
       19.6111
      -27.2222
       14.2778
       12.9444
       24.0000
      -20.1667
       -3.8333
       39.3333
      -26.8333
      -12.5000
      -13.8889
       -2.3889
       16.2778
    
    
    imean =
    
                        6
                        9
                       27
                        0
                        0
                        0
    
    
    semean =
    
       20.8681
       14.7560
       36.1446
             0
             0
             0
    
    
    bmean =
    
      331.0556
      339.5556
      354.7778
      298.8333
    
    
    r =
    
       30.1667
       19.8333
       48.1667
      -26.1667
      -33.1667
       -2.1667
        8.1667
       23.1667
        7.1667
      -24.5000
      -33.8333
       14.5000
      -29.8333
       -1.1667
       17.1667
      -19.5000
        3.5000
       -1.5000
       90.9444
      -16.3889
      -17.0556
       30.6111
       -9.3889
      -58.3889
       48.9444
      -18.0556
       10.9444
       -0.7222
       -7.0556
       17.2778
       34.9444
      -11.3889
      -23.0556
      -51.7222
       12.2778
      -32.7222
     -121.1111
       -3.4444
      -31.1111
       -4.4444
       42.5556
       60.5556
      -57.1111
       -5.1111
      -18.1111
       25.2222
       40.8889
      -31.7778
       -5.1111
       12.5556
        5.8889
       71.2222
      -15.7778
       34.2222
    
    
    ifail =
    
                        0
    
    
    
    function g04ca_example
    y = [274;
         361;
         253;
         325;
         317;
         339;
         326;
         402;
         336;
         379;
         345;
         361;
         352;
         334;
         318;
         339;
         393;
         358;
         350;
         340;
         203;
         397;
         356;
         298;
         382;
         376;
         355;
         418;
         387;
         379;
         432;
         339;
         293;
         322;
         417;
         342;
         82;
         297;
         133;
         306;
         352;
         361;
         220;
         333;
         270;
         388;
         379;
         274;
         336;
         307;
         266;
         389;
         333;
         353];
    lfac = [int64(6);3];
    nblock = int64(3);
    inter = int64(2);
    irdf = int64(0);
    mterm = int64(6);
    maxt = int64(27);
    [table, itotal, tmean, e, imean, semean, bmean, r, ifail] = ...
        g04ca(y, lfac, nblock, inter, irdf, mterm, maxt)
    
     
    
    table =
    
       1.0e+05 *
    
        0.0000    0.3012    0.1506    0.0001    0.0000
        0.0001    0.7301    0.1460    0.0001    0.0000
        0.0000    0.2160    0.1080    0.0001    0.0000
        0.0001    0.3119    0.0312    0.0000    0.0000
        0.0003    0.6663    0.0196         0         0
        0.0005    2.2254         0         0         0
    
    
    itotal =
    
                        6
    
    
    tmean =
    
      254.7778
      339.0000
      333.3333
      367.7778
      330.7778
      360.6667
      334.2778
      353.7778
      305.1111
      235.3333
      332.6667
      196.3333
      342.6667
      341.6667
      332.6667
      309.3333
      370.3333
      320.3333
      395.0000
      370.3333
      338.0000
      373.3333
      326.6667
      292.3333
      350.0000
      381.0000
      351.0000
    
    
    e =
    
      -76.2778
        7.9444
        2.2778
       36.7222
       -0.2778
       29.6111
        3.2222
       22.7222
      -25.9444
      -22.6667
       55.1667
      -32.5000
        0.4444
      -20.0556
       19.6111
      -27.2222
       14.2778
       12.9444
       24.0000
      -20.1667
       -3.8333
       39.3333
      -26.8333
      -12.5000
      -13.8889
       -2.3889
       16.2778
    
    
    imean =
    
                        6
                        9
                       27
                        0
                        0
                        0
    
    
    semean =
    
       20.8681
       14.7560
       36.1446
             0
             0
             0
    
    
    bmean =
    
      331.0556
      339.5556
      354.7778
      298.8333
    
    
    r =
    
       30.1667
       19.8333
       48.1667
      -26.1667
      -33.1667
       -2.1667
        8.1667
       23.1667
        7.1667
      -24.5000
      -33.8333
       14.5000
      -29.8333
       -1.1667
       17.1667
      -19.5000
        3.5000
       -1.5000
       90.9444
      -16.3889
      -17.0556
       30.6111
       -9.3889
      -58.3889
       48.9444
      -18.0556
       10.9444
       -0.7222
       -7.0556
       17.2778
       34.9444
      -11.3889
      -23.0556
      -51.7222
       12.2778
      -32.7222
     -121.1111
       -3.4444
      -31.1111
       -4.4444
       42.5556
       60.5556
      -57.1111
       -5.1111
      -18.1111
       25.2222
       40.8889
      -31.7778
       -5.1111
       12.5556
        5.8889
       71.2222
      -15.7778
       34.2222
    
    
    ifail =
    
                        0
    
    
    

    PDF version (NAG web site, 64-bit version, 64-bit version)
    Chapter Contents
    Chapter Introduction
    NAG Toolbox

    © The Numerical Algorithms Group Ltd, Oxford, UK. 2009–2013