hide long namesshow long names
hide short namesshow short names
Integer type:  int32  int64  nag_int  show int32  show int32  show int64  show int64  show nag_int  show nag_int

PDF version (NAG web site, 64-bit version, 64-bit version)
Chapter Contents
Chapter Introduction
NAG Toolbox

NAG Toolbox: nag_correg_linregm_rssq (g02ea)

Purpose

nag_correg_linregm_rssq (g02ea) calculates the residual sums of squares for all possible linear regressions for a given set of independent variables.

Syntax

[nmod, modl, rss, nterms, mrank, ifail] = g02ea(mean, x, vname, isx, y, 'n', n, 'm', m, 'wt', wt)
[nmod, modl, rss, nterms, mrank, ifail] = nag_correg_linregm_rssq(mean, x, vname, isx, y, 'n', n, 'm', m, 'wt', wt)

Description

For a set of kk possible independent variables there are 2k2k linear regression models with from zero to kk independent variables in each model. For example if k = 3k=3 and the variables are AA, BB and CC then the possible models are:
(i) null model
(ii) AA
(iii) BB
(iv) CC
(v) AA and BB
(vi) AA and CC
(vii) BB and CC
(viii) AA, BB and CC.
nag_correg_linregm_rssq (g02ea) calculates the residual sums of squares from each of the 2k2k possible models. The method used involves a QRQR decomposition of the matrix of possible independent variables. Independent variables are then moved into and out of the model by a series of Givens rotations and the residual sums of squares computed for each model; see Clark (1981) and Smith and Bremner (1989).
The computed residual sums of squares are then ordered first by increasing number of terms in the model, then by decreasing size of residual sums of squares. So the first model will always have the largest residual sum of squares and the 2k2kth will always have the smallest. This aids you in selecting the best possible model from the given set of independent variables.
nag_correg_linregm_rssq (g02ea) allows you to specify some independent variables that must be in the model, the forced variables. The other independent variables from which the possible models are to be formed are the free variables.

References

Clark M R B (1981) A Givens algorithm for moving from one linear model to another without going back to the data Appl. Statist. 30 198–203
Smith D M and Bremner J M (1989) All possible subset regressions using the QRQR decomposition Comput. Statist. Data Anal. 7 217–236
Weisberg S (1985) Applied Linear Regression Wiley

Parameters

Compulsory Input Parameters

1:     mean – string (length ≥ 1)
Indicates if a mean term is to be included.
mean = 'M'mean='M'
A mean term, intercept, will be included in the model.
mean = 'Z'mean='Z'
The model will pass through the origin, zero-point.
Constraint: mean = 'M'mean='M' or 'Z''Z'.
2:     x(ldx,m) – double array
ldx, the first dimension of the array, must satisfy the constraint ldxnldxn.
x(i,j)xij must contain the iith observation for the jjth independent variable, for i = 1,2,,ni=1,2,,n and j = 1,2,,mj=1,2,,m.
3:     vname(m) – cell array of strings
m, the dimension of the array, must satisfy the constraint m2m2.
vname(j)vnamej must contain the name of the variable in column jj of x, for j = 1,2,,mj=1,2,,m.
4:     isx(m) – int64int32nag_int array
m, the dimension of the array, must satisfy the constraint m2m2.
Indicates which independent variables are to be considered in the model.
isx(j)2isxj2
The variable contained in the jjth column of x is included in all regression models, i.e., is a forced variable.
isx(j) = 1isxj=1
The variable contained in the jjth column of x is included in the set from which the regression models are chosen, i.e., is a free variable.
isx(j) = 0isxj=0
The variable contained in the jjth column of x is not included in the models.
Constraints:
  • isx(j)0isxj0, for j = 1,2,,mj=1,2,,m;
  • at least one value of isx = 1isx=1.
5:     y(n) – double array
n, the dimension of the array, must satisfy the constraint
  • n2n2
  • nmnm, is the number of independent variables to be considered (forced plus free plus mean if included), as specified by mean and isx
  • .
    y(i)yi must contain the iith observation on the dependent variable, yiyi, for i = 1,2,,ni=1,2,,n.

    Optional Input Parameters

    1:     n – int64int32nag_int scalar
    Default: The dimension of the array y and the first dimension of the array x. (An error is raised if these dimensions are not equal.)
    nn, the number of observations.
    Constraints:
    • n2n2;
    • nmnm, is the number of independent variables to be considered (forced plus free plus mean if included), as specified by mean and isx.
    2:     m – int64int32nag_int scalar
    Default: The dimension of the arrays vname, isx and the second dimension of the array x. (An error is raised if these dimensions are not equal.)
    The number of variables contained in x.
    Constraint: m2m2.
    3:     wt( : :) – double array
    Note: the dimension of the array wt must be at least nn if weight = 'W'weight='W'.
    If weight = 'W'weight='W', wt must contain the weights to be used in the weighted regression.
    If wt(i) = 0.0wti=0.0, the iith observation is not included in the model, in which case the effective number of observations is the number of observations with nonzero weights.
    If weight = 'U'weight='U', wt is not referenced and the effective number of observations is n.
    Constraint: if weight = 'W'weight='W', wt(i)0.0wti0.0, for i = 1,2,,ni=1,2,,n.

    Input Parameters Omitted from the MATLAB Interface

    weight ldx ldmodl wk

    Output Parameters

    1:     nmod – int64int32nag_int scalar
    The total number of models for which residual sums of squares have been calculated.
    2:     modl(ldmodl,m) – cell array of strings
    ldmodl = max (2k,m)ldmodl=max(2k,m).
    The first nterms(i)ntermsi elements of the iith row of modl contain the names of the independent variables, as given in vname, that are included in the iith model.
    3:     rss(ldmodl) – double array
    ldmodl = max (2k,m)ldmodl=max(2k,m).
    rss(i)rssi contains the residual sum of squares for the iith model, for i = 1,2,,nmodi=1,2,,nmod.
    4:     nterms(ldmodl) – int64int32nag_int array
    ldmodl = max (2k,m)ldmodl=max(2k,m).
    nterms(i)ntermsi contains the number of independent variables in the iith model, not including the mean if one is fitted, for i = 1,2,,nmodi=1,2,,nmod.
    5:     mrank(ldmodl) – int64int32nag_int array
    ldmodl = max (2k,m)ldmodl=max(2k,m).
    mrank(i)mranki contains the rank of the residual sum of squares for the iith model.
    6:     ifail – int64int32nag_int scalar
    ifail = 0ifail=0 unless the function detects an error (see [Error Indicators and Warnings]).

    Error Indicators and Warnings

    Errors or warnings detected by the function:
      ifail = 1ifail=1
    On entry,n < 2n<2,
    orm < 2m<2,
    orldx < nldx<n,
    orldmodl < mldmodl<m,
    ormean'M'mean'M' or 'Z''Z',
    or weight'U'weight'U' or 'W''W'.
      ifail = 2ifail=2
    On entry, weight = 'W'weight='W' and a value of wt < 0.0wt<0.0.
      ifail = 3ifail=3
    On entry,a value of isx < 0isx<0,
    orthere are no free variables, i.e., no element of isx = 1isx=1.
      ifail = 4ifail=4
    On entry, ldmodl < ldmodl< the number of possible models = 2k=2k, where kk is the number of free independent variables from isx.
      ifail = 5ifail=5
    On entry, the number of independent variables to be considered (forced plus free plus mean if included) is greater or equal to the effective number of observations.
      ifail = 6ifail=6
    The full model is not of full rank, i.e., some of the independent variables may be linear combinations of other independent variables. Variables must be excluded from the model in order to give full rank.

    Accuracy

    For a discussion of the improved accuracy obtained by using a method based on the QRQR decomposition see Smith and Bremner (1989).

    Further Comments

    nag_correg_linregm_rssq_stat (g02ec) may be used to compute R2R2 and CpCp-values from the results of nag_correg_linregm_rssq (g02ea).
    If a mean has been included in the model and no variables are forced in then rss(1)rss1 contains the total sum of squares and in many situations a reasonable estimate of the variance of the errors is given by rss(nmod) / (n1nterms(nmod))rssnmod/(n-1-ntermsnmod).

    Example

    function nag_correg_linregm_rssq_example
    mean_p = 'M';
    x = [0, 1125, 232, 7160, 85.9, 8905;
         7, 920, 268, 8804, 86.5, 7388;
         15, 835, 271, 8108, 85.2, 5348;
         22, 1000, 237, 6370, 83.8, 8056;
         29, 1150, 192, 6441, 82.1, 6960;
         37, 990, 202, 5154, 79.2, 5690;
         44, 840, 184, 5896, 81.2, 6932;
         58, 650, 200, 5336, 80.6, 5400;
         65, 640, 180, 5041, 78.4, 3177;
         72, 583, 165, 5012, 79.3, 4461;
         80, 570, 151, 4825, 78.7, 3901;
         86, 570, 171, 4391, 78, 5002;
         93, 510, 243, 4320, 72.3, 4665;
         100, 555, 147, 3709, 74.9, 4642;
         107, 460, 286, 3969, 74.4, 4840;
         122, 275, 198, 3558, 72.5, 4479;
         129, 510, 196, 4361, 57.7, 4200;
         151, 165, 210, 3301, 71.8, 3410;
         171, 244, 327, 2964, 72.5, 3360;
         220, 79, 334, 2777, 71.9, 2599];
    vname = {'DAY'; 'BOD'; 'TKN'; 'TS '; 'TVS'; 'COD'};
    isx = [int64(0);1;1;1;1;1];
    y = [1.5563;
         0.8976;
         0.7482;
         0.716;
         0.301;
         0.3617;
         0.1139;
         0.1139;
         -0.2218;
         -0.1549;
         0;
         0;
         -0.0969;
         -0.2218;
         -0.3979;
         -0.1549;
         -0.2218;
         -0.3979;
         -0.5229;
         -0.0458];
    [nmod, model, rss, nterms, mrank, ifail] = nag_correg_linregm_rssq(mean_p, x, vname, isx, y)
    
     
    
    nmod =
    
                       32
    
    
    model = 
    
        ''       ''       ''       ''       ''       ''
        'TKN'    ''       ''       ''       ''       ''
        'TVS'    ''       ''       ''       ''       ''
        'BOD'    ''       ''       ''       ''       ''
        'COD'    ''       ''       ''       ''       ''
        'TS '    ''       ''       ''       ''       ''
        'TKN'    'TVS'    ''       ''       ''       ''
        'BOD'    'TVS'    ''       ''       ''       ''
        'BOD'    'TKN'    ''       ''       ''       ''
        'BOD'    'COD'    ''       ''       ''       ''
        'TKN'    'TS '    ''       ''       ''       ''
        'TS '    'TVS'    ''       ''       ''       ''
        'BOD'    'TS '    ''       ''       ''       ''
        'TKN'    'COD'    ''       ''       ''       ''
        'TVS'    'COD'    ''       ''       ''       ''
        'TS '    'COD'    ''       ''       ''       ''
        'BOD'    'TKN'    'TVS'    ''       ''       ''
        'TKN'    'TS '    'TVS'    ''       ''       ''
        'BOD'    'TS '    'TVS'    ''       ''       ''
        'BOD'    'TVS'    'COD'    ''       ''       ''
        'BOD'    'TKN'    'COD'    ''       ''       ''
        'BOD'    'TKN'    'TS '    ''       ''       ''
        'TKN'    'TVS'    'COD'    ''       ''       ''
        'BOD'    'TS '    'COD'    ''       ''       ''
        'TS '    'TVS'    'COD'    ''       ''       ''
        'TKN'    'TS '    'COD'    ''       ''       ''
        'BOD'    'TKN'    'TS '    'TVS'    ''       ''
        'BOD'    'TKN'    'TVS'    'COD'    ''       ''
        'BOD'    'TS '    'TVS'    'COD'    ''       ''
        'BOD'    'TKN'    'TS '    'COD'    ''       ''
        'TKN'    'TS '    'TVS'    'COD'    ''       ''
        'BOD'    'TKN'    'TS '    'TVS'    'COD'    ''
    
    
    rss =
    
        5.0634
        5.0219
        2.5044
        2.0338
        1.5563
        1.5370
        2.4381
        1.7462
        1.5921
        1.4963
        1.4707
        1.4590
        1.4397
        1.4388
        1.3287
        1.0850
        1.4257
        1.3900
        1.3894
        1.3204
        1.2764
        1.2582
        1.2179
        1.0644
        1.0634
        0.9871
        1.2199
        1.1565
        1.0388
        0.9871
        0.9653
        0.9652
    
    
    nterms =
    
                        0
                        1
                        1
                        1
                        1
                        1
                        2
                        2
                        2
                        2
                        2
                        2
                        2
                        2
                        2
                        2
                        3
                        3
                        3
                        3
                        3
                        3
                        3
                        3
                        3
                        3
                        4
                        4
                        4
                        4
                        4
                        5
    
    
    mrank =
    
                       32
                       31
                       30
                       28
                       25
                       24
                       29
                       27
                       26
                       23
                       22
                       21
                       20
                       19
                       15
                        8
                       18
                       17
                       16
                       14
                       13
                       12
                       10
                        7
                        6
                        4
                       11
                        9
                        5
                        3
                        2
                        1
    
    
    ifail =
    
                        0
    
    
    
    function g02ea_example
    mean_p = 'M';
    x = [0, 1125, 232, 7160, 85.9, 8905;
         7, 920, 268, 8804, 86.5, 7388;
         15, 835, 271, 8108, 85.2, 5348;
         22, 1000, 237, 6370, 83.8, 8056;
         29, 1150, 192, 6441, 82.1, 6960;
         37, 990, 202, 5154, 79.2, 5690;
         44, 840, 184, 5896, 81.2, 6932;
         58, 650, 200, 5336, 80.6, 5400;
         65, 640, 180, 5041, 78.4, 3177;
         72, 583, 165, 5012, 79.3, 4461;
         80, 570, 151, 4825, 78.7, 3901;
         86, 570, 171, 4391, 78, 5002;
         93, 510, 243, 4320, 72.3, 4665;
         100, 555, 147, 3709, 74.9, 4642;
         107, 460, 286, 3969, 74.4, 4840;
         122, 275, 198, 3558, 72.5, 4479;
         129, 510, 196, 4361, 57.7, 4200;
         151, 165, 210, 3301, 71.8, 3410;
         171, 244, 327, 2964, 72.5, 3360;
         220, 79, 334, 2777, 71.9, 2599];
    vname = {'DAY'; 'BOD'; 'TKN'; 'TS '; 'TVS'; 'COD'};
    isx = [int64(0);1;1;1;1;1];
    y = [1.5563;
         0.8976;
         0.7482;
         0.716;
         0.301;
         0.3617;
         0.1139;
         0.1139;
         -0.2218;
         -0.1549;
         0;
         0;
         -0.0969;
         -0.2218;
         -0.3979;
         -0.1549;
         -0.2218;
         -0.3979;
         -0.5229;
         -0.0458];
    [nmod, model, rss, nterms, mrank, ifail] = g02ea(mean_p, x, vname, isx, y)
    
     
    
    nmod =
    
                       32
    
    
    model = 
    
        ''       ''       ''       ''       ''       ''
        'TKN'    ''       ''       ''       ''       ''
        'TVS'    ''       ''       ''       ''       ''
        'BOD'    ''       ''       ''       ''       ''
        'COD'    ''       ''       ''       ''       ''
        'TS '    ''       ''       ''       ''       ''
        'TKN'    'TVS'    ''       ''       ''       ''
        'BOD'    'TVS'    ''       ''       ''       ''
        'BOD'    'TKN'    ''       ''       ''       ''
        'BOD'    'COD'    ''       ''       ''       ''
        'TKN'    'TS '    ''       ''       ''       ''
        'TS '    'TVS'    ''       ''       ''       ''
        'BOD'    'TS '    ''       ''       ''       ''
        'TKN'    'COD'    ''       ''       ''       ''
        'TVS'    'COD'    ''       ''       ''       ''
        'TS '    'COD'    ''       ''       ''       ''
        'BOD'    'TKN'    'TVS'    ''       ''       ''
        'TKN'    'TS '    'TVS'    ''       ''       ''
        'BOD'    'TS '    'TVS'    ''       ''       ''
        'BOD'    'TVS'    'COD'    ''       ''       ''
        'BOD'    'TKN'    'COD'    ''       ''       ''
        'BOD'    'TKN'    'TS '    ''       ''       ''
        'TKN'    'TVS'    'COD'    ''       ''       ''
        'BOD'    'TS '    'COD'    ''       ''       ''
        'TS '    'TVS'    'COD'    ''       ''       ''
        'TKN'    'TS '    'COD'    ''       ''       ''
        'BOD'    'TKN'    'TS '    'TVS'    ''       ''
        'BOD'    'TKN'    'TVS'    'COD'    ''       ''
        'BOD'    'TS '    'TVS'    'COD'    ''       ''
        'BOD'    'TKN'    'TS '    'COD'    ''       ''
        'TKN'    'TS '    'TVS'    'COD'    ''       ''
        'BOD'    'TKN'    'TS '    'TVS'    'COD'    ''
    
    
    rss =
    
        5.0634
        5.0219
        2.5044
        2.0338
        1.5563
        1.5370
        2.4381
        1.7462
        1.5921
        1.4963
        1.4707
        1.4590
        1.4397
        1.4388
        1.3287
        1.0850
        1.4257
        1.3900
        1.3894
        1.3204
        1.2764
        1.2582
        1.2179
        1.0644
        1.0634
        0.9871
        1.2199
        1.1565
        1.0388
        0.9871
        0.9653
        0.9652
    
    
    nterms =
    
                        0
                        1
                        1
                        1
                        1
                        1
                        2
                        2
                        2
                        2
                        2
                        2
                        2
                        2
                        2
                        2
                        3
                        3
                        3
                        3
                        3
                        3
                        3
                        3
                        3
                        3
                        4
                        4
                        4
                        4
                        4
                        5
    
    
    mrank =
    
                       32
                       31
                       30
                       28
                       25
                       24
                       29
                       27
                       26
                       23
                       22
                       21
                       20
                       19
                       15
                        8
                       18
                       17
                       16
                       14
                       13
                       12
                       10
                        7
                        6
                        4
                       11
                        9
                        5
                        3
                        2
                        1
    
    
    ifail =
    
                        0
    
    
    

    PDF version (NAG web site, 64-bit version, 64-bit version)
    Chapter Contents
    Chapter Introduction
    NAG Toolbox

    © The Numerical Algorithms Group Ltd, Oxford, UK. 2009–2013