hide long namesshow long names
hide short namesshow short names
Integer type:  int32  int64  nag_int  show int32  show int32  show int64  show int64  show nag_int  show nag_int

PDF version (NAG web site, 64-bit version, 64-bit version)
Chapter Contents
Chapter Introduction
NAG Toolbox

NAG Toolbox: nag_correg_linregm_coeffs_noconst (g02ch)

Purpose

nag_correg_linregm_coeffs_noconst (g02ch) performs a multiple linear regression with no constant on a set of variables whose sums of squares and cross-products about zero and correlation-like coefficients are given.

Syntax

[result, coef, rznv, cz, ifail] = g02ch(n, sspz, rz, 'k1', k1)
[result, coef, rznv, cz, ifail] = nag_correg_linregm_coeffs_noconst(n, sspz, rz, 'k1', k1)
Note: the interface to this routine has changed since earlier releases of the toolbox:
Mark 23: k dropped from interface
.

Description

nag_correg_linregm_coeffs_noconst (g02ch) fits a curve of the form
y = b1x1 + b2x2 + + bkxk
y=b1x1+b2x2++bkxk
to the data points
(x11,x21,,xk1,y1)
(x12,x22,,xk2,y2)
(x1n,x2n,,xkn,yn)
(x11,x21,,xk1,y1) (x12,x22,,xk2,y2) (x1n,x2n,,xkn,yn)
such that
yi = b1x1i + b2x2i + + bkxki + ei,  i = 1,2,,n.
yi=b1x1i+b2x2i++bkxki+ei,  i=1,2,,n.
The function calculates the regression coefficients, b1,b2,,bkb1,b2,,bk, (and various other statistical quantities) by minimizing
n
ei2.
i = 1
i=1nei2.
The actual data values (x1i,x2i,,xki,yi)(x1i,x2i,,xki,yi) are not provided as input to the function. Instead, input to the function consists of:
(i) The number of cases, nn, on which the regression is based.
(ii) The total number of variables, dependent and independent, in the regression, (k + 1)(k+1).
(iii) The number of independent variables in the regression, kk.
(iv) The (k + 1)(k+1) by (k + 1)(k+1) matrix [ij][S~ij] of sums of squares and cross-products about zero of all the variables in the regression; the terms involving the dependent variable, yy, appear in the (k + 1)(k+1)th row and column.
(v) The (k + 1)(k+1) by (k + 1)(k+1) matrix [ij][R~ij] of correlation-like coefficients for all the variables in the regression; the correlations involving the dependent variable, yy, appear in the (k + 1)(k+1)th row and column.
The quantities calculated are:
(a) The inverse of the kk by kk partition of the matrix of correlation-like coefficients, [ij][R~ij], involving only the independent variables. The inverse is obtained using an accurate method which assumes that this sub-matrix is positive definite (see Section [Further Comments]).
(b) The modified matrix, C = [cij]C=[cij], where
cij = (ijij)/(ij),  i,j = 1,2,,k,
cij=R~ijr~ijS~ij,  i,j=1,2,,k,
where ijr~ij is the (i,j)(i,j)th element of the inverse matrix of [ij][R~ij] as described in (a) above. Each element of CC is thus the corresponding element of the matrix of correlation-like coefficients multiplied by the corresponding element of the inverse of this matrix, divided by the corresponding element of the matrix of sums of squares and cross-products about zero.
(c) The regression coefficients:
k
bi = cijj(k + 1),  i = 1,2,,k,
j = 1
bi=j=1kcijS~j(k+1),  i=1,2,,k,
where j(k + 1)S~j(k+1) is the sum of cross-products about zero for the independent variable xjxj and the dependent variable yy.
(d) The sum of squares attributable to the regression, SSRSSR, the sum of squares of deviations about the regression, SSDSSD, and the total sum of squares, SSTSST:
  • SST = (k + 1)(k + 1)SST=S~(k+1)(k+1), the sum of squares about zero for the dependent variable, yy;
  • SSR = j = 1kbjj(k + 1);  SSD = SSTSSRSSR=j=1kbjS~j(k+1);  SSD=SST-SSR.
(e) The degrees of freedom attributable to the regression, DFRDFR, the degrees of freedom of deviations about the regression, DFDDFD, and the total degrees of freedom, DFTDFT:
DFR = k;  DFD = nk;  DFT = n.
DFR=k;  DFD=n-k;  DFT=n.
(f) The mean square attributable to the regression, MSRMSR, and the mean square of deviations about the regression, MSDMSD:
MSR = SSR / DFR;  MSD = SSD / DFD.
MSR=SSR/DFR;  MSD=SSD/DFD.
(g) The FF value for the analysis of variance:
F = MSR / MSD.
F=MSR/MSD.
(h) The standard error estimate:
s = sqrt(MSD).
s=MSD.
(i) The coefficient of multiple correlation, RR, the coefficient of multiple determination, R2R2, and the coefficient of multiple determination corrected for the degrees of freedom, R2R-2:
R = sqrt(1(SSD)/(SST));  R2 = 1(SSD)/(SST);   R2 = 1(SSD × DFT)/(SST × DFD).
R=1-SSD SST ;  R2=1-SSD SST ;   R-2=1-SSD×DFT SST×DFD .
(j) The standard error of the regression coefficients:
se(bi) = sqrt(MSD × cii),   i = 1,2,,k.
se(bi)=MSD×cii,   i= 1,2,,k.
(k) The tt values for the regression coefficients:
t(bi) = (bi)/(se(bi)),  i = 1,2,,k.
t(bi)=bi se(bi) ,  i=1,2,,k.

References

Draper N R and Smith H (1985) Applied Regression Analysis (2nd Edition) Wiley

Parameters

Compulsory Input Parameters

1:     n – int64int32nag_int scalar
nn, the number of cases used in calculating the sums of squares and cross-products and correlation-like coefficients.
2:     sspz(ldsspz,k1) – double array
ldsspz, the first dimension of the array, must satisfy the constraint ldsspzk1ldsspzk1.
sspz(i,j)sspzij must be set to ijS~ij, the sum of cross-products about zero for the iith and jjth variables, for i = 1,2,,k + 1i=1,2,,k+1 and j = 1,2,,k + 1j=1,2,,k+1; terms involving the dependent variable appear in row k + 1k+1 and column k + 1k+1.
3:     rz(ldrz,k1) – double array
ldrz, the first dimension of the array, must satisfy the constraint ldrzk1ldrzk1.
rz(i,j)rzij must be set to ijR~ij, the correlation-like coefficient for the iith and jjth variables, for i = 1,2,,k + 1i=1,2,,k+1 and j = 1,2,,k + 1j=1,2,,k+1; coefficients involving the dependent variable appear in row k + 1k+1 and column k + 1k+1.

Optional Input Parameters

1:     k1 – int64int32nag_int scalar
Default: The first dimension of the arrays sspz, rz and the second dimension of the arrays sspz, rz. (An error is raised if these dimensions are not equal.)
The total number of variables, independent and dependent (k + 1)(k+1), in the regression.
Constraint: 2k1n2k1n.

Input Parameters Omitted from the MATLAB Interface

k ldsspz ldrz ldcoef ldrznv ldcz wkz ldwkz

Output Parameters

1:     result(1313) – double array
The following information:
result(1)result1 SSRSSR, the sum of squares attributable to the regression;
result(2)result2 DFRDFR, the degrees of freedom attributable to the regression;
result(3)result3 MSRMSR, the mean square attributable to the regression;
result(4)result4 FF, the FF value for the analysis of variance;
result(5)result5 SSDSSD, the sum of squares of deviations about the regression;
result(6)result6 DFDDFD, the degrees of freedom of deviations about the regression;
result(7)result7 MSDMSD, the mean square of deviations about the regression;
result(8)result8 SSTSST, the total sum of squares;
result(9)result9 DFTDFT, the total degrees of freedom;
result(10)result10 ss, the standard error estimate;
result(11)result11 RR, the coefficient of multiple correlation;
result(12)result12 R2R2, the coefficient of multiple determination;
result(13)result13 R2R-2, the coefficient of multiple determination corrected for the degrees of freedom.
2:     coef(ldcoef,33) – double array
ldcoefkldcoefk.
For i = 1,2,,ki=1,2,,k, the following information:
coef(i,1)coefi1
bibi, the regression coefficient for the iith variable.
coef(i,2)coefi2
se(bi)se(bi), the standard error of the regression coefficient for the iith variable.
coef(i,3)coefi3
t(bi)t(bi), the tt value of the regression coefficient for the iith variable.
3:     rznv(ldrznv,k) – double array
k = k11k=k1-1.
ldrznvkldrznvk.
The inverse of the matrix of correlation-like coefficients for the independent variables; that is, the inverse of the matrix consisting of the first kk rows and columns of rz.
4:     cz(ldcz,k) – double array
k = k11k=k1-1.
ldczkldczk.
The modified inverse matrix, CC, where
cz(i,j) = (rz(i,j) × rznv(i,j))/(sspz(i,j)),  i,j = 1,2,,k.
czij=rzij×rznvij sspzij ,  i,j=1,2,,k.
5:     ifail – int64int32nag_int scalar
ifail = 0ifail=0 unless the function detects an error (see [Error Indicators and Warnings]).

Error Indicators and Warnings

Errors or warnings detected by the function:
  ifail = 1ifail=1
On entry,k1 < 2k1<2.
  ifail = 2ifail=2
On entry,k1(k + 1)k1(k+1).
  ifail = 3ifail=3
On entry,n < k1n<k1.
  ifail = 4ifail=4
On entry,ldsspz < k1ldsspz<k1,
orldrz < k1ldrz<k1,
orldcoef < kldcoef<k,
orldrznv < kldrznv<k,
orldcz < kldcz<k,
orldwkz < kldwkz<k.
  ifail = 5ifail=5
This indicates that the kk by kk partition of the matrix held in rz, which is to be inverted, is not positive definite.
  ifail = 6ifail=6
This indicates that the refinement following the actual inversion fails, indicating that the kk by kk partition of the matrix held in rz, which is to be inverted, is ill-conditioned. The use of nag_correg_linregm_fit (g02da), which employs a different numerical technique, may avoid the difficulty.
  ifail = 7ifail=7

Accuracy

The accuracy of any regression function is almost entirely dependent on the accuracy of the matrix inversion method used. In nag_correg_linregm_coeffs_noconst (g02ch), it is the matrix of correlation-like coefficients rather than that of the sums of squares and cross-products about zero that is inverted; this means that all terms in the matrix for inversion are of a similar order, and reduces the scope for computational error. For details on absolute accuracy, the relevant section of the document describing the inversion function used, nag_linsys_real_posdef_solve_ref (f04ab), should be consulted. nag_correg_linregm_fit (g02da) uses a different method, based on nag_linsys_real_gen_lsqsol (f04am), and that function may well prove more reliable numerically. It does not handle missing values, nor does it provide the same output as this function.
If, in calculating FF or any of the t(bi)t(bi)  (see Section [Description]), the numbers involved are such that the result would be outside the range of numbers which can be stored by the machine, then the answer is set to the largest quantity which can be stored as a double variable, by means of a call to nag_machine_real_largest (x02al).

Further Comments

The time taken by nag_correg_linregm_coeffs_noconst (g02ch) depends on kk.
This function assumes that the matrix of correlation-like coefficients for the independent variables in the regression is positive definite; it fails if this is not the case.
This correlation matrix will in fact be positive definite whenever the correlation-like matrix and the sums of squares and cross-products (about zero) matrix have been formed either without regard to missing values, or by eliminating completely any cases involving missing values for any variable. If, however, these matrices are formed by eliminating cases with missing values from only those calculations involving the variables for which the values are missing, no such statement can be made, and the correlation-like matrix may or may not be positive definite. You should be aware of the possible dangers of using correlation matrices formed in this way (see the G02 Chapter Introduction), but if they nevertheless wish to carry out regressions using such matrices, this function is capable of handling the inversion of such matrices, provided they are positive definite.
If a matrix is positive definite, its subsequent re-organisation by either of nag_correg_linregm_service_select (g02ce) or nag_correg_linregm_service_reorder (g02cf) will not affect this property and the new matrix can safely be used in this function. Thus correlation matrices produced by any of nag_correg_coeffs_zero (g02bd), nag_correg_coeffs_zero_miss_case (g02be), nag_correg_coeffs_zero_subset (g02bk) or nag_correg_coeffs_zero_subset_miss_case (g02bl), even if subsequently modified by either nag_correg_linregm_service_select (g02ce) or nag_correg_linregm_service_reorder (g02cf), can be handled by this function.
It should be noted that the function requires the dependent variable to be the last of the k + 1k+1 variables whose statistics are provided as input to the function. If this variable is not correctly positioned in the original data, the means, standard deviations, sums of squares and cross-products about zero, and correlation-like coefficients can be manipulated by using nag_correg_linregm_service_select (g02ce) or nag_correg_linregm_service_reorder (g02cf) to reorder the variables as necessary.

Example

function nag_correg_linregm_coeffs_noconst_example
n = int64(5);
sspz = [245, 99, 82;
     99, 271, 52;
     82, 52, 54];
rz = [1, 0.3842, 0.7129;
     0.3842, 1, 0.4299;
     0.7129, 0.4299, 1];
[result, coeff, rzinv, cz, ifail] = nag_correg_linregm_coeffs_noconst(n, sspz, rz)
 

result =

   28.9857
    2.0000
   14.4929
    1.7382
   25.0143
    3.0000
    8.3381
   54.0000
    5.0000
    2.8876
    0.7326
    0.5368
    0.2280


coeff =

    0.3017    0.1998    1.5098
    0.0817    0.1900    0.4299


rzinv =

    1.1732   -0.4507
   -0.4507    1.1732


cz =

    0.0048   -0.0017
   -0.0017    0.0043


ifail =

                    0


function g02ch_example
n = int64(5);
sspz = [245, 99, 82;
     99, 271, 52;
     82, 52, 54];
rz = [1, 0.3842, 0.7129;
     0.3842, 1, 0.4299;
     0.7129, 0.4299, 1];
[result, coeff, rzinv, cz, ifail] = g02ch(n, sspz, rz)
 

result =

   28.9857
    2.0000
   14.4929
    1.7382
   25.0143
    3.0000
    8.3381
   54.0000
    5.0000
    2.8876
    0.7326
    0.5368
    0.2280


coeff =

    0.3017    0.1998    1.5098
    0.0817    0.1900    0.4299


rzinv =

    1.1732   -0.4507
   -0.4507    1.1732


cz =

    0.0048   -0.0017
   -0.0017    0.0043


ifail =

                    0



PDF version (NAG web site, 64-bit version, 64-bit version)
Chapter Contents
Chapter Introduction
NAG Toolbox

© The Numerical Algorithms Group Ltd, Oxford, UK. 2009–2013