hide long namesshow long names
hide short namesshow short names
Integer type:  int32  int64  nag_int  show int32  show int32  show int64  show int64  show nag_int  show nag_int

PDF version (NAG web site, 64-bit version, 64-bit version)
Chapter Contents
Chapter Introduction
NAG Toolbox

NAG Toolbox: nag_correg_pls_wold (g02lb)

 Contents

    1  Purpose
    2  Syntax
    7  Accuracy
    9  Example

Purpose

nag_correg_pls_wold (g02lb) fits an orthogonal scores partial least squares (PLS) regression by using Wold's iterative method.

Syntax

[xbar, ybar, xstd, ystd, xres, yres, w, p, t, c, u, xcv, ycv, ifail] = g02lb(x, isx, y, iscale, xstd, ystd, maxfac, 'n', n, 'mx', mx, 'ip', ip, 'my', my, 'maxit', maxit, 'tau', tau)
[xbar, ybar, xstd, ystd, xres, yres, w, p, t, c, u, xcv, ycv, ifail] = nag_correg_pls_wold(x, isx, y, iscale, xstd, ystd, maxfac, 'n', n, 'mx', mx, 'ip', ip, 'my', my, 'maxit', maxit, 'tau', tau)
Note: the interface to this routine has changed since earlier releases of the toolbox:
At Mark 24: tau and maxit where made optional

Description

Let X1 be the mean-centred n by m data matrix X of n observations on m predictor variables. Let Y1 be the mean-centred n by r data matrix Y of n observations on r response variables.
The first of the k factors PLS methods extract from the data predicts both X1 and Y1 by regressing on a t1 column vector of n scores:
X^1 = t1 p1T Y^1 = t1 c1T , with ​ t1T t1 = 1 ,  
where the column vectors of m x-loadings p1 and r y-loadings c1 are calculated in the least squares sense:
p1T = t1T X1 c1T = t1T Y1 .  
The x-score vector t1=X1w1 is the linear combination of predictor data X1 that has maximum covariance with the y-scores u1=Y1c1, where the x-weights vector w1 is the normalised first left singular vector of X1T Y1.
The method extracts subsequent PLS factors by repeating the above process with the residual matrices:
Xi = Xi-1 - X^ i-1 Yi = Yi-1 - Y^ i-1 , i=2,3,,k ,  
and with orthogonal scores:
tiT tj = 0 , j=1,2,,i-1 .  
Optionally, in addition to being mean-centred, the data matrices X1 and Y1 may be scaled by standard deviations of the variables. If data are supplied mean-centred, the calculations are not affected within numerical accuracy.

References

Wold H (1966) Estimation of principal components and related models by iterative least squares In: Multivariate Analysis (ed P R Krishnaiah) 391–420 Academic Press NY

Parameters

Compulsory Input Parameters

1:     xldxmx – double array
ldx, the first dimension of the array, must satisfy the constraint ldxn.
xij must contain the ith observation on the jth predictor variable, for i=1,2,,n and j=1,2,,mx.
2:     isxmx int64int32nag_int array
Indicates which predictor variables are to be included in the model.
isxj=1
The jth predictor variable (with variates in the jth column of X) is included in the model.
isxj=0
Otherwise.
Constraint: the sum of elements in isx must equal ip.
3:     yldymy – double array
ldy, the first dimension of the array, must satisfy the constraint ldyn.
yij must contain the ith observation for the jth response variable, for i=1,2,,n and j=1,2,,my.
4:     iscale int64int32nag_int scalar
Indicates how predictor variables are scaled.
iscale=1
Data are scaled by the standard deviation of variables.
iscale=2
Data are scaled by user-supplied scalings.
iscale=-1
No scaling.
Constraint: iscale=-1, 1 or 2.
5:     xstdip – double array
If iscale=2, xstdj must contain the user-supplied scaling for the jth predictor variable in the model, for j=1,2,,ip. Otherwise xstd need not be set.
6:     ystdmy – double array
If iscale=2, ystdj must contain the user-supplied scaling for the jth response variable in the model, for j=1,2,,my. Otherwise ystd need not be set.
7:     maxfac int64int32nag_int scalar
k, the number of latent variables to calculate.
Constraint: 1maxfacip.

Optional Input Parameters

1:     n int64int32nag_int scalar
Default: the first dimension of the arrays x, y. (An error is raised if these dimensions are not equal.)
n, the number of observations.
Constraint: n>1.
2:     mx int64int32nag_int scalar
Default: the dimension of the array isx and the second dimension of the array x. (An error is raised if these dimensions are not equal.)
The number of predictor variables.
Constraint: mx>1.
3:     ip int64int32nag_int scalar
Default: the dimension of the array xstd.
m, the number of predictor variables in the model.
Constraint: 1<ipmx.
4:     my int64int32nag_int scalar
Default: the dimension of the array ystd and the second dimension of the array y. (An error is raised if these dimensions are not equal.)
r, the number of response variables.
Constraint: my1.
5:     maxit int64int32nag_int scalar
Default: maxit=200
If my=1, maxit is not referenced; otherwise the maximum number of iterations used to calculate the x-weights.
Constraint: if my>1, maxit>1.
6:     tau – double scalar
Default: tau=1.0e−4
If my=1, tau is not referenced; otherwise the iterative procedure used to calculate the x-weights will halt if the Euclidean distance between two subsequent estimates is less than or equal to tau.
Constraint: if my>1, tau>0.0.

Output Parameters

1:     xbarip – double array
Mean values of predictor variables in the model.
2:     ybarmy – double array
The mean value of each response variable.
3:     xstdip – double array
If iscale=1, standard deviations of predictor variables in the model. Otherwise xstd is not changed.
4:     ystdmy – double array
If iscale=1, the standard deviation of each response variable. Otherwise ystd is not changed.
5:     xresldxresip – double array
The predictor variables' residual matrix Xk.
6:     yresldyresmy – double array
The residuals for each response variable, Yk.
7:     wldwmaxfac – double array
The jth column of W contains the x-weights wj, for j=1,2,,maxfac.
8:     pldpmaxfac – double array
The jth column of P contains the x-loadings pj, for j=1,2,,maxfac.
9:     tldtmaxfac – double array
The jth column of T contains the x-scores tj, for j=1,2,,maxfac.
10:   cldcmaxfac – double array
The jth column of C contains the y-loadings cj, for j=1,2,,maxfac.
11:   uldumaxfac – double array
The jth column of U contains the y-scores uj, for j=1,2,,maxfac.
12:   xcvmaxfac – double array
xcvj contains the cumulative percentage of variance in the predictor variables explained by the first j factors, for j=1,2,,maxfac.
13:   ycvldycvmy – double array
ycvij is the cumulative percentage of variance of the jth response variable explained by the first i factors, for i=1,2,,maxfac and j=1,2,,my.
14:   ifail int64int32nag_int scalar
ifail=0 unless the function detects an error (see Error Indicators and Warnings).

Error Indicators and Warnings

Errors or warnings detected by the function:
   ifail=1
Constraint: iscale=-1 or 1.
Constraint: mx>1.
Constraint: my1.
Constraint: n>1.
On entry, element _ of isx is invalid.
   ifail=2
Constraint: 1<ipmx.
Constraint: 1maxfacip.
Constraint: if my>1, maxit>1.
Constraint: if my>1, tau>0.0.
Constraint: ldcmy.
Constraint: ldpip.
Constraint: ldtn.
Constraint: ldun.
Constraint: ldwip.
Constraint: ldxresn.
Constraint: ldxn.
Constraint: ldycvmaxfac.
Constraint: ldyres<n.
Constraint: ldyn.
   ifail=3
On entry, ip is not equal to the sum of isx elements.
   ifail=-99
An unexpected error has been triggered by this routine. Please contact NAG.
   ifail=-399
Your licence key may have expired or may not have been installed correctly.
   ifail=-999
Dynamic memory allocation failed.

Accuracy

In general, the iterative method used in the calculations is less accurate (but faster) than the singular value decomposition approach adopted by nag_correg_pls_svd (g02la).

Further Comments

nag_correg_pls_wold (g02lb) allocates internally (n+r) elements of double storage.

Example

This example reads in data from an experiment to measure the biological activity in a chemical compound, and a PLS model is estimated.
function g02lb_example


fprintf('g02lb example results\n\n');

n = 15;
x = zeros(n,n);
x(:,1:8) = ...
       [-2.6931, -2.5271, -1.2871, 3.0777,  0.3891, -0.0701,  1.9607, -1.6324;
        -2.6931, -2.5271, -1.2871, 3.0777,  0.3891, -0.0701,  1.9607, -1.6324;
        -2.6931, -2.5271, -1.2871, 3.0777,  0.3891, -0.0701,  0.0744, -1.7333;
        -2.6931, -2.5271, -1.2871, 3.0777,  0.3891, -0.0701,  0.0744, -1.7333;
        -2.6931, -2.5271, -1.2871, 2.8369,  1.4092, -3.1398,  0.0744, -1.7333;
        -2.6931, -2.5271, -1.2871, 3.0777,  0.3891, -0.0701, -4.7548,  3.6521;
        -2.6931, -2.5271, -1.2871, 3.0777,  0.3891, -0.0701,  0.0744, -1.7333;
        -2.6931, -2.5271, -1.2871, 3.0777,  0.3891, -0.0701,  2.4064,  1.7438;
        -2.6931, -2.5271, -1.2871, 0.0744, -1.7333,  0.0902,  0.0744, -1.7333;
         2.2261, -5.3648,  0.3049, 3.0777,  0.3891, -0.0701,  0.0744, -1.7333;
        -4.1921, -1.0285, -0.9801, 3.0777,  0.3891, -0.0701,  0.0744, -1.7333;
        -4.9217,  1.2977,  0.4473, 3.0777,  0.3891, -0.0701,  0.0744, -1.7333;
        -2.6931, -2.5271, -1.2871, 3.0777,  0.3891, -0.0701,  2.2261, -5.3648;
        -2.6931, -2.5271, -1.2871, 3.0777,  0.3891, -0.0701, -4.9217,  1.2977;
        -2.6931, -2.5271, -1.2871, 3.0777,  0.3891, -0.0701, -4.1921, -1.0285];
x(:,9:n) = ...
       [ 0.5746,  1.9607, -1.6324, 0.5740,  2.8369,  1.4092, -3.1398;
         0.5746,  0.0744, -1.7333, 0.0902,  2.8369,  1.4092, -3.1398;
         0.0902,  1.9607, -1.6324, 0.5746,  2.8369,  1.4092, -3.1398;
         0.0902,  0.0744, -1.7333, 0.0902,  2.8369,  1.4092, -3.1398;
         0.0902,  0.0744, -1.7333, 0.0902,  2.8369,  1.4092, -3.1398;
         0.8524,  0.0744, -1.7333, 0.0902,  2.8369,  1.4092, -3.1398;
         0.0902,  0.0744, -1.7333, 0.0902, -1.2201,  0.8829,  2.2253;
         1.1057,  0.0744, -1.7333, 0.0902,  2.8369,  1.4092, -3.1398;
         0.0902,  0.0744, -1.7333, 0.0902,  2.8369,  1.4092, -3.1398;
         0.0902,  0.0744, -1.7333, 0.0902,  2.8369,  1.4092, -3.1398;
         0.0902,  0.0744, -1.7333, 0.0902,  2.8369,  1.4092, -3.1398;
         0.0902,  0.0744, -1.7333, 0.0902,  2.8369,  1.4092, -3.1398;
         0.3049,  2.2261, -5.3648, 0.3049,  2.8369,  1.4092, -3.1398;
         0.4473,  0.0744, -1.7333, 0.0902,  2.8369,  1.4092, -3.1398;
        -0.9801,  0.0744, -1.7333, 0.0902,  2.8369,  1.4092, -3.1398];
y    = [ 0;       0.28;    0.2;    0.51;    0.11;    2.73;    0.18;    1.53;
        -0.1;    -0.52;    0.4;    0.3;    -1;       1.57;    0.59];

isx = ones(n, 1, 'int64');
iscale = int64(1);
xstd = zeros(n, 1);
ystd = [0];
maxfac = int64(4);

% Fit a PLS model
[xbar, ybar, xstd, ystd, xres, yres, w, p, t, c, u, xcv, ycv, ifail] = ...
g02lb( ...
       x, isx, y, iscale, xstd, ystd, maxfac);

% Display results
disp('x-loadings, P');
disp(p);
disp('x-scores,   T');
disp(t);
disp('y-loadings, C');
disp(c);
disp('y-scores,   U');
disp(u);

fprintf('Explained variance\n');
fprintf(' Model effects   Dependent variable(s)\n');
fprintf('%12.6f    %12.6f\n',[xcv(1:maxfac) ycv(1:maxfac,1)]');


g02lb example results

x-loadings, P
   -0.6708   -1.0047    0.6505    0.6169
    0.4943    0.1355   -0.9010   -0.2388
   -0.4167   -1.9983   -0.5538    0.8474
    0.3930    1.2441   -0.6967   -0.4336
    0.3267    0.5838   -1.4088   -0.6323
    0.0145    0.9607    1.6594    0.5361
   -2.4471    0.3532   -1.1321   -1.3554
    3.5198    0.6005    0.2191    0.0380
    1.0973    2.0635   -0.4074   -0.3522
   -2.4466    2.5640   -0.4806    0.3819
    2.2732   -1.3110   -0.7686   -1.8959
   -1.7987    2.4088   -0.9475   -0.4727
    0.3629    0.2241   -2.6332    2.3739
    0.3629    0.2241   -2.6332    2.3739
   -0.3629   -0.2241    2.6332   -2.3739

x-scores,   T
   -0.1896    0.3898   -0.2502   -0.2479
    0.0201   -0.0013   -0.1726   -0.2042
   -0.1889    0.3141   -0.1727   -0.1350
    0.0210   -0.0773   -0.0950   -0.0912
   -0.0090   -0.2649   -0.4195   -0.1327
    0.5479    0.2843    0.1914    0.2727
   -0.0937   -0.0579    0.6799   -0.6129
    0.2500    0.2033   -0.1046   -0.1014
   -0.1005   -0.2992    0.2131    0.1223
   -0.1810   -0.4427    0.0559    0.2114
    0.0497   -0.0762   -0.1526   -0.0771
    0.0173   -0.2517   -0.2104    0.1044
   -0.6002    0.3596    0.1876    0.4812
    0.3796    0.1338    0.1410    0.1999
    0.0773   -0.2139    0.1085    0.2106

y-loadings, C
    3.5425    1.0475    0.2548    0.1866

y-scores,   U
   -1.7670    0.1812   -0.0600   -0.0320
   -0.6724   -0.2735   -0.0662   -0.0402
   -0.9852    0.4097    0.0158    0.0198
    0.2267   -0.0107    0.0180    0.0177
   -1.3370   -0.3619   -0.0173    0.0073
    8.9056    0.6000    0.0701    0.0422
   -1.0634    0.0332    0.0235   -0.0151
    4.2143    0.3184    0.0232    0.0219
   -2.1580   -0.2652    0.0153    0.0011
   -3.7999   -0.4520    0.0082    0.0034
   -0.2033   -0.2446   -0.0392   -0.0214
   -0.5942   -0.2398    0.0089    0.0165
   -5.6764    0.5487    0.0375    0.0185
    4.3707   -0.1161   -0.0639   -0.0535
    0.5395   -0.1274    0.0261    0.0139

Explained variance
 Model effects   Dependent variable(s)
   16.902124       89.638060
   29.674338       97.476270
   44.332404       97.939839
   56.172041       98.188474

PDF version (NAG web site, 64-bit version, 64-bit version)
Chapter Contents
Chapter Introduction
NAG Toolbox

© The Numerical Algorithms Group Ltd, Oxford, UK. 2009–2015