Integer type:  int32  int64  nag_int  show int32  show int32  show int64  show int64  show nag_int  show nag_int

Chapter Contents
Chapter Introduction
NAG Toolbox

NAG Toolbox: nag_correg_pls_svd (g02la)

Purpose

nag_correg_pls_svd (g02la) fits an orthogonal scores partial least squares (PLS) regression by using singular value decomposition.

Syntax

[xbar, ybar, xstd, ystd, xres, yres, w, p, t, c, u, xcv, ycv, ifail] = g02la(x, isx, y, iscale, xstd, ystd, maxfac, 'n', n, 'mx', mx, 'ip', ip, 'my', my)
[xbar, ybar, xstd, ystd, xres, yres, w, p, t, c, u, xcv, ycv, ifail] = nag_correg_pls_svd(x, isx, y, iscale, xstd, ystd, maxfac, 'n', n, 'mx', mx, 'ip', ip, 'my', my)

Description

Let ${X}_{1}$ be the mean-centred $n$ by $m$ data matrix $X$ of $n$ observations on $m$ predictor variables. Let ${Y}_{1}$ be the mean-centred $n$ by $r$ data matrix $Y$ of $n$ observations on $r$ response variables.
The first of the $k$ factors PLS methods extract from the data predicts both ${X}_{1}$ and ${Y}_{1}$ by regressing on ${t}_{1}$ a column vector of $n$ scores:
 $X^1 = t1 p1T Y^1 = t1 c1T , with ​ t1T t1 = 1 ,$
where the column vectors of $m$ $x$-loadings ${p}_{1}$ and $r$ $y$-loadings ${c}_{1}$ are calculated in the least squares sense:
 $p1T = t1T X1 c1T = t1T Y1 .$
The $x$-score vector ${t}_{1}={X}_{1}{w}_{1}$ is the linear combination of predictor data ${X}_{1}$ that has maximum covariance with the $y$-scores ${u}_{1}={Y}_{1}{c}_{1}$, where the $x$-weights vector ${w}_{1}$ is the normalised first left singular vector of ${X}_{1}^{\mathrm{T}}{Y}_{1}$.
The method extracts subsequent PLS factors by repeating the above process with the residual matrices:
 $Xi = Xi-1 - X^ i-1 Yi = Yi-1 - Y^ i-1 , i=2,3,…,k ,$
and with orthogonal scores:
 $tiT tj = 0 , j=1,2,…,i-1 .$
Optionally, in addition to being mean-centred, the data matrices ${X}_{1}$ and ${Y}_{1}$ may be scaled by standard deviations of the variables. If data are supplied mean-centred, the calculations are not affected within numerical accuracy.

None.

Parameters

Compulsory Input Parameters

1:     $\mathrm{x}\left(\mathit{ldx},{\mathbf{mx}}\right)$ – double array
ldx, the first dimension of the array, must satisfy the constraint $\mathit{ldx}\ge {\mathbf{n}}$.
${\mathbf{x}}\left(\mathit{i},\mathit{j}\right)$ must contain the $\mathit{i}$th observation on the $\mathit{j}$th predictor variable, for $\mathit{i}=1,2,\dots ,{\mathbf{n}}$ and $\mathit{j}=1,2,\dots ,{\mathbf{mx}}$.
2:     $\mathrm{isx}\left({\mathbf{mx}}\right)$int64int32nag_int array
Indicates which predictor variables are to be included in the model.
${\mathbf{isx}}\left(j\right)=1$
The $j$th predictor variable (with variates in the $j$th column of $X$) is included in the model.
${\mathbf{isx}}\left(j\right)=0$
Otherwise.
Constraint: the sum of elements in isx must equal ip.
3:     $\mathrm{y}\left(\mathit{ldy},{\mathbf{my}}\right)$ – double array
ldy, the first dimension of the array, must satisfy the constraint $\mathit{ldy}\ge {\mathbf{n}}$.
${\mathbf{y}}\left(\mathit{i},\mathit{j}\right)$ must contain the $\mathit{i}$th observation for the $\mathit{j}$th response variable, for $\mathit{i}=1,2,\dots ,{\mathbf{n}}$ and $\mathit{j}=1,2,\dots ,{\mathbf{my}}$.
4:     $\mathrm{iscale}$int64int32nag_int scalar
Indicates how predictor variables are scaled.
${\mathbf{iscale}}=1$
Data are scaled by the standard deviation of variables.
${\mathbf{iscale}}=2$
Data are scaled by user-supplied scalings.
${\mathbf{iscale}}=-1$
No scaling.
Constraint: ${\mathbf{iscale}}=-1$, $1$ or $2$.
5:     $\mathrm{xstd}\left({\mathbf{ip}}\right)$ – double array
If ${\mathbf{iscale}}=2$, ${\mathbf{xstd}}\left(\mathit{j}\right)$ must contain the user-supplied scaling for the $\mathit{j}$th predictor variable in the model, for $\mathit{j}=1,2,\dots ,{\mathbf{ip}}$. Otherwise xstd need not be set.
6:     $\mathrm{ystd}\left({\mathbf{my}}\right)$ – double array
If ${\mathbf{iscale}}=2$, ${\mathbf{ystd}}\left(\mathit{j}\right)$ must contain the user-supplied scaling for the $\mathit{j}$th response variable in the model, for $\mathit{j}=1,2,\dots ,{\mathbf{my}}$. Otherwise ystd need not be set.
7:     $\mathrm{maxfac}$int64int32nag_int scalar
$k$, the number of latent variables to calculate.
Constraint: $1\le {\mathbf{maxfac}}\le {\mathbf{ip}}$.

Optional Input Parameters

1:     $\mathrm{n}$int64int32nag_int scalar
Default: the first dimension of the arrays x, y. (An error is raised if these dimensions are not equal.)
$n$, the number of observations.
Constraint: ${\mathbf{n}}>1$.
2:     $\mathrm{mx}$int64int32nag_int scalar
Default: the dimension of the array isx and the second dimension of the array x. (An error is raised if these dimensions are not equal.)
The number of predictor variables.
Constraint: ${\mathbf{mx}}>1$.
3:     $\mathrm{ip}$int64int32nag_int scalar
Default: the dimension of the array xstd.
$m$, the number of predictor variables in the model.
Constraint: $1<{\mathbf{ip}}\le {\mathbf{mx}}$.
4:     $\mathrm{my}$int64int32nag_int scalar
Default: the dimension of the array ystd and the second dimension of the array y. (An error is raised if these dimensions are not equal.)
$r$, the number of response variables.
Constraint: ${\mathbf{my}}\ge 1$.

Output Parameters

1:     $\mathrm{xbar}\left({\mathbf{ip}}\right)$ – double array
Mean values of predictor variables in the model.
2:     $\mathrm{ybar}\left({\mathbf{my}}\right)$ – double array
The mean value of each response variable.
3:     $\mathrm{xstd}\left({\mathbf{ip}}\right)$ – double array
If ${\mathbf{iscale}}=1$, standard deviations of predictor variables in the model. Otherwise xstd is not changed.
4:     $\mathrm{ystd}\left({\mathbf{my}}\right)$ – double array
If ${\mathbf{iscale}}=1$, the standard deviation of each response variable. Otherwise ystd is not changed.
5:     $\mathrm{xres}\left(\mathit{ldxres},{\mathbf{ip}}\right)$ – double array
The predictor variables' residual matrix ${X}_{k}$.
6:     $\mathrm{yres}\left(\mathit{ldyres},{\mathbf{my}}\right)$ – double array
The residuals for each response variable, ${Y}_{k}$.
7:     $\mathrm{w}\left(\mathit{ldw},{\mathbf{maxfac}}\right)$ – double array
The $\mathit{j}$th column of $W$ contains the $x$-weights ${w}_{\mathit{j}}$, for $\mathit{j}=1,2,\dots ,{\mathbf{maxfac}}$.
8:     $\mathrm{p}\left(\mathit{ldp},{\mathbf{maxfac}}\right)$ – double array
The $\mathit{j}$th column of $P$ contains the $x$-loadings ${p}_{\mathit{j}}$, for $\mathit{j}=1,2,\dots ,{\mathbf{maxfac}}$.
9:     $\mathrm{t}\left(\mathit{ldt},{\mathbf{maxfac}}\right)$ – double array
The $\mathit{j}$th column of $T$ contains the $x$-scores ${t}_{\mathit{j}}$, for $\mathit{j}=1,2,\dots ,{\mathbf{maxfac}}$.
10:   $\mathrm{c}\left(\mathit{ldc},{\mathbf{maxfac}}\right)$ – double array
The $\mathit{j}$th column of $C$ contains the $y$-loadings ${c}_{\mathit{j}}$, for $\mathit{j}=1,2,\dots ,{\mathbf{maxfac}}$.
11:   $\mathrm{u}\left(\mathit{ldu},{\mathbf{maxfac}}\right)$ – double array
The $\mathit{j}$th column of $U$ contains the $y$-scores ${u}_{\mathit{j}}$, for $\mathit{j}=1,2,\dots ,{\mathbf{maxfac}}$.
12:   $\mathrm{xcv}\left({\mathbf{maxfac}}\right)$ – double array
${\mathbf{xcv}}\left(\mathit{j}\right)$ contains the cumulative percentage of variance in the predictor variables explained by the first $\mathit{j}$ factors, for $\mathit{j}=1,2,\dots ,{\mathbf{maxfac}}$.
13:   $\mathrm{ycv}\left(\mathit{ldycv},{\mathbf{my}}\right)$ – double array
${\mathbf{ycv}}\left(\mathit{i},\mathit{j}\right)$ is the cumulative percentage of variance of the $\mathit{j}$th response variable explained by the first $\mathit{i}$ factors, for $\mathit{i}=1,2,\dots ,{\mathbf{maxfac}}$ and $\mathit{j}=1,2,\dots ,{\mathbf{my}}$.
14:   $\mathrm{ifail}$int64int32nag_int scalar
${\mathbf{ifail}}={\mathbf{0}}$ unless the function detects an error (see Error Indicators and Warnings).

Error Indicators and Warnings

Errors or warnings detected by the function:
${\mathbf{ifail}}=1$
Constraint: ${\mathbf{iscale}}=-1$ or $1$.
Constraint: ${\mathbf{mx}}>1$.
Constraint: ${\mathbf{my}}\ge 1$.
Constraint: ${\mathbf{n}}>1$.
On entry, element $_$ of isx is invalid.
${\mathbf{ifail}}=2$
Constraint: $1<{\mathbf{ip}}\le {\mathbf{mx}}$.
Constraint: $1\le {\mathbf{maxfac}}\le {\mathbf{ip}}$.
Constraint: $\mathit{ldc}\ge {\mathbf{my}}$.
Constraint: $\mathit{ldp}\ge {\mathbf{ip}}$.
Constraint: $\mathit{ldt}\ge {\mathbf{n}}$.
Constraint: $\mathit{ldu}\ge {\mathbf{n}}$.
Constraint: $\mathit{ldw}\ge {\mathbf{ip}}$.
Constraint: $\mathit{ldxres}\ge {\mathbf{n}}$.
Constraint: $\mathit{ldx}\ge {\mathbf{n}}$.
Constraint: $\mathit{ldycv}\ge {\mathbf{maxfac}}$.
Constraint: $\mathit{ldyres}\ge {\mathbf{n}}$.
Constraint: $\mathit{ldy}\ge {\mathbf{n}}$.
${\mathbf{ifail}}=3$
On entry, ip is not equal to the sum of isx elements.
${\mathbf{ifail}}=-99$
${\mathbf{ifail}}=-399$
Your licence key may have expired or may not have been installed correctly.
${\mathbf{ifail}}=-999$
Dynamic memory allocation failed.

Accuracy

The computed singular value decomposition is nearly the exact singular value decomposition for a nearby matrix $\left(A+E\right)$, where
 $E2 = Oε A2 ,$
and $\epsilon$ is the machine precision.

nag_correg_pls_svd (g02la) allocates internally $2mr+A+\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(3\left(A+B\right),5A\right)+r$ elements of double storage, where $A=\mathrm{min}\phantom{\rule{0.125em}{0ex}}\left(m,r\right)$ and $B=\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(m,r\right)$.

Example

This example reads in data from an experiment to measure the biological activity in a chemical compound, and a PLS model is estimated.
```function g02la_example

fprintf('g02la example results\n\n');

n = 15;
x = zeros(n,n);
x(:,1:8) = ...
[-2.6931, -2.5271, -1.2871, 3.0777,  0.3891, -0.0701,  1.9607, -1.6324;
-2.6931, -2.5271, -1.2871, 3.0777,  0.3891, -0.0701,  1.9607, -1.6324;
-2.6931, -2.5271, -1.2871, 3.0777,  0.3891, -0.0701,  0.0744, -1.7333;
-2.6931, -2.5271, -1.2871, 3.0777,  0.3891, -0.0701,  0.0744, -1.7333;
-2.6931, -2.5271, -1.2871, 2.8369,  1.4092, -3.1398,  0.0744, -1.7333;
-2.6931, -2.5271, -1.2871, 3.0777,  0.3891, -0.0701, -4.7548,  3.6521;
-2.6931, -2.5271, -1.2871, 3.0777,  0.3891, -0.0701,  0.0744, -1.7333;
-2.6931, -2.5271, -1.2871, 3.0777,  0.3891, -0.0701,  2.4064,  1.7438;
-2.6931, -2.5271, -1.2871, 0.0744, -1.7333,  0.0902,  0.0744, -1.7333;
2.2261, -5.3648,  0.3049, 3.0777,  0.3891, -0.0701,  0.0744, -1.7333;
-4.1921, -1.0285, -0.9801, 3.0777,  0.3891, -0.0701,  0.0744, -1.7333;
-4.9217,  1.2977,  0.4473, 3.0777,  0.3891, -0.0701,  0.0744, -1.7333;
-2.6931, -2.5271, -1.2871, 3.0777,  0.3891, -0.0701,  2.2261, -5.3648;
-2.6931, -2.5271, -1.2871, 3.0777,  0.3891, -0.0701, -4.9217,  1.2977;
-2.6931, -2.5271, -1.2871, 3.0777,  0.3891, -0.0701, -4.1921, -1.0285];
x(:,9:n) = ...
[ 0.5746,  1.9607, -1.6324, 0.5740,  2.8369,  1.4092, -3.1398;
0.5746,  0.0744, -1.7333, 0.0902,  2.8369,  1.4092, -3.1398;
0.0902,  1.9607, -1.6324, 0.5746,  2.8369,  1.4092, -3.1398;
0.0902,  0.0744, -1.7333, 0.0902,  2.8369,  1.4092, -3.1398;
0.0902,  0.0744, -1.7333, 0.0902,  2.8369,  1.4092, -3.1398;
0.8524,  0.0744, -1.7333, 0.0902,  2.8369,  1.4092, -3.1398;
0.0902,  0.0744, -1.7333, 0.0902, -1.2201,  0.8829,  2.2253;
1.1057,  0.0744, -1.7333, 0.0902,  2.8369,  1.4092, -3.1398;
0.0902,  0.0744, -1.7333, 0.0902,  2.8369,  1.4092, -3.1398;
0.0902,  0.0744, -1.7333, 0.0902,  2.8369,  1.4092, -3.1398;
0.0902,  0.0744, -1.7333, 0.0902,  2.8369,  1.4092, -3.1398;
0.0902,  0.0744, -1.7333, 0.0902,  2.8369,  1.4092, -3.1398;
0.3049,  2.2261, -5.3648, 0.3049,  2.8369,  1.4092, -3.1398;
0.4473,  0.0744, -1.7333, 0.0902,  2.8369,  1.4092, -3.1398;
-0.9801,  0.0744, -1.7333, 0.0902,  2.8369,  1.4092, -3.1398];
y    = [ 0;       0.28;    0.2;    0.51;    0.11;    2.73;    0.18;    1.53;
-0.1;    -0.52;    0.4;    0.3;    -1;       1.57;    0.59];

isx = ones(n, 1, 'int64');
iscale = int64(1);
xstd = zeros(n, 1);
ystd = [0];
maxfac = int64(4);

% Fit a PLS model
[xbar, ybar, xstd, ystd, xres, yres, w, p, t, c, u, xcv, ycv, ifail] = ...
g02la( ...
x, isx, y, iscale, xstd, ystd, maxfac);

% Display results
disp(p);
disp('x-scores,   T');
disp(t);
disp(c);
disp('y-scores,   U');
disp(u);

fprintf('Explained variance\n');
fprintf(' Model effects   Dependent variable(s)\n');
fprintf('%12.6f    %12.6f\n',[xcv(1:maxfac) ycv(1:maxfac,1)]');

```
```g02la example results

-0.6708   -1.0047    0.6505    0.6169
0.4943    0.1355   -0.9010   -0.2388
-0.4167   -1.9983   -0.5538    0.8474
0.3930    1.2441   -0.6967   -0.4336
0.3267    0.5838   -1.4088   -0.6323
0.0145    0.9607    1.6594    0.5361
-2.4471    0.3532   -1.1321   -1.3554
3.5198    0.6005    0.2191    0.0380
1.0973    2.0635   -0.4074   -0.3522
-2.4466    2.5640   -0.4806    0.3819
2.2732   -1.3110   -0.7686   -1.8959
-1.7987    2.4088   -0.9475   -0.4727
0.3629    0.2241   -2.6332    2.3739
0.3629    0.2241   -2.6332    2.3739
-0.3629   -0.2241    2.6332   -2.3739

x-scores,   T
-0.1896    0.3898   -0.2502   -0.2479
0.0201   -0.0013   -0.1726   -0.2042
-0.1889    0.3141   -0.1727   -0.1350
0.0210   -0.0773   -0.0950   -0.0912
-0.0090   -0.2649   -0.4195   -0.1327
0.5479    0.2843    0.1914    0.2727
-0.0937   -0.0579    0.6799   -0.6129
0.2500    0.2033   -0.1046   -0.1014
-0.1005   -0.2992    0.2131    0.1223
-0.1810   -0.4427    0.0559    0.2114
0.0497   -0.0762   -0.1526   -0.0771
0.0173   -0.2517   -0.2104    0.1044
-0.6002    0.3596    0.1876    0.4812
0.3796    0.1338    0.1410    0.1999
0.0773   -0.2139    0.1085    0.2106

3.5425    1.0475    0.2548    0.1866

y-scores,   U
-1.7670    0.1812   -0.0600   -0.0320
-0.6724   -0.2735   -0.0662   -0.0402
-0.9852    0.4097    0.0158    0.0198
0.2267   -0.0107    0.0180    0.0177
-1.3370   -0.3619   -0.0173    0.0073
8.9056    0.6000    0.0701    0.0422
-1.0634    0.0332    0.0235   -0.0151
4.2143    0.3184    0.0232    0.0219
-2.1580   -0.2652    0.0153    0.0011
-3.7999   -0.4520    0.0082    0.0034
-0.2033   -0.2446   -0.0392   -0.0214
-0.5942   -0.2398    0.0089    0.0165
-5.6764    0.5487    0.0375    0.0185
4.3707   -0.1161   -0.0639   -0.0535
0.5395   -0.1274    0.0261    0.0139

Explained variance
Model effects   Dependent variable(s)
16.902124       89.638060
29.674338       97.476270
44.332404       97.939839
56.172041       98.188474
```