Integer type:  int32  int64  nag_int  show int32  show int32  show int64  show int64  show nag_int  show nag_int

Chapter Contents
Chapter Introduction
NAG Toolbox

# NAG Toolbox: nag_correg_pls_fit (g02lc)

## Purpose

nag_correg_pls_fit (g02lc) calculates parameter estimates for a given number of factors given the output from an orthogonal scores PLS regression (nag_correg_pls_svd (g02la) or nag_correg_pls_wold (g02lb)).

## Syntax

[b, ob, vip, ifail] = g02lc(nfact, p, c, w, rcond, orig, xbar, ybar, iscale, xstd, ystd, vipopt, ycv, 'ip', ip, 'my', my, 'maxfac', maxfac)
[b, ob, vip, ifail] = nag_correg_pls_fit(nfact, p, c, w, rcond, orig, xbar, ybar, iscale, xstd, ystd, vipopt, ycv, 'ip', ip, 'my', my, 'maxfac', maxfac)

## Description

The parameter estimates $B$ for a $l$-factor orthogonal scores PLS model with $m$ predictor variables and $r$ response variables are given by,
 $B=W PTW-1 CT , B∈ ℝm×r ,$
where $W$ is the $m$ by $k$ ($\ge l$) matrix of $x$-weights; $P$ is the $m$ by $k$ matrix of $x$-loadings; and $C$ is the $r$ by $k$ matrix of $y$-loadings for a fitted PLS model.
The parameter estimates $B$ are for centred, and possibly scaled, predictor data ${X}_{1}$ and response data ${Y}_{1}$. Parameter estimates may also be given for the predictor data $X$ and response data $Y$.
Optionally, nag_correg_pls_fit (g02lc) will calculate variable influence on projection (VIP) statistics, see Wold (1994).

## References

Wold S (1994) PLS for multivariate linear modelling QSAR: chemometric methods in molecular design Methods and Principles in Medicinal Chemistry (ed van de Waterbeemd H) Verlag-Chemie

## Parameters

### Compulsory Input Parameters

1:     $\mathrm{nfact}$int64int32nag_int scalar
$l$, the number of factors to include in the calculation of parameter estimates.
Constraint: $1\le {\mathbf{nfact}}\le {\mathbf{maxfac}}$.
2:     $\mathrm{p}\left(\mathit{ldp},{\mathbf{maxfac}}\right)$ – double array
ldp, the first dimension of the array, must satisfy the constraint $\mathit{ldp}\ge {\mathbf{ip}}$.
$x$-loadings as returned from nag_correg_pls_svd (g02la) and nag_correg_pls_wold (g02lb).
3:     $\mathrm{c}\left(\mathit{ldc},{\mathbf{maxfac}}\right)$ – double array
ldc, the first dimension of the array, must satisfy the constraint $\mathit{ldc}\ge {\mathbf{my}}$.
$y$-loadings as returned from nag_correg_pls_svd (g02la) and nag_correg_pls_wold (g02lb).
4:     $\mathrm{w}\left(\mathit{ldw},{\mathbf{maxfac}}\right)$ – double array
ldw, the first dimension of the array, must satisfy the constraint $\mathit{ldw}\ge {\mathbf{ip}}$.
$x$-weights as returned from nag_correg_pls_svd (g02la) and nag_correg_pls_wold (g02lb).
5:     $\mathrm{rcond}$ – double scalar
Singular values of ${P}^{\mathrm{T}}W$ less than rcond times the maximum singular value are treated as zero when calculating parameter estimates. If rcond is negative, a value of $0.005$ is used.
6:     $\mathrm{orig}$int64int32nag_int scalar
Indicates how parameter estimates are calculated.
${\mathbf{orig}}=-1$
Parameter estimates for the centered, and possibly, scaled data.
${\mathbf{orig}}=1$
Parameter estimates for the original data.
Constraint: ${\mathbf{orig}}=-1$ or $1$.
7:     $\mathrm{xbar}\left({\mathbf{ip}}\right)$ – double array
If ${\mathbf{orig}}=1$, mean values of predictor variables in the model; otherwise xbar is not referenced.
8:     $\mathrm{ybar}\left({\mathbf{my}}\right)$ – double array
If ${\mathbf{orig}}=1$, mean value of each response variable in the model; otherwise ybar is not referenced.
9:     $\mathrm{iscale}$int64int32nag_int scalar
If ${\mathbf{orig}}=1$, iscale must take the value supplied to either nag_correg_pls_svd (g02la) or nag_correg_pls_wold (g02lb); otherwise iscale is not referenced.
Constraint: if ${\mathbf{orig}}=1$, ${\mathbf{iscale}}=-1$, $1$ or $2$.
10:   $\mathrm{xstd}\left({\mathbf{ip}}\right)$ – double array
If ${\mathbf{orig}}=1$ and ${\mathbf{iscale}}\ne -1$, the scalings of predictor variables in the model as returned from either nag_correg_pls_svd (g02la) or nag_correg_pls_wold (g02lb); otherwise xstd is not referenced.
11:   $\mathrm{ystd}\left({\mathbf{my}}\right)$ – double array
If ${\mathbf{orig}}=1$ and ${\mathbf{iscale}}\ne -1$, the scalings of response variables as returned from either nag_correg_pls_svd (g02la) or nag_correg_pls_wold (g02lb); otherwise ystd is not referenced.
12:   $\mathrm{vipopt}$int64int32nag_int scalar
A flag that determines variable influence on projections (VIP) options.
${\mathbf{vipopt}}=0$
VIP are not calculated.
${\mathbf{vipopt}}=1$
VIP are calculated for predictor variables using the mean explained variance in responses.
${\mathbf{vipopt}}={\mathbf{my}}$
VIP are calculated for predictor variables for each response variable in the model.
Note that setting ${\mathbf{vipopt}}={\mathbf{my}}$ when ${\mathbf{my}}=1$ gives the same result as setting ${\mathbf{vipopt}}=1$ directly.
Constraint: ${\mathbf{vipopt}}=0$, $1$ or ${\mathbf{my}}$.
13:   $\mathrm{ycv}\left(\mathit{ldycv},{\mathbf{my}}\right)$ – double array
ldycv, the first dimension of the array, must satisfy the constraint if ${\mathbf{vipopt}}\ne 0$, $\mathit{ldycv}\ge {\mathbf{nfact}}$.
If ${\mathbf{vipopt}}\ne 0$, ${\mathbf{ycv}}\left(\mathit{i},\mathit{j}\right)$ is the cumulative percentage of variance of the $\mathit{j}$th response variable explained by the first $\mathit{i}$ factors, for $\mathit{i}=1,2,\dots ,{\mathbf{nfact}}$ and $\mathit{j}=1,2,\dots ,{\mathbf{my}}$; otherwise ycv is not referenced.

### Optional Input Parameters

1:     $\mathrm{ip}$int64int32nag_int scalar
Default: the dimension of the arrays xbar, xstd and the first dimension of the arrays p, w. (An error is raised if these dimensions are not equal.)
$m$, the number of predictor variables in the fitted model.
Constraint: ${\mathbf{ip}}>1$.
2:     $\mathrm{my}$int64int32nag_int scalar
Default: the dimension of the arrays ybar, ystd and the first dimension of the array c and the second dimension of the array ycv. (An error is raised if these dimensions are not equal.)
$r$, the number of response variables.
Constraint: ${\mathbf{my}}\ge 1$.
3:     $\mathrm{maxfac}$int64int32nag_int scalar
Default: the second dimension of the arrays p, c, w. (An error is raised if these dimensions are not equal.)
$k$, the number of factors available in the PLS model.
Constraint: $1\le {\mathbf{maxfac}}\le {\mathbf{ip}}$.

### Output Parameters

1:     $\mathrm{b}\left(\mathit{ldb},{\mathbf{my}}\right)$ – double array
${\mathbf{b}}\left(\mathit{i},\mathit{j}\right)$ contains the parameter estimate for the $\mathit{i}$th predictor variable in the model for the $\mathit{j}$th response variable, for $\mathit{i}=1,2,\dots ,{\mathbf{ip}}$ and $\mathit{j}=1,2,\dots ,{\mathbf{my}}$.
2:     $\mathrm{ob}\left(\mathit{ldob},{\mathbf{my}}\right)$ – double array
If ${\mathbf{orig}}=1$, ${\mathbf{ob}}\left(1,\mathit{j}\right)$ contains the intercept value for the $\mathit{j}$th response variable, and ${\mathbf{ob}}\left(\mathit{i}+1,\mathit{j}\right)$ contains the parameter estimate on the original scale for the $\mathit{i}$th predictor variable in the model, for $\mathit{i}=1,2,\dots ,{\mathbf{ip}}$ and $\mathit{j}=1,2,\dots ,{\mathbf{my}}$. Otherwise ob is not referenced.
3:     $\mathrm{vip}\left(\mathit{ldvip},{\mathbf{vipopt}}\right)$ – double array
If ${\mathbf{vipopt}}=1$, ${\mathbf{vip}}\left(\mathit{i},1\right)$ contains the VIP statistic for the $\mathit{i}$th predictor variable in the model for all response variables, for $\mathit{i}=1,2,\dots ,{\mathbf{ip}}$.
If ${\mathbf{vipopt}}={\mathbf{my}}$, ${\mathbf{vip}}\left(\mathit{i},\mathit{j}\right)$ contains the VIP statistic for the $\mathit{i}$th predictor variable in the model for the $\mathit{j}$th response variable, for $\mathit{i}=1,2,\dots ,{\mathbf{ip}}$ and $\mathit{j}=1,2,\dots ,{\mathbf{my}}$.
Otherwise vip is not referenced.
4:     $\mathrm{ifail}$int64int32nag_int scalar
${\mathbf{ifail}}={\mathbf{0}}$ unless the function detects an error (see Error Indicators and Warnings).

## Error Indicators and Warnings

Errors or warnings detected by the function:
${\mathbf{ifail}}=1$
Constraint: if ${\mathbf{orig}}=1$, ${\mathbf{iscale}}=-1$ or $1$.
Constraint: ${\mathbf{ip}}>1$.
Constraint: ${\mathbf{my}}\ge 1$.
Constraint: ${\mathbf{orig}}=-1$ or $1$.
Constraint: ${\mathbf{vipopt}}=0$, $1$ or ${\mathbf{my}}$.
${\mathbf{ifail}}=2$
Constraint: $1\le {\mathbf{maxfac}}\le {\mathbf{ip}}$.
Constraint: $1\le {\mathbf{nfact}}\le {\mathbf{maxfac}}$.
Constraint: if ${\mathbf{orig}}=1$, $\mathit{ldob}\ge {\mathbf{ip}}+1$.
Constraint: if ${\mathbf{vipopt}}\ne 0$, $\mathit{ldvip}\ge {\mathbf{ip}}$.
Constraint: if ${\mathbf{vipopt}}\ne 0$, $\mathit{ldycv}\ge {\mathbf{nfact}}$.
Constraint: $\mathit{ldb}\ge {\mathbf{ip}}$.
Constraint: $\mathit{ldc}\ge {\mathbf{my}}$.
Constraint: $\mathit{ldp}\ge {\mathbf{ip}}$.
Constraint: $\mathit{ldw}\ge {\mathbf{ip}}$.
${\mathbf{ifail}}=-99$
${\mathbf{ifail}}=-399$
Your licence key may have expired or may not have been installed correctly.
${\mathbf{ifail}}=-999$
Dynamic memory allocation failed.

## Accuracy

The calculations are based on the singular value decomposition of ${P}^{\mathrm{T}}W$.

nag_correg_pls_fit (g02lc) allocates internally $l\left(l+r+4\right)+\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(2l,r\right)$ elements of double storage.

## Example

This example reads in details of a PLS model, and a set of parameter estimates are calculated along with their VIP statistics.
```function g02lc_example

fprintf('g02lc example results\n\n');

nfact = int64(2);
p = [-0.6708, -1.0047,  0.6505,  0.6169;
0.4943,  0.1355, -0.9010, -0.2388;
-0.4167, -1.9983, -0.5538,  0.8474;
0.3930,  1.2441, -0.6967, -0.4336;
0.3267,  0.5838, -1.4088, -0.6323;
0.0145,  0.9607,  1.6594,  0.5361;
-2.4471,  0.3532, -1.1321, -1.3554;
3.5198,  0.6005,  0.2191,  0.0380;
1.0973,  2.0635, -0.4074, -0.3522;
-2.4466,  2.5640, -0.4806,  0.3819;
2.2732, -1.3110, -0.7686, -1.8959;
-1.7987,  2.4088, -0.9475, -0.4727;
0.3629,  0.2241, -2.6332,  2.3739;
0.3629,  0.2241, -2.6332,  2.3739;
-0.3629, -0.2241,  2.6332, -2.3739];
c = [ 3.5425,  1.0475,  0.2548,  0.1866];
w = [-1.5764e-1  -1.5935e-1   1.7774e-1   5.4029e-2;
8.5680e-2  -1.5240e-4  -1.2179e-1   1.0989e-1;
-1.6931e-1  -3.7431e-1   9.4348e-2   3.1878e-1;
1.2153e-1   2.0589e-1  -1.8144e-1  -4.4610e-2;
7.1133e-2   5.5884e-2  -2.6916e-1   5.4912e-2;
6.5188e-2   2.4170e-1   2.3365e-1  -1.8849e-1;
-4.2481e-1  -1.8798e-3  -3.2413e-1  -1.1600e-1;
6.5370e-1   1.6725e-1   2.1908e-1   2.5461e-1;
2.8504e-1   3.6549e-1  -1.9244e-1  -1.5430e-1;
-2.9341e-1   5.0464e-1  -1.0952e-2   1.3881e-1;
2.9829e-1  -3.6979e-1  -4.9942e-1  -4.9355e-1;
-2.0313e-1   4.1952e-1  -2.5684e-1  -7.5647e-2;
5.6905e-2  -2.3197e-2  -3.0503e-1   3.9673e-1;
5.6905e-2  -2.3197e-2  -3.0503e-1   3.9673e-1;
-5.6905e-2   2.3197e-2   3.0503e-1  -3.9673e-1];
vipopt = int64(1);
ycv  = [89.638060; 97.476270; 97.939839; 98.188474];

% Means and scalings
orig = int64(1);
xbar = [-2.6137; -2.3614; -1.0449;  2.8614;  0.3156;
-0.2641; -0.3146; -1.1221;  0.2401;  0.4694;
-1.9619;  0.1691;  2.5664;  1.3741; -2.7821];
ybar = [0.452];
iscale = int64(1);
xstd = [1.4956;   1.3233;  0.5829;  0.7735;  0.6247;
0.7966;   2.4113;  2.0421;  0.4678;  0.8197;
0.9420;   0.1735;  1.0475;  0.1359;  1.3853];
ystd = [0.9062];

% Calculate predictions
rcond = -1;
[b, ob, vip, ifail] = ...
g02lc( ...
nfact, p, c, w, rcond, orig, xbar, ybar, ...
iscale, xstd, ystd, vipopt, ycv);

% Display results
disp('Parameter estimates');
disp(b);
disp('Intercept values');
disp(ob);
disp('VIP statistics');
disp(vip);

```
```g02lc example results

Parameter estimates
-0.1383
0.0572
-0.1906
0.1238
0.0591
0.0936
-0.2842
0.4713
0.2661
-0.0914
0.1226
-0.0488
0.0332
0.0332
-0.0332

Intercept values
-0.4374
-0.0838
0.0392
-0.2964
0.1451
0.0857
0.1065
-0.1068
0.2091
0.5155
-0.1011
0.1180
-0.2548
0.0287
0.2214
-0.0217

VIP statistics
0.6111
0.3182
0.7513
0.5048
0.2712
0.3593
1.5777
2.4348
1.1322
1.2226
1.1799
0.8840
0.2129
0.2129
0.2129

```