# NAG Library Routine Document

## 1Purpose

g02laf fits an orthogonal scores partial least squares (PLS) regression by using singular value decomposition.

## 2Specification

Fortran Interface
 Subroutine g02laf ( n, mx, x, ldx, isx, ip, my, y, ldy, xbar, ybar, xstd, ystd, xres, yres, w, ldw, p, ldp, t, ldt, c, ldc, u, ldu, xcv, ycv,
 Integer, Intent (In) :: n, mx, ldx, isx(mx), ip, my, ldy, iscale, maxfac, ldxres, ldyres, ldw, ldp, ldt, ldc, ldu, ldycv Integer, Intent (Inout) :: ifail Real (Kind=nag_wp), Intent (In) :: x(ldx,mx), y(ldy,my) Real (Kind=nag_wp), Intent (Inout) :: xstd(ip), ystd(my), xres(ldxres,ip), yres(ldyres,my), w(ldw,maxfac), p(ldp,maxfac), t(ldt,maxfac), c(ldc,maxfac), u(ldu,maxfac), ycv(ldycv,my) Real (Kind=nag_wp), Intent (Out) :: xbar(ip), ybar(my), xcv(maxfac)
#include <nagmk26.h>
 void g02laf_ (const Integer *n, const Integer *mx, const double x[], const Integer *ldx, const Integer isx[], const Integer *ip, const Integer *my, const double y[], const Integer *ldy, double xbar[], double ybar[], const Integer *iscale, double xstd[], double ystd[], const Integer *maxfac, double xres[], const Integer *ldxres, double yres[], const Integer *ldyres, double w[], const Integer *ldw, double p[], const Integer *ldp, double t[], const Integer *ldt, double c[], const Integer *ldc, double u[], const Integer *ldu, double xcv[], double ycv[], const Integer *ldycv, Integer *ifail)

## 3Description

Let ${X}_{1}$ be the mean-centred $n$ by $m$ data matrix $X$ of $n$ observations on $m$ predictor variables. Let ${Y}_{1}$ be the mean-centred $n$ by $r$ data matrix $Y$ of $n$ observations on $r$ response variables.
The first of the $k$ factors PLS methods extract from the data predicts both ${X}_{1}$ and ${Y}_{1}$ by regressing on ${t}_{1}$ a column vector of $n$ scores:
 $X^1 = t1 p1T Y^1 = t1 c1T , with ​ t1T t1 = 1 ,$
where the column vectors of $m$ $x$-loadings ${p}_{1}$ and $r$ $y$-loadings ${c}_{1}$ are calculated in the least squares sense:
 $p1T = t1T X1 c1T = t1T Y1 .$
The $x$-score vector ${t}_{1}={X}_{1}{w}_{1}$ is the linear combination of predictor data ${X}_{1}$ that has maximum covariance with the $y$-scores ${u}_{1}={Y}_{1}{c}_{1}$, where the $x$-weights vector ${w}_{1}$ is the normalised first left singular vector of ${X}_{1}^{\mathrm{T}}{Y}_{1}$.
The method extracts subsequent PLS factors by repeating the above process with the residual matrices:
 $Xi = Xi-1 - X^ i-1 Yi = Yi-1 - Y^ i-1 , i=2,3,…,k ,$
and with orthogonal scores:
 $tiT tj = 0 , j=1,2,…,i-1 .$
Optionally, in addition to being mean-centred, the data matrices ${X}_{1}$ and ${Y}_{1}$ may be scaled by standard deviations of the variables. If data are supplied mean-centred, the calculations are not affected within numerical accuracy.

None.

## 5Arguments

1:     $\mathbf{n}$ – IntegerInput
On entry: $n$, the number of observations.
Constraint: ${\mathbf{n}}>1$.
2:     $\mathbf{mx}$ – IntegerInput
On entry: the number of predictor variables.
Constraint: ${\mathbf{mx}}>1$.
3:     $\mathbf{x}\left({\mathbf{ldx}},{\mathbf{mx}}\right)$ – Real (Kind=nag_wp) arrayInput
On entry: ${\mathbf{x}}\left(\mathit{i},\mathit{j}\right)$ must contain the $\mathit{i}$th observation on the $\mathit{j}$th predictor variable, for $\mathit{i}=1,2,\dots ,{\mathbf{n}}$ and $\mathit{j}=1,2,\dots ,{\mathbf{mx}}$.
4:     $\mathbf{ldx}$ – IntegerInput
On entry: the first dimension of the array x as declared in the (sub)program from which g02laf is called.
Constraint: ${\mathbf{ldx}}\ge {\mathbf{n}}$.
5:     $\mathbf{isx}\left({\mathbf{mx}}\right)$ – Integer arrayInput
On entry: indicates which predictor variables are to be included in the model.
${\mathbf{isx}}\left(j\right)=1$
The $j$th predictor variable (with variates in the $j$th column of $X$) is included in the model.
${\mathbf{isx}}\left(j\right)=0$
Otherwise.
Constraint: the sum of elements in isx must equal ip.
6:     $\mathbf{ip}$ – IntegerInput
On entry: $m$, the number of predictor variables in the model.
Constraint: $1<{\mathbf{ip}}\le {\mathbf{mx}}$.
7:     $\mathbf{my}$ – IntegerInput
On entry: $r$, the number of response variables.
Constraint: ${\mathbf{my}}\ge 1$.
8:     $\mathbf{y}\left({\mathbf{ldy}},{\mathbf{my}}\right)$ – Real (Kind=nag_wp) arrayInput
On entry: ${\mathbf{y}}\left(\mathit{i},\mathit{j}\right)$ must contain the $\mathit{i}$th observation for the $\mathit{j}$th response variable, for $\mathit{i}=1,2,\dots ,{\mathbf{n}}$ and $\mathit{j}=1,2,\dots ,{\mathbf{my}}$.
9:     $\mathbf{ldy}$ – IntegerInput
On entry: the first dimension of the array y as declared in the (sub)program from which g02laf is called.
Constraint: ${\mathbf{ldy}}\ge {\mathbf{n}}$.
10:   $\mathbf{xbar}\left({\mathbf{ip}}\right)$ – Real (Kind=nag_wp) arrayOutput
On exit: mean values of predictor variables in the model.
11:   $\mathbf{ybar}\left({\mathbf{my}}\right)$ – Real (Kind=nag_wp) arrayOutput
On exit: the mean value of each response variable.
12:   $\mathbf{iscale}$ – IntegerInput
On entry: indicates how predictor variables are scaled.
${\mathbf{iscale}}=1$
Data are scaled by the standard deviation of variables.
${\mathbf{iscale}}=2$
Data are scaled by user-supplied scalings.
${\mathbf{iscale}}=-1$
No scaling.
Constraint: ${\mathbf{iscale}}=-1$, $1$ or $2$.
13:   $\mathbf{xstd}\left({\mathbf{ip}}\right)$ – Real (Kind=nag_wp) arrayInput/Output
On entry: if ${\mathbf{iscale}}=2$, ${\mathbf{xstd}}\left(\mathit{j}\right)$ must contain the user-supplied scaling for the $\mathit{j}$th predictor variable in the model, for $\mathit{j}=1,2,\dots ,{\mathbf{ip}}$. Otherwise xstd need not be set.
On exit: if ${\mathbf{iscale}}=1$, standard deviations of predictor variables in the model. Otherwise xstd is not changed.
14:   $\mathbf{ystd}\left({\mathbf{my}}\right)$ – Real (Kind=nag_wp) arrayInput/Output
On entry: if ${\mathbf{iscale}}=2$, ${\mathbf{ystd}}\left(\mathit{j}\right)$ must contain the user-supplied scaling for the $\mathit{j}$th response variable in the model, for $\mathit{j}=1,2,\dots ,{\mathbf{my}}$. Otherwise ystd need not be set.
On exit: if ${\mathbf{iscale}}=1$, the standard deviation of each response variable. Otherwise ystd is not changed.
15:   $\mathbf{maxfac}$ – IntegerInput
On entry: $k$, the number of latent variables to calculate.
Constraint: $1\le {\mathbf{maxfac}}\le {\mathbf{ip}}$.
16:   $\mathbf{xres}\left({\mathbf{ldxres}},{\mathbf{ip}}\right)$ – Real (Kind=nag_wp) arrayOutput
On exit: the predictor variables' residual matrix ${X}_{k}$.
17:   $\mathbf{ldxres}$ – IntegerInput
On entry: the first dimension of the array xres as declared in the (sub)program from which g02laf is called.
Constraint: ${\mathbf{ldxres}}\ge {\mathbf{n}}$.
18:   $\mathbf{yres}\left({\mathbf{ldyres}},{\mathbf{my}}\right)$ – Real (Kind=nag_wp) arrayOutput
On exit: the residuals for each response variable, ${Y}_{k}$.
19:   $\mathbf{ldyres}$ – IntegerInput
On entry: the first dimension of the array yres as declared in the (sub)program from which g02laf is called.
Constraint: ${\mathbf{ldyres}}\ge {\mathbf{n}}$.
20:   $\mathbf{w}\left({\mathbf{ldw}},{\mathbf{maxfac}}\right)$ – Real (Kind=nag_wp) arrayOutput
On exit: the $\mathit{j}$th column of $W$ contains the $x$-weights ${w}_{\mathit{j}}$, for $\mathit{j}=1,2,\dots ,{\mathbf{maxfac}}$.
21:   $\mathbf{ldw}$ – IntegerInput
On entry: the first dimension of the array w as declared in the (sub)program from which g02laf is called.
Constraint: ${\mathbf{ldw}}\ge {\mathbf{ip}}$.
22:   $\mathbf{p}\left({\mathbf{ldp}},{\mathbf{maxfac}}\right)$ – Real (Kind=nag_wp) arrayOutput
On exit: the $\mathit{j}$th column of $P$ contains the $x$-loadings ${p}_{\mathit{j}}$, for $\mathit{j}=1,2,\dots ,{\mathbf{maxfac}}$.
23:   $\mathbf{ldp}$ – IntegerInput
On entry: the first dimension of the array p as declared in the (sub)program from which g02laf is called.
Constraint: ${\mathbf{ldp}}\ge {\mathbf{ip}}$.
24:   $\mathbf{t}\left({\mathbf{ldt}},{\mathbf{maxfac}}\right)$ – Real (Kind=nag_wp) arrayOutput
On exit: the $\mathit{j}$th column of $T$ contains the $x$-scores ${t}_{\mathit{j}}$, for $\mathit{j}=1,2,\dots ,{\mathbf{maxfac}}$.
25:   $\mathbf{ldt}$ – IntegerInput
On entry: the first dimension of the array t as declared in the (sub)program from which g02laf is called.
Constraint: ${\mathbf{ldt}}\ge {\mathbf{n}}$.
26:   $\mathbf{c}\left({\mathbf{ldc}},{\mathbf{maxfac}}\right)$ – Real (Kind=nag_wp) arrayOutput
On exit: the $\mathit{j}$th column of $C$ contains the $y$-loadings ${c}_{\mathit{j}}$, for $\mathit{j}=1,2,\dots ,{\mathbf{maxfac}}$.
27:   $\mathbf{ldc}$ – IntegerInput
On entry: the first dimension of the array c as declared in the (sub)program from which g02laf is called.
Constraint: ${\mathbf{ldc}}\ge {\mathbf{my}}$.
28:   $\mathbf{u}\left({\mathbf{ldu}},{\mathbf{maxfac}}\right)$ – Real (Kind=nag_wp) arrayOutput
On exit: the $\mathit{j}$th column of $U$ contains the $y$-scores ${u}_{\mathit{j}}$, for $\mathit{j}=1,2,\dots ,{\mathbf{maxfac}}$.
29:   $\mathbf{ldu}$ – IntegerInput
On entry: the first dimension of the array u as declared in the (sub)program from which g02laf is called.
Constraint: ${\mathbf{ldu}}\ge {\mathbf{n}}$.
30:   $\mathbf{xcv}\left({\mathbf{maxfac}}\right)$ – Real (Kind=nag_wp) arrayOutput
On exit: ${\mathbf{xcv}}\left(\mathit{j}\right)$ contains the cumulative percentage of variance in the predictor variables explained by the first $\mathit{j}$ factors, for $\mathit{j}=1,2,\dots ,{\mathbf{maxfac}}$.
31:   $\mathbf{ycv}\left({\mathbf{ldycv}},{\mathbf{my}}\right)$ – Real (Kind=nag_wp) arrayOutput
On exit: ${\mathbf{ycv}}\left(\mathit{i},\mathit{j}\right)$ is the cumulative percentage of variance of the $\mathit{j}$th response variable explained by the first $\mathit{i}$ factors, for $\mathit{i}=1,2,\dots ,{\mathbf{maxfac}}$ and $\mathit{j}=1,2,\dots ,{\mathbf{my}}$.
32:   $\mathbf{ldycv}$ – IntegerInput
On entry: the first dimension of the array ycv as declared in the (sub)program from which g02laf is called.
Constraint: ${\mathbf{ldycv}}\ge {\mathbf{maxfac}}$.
33:   $\mathbf{ifail}$ – IntegerInput/Output
On entry: ifail must be set to $0$, . If you are unfamiliar with this argument you should refer to Section 3.4 in How to Use the NAG Library and its Documentation for details.
For environments where it might be inappropriate to halt program execution when an error is detected, the value  is recommended. If the output of error messages is undesirable, then the value $1$ is recommended. Otherwise, if you are not familiar with this argument, the recommended value is $0$. When the value  is used it is essential to test the value of ifail on exit.
On exit: ${\mathbf{ifail}}={\mathbf{0}}$ unless the routine detects an error or a warning has been flagged (see Section 6).

## 6Error Indicators and Warnings

If on entry ${\mathbf{ifail}}=0$ or $-1$, explanatory error messages are output on the current error message unit (as defined by x04aaf).
Errors or warnings detected by the routine:
${\mathbf{ifail}}=1$
On entry, ${\mathbf{iscale}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{iscale}}=-1$ or $1$.
On entry, ${\mathbf{isx}}\left(〈\mathit{\text{value}}〉\right)$ is invalid.
Constraint: ${\mathbf{isx}}\left(j\right)=0$ or $1$, for all $j$.
On entry, ${\mathbf{mx}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{mx}}>1$.
On entry, ${\mathbf{my}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{my}}\ge 1$.
On entry, ${\mathbf{n}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{n}}>1$.
${\mathbf{ifail}}=2$
On entry, ${\mathbf{ip}}=〈\mathit{\text{value}}〉$ and ${\mathbf{mx}}=〈\mathit{\text{value}}〉$.
Constraint: $1<{\mathbf{ip}}\le {\mathbf{mx}}$.
On entry, ${\mathbf{ldc}}=〈\mathit{\text{value}}〉$ and ${\mathbf{my}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{ldc}}\ge {\mathbf{my}}$.
On entry, ${\mathbf{ldp}}=〈\mathit{\text{value}}〉$ and ${\mathbf{ip}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{ldp}}\ge {\mathbf{ip}}$.
On entry, ${\mathbf{ldt}}=〈\mathit{\text{value}}〉$ and ${\mathbf{n}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{ldt}}\ge {\mathbf{n}}$.
On entry, ${\mathbf{ldu}}=〈\mathit{\text{value}}〉$ and ${\mathbf{n}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{ldu}}\ge {\mathbf{n}}$.
On entry, ${\mathbf{ldw}}=〈\mathit{\text{value}}〉$ and ${\mathbf{ip}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{ldw}}\ge {\mathbf{ip}}$.
On entry, ${\mathbf{ldx}}=〈\mathit{\text{value}}〉$ and ${\mathbf{n}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{ldx}}\ge {\mathbf{n}}$.
On entry, ${\mathbf{ldxres}}=〈\mathit{\text{value}}〉$ and ${\mathbf{n}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{ldxres}}\ge {\mathbf{n}}$.
On entry, ${\mathbf{ldy}}=〈\mathit{\text{value}}〉$ and ${\mathbf{n}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{ldy}}\ge {\mathbf{n}}$.
On entry, ${\mathbf{ldycv}}=〈\mathit{\text{value}}〉$ and ${\mathbf{maxfac}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{ldycv}}\ge {\mathbf{maxfac}}$.
On entry, ${\mathbf{ldyres}}=〈\mathit{\text{value}}〉$ and ${\mathbf{n}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{ldyres}}\ge {\mathbf{n}}$.
On entry, ${\mathbf{maxfac}}=〈\mathit{\text{value}}〉$ and ${\mathbf{ip}}=〈\mathit{\text{value}}〉$.
Constraint: $1\le {\mathbf{maxfac}}\le {\mathbf{ip}}$.
${\mathbf{ifail}}=3$
On entry, ${\mathbf{ip}}=〈\mathit{\text{value}}〉$ and $\mathrm{sum}\left({\mathbf{isx}}\right)=〈\mathit{\text{value}}〉$.
Constraint: the sum of elements in isx must equal ip.
${\mathbf{ifail}}=-99$
See Section 3.9 in How to Use the NAG Library and its Documentation for further information.
${\mathbf{ifail}}=-399$
Your licence key may have expired or may not have been installed correctly.
See Section 3.8 in How to Use the NAG Library and its Documentation for further information.
${\mathbf{ifail}}=-999$
Dynamic memory allocation failed.
See Section 3.7 in How to Use the NAG Library and its Documentation for further information.

## 7Accuracy

The computed singular value decomposition is nearly the exact singular value decomposition for a nearby matrix $\left(A+E\right)$, where
 $E2 = Oε A2 ,$
and $\epsilon$ is the machine precision.

## 8Parallelism and Performance

g02laf makes calls to BLAS and/or LAPACK routines, which may be threaded within the vendor library used by this implementation. Consult the documentation for the vendor library for further information.
Please consult the X06 Chapter Introduction for information on how to control and interrogate the OpenMP environment used within this routine. Please also consult the Users' Note for your implementation for any additional implementation-specific information.

g02laf allocates internally $2mr+A+\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(3\left(A+B\right),5A\right)+r$ elements of real storage, where $A=\mathrm{min}\phantom{\rule{0.125em}{0ex}}\left(m,r\right)$ and $B=\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(m,r\right)$.

## 10Example

This example reads in data from an experiment to measure the biological activity in a chemical compound, and a PLS model is estimated.

### 10.1Program Text

Program Text (g02lafe.f90)

### 10.2Program Data

Program Data (g02lafe.d)

### 10.3Program Results

Program Results (g02lafe.r)