G02LBF (PDF version)
G02 Chapter Contents
G02 Chapter Introduction
NAG Library Manual

NAG Library Routine Document

G02LBF

Note:  before using this routine, please read the Users' Note for your implementation to check the interpretation of bold italicised terms and other implementation-dependent details.

+ Contents

    1  Purpose
    7  Accuracy

1  Purpose

G02LBF fits an orthogonal scores partial least squares (PLS) regression by using Wold's iterative method.

2  Specification

SUBROUTINE G02LBF ( N, MX, X, LDX, ISX, IP, MY, Y, LDY, XBAR, YBAR, ISCALE, XSTD, YSTD, MAXFAC, MAXIT, TAU, XRES, LDXRES, YRES, LDYRES, W, LDW, P, LDP, T, LDT, C, LDC, U, LDU, XCV, YCV, LDYCV, IFAIL)
INTEGER  N, MX, LDX, ISX(MX), IP, MY, LDY, ISCALE, MAXFAC, MAXIT, LDXRES, LDYRES, LDW, LDP, LDT, LDC, LDU, LDYCV, IFAIL
REAL (KIND=nag_wp)  X(LDX,MX), Y(LDY,MY), XBAR(IP), YBAR(MY), XSTD(IP), YSTD(MY), TAU, XRES(LDXRES,IP), YRES(LDYRES,MY), W(LDW,MAXFAC), P(LDP,MAXFAC), T(LDT,MAXFAC), C(LDC,MAXFAC), U(LDU,MAXFAC), XCV(MAXFAC), YCV(LDYCV,MY)

3  Description

Let X1 be the mean-centred n by m data matrix X of n observations on m predictor variables. Let Y1 be the mean-centred n by r data matrix Y of n observations on r response variables.
The first of the k factors PLS methods extract from the data predicts both X1 and Y1 by regressing on a t1 column vector of n scores:
X^1 = t1 p1T Y^1 = t1 c1T , with ​ t1T t1 = 1 ,
where the column vectors of m x-loadings p1 and r y-loadings c1 are calculated in the least squares sense:
p1T = t1T X1 c1T = t1T Y1 .
The x-score vector t1=X1w1 is the linear combination of predictor data X1 that has maximum covariance with the y-scores u1=Y1c1, where the x-weights vector w1 is the normalised first left singular vector of X1T Y1.
The method extracts subsequent PLS factors by repeating the above process with the residual matrices:
Xi = Xi-1 - X^ i-1 Yi = Yi-1 - Y^ i-1 , i=2,3,,k ,
and with orthogonal scores:
tiT tj = 0 , j=1,2,,i-1 .
Optionally, in addition to being mean-centred, the data matrices X1 and Y1 may be scaled by standard deviations of the variables. If data are supplied mean-centred, the calculations are not affected within numerical accuracy.

4  References

Wold H (1966) Estimation of principal components and related models by iterative least-squares In: Multivariate Analysis (ed P R Krishnaiah) 391–420 Academic Press NY

5  Parameters

1:     N – INTEGERInput
On entry: n, the number of observations.
Constraint: N>1.
2:     MX – INTEGERInput
On entry: the number of predictor variables.
Constraint: MX>1.
3:     X(LDX,MX) – REAL (KIND=nag_wp) arrayInput
On entry: Xij must contain the ith observation on the jth predictor variable, for i=1,2,,N and j=1,2,,MX.
4:     LDX – INTEGERInput
On entry: the first dimension of the array X as declared in the (sub)program from which G02LBF is called.
Constraint: LDXN.
5:     ISX(MX) – INTEGER arrayInput
On entry: indicates which predictor variables are to be included in the model.
ISXj=1
The jth predictor variable (with variates in the jth column of X) is included in the model.
ISXj=0
Otherwise.
Constraint: the sum of elements in ISX must equal IP.
6:     IP – INTEGERInput
On entry: m, the number of predictor variables in the model.
Constraint: 1<IPMX.
7:     MY – INTEGERInput
On entry: r, the number of response variables.
Constraint: MY1.
8:     Y(LDY,MY) – REAL (KIND=nag_wp) arrayInput
On entry: Yij must contain the ith observation for the jth response variable, for i=1,2,,N and j=1,2,,MY.
9:     LDY – INTEGERInput
On entry: the first dimension of the array Y as declared in the (sub)program from which G02LBF is called.
Constraint: LDYN.
10:   XBAR(IP) – REAL (KIND=nag_wp) arrayOutput
On exit: mean values of predictor variables in the model.
11:   YBAR(MY) – REAL (KIND=nag_wp) arrayOutput
On exit: the mean value of each response variable.
12:   ISCALE – INTEGERInput
On entry: indicates how predictor variables are scaled.
ISCALE=1
Data are scaled by the standard deviation of variables.
ISCALE=2
Data are scaled by user-supplied scalings.
ISCALE=-1
No scaling.
Constraint: ISCALE=-1, 1 or 2.
13:   XSTD(IP) – REAL (KIND=nag_wp) arrayInput/Output
On entry: if ISCALE=2, XSTDj must contain the user-supplied scaling for the jth predictor variable in the model, for j=1,2,,IP. Otherwise XSTD need not be set.
On exit: if ISCALE=1, standard deviations of predictor variables in the model. Otherwise XSTD is not changed.
14:   YSTD(MY) – REAL (KIND=nag_wp) arrayInput/Output
On entry: if ISCALE=2, YSTDj must contain the user-supplied scaling for the jth response variable in the model, for j=1,2,,MY. Otherwise YSTD need not be set.
On exit: if ISCALE=1, the standard deviation of each response variable. Otherwise YSTD is not changed.
15:   MAXFAC – INTEGERInput
On entry: k, the number of latent variables to calculate.
Constraint: 1MAXFACIP.
16:   MAXIT – INTEGERInput
On entry: if MY=1, MAXIT is not referenced; otherwise the maximum number of iterations used to calculate the x-weights.
Suggested value: MAXIT=200.
Constraint: if MY>1, MAXIT>1.
17:   TAU – REAL (KIND=nag_wp)Input
On entry: if MY=1, TAU is not referenced; otherwise the iterative procedure used to calculate the x-weights will halt if the Euclidean distance between two subsequent estimates is less than or equal to TAU.
Suggested value: TAU=1.0E−4.
Constraint: if MY>1, TAU>0.0.
18:   XRES(LDXRES,IP) – REAL (KIND=nag_wp) arrayOutput
On exit: the predictor variables' residual matrix Xk.
19:   LDXRES – INTEGERInput
On entry: the first dimension of the array XRES as declared in the (sub)program from which G02LBF is called.
Constraint: LDXRESN.
20:   YRES(LDYRES,MY) – REAL (KIND=nag_wp) arrayOutput
On exit: the residuals for each response variable, Yk.
21:   LDYRES – INTEGERInput
On entry: the first dimension of the array YRES as declared in the (sub)program from which G02LBF is called.
Constraint: LDYRESN.
22:   W(LDW,MAXFAC) – REAL (KIND=nag_wp) arrayOutput
On exit: the jth column of W contains the x-weights wj, for j=1,2,,MAXFAC.
23:   LDW – INTEGERInput
On entry: the first dimension of the array W as declared in the (sub)program from which G02LBF is called.
Constraint: LDWIP.
24:   P(LDP,MAXFAC) – REAL (KIND=nag_wp) arrayOutput
On exit: the jth column of P contains the x-loadings pj, for j=1,2,,MAXFAC.
25:   LDP – INTEGERInput
On entry: the first dimension of the array P as declared in the (sub)program from which G02LBF is called.
Constraint: LDPIP.
26:   T(LDT,MAXFAC) – REAL (KIND=nag_wp) arrayOutput
On exit: the jth column of T contains the x-scores tj, for j=1,2,,MAXFAC.
27:   LDT – INTEGERInput
On entry: the first dimension of the array T as declared in the (sub)program from which G02LBF is called.
Constraint: LDTN.
28:   C(LDC,MAXFAC) – REAL (KIND=nag_wp) arrayOutput
On exit: the jth column of C contains the y-loadings cj, for j=1,2,,MAXFAC.
29:   LDC – INTEGERInput
On entry: the first dimension of the array C as declared in the (sub)program from which G02LBF is called.
Constraint: LDCMY.
30:   U(LDU,MAXFAC) – REAL (KIND=nag_wp) arrayOutput
On exit: the jth column of U contains the y-scores uj, for j=1,2,,MAXFAC.
31:   LDU – INTEGERInput
On entry: the first dimension of the array U as declared in the (sub)program from which G02LBF is called.
Constraint: LDUN.
32:   XCV(MAXFAC) – REAL (KIND=nag_wp) arrayOutput
On exit: XCVj contains the cumulative percentage of variance in the predictor variables explained by the first j factors, for j=1,2,,MAXFAC.
33:   YCV(LDYCV,MY) – REAL (KIND=nag_wp) arrayOutput
On exit: YCVij is the cumulative percentage of variance of the jth response variable explained by the first i factors, for i=1,2,,MAXFAC and j=1,2,,MY.
34:   LDYCV – INTEGERInput
On entry: the first dimension of the array YCV as declared in the (sub)program from which G02LBF is called.
Constraint: LDYCVMAXFAC.
35:   IFAIL – INTEGERInput/Output
On entry: IFAIL must be set to 0, -1​ or ​1. If you are unfamiliar with this parameter you should refer to Section 3.3 in the Essential Introduction for details.
For environments where it might be inappropriate to halt program execution when an error is detected, the value -1​ or ​1 is recommended. If the output of error messages is undesirable, then the value 1 is recommended. Otherwise, if you are not familiar with this parameter, the recommended value is 0. When the value -1​ or ​1 is used it is essential to test the value of IFAIL on exit.
On exit: IFAIL=0 unless the routine detects an error or a warning has been flagged (see Section 6).

6  Error Indicators and Warnings

If on entry IFAIL=0 or -1, explanatory error messages are output on the current error message unit (as defined by X04AAF).
Errors or warnings detected by the routine:
IFAIL=1
On entry,N<2,
orMX<2,
oran element of ISX0 or 1,
orMY<1,
orISCALE-1, 1 or 2.
IFAIL=2
On entry,LDX<N,
orIP<2 or IP>MX,
orLDY<N,
orMAXFAC<1 or MAXFAC>IP,
orMY>1 and MAXIT1,
orMY>1 and TAU0.0,
orLDXRES<N,
orLDYRES<N,
orLDW<IP,
orLDP<IP,
orLDC<MY,
orLDT<N,
orLDU<N,
orLDYCV<MAXFAC.
IFAIL=3
IP does not equal the sum of elements in ISX.

7  Accuracy

In general, the iterative method used in the calculations is less accurate (but faster) than the singular value decomposition approach adopted by G02LAF.

8  Further Comments

G02LBF allocates internally (n+r) elements of real storage.

9  Example

This example reads in data from an experiment to measure the biological activity in a chemical compound, and a PLS model is estimated.

9.1  Program Text

Program Text (g02lbfe.f90)

9.2  Program Data

Program Data (g02lbfe.d)

9.3  Program Results

Program Results (g02lbfe.r)


G02LBF (PDF version)
G02 Chapter Contents
G02 Chapter Introduction
NAG Library Manual

© The Numerical Algorithms Group Ltd, Oxford, UK. 2012