NAG C Library Function Document

nag_pls_orth_scores_wold (g02lbc)

1
Purpose

nag_pls_orth_scores_wold (g02lbc) fits an orthogonal scores partial least squares (PLS) regression by using Wold's iterative method.

2
Specification

#include <nag.h>
#include <nagg02.h>
void  nag_pls_orth_scores_wold (Nag_OrderType order, Integer n, Integer mx, const double x[], Integer pdx, const Integer isx[], Integer ip, Integer my, const double y[], Integer pdy, double xbar[], double ybar[], Nag_ScalePredictor iscale, double xstd[], double ystd[], Integer maxfac, Integer maxit, double tau, double xres[], Integer pdxres, double yres[], Integer pdyres, double w[], Integer pdw, double p[], Integer pdp, double t[], Integer pdt, double c[], Integer pdc, double u[], Integer pdu, double xcv[], double ycv[], Integer pdycv, NagError *fail)

3
Description

Let X1 be the mean-centred n by m data matrix X of n observations on m predictor variables. Let Y1 be the mean-centred n by r data matrix Y of n observations on r response variables.
The first of the k factors PLS methods extract from the data predicts both X1 and Y1 by regressing on a t1 column vector of n scores:
X^1 = t1 p1T Y^1 = t1 c1T , with ​ t1T t1 = 1 ,  
where the column vectors of m x-loadings p1 and r y-loadings c1 are calculated in the least squares sense:
p1T = t1T X1 c1T = t1T Y1 .  
The x-score vector t1=X1w1 is the linear combination of predictor data X1 that has maximum covariance with the y-scores u1=Y1c1, where the x-weights vector w1 is the normalised first left singular vector of X1T Y1.
The method extracts subsequent PLS factors by repeating the above process with the residual matrices:
Xi = Xi-1 - X^ i-1 Yi = Yi-1 - Y^ i-1 , i=2,3,,k ,  
and with orthogonal scores:
tiT tj = 0 , j=1,2,,i-1 .  
Optionally, in addition to being mean-centred, the data matrices X1 and Y1 may be scaled by standard deviations of the variables. If data are supplied mean-centred, the calculations are not affected within numerical accuracy.

4
References

Wold H (1966) Estimation of principal components and related models by iterative least squares In: Multivariate Analysis (ed P R Krishnaiah) 391–420 Academic Press NY

5
Arguments

1:     order Nag_OrderTypeInput
On entry: the order argument specifies the two-dimensional storage scheme being used, i.e., row-major ordering or column-major ordering. C language defined storage is specified by order=Nag_RowMajor. See Section 3.3.1.3 in How to Use the NAG Library and its Documentation for a more detailed explanation of the use of this argument.
Constraint: order=Nag_RowMajor or Nag_ColMajor.
2:     n IntegerInput
On entry: n, the number of observations.
Constraint: n>1.
3:     mx IntegerInput
On entry: the number of predictor variables.
Constraint: mx>1.
4:     x[dim] const doubleInput
Note: the dimension, dim, of the array x must be at least
  • max1,pdx×mx when order=Nag_ColMajor;
  • max1,n×pdx when order=Nag_RowMajor.
Where Xi,j appears in this document, it refers to the array element
  • x[j-1×pdx+i-1] when order=Nag_ColMajor;
  • x[i-1×pdx+j-1] when order=Nag_RowMajor.
On entry: Xi,j must contain the ith observation on the jth predictor variable, for i=1,2,,n and j=1,2,,mx.
5:     pdx IntegerInput
On entry: the stride separating row or column elements (depending on the value of order) in the array x.
Constraints:
  • if order=Nag_ColMajor, pdxn;
  • if order=Nag_RowMajor, pdxmx.
6:     isx[mx] const IntegerInput
On entry: indicates which predictor variables are to be included in the model.
isx[j-1]=1
The jth predictor variable (with variates in the jth column of X) is included in the model.
isx[j-1]=0
Otherwise.
Constraint: the sum of elements in isx must equal ip.
7:     ip IntegerInput
On entry: m, the number of predictor variables in the model.
Constraint: 1<ipmx.
8:     my IntegerInput
On entry: r, the number of response variables.
Constraint: my1.
9:     y[dim] const doubleInput
Note: the dimension, dim, of the array y must be at least
  • max1,pdy×my when order=Nag_ColMajor;
  • max1,n×pdy when order=Nag_RowMajor.
Where Yi,j appears in this document, it refers to the array element
  • y[j-1×pdy+i-1] when order=Nag_ColMajor;
  • y[i-1×pdy+j-1] when order=Nag_RowMajor.
On entry: Yi,j must contain the ith observation for the jth response variable, for i=1,2,,n and j=1,2,,my.
10:   pdy IntegerInput
On entry: the stride separating row or column elements (depending on the value of order) in the array y.
Constraints:
  • if order=Nag_ColMajor, pdyn;
  • if order=Nag_RowMajor, pdymy.
11:   xbar[ip] doubleOutput
On exit: mean values of predictor variables in the model.
12:   ybar[my] doubleOutput
On exit: the mean value of each response variable.
13:   iscale Nag_ScalePredictorInput
On entry: indicates how predictor variables are scaled.
iscale=Nag_PredStdScale
Data are scaled by the standard deviation of variables.
iscale=Nag_PredUserScale
Data are scaled by user-supplied scalings.
iscale=Nag_PredNoScale
No scaling.
Constraint: iscale=Nag_PredNoScale, Nag_PredStdScale or Nag_PredUserScale.
14:   xstd[ip] doubleInput/Output
On entry: if iscale=Nag_PredUserScale, xstd[j-1] must contain the user-supplied scaling for the jth predictor variable in the model, for j=1,2,,ip. Otherwise xstd need not be set.
On exit: if iscale=Nag_PredStdScale, standard deviations of predictor variables in the model. Otherwise xstd is not changed.
15:   ystd[my] doubleInput/Output
On entry: if iscale=Nag_PredUserScale, ystd[j-1] must contain the user-supplied scaling for the jth response variable in the model, for j=1,2,,my. Otherwise ystd need not be set.
On exit: if iscale=Nag_PredStdScale, the standard deviation of each response variable. Otherwise ystd is not changed.
16:   maxfac IntegerInput
On entry: k, the number of latent variables to calculate.
Constraint: 1maxfacip.
17:   maxit IntegerInput
On entry: if my=1, maxit is not referenced; otherwise the maximum number of iterations used to calculate the x-weights.
Suggested value: maxit=200.
Constraint: if my>1, maxit>1.
18:   tau doubleInput
On entry: if my=1, tau is not referenced; otherwise the iterative procedure used to calculate the x-weights will halt if the Euclidean distance between two subsequent estimates is less than or equal to tau.
Suggested value: tau=1.0e−4.
Constraint: if my>1, tau>0.0.
19:   xres[dim] doubleOutput
Note: the dimension, dim, of the array xres must be at least
  • max1,pdxres×ip when order=Nag_ColMajor;
  • max1,n×pdxres when order=Nag_RowMajor.
The i,jth element of the matrix is stored in
  • xres[j-1×pdxres+i-1] when order=Nag_ColMajor;
  • xres[i-1×pdxres+j-1] when order=Nag_RowMajor.
On exit: the predictor variables' residual matrix Xk.
20:   pdxres IntegerInput
On entry: the stride separating row or column elements (depending on the value of order) in the array xres.
Constraints:
  • if order=Nag_ColMajor, pdxresn;
  • if order=Nag_RowMajor, pdxresip.
21:   yres[dim] doubleOutput
Note: the dimension, dim, of the array yres must be at least
  • max1,pdyres×my when order=Nag_ColMajor;
  • max1,n×pdyres when order=Nag_RowMajor.
The i,jth element of the matrix is stored in
  • yres[j-1×pdyres+i-1] when order=Nag_ColMajor;
  • yres[i-1×pdyres+j-1] when order=Nag_RowMajor.
On exit: the residuals for each response variable, Yk.
22:   pdyres IntegerInput
On entry: the stride separating row or column elements (depending on the value of order) in the array yres.
Constraints:
  • if order=Nag_ColMajor, pdyresn;
  • if order=Nag_RowMajor, pdyresmy.
23:   w[dim] doubleOutput
Note: the dimension, dim, of the array w must be at least
  • max1,pdw×maxfac when order=Nag_ColMajor;
  • max1,ip×pdw when order=Nag_RowMajor.
The i,jth element of the matrix W is stored in
  • w[j-1×pdw+i-1] when order=Nag_ColMajor;
  • w[i-1×pdw+j-1] when order=Nag_RowMajor.
On exit: the jth column of W contains the x-weights wj, for j=1,2,,maxfac.
24:   pdw IntegerInput
On entry: the stride separating row or column elements (depending on the value of order) in the array w.
Constraints:
  • if order=Nag_ColMajor, pdwip;
  • if order=Nag_RowMajor, pdwmaxfac.
25:   p[dim] doubleOutput
Note: the dimension, dim, of the array p must be at least
  • max1,pdp×maxfac when order=Nag_ColMajor;
  • max1,ip×pdp when order=Nag_RowMajor.
The i,jth element of the matrix P is stored in
  • p[j-1×pdp+i-1] when order=Nag_ColMajor;
  • p[i-1×pdp+j-1] when order=Nag_RowMajor.
On exit: the jth column of P contains the x-loadings pj, for j=1,2,,maxfac.
26:   pdp IntegerInput
On entry: the stride separating row or column elements (depending on the value of order) in the array p.
Constraints:
  • if order=Nag_ColMajor, pdpip;
  • if order=Nag_RowMajor, pdpmaxfac.
27:   t[dim] doubleOutput
Note: the dimension, dim, of the array t must be at least
  • max1,pdt×maxfac when order=Nag_ColMajor;
  • max1,n×pdt when order=Nag_RowMajor.
The i,jth element of the matrix T is stored in
  • t[j-1×pdt+i-1] when order=Nag_ColMajor;
  • t[i-1×pdt+j-1] when order=Nag_RowMajor.
On exit: the jth column of T contains the x-scores tj, for j=1,2,,maxfac.
28:   pdt IntegerInput
On entry: the stride separating row or column elements (depending on the value of order) in the array t.
Constraints:
  • if order=Nag_ColMajor, pdtn;
  • if order=Nag_RowMajor, pdtmaxfac.
29:   c[dim] doubleOutput
Note: the dimension, dim, of the array c must be at least
  • max1,pdc×maxfac when order=Nag_ColMajor;
  • max1,my×pdc when order=Nag_RowMajor.
The i,jth element of the matrix C is stored in
  • c[j-1×pdc+i-1] when order=Nag_ColMajor;
  • c[i-1×pdc+j-1] when order=Nag_RowMajor.
On exit: the jth column of C contains the y-loadings cj, for j=1,2,,maxfac.
30:   pdc IntegerInput
On entry: the stride separating row or column elements (depending on the value of order) in the array c.
Constraints:
  • if order=Nag_ColMajor, pdcmy;
  • if order=Nag_RowMajor, pdcmaxfac.
31:   u[dim] doubleOutput
Note: the dimension, dim, of the array u must be at least
  • max1,pdu×maxfac when order=Nag_ColMajor;
  • max1,n×pdu when order=Nag_RowMajor.
The i,jth element of the matrix U is stored in
  • u[j-1×pdu+i-1] when order=Nag_ColMajor;
  • u[i-1×pdu+j-1] when order=Nag_RowMajor.
On exit: the jth column of U contains the y-scores uj, for j=1,2,,maxfac.
32:   pdu IntegerInput
On entry: the stride separating row or column elements (depending on the value of order) in the array u.
Constraints:
  • if order=Nag_ColMajor, pdun;
  • if order=Nag_RowMajor, pdumaxfac.
33:   xcv[maxfac] doubleOutput
On exit: xcv[j-1] contains the cumulative percentage of variance in the predictor variables explained by the first j factors, for j=1,2,,maxfac.
34:   ycv[dim] doubleOutput
Note: the dimension, dim, of the array ycv must be at least
  • max1,pdycv×my when order=Nag_ColMajor;
  • max1,maxfac×pdycv when order=Nag_RowMajor.
Where YCVi,j appears in this document, it refers to the array element
  • ycv[j-1×pdycv+i-1] when order=Nag_ColMajor;
  • ycv[i-1×pdycv+j-1] when order=Nag_RowMajor.
On exit: YCVi,j is the cumulative percentage of variance of the jth response variable explained by the first i factors, for i=1,2,,maxfac and j=1,2,,my.
35:   pdycv IntegerInput
On entry: the stride separating row or column elements (depending on the value of order) in the array ycv.
Constraints:
  • if order=Nag_ColMajor, pdycvmaxfac;
  • if order=Nag_RowMajor, pdycvmy.
36:   fail NagError *Input/Output
The NAG error argument (see Section 3.7 in How to Use the NAG Library and its Documentation).

6
Error Indicators and Warnings

NE_ALLOC_FAIL
Dynamic memory allocation failed.
See Section 2.3.1.2 in How to Use the NAG Library and its Documentation for further information.
NE_BAD_PARAM
On entry, argument value had an illegal value.
NE_INT
On entry, mx=value.
Constraint: mx>1.
On entry, my=value.
Constraint: my1.
On entry, n=value.
Constraint: n>1.
On entry, pdc=value.
Constraint: pdc>0.
On entry, pdp=value.
Constraint: pdp>0.
On entry, pdt=value.
Constraint: pdt>0.
On entry, pdu=value.
Constraint: pdu>0.
On entry, pdw=value.
Constraint: pdw>0.
On entry, pdx=value.
Constraint: pdx>0.
On entry, pdxres=value.
Constraint: pdxres>0.
On entry, pdy=value.
Constraint: pdy>0.
On entry, pdycv=value.
Constraint: pdycv>0.
On entry, pdyres=value.
Constraint: pdyres>0.
NE_INT_2
On entry, ip=value and mx=value.
Constraint: 1<ipmx.
On entry, maxfac=value and ip=value.
Constraint: 1maxfacip.
On entry, my=value and maxit=value.
Constraint: if my>1, maxit>1.
On entry, pdc=value and maxfac=value.
Constraint: pdcmaxfac.
On entry, pdc=value and my=value.
Constraint: pdc or my.
On entry, pdp=value and ip=value.
Constraint: pdp or ip.
On entry, pdp=value and maxfac=value.
Constraint: pdpmaxfac.
On entry, pdt=value and maxfac=value.
Constraint: pdtmaxfac.
On entry, pdt=value and n=value.
Constraint: pdt or n.
On entry, pdu=value and maxfac=value.
Constraint: pdumaxfac.
On entry, pdu=value and n=value.
Constraint: pdu or n.
On entry, pdw=value and ip=value.
Constraint: pdw or ip.
On entry, pdw=value and maxfac=value.
Constraint: pdwmaxfac.
On entry, pdx=value and mx=value.
Constraint: pdxmx.
On entry, pdx=value and n=value.
Constraint: pdx or n.
On entry, pdxres=value and ip=value.
Constraint: pdxresip.
On entry, pdxres=value and n=value.
Constraint: pdxres or n.
On entry, pdy=value and my=value.
Constraint: pdymy.
On entry, pdy=value and n=value.
Constraint: pdy or n.
On entry, pdycv=value and maxfac=value.
Constraint: pdycv or maxfac.
On entry, pdycv=value and my=value.
Constraint: pdycvmy.
On entry, pdyres=value and my=value.
Constraint: pdyresmy.
On entry, pdyres=value and n=value.
Constraint: pdyres< or n.
NE_INT_ARG_CONS
On entry, ip=value and sumisx=value.
Constraint: the sum of elements in isx must equal ip.
NE_INT_ARRAY_VAL_1_OR_2
On entry, isxvalue is invalid.
Constraint: isx[j-1]=0 or 1, for all j.
NE_INTERNAL_ERROR
An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact NAG for assistance.
See Section 2.7.6 in How to Use the NAG Library and its Documentation for further information.
NE_NO_LICENCE
Your licence key may have expired or may not have been installed correctly.
See Section 2.7.5 in How to Use the NAG Library and its Documentation for further information.
NE_REAL
On entry, tau=value.
Constraint: if my>1, tau>0.0.

7
Accuracy

In general, the iterative method used in the calculations is less accurate (but faster) than the singular value decomposition approach adopted by nag_pls_orth_scores_svd (g02lac).

8
Parallelism and Performance

nag_pls_orth_scores_wold (g02lbc) makes calls to BLAS and/or LAPACK routines, which may be threaded within the vendor library used by this implementation. Consult the documentation for the vendor library for further information.
Please consult the x06 Chapter Introduction for information on how to control and interrogate the OpenMP environment used within this function. Please also consult the Users' Note for your implementation for any additional implementation-specific information.

9
Further Comments

nag_pls_orth_scores_wold (g02lbc) allocates internally (n+r) elements of double storage.

10
Example

This example reads in data from an experiment to measure the biological activity in a chemical compound, and a PLS model is estimated.

10.1
Program Text

Program Text (g02lbce.c)

10.2
Program Data

Program Data (g02lbce.d)

10.3
Program Results

Program Results (g02lbce.r)