naginterfaces.library.correg.pls_wold¶

naginterfaces.library.correg.pls_wold(x, isx, y, iscale, xstd, ystd, maxfac, maxit=200, tau=0.0001, io_manager=None)[source]¶

pls_wold fits an orthogonal scores partial least squares (PLS) regression by using Wold’s iterative method.

For full information please refer to the NAG Library document for g02lb

https://www.nag.com/numeric/nl/nagdoc_29.3/flhtml/g02/g02lbf.html

Parameters

xfloat, array-like, shape $(n, mx)$

$x [i - 1, j - 1]$ must contain the $i$ th observation on the $j$ th predictor variable, for $j = 1, 2, \dots, mx$ , for $i = 1, 2, \dots, n$ .

isxint, array-like, shape $(mx)$

Indicates which predictor variables are to be included in the model.

$i s x [j - 1] = 1$

The $j$ th predictor variable (with variates in the $j$ th column of $X$ ) is included in the model.

$i s x [j - 1] = 0$

Otherwise.

yfloat, array-like, shape $(n, my)$

$y [i - 1, j - 1]$ must contain the $i$ th observation for the $j$ th response variable, for $j = 1, 2, \dots, my$ , for $i = 1, 2, \dots, n$ .

iscaleint

Indicates how predictor variables are scaled.

$i s c a l e = 1$

Data are scaled by the standard deviation of variables.

$i s c a l e = 2$

Data are scaled by user-supplied scalings.

$i s c a l e = - 1$

No scaling.

xstdfloat, array-like, shape $(ip)$

If $i s c a l e = 2$ , $x s t d [j - 1]$ must contain the user-supplied scaling for the $j$ th predictor variable in the model, for $j = 1, 2, \dots, ip$ . Otherwise $x s t d$ need not be set.

ystdfloat, array-like, shape $(my)$

If $i s c a l e = 2$ , $y s t d [j - 1]$ must contain the user-supplied scaling for the $j$ th response variable in the model, for $j = 1, 2, \dots, my$ . Otherwise $y s t d$ need not be set.

maxfacint

$k$ , the number of latent variables to calculate.

maxitint, optional

If $my = 1$ , $m a x i t$ is not referenced; otherwise the maximum number of iterations used to calculate the $x$ -weights.

taufloat, optional

If $my = 1$ , $t a u$ is not referenced; otherwise the iterative procedure used to calculate the $x$ -weights will halt if the Euclidean distance between two subsequent estimates is less than or equal to $t a u$ .

io_managerFileObjManager, optional

Manager for I/O in this routine.

Returns

xbarfloat, ndarray, shape $(ip)$: Mean values of predictor variables in the model.
ybarfloat, ndarray, shape $(my)$: The mean value of each response variable.
xstdfloat, ndarray, shape $(ip)$: If $i s c a l e = 1$ , standard deviations of predictor variables in the model. Otherwise $x s t d$ is not changed.
ystdfloat, ndarray, shape $(my)$: If $i s c a l e = 1$ , the standard deviation of each response variable. Otherwise $y s t d$ is not changed.
xresfloat, ndarray, shape $(n, ip)$: The predictor variables’ residual matrix $X_{k}$ .
yresfloat, ndarray, shape $(n, my)$: The residuals for each response variable, $Y_{k}$ .
wfloat, ndarray, shape $(ip, m a x f a c)$: The $j$ th column of $W$ contains the $x$ -weights $w_{j}$ , for $j = 1, 2, \dots, m a x f a c$ .
pfloat, ndarray, shape $(ip, m a x f a c)$: The $j$ th column of $P$ contains the $x$ -loadings $p_{j}$ , for $j = 1, 2, \dots, m a x f a c$ .
tfloat, ndarray, shape $(n, m a x f a c)$: The $j$ th column of $T$ contains the $x$ -scores $t_{j}$ , for $j = 1, 2, \dots, m a x f a c$ .
cfloat, ndarray, shape $(my, m a x f a c)$: The $j$ th column of $C$ contains the $y$ -loadings $c_{j}$ , for $j = 1, 2, \dots, m a x f a c$ .
ufloat, ndarray, shape $(n, m a x f a c)$: The $j$ th column of $U$ contains the $y$ -scores $u_{j}$ , for $j = 1, 2, \dots, m a x f a c$ .
xcvfloat, ndarray, shape $(m a x f a c)$: $x c v [j - 1]$ contains the cumulative percentage of variance in the predictor variables explained by the first $j$ factors, for $j = 1, 2, \dots, m a x f a c$ .
ycvfloat, ndarray, shape $(m a x f a c, my)$: $y c v [i - 1, j - 1]$ is the cumulative percentage of variance of the $j$ th response variable explained by the first $i$ factors, for $j = 1, 2, \dots, my$ , for $i = 1, 2, \dots, m a x f a c$ .

Raises

NagValueError

(errno $1$ )

On entry, $n = ⟨ v a l u e ⟩$ .

Constraint: $n > 1$ .

(errno $1$ )

On entry, $mx = ⟨ v a l u e ⟩$ .

Constraint: $mx > 1$ .

(errno $1$ )

On entry, $i s x [⟨ v a l u e ⟩]$ is invalid.

Constraint: $i s x [j - 1] = 0$ or $1$ , for all $j$ .

(errno $1$ )

On entry, $my = ⟨ v a l u e ⟩$ .

Constraint: $my \geq 1$ .

(errno $1$ )

On entry, $i s c a l e = ⟨ v a l u e ⟩$ .

Constraint: $i s c a l e = - 1$ or $1$ .

(errno $2$ )

On entry, $ip = ⟨ v a l u e ⟩$ and $mx = ⟨ v a l u e ⟩$ .

Constraint: $1 < ip \leq mx$ .

(errno $2$ )

On entry, $m a x f a c = ⟨ v a l u e ⟩$ and $ip = ⟨ v a l u e ⟩$ .

Constraint: $1 \leq m a x f a c \leq ip$ .

(errno $2$ )

On entry, $my = ⟨ v a l u e ⟩$ and $m a x i t = ⟨ v a l u e ⟩$ .

Constraint: if $my > 1$ , $m a x i t > 1$ .

(errno $2$ )

On entry, $t a u = ⟨ v a l u e ⟩$ .

Constraint: if $my > 1$ , $t a u > 0.0$ .

(errno $3$ )

On entry, $ip = ⟨ v a l u e ⟩$ and $s u m (i s x) = ⟨ v a l u e ⟩$ .

Constraint: the sum of elements in $i s x$ must equal $ip$ .

Notes

Let $X_{1}$ be the mean-centred $n \times m$ data matrix $X$ of $n$ observations on $m$ predictor variables. Let $Y_{1}$ be the mean-centred $n \times r$ data matrix $Y$ of $n$ observations on $r$ response variables.

The first of the $k$ factors PLS methods extract from the data predicts both $X_{1}$ and $Y_{1}$ by regressing on a $t_{1}$ column vector of $n$ scores:

\begin{matrix} \begin{matrix} {^X}_{1} = t_{1} p_{1}^{T} {^Y}_{1} = t_{1} c_{1}^{T}, & with t_{1}^{T} t_{1} = 1, \end{matrix} \end{matrix}

where the column vectors of $m$ $x$ -loadings $p_{1}$ and $r$ $y$ -loadings $c_{1}$ are calculated in the least squares sense:

\begin{matrix} \begin{matrix} p_{1}^{T} = t_{1}^{T} X_{1} c_{1}^{T} = t_{1}^{T} Y_{1} . \end{matrix} \end{matrix}

The $x$ -score vector $t_{1} = X_{1} w_{1}$ is the linear combination of predictor data $X_{1}$ that has maximum covariance with the $y$ -scores $u_{1} = Y_{1} c_{1}$ , where the $x$ -weights vector $w_{1}$ is the normalised first left singular vector of $X_{1}^{T} Y_{1}$ .

The method extracts subsequent PLS factors by repeating the above process with the residual matrices:

\begin{matrix} \begin{matrix} X_{i} = X_{i - 1} - {^X}_{i - 1} Y_{i} = Y_{i - 1} - {^Y}_{i - 1}, i = 2, 3, \dots, k, \end{matrix} \end{matrix}

and with orthogonal scores:

t_{i}^{T} t_{j} = 0, j = 1, 2, \dots, i - 1 .

Optionally, in addition to being mean-centred, the data matrices $X_{1}$ and $Y_{1}$ may be scaled by standard deviations of the variables. If data are supplied mean-centred, the calculations are not affected within numerical accuracy.

References: Wold, H, 1966, Estimation of principal components and related models by iterative least squares, In: Multivariate Analysis, (ed P R Krishnaiah), 391–420, Academic Press NY

NAG and Python

Return to Front

naginterfaces.library.correg.pls_wold¶

naginterfaces.library.correg.pls_​wold¶

naginterfaces.library.correg.pls_wold¶