naginterfaces.library.mv.canon_corr¶

naginterfaces.library.mv.canon_corr(z, isz, nx, ny, mcv, tol, wt=None)[source]¶

canon_corr performs canonical correlation analysis upon input data matrices.

For full information please refer to the NAG Library document for g03ad

https://www.nag.com/numeric/nl/nagdoc_29.3/flhtml/g03/g03adf.html

Parameters

zfloat, array-like, shape $(n, m)$

$z [i - 1, j - 1]$ must contain the $i$ th observation for the $j$ th variable, for $j = 1, 2, \dots, m$ , for $i = 1, 2, \dots, n$ .

Both $x$ and $y$ variables are to be included in $z$ , the indicator array, $i s z$ , being used to assign the variables in $z$ to the $x$ or $y$ sets as appropriate.

iszint, array-like, shape $(m)$

$i s z [j - 1]$ indicates whether or not the $j$ th variable is included in the analysis and to which set of variables it belongs.

$i s z [j - 1] > 0$

The variable contained in the $j$ th column of $z$ is included as an $x$ variable in the analysis.

$i s z [j - 1] < 0$

The variable contained in the $j$ th column of $z$ is included as a $y$ variable in the analysis.

$i s z [j - 1] = 0$

The variable contained in the $j$ th column of $z$ is not included in the analysis.

nxint

The number of $x$ variables in the analysis, $n_{x}$ .

nyint

The number of $y$ variables in the analysis, $n_{y}$ .

mcvint

An upper limit to the number of canonical variates.

tolfloat

The value of $t o l$ is used to decide if the variables are of full rank and, if not, what is the rank of the variables. The smaller the value of $t o l$ the stricter the criterion for selecting the singular value decomposition. If a non-negative value of $t o l$ less than machine precision is entered, the square root of machine precision is used instead.

wtNone or float, array-like, shape $(n)$ , optional

The elements of $w t$ must contain the weights to be used in the analysis. The effective number of observations is the sum of the weights. If $w t [i - 1] = 0.0$ then the $i$ th observation is not included in the analysis.

If weights are not provided then $w t$ must be set to None and the effective number of observations is $n$ .

Returns

efloat, ndarray, shape $(min (n x, n y), 6)$

The statistics of the canonical variate analysis.

$e [i - 1, 0]$

The canonical correlations, $δ_{i}$ , for $i = 1, 2, \dots, l$ .

$e [i - 1, 1]$

The eigenvalues of $Σ$ , $λ_{i}^{2}$ , for $i = 1, 2, \dots, l$ .

$e [i - 1, 2]$

The proportion of variation explained by the $i$ th canonical variate, for $i = 1, 2, \dots, l$ .

$e [i - 1, 3]$

The $χ^{2}$ statistic for the $i$ th canonical variate, for $i = 1, 2, \dots, l$ .

$e [i - 1, 4]$

The degrees of freedom for $χ^{2}$ statistic for the $i$ th canonical variate, for $i = 1, 2, \dots, l$ .

$e [i - 1, 5]$

The significance level for the $χ^{2}$ statistic for the $i$ th canonical variate, for $i = 1, 2, \dots, l$ .

ncvint

The number of canonical correlations, $l$ . This will be the minimum of the rank of $X$ and the rank of $Y$ .

cvxfloat, ndarray, shape $(n x, m c v)$

The canonical variate loadings for the $x$ variables. $c v x [i - 1, j - 1]$ contains the loading coefficient for the $i$ th $x$ variable on the $j$ th canonical variate.

cvyfloat, ndarray, shape $(n y, m c v)$

The canonical variate loadings for the $y$ variables. $c v y [i - 1, j - 1]$ contains the loading coefficient for the $i$ th $y$ variable on the $j$ th canonical variate.

Raises

NagValueError

(errno $1$ )

On entry, $m c v = ⟨ v a l u e ⟩$ and $m i n (n x, n y) = ⟨ v a l u e ⟩$ .

Constraint: $m c v \geq m i n (n x, n y)$ .

(errno $1$ )

On entry, $t o l = ⟨ v a l u e ⟩$ .

Constraint: $t o l \geq 0.0$ .

(errno $1$ )

On entry, $weight = ⟨ v a l u e ⟩$ .

Constraint: $weight ='U'$ or $'W'$ .

(errno $1$ )

On entry, $lde = ⟨ v a l u e ⟩$ and $m i n (n x, n y) = ⟨ v a l u e ⟩$ .

Constraint: $lde \geq m i n (n x, n y)$ .

(errno $1$ )

On entry, $n = ⟨ v a l u e ⟩$ and $n x + n y = ⟨ v a l u e ⟩$ .

Constraint: $n > n x + n y$ .

(errno $1$ )

On entry, $m = ⟨ v a l u e ⟩$ and $n x + n y = ⟨ v a l u e ⟩$ .

Constraint: $m \geq n x + n y$ .

(errno $1$ )

On entry, $n y = ⟨ v a l u e ⟩$ .

Constraint: $n y \geq 1$ .

(errno $1$ )

On entry, $n x = ⟨ v a l u e ⟩$ .

Constraint: $n x \geq 1$ .

(errno $2$ )

On entry, $i = ⟨ v a l u e ⟩$ and $w t [i - 1] < 0.0$ .

Constraint: $w t [i - 1] \geq 0.0$ .

(errno $3$ )

On entry, $n y = ⟨ v a l u e ⟩$ , expected $value = ⟨ v a l u e ⟩$ .

Constraint: $n y$ must be consistent with $i s z$ .

(errno $3$ )

On entry, $n x = ⟨ v a l u e ⟩$ , expected $value = ⟨ v a l u e ⟩$ .

Constraint: $n x$ must be consistent with $i s z$ .

(errno $4$ )

On entry, the effective number of observations is less than $n x + n y + 1$ .

(errno $5$ )

The singular value decomposition has failed to converge.

Warns

NagAlgorithmicWarning

(errno $6$ ): A canonical correlation is equal to $1.0$ .
(errno $7$ ): On entry, the rank of the $Y$ matrix is $0$ .
(errno $7$ ): On entry, the rank of the $X$ matrix is $0$ .

Notes

In the NAG Library the traditional C interface for this routine uses a different algorithmic base. Please contact NAG if you have any questions about compatibility.

Let there be two sets of variables, $x$ and $y$ . For a sample of $n$ observations on $n_{x}$ variables in a data matrix $X$ and $n_{y}$ variables in a data matrix $Y$ , canonical correlation analysis seeks to find a small number of linear combinations of each set of variables in order to explain or summarise the relationships between them. The variables thus formed are known as canonical variates.

Let the variance-covariance matrix of the two datasets be

\begin{matrix} (\begin{matrix} S_{x x} & S_{x y} S_{y x} & S_{y y} \end{matrix}) \end{matrix}

and let

Σ = S_{y y}^{- 1} S_{y x} S_{x x}^{- 1} S_{x y}

then the canonical correlations can be calculated from the eigenvalues of the matrix $Σ$ . However, canon_corr calculates the canonical correlations by means of a singular value decomposition (SVD) of a matrix $V$ . If the rank of the data matrix $X$ is $k_{x}$ and the rank of the data matrix $Y$ is $k_{y}$ , and both $X$ and $Y$ have had variable (column) means subtracted then the $k_{x} \times k_{y}$ matrix $V$ is given by:

V = Q_{x}^{T} Q_{y},

where $Q_{x}$ is the first $k_{x}$ columns of the orthogonal matrix $Q$ either from the $Q R$ decomposition of $X$ if $X$ is of full column rank, i.e., $k_{x} = n_{x}$ :

X = Q_{x} R_{x}

or from the SVD of $X$ if $k_{x} < n_{x}$ :

X = Q_{x} D_{x} P_{x}^{T} .

Similarly $Q_{y}$ is the first $k_{y}$ columns of the orthogonal matrix $Q$ either from the $Q R$ decomposition of $Y$ if $Y$ is of full column rank, i.e., $k_{y} = n_{y}$ :

Y = Q_{y} R_{y}

or from the SVD of $Y$ if $k_{y} < n_{y}$ :

Y = Q_{y} D_{y} P_{y}^{T} .

Let the SVD of $V$ be:

V = U_{x} Δ U_{y}^{T}

then the nonzero elements of the diagonal matrix $Δ$ , $δ_{i}$ , for $i = 1, 2, \dots, l$ , are the $l$ canonical correlations associated with the $l$ canonical variates, where $l = m i n (k_{x}, k_{y})$ .

The eigenvalues, $λ_{i}^{2}$ , of the matrix $Σ$ are given by:

λ_{i}^{2} = δ_{i}^{2} .

The value of $π_{i} = λ_{i}^{2} / \sum λ_{i}^{2}$ gives the proportion of variation explained by the $i$ th canonical variate. The values of the $π_{i}$ ’s give an indication as to how many canonical variates are needed to adequately describe the data, i.e., the dimensionality of the problem.

To test for a significant dimensionality greater than $i$ the $χ^{2}$ statistic:

(n - \frac{1}{2} (k_{x} + k_{y} + 3)) l \sum j = i + 1 log (1 - δ_{j}^{2})

can be used. This is asymptotically distributed as a $χ^{2}$ -distribution with $(k_{x} - i) (k_{y} - i)$ degrees of freedom. If the test for $i = k_{m i n}$ is not significant, then the remaining tests for $i > k_{m i n}$ should be ignored.

The loadings for the canonical variates are calculated from the matrices $U_{x}$ and $U_{y}$ respectively. These matrices are scaled so that the canonical variates have unit variance.

References

Hastings, N A J and Peacock, J B, 1975, Statistical Distributions, Butterworth

Kendall, M G and Stuart, A, 1976, The Advanced Theory of Statistics (Volume 3), (3rd Edition), Griffin

Morrison, D F, 1967, Multivariate Statistical Methods, McGraw–Hill

NAG and Python

Return to Front

naginterfaces.library.mv.canon_corr¶

naginterfaces.library.mv.canon_​corr¶

naginterfaces.library.mv.canon_corr¶