# NAG Library Routine Document

## 1Purpose

f08raf (dorcsd) computes the CS decomposition of a real $m$ by $m$ orthogonal matrix $X$, partitioned into a $2$ by $2$ array of submatrices.

## 2Specification

Fortran Interface
 Subroutine f08raf ( m, p, q, x11, x12, x21, x22, u1, ldu1, u2, ldu2, v1t, v2t, work, info)
 Integer, Intent (In) :: m, p, q, ldx11, ldx12, ldx21, ldx22, ldu1, ldu2, ldv1t, ldv2t, lwork Integer, Intent (Out) :: iwork(m-min(p,m-p,q,m-q)), info Real (Kind=nag_wp), Intent (Inout) :: x11(ldx11,*), x12(ldx12,*), x21(ldx21,*), x22(ldx22,*), u1(ldu1,*), u2(ldu2,*), v1t(ldv1t,*), v2t(ldv2t,*) Real (Kind=nag_wp), Intent (Out) :: theta(min(p,m-p,q,m-q)), work(max(1,lwork)) Character (1), Intent (In) :: jobu1, jobu2, jobv1t, jobv2t, trans, signs
#include nagmk26.h
 void f08raf_ (const char *jobu1, const char *jobu2, const char *jobv1t, const char *jobv2t, const char *trans, const char *signs, const Integer *m, const Integer *p, const Integer *q, double x11[], const Integer *ldx11, double x12[], const Integer *ldx12, double x21[], const Integer *ldx21, double x22[], const Integer *ldx22, double theta[], double u1[], const Integer *ldu1, double u2[], const Integer *ldu2, double v1t[], const Integer *ldv1t, double v2t[], const Integer *ldv2t, double work[], const Integer *lwork, Integer iwork[], Integer *info, const Charlen length_jobu1, const Charlen length_jobu2, const Charlen length_jobv1t, const Charlen length_jobv2t, const Charlen length_trans, const Charlen length_signs)
The routine may be called by its LAPACK name dorcsd.

## 3Description

The $m$ by $m$ orthogonal matrix $X$ is partitioned as
 $X= X11 X12 X21 X22$
where ${X}_{11}$ is a $p$ by $q$ submatrix and the dimensions of the other submatrices ${X}_{12}$, ${X}_{21}$ and ${X}_{22}$ are such that $X$ remains $m$ by $m$.
The CS decomposition of $X$ is $X=U{\Sigma }_{p}{V}^{\mathrm{T}}$ where $U$, $V$ and ${\Sigma }_{p}$ are $m$ by $m$ matrices, such that
 $U= U1 0 0 U2$
is an orthogonal matrix containing the $p$ by $p$ orthogonal matrix ${U}_{1}$ and the $\left(m-p\right)$ by $\left(m-p\right)$ orthogonal matrix ${U}_{2}$;
 $V= V1 0 0 V2$
is an orthogonal matrix containing the $q$ by $q$ orthogonal matrix ${V}_{1}$ and the $\left(m-q\right)$ by $\left(m-q\right)$ orthogonal matrix ${V}_{2}$; and
 $Σp= I11 0 0 0 C 0 0 -S 0 0 0 -I12 0 0 I22 0 0 S C 0 0 I21 0 0$
contains the $r$ by $r$ non-negative diagonal submatrices $C$ and $S$ satisfying ${C}^{2}+{S}^{2}=I$, where $r=\mathrm{min}\phantom{\rule{0.125em}{0ex}}\left(p,m-p,q,m-q\right)$ and the top left partition is $p$ by $q$.
The identity matrix ${I}_{11}$ is of order $\mathrm{min}\phantom{\rule{0.125em}{0ex}}\left(p,q\right)-r$ and vanishes if $\mathrm{min}\phantom{\rule{0.125em}{0ex}}\left(p,q\right)=r$.
The identity matrix ${I}_{12}$ is of order $\mathrm{min}\phantom{\rule{0.125em}{0ex}}\left(p,m-q\right)-r$ and vanishes if $\mathrm{min}\phantom{\rule{0.125em}{0ex}}\left(p,m-q\right)=r$.
The identity matrix ${I}_{21}$ is of order $\mathrm{min}\phantom{\rule{0.125em}{0ex}}\left(m-p,q\right)-r$ and vanishes if $\mathrm{min}\phantom{\rule{0.125em}{0ex}}\left(m-p,q\right)=r$.
The identity matrix ${I}_{22}$ is of order $\mathrm{min}\phantom{\rule{0.125em}{0ex}}\left(m-p,m-q\right)-r$ and vanishes if $\mathrm{min}\phantom{\rule{0.125em}{0ex}}\left(m-p,m-q\right)=r$.
In each of the four cases $r=p,q,m-p,m-q$ at least two of the identity matrices vanish.
The indicated zeros represent augmentations by additional rows or columns (but not both) to the square diagonal matrices formed by ${I}_{ij}$ and $C$ or $S$.
${\Sigma }_{p}$ does not need to be stored in full; it is sufficient to return only the values ${\theta }_{i}$ for $i=1,2,\dots ,r$ where ${C}_{ii}=\mathrm{cos}\left({\theta }_{i}\right)$ and ${S}_{ii}=\mathrm{sin}\left({\theta }_{i}\right)$.
The algorithm used to perform the complete $CS$ decomposition is described fully in Sutton (2009) including discussions of the stability and accuracy of the algorithm.

## 4References

Anderson E, Bai Z, Bischof C, Blackford S, Demmel J, Dongarra J J, Du Croz J J, Greenbaum A, Hammarling S, McKenney A and Sorensen D (1999) LAPACK Users' Guide (3rd Edition) SIAM, Philadelphia http://www.netlib.org/lapack/lug
Golub G H and Van Loan C F (2012) Matrix Computations (4th Edition) Johns Hopkins University Press, Baltimore
Sutton B D (2009) Computing the complete $CS$ decomposition Numerical Algorithms (Volume 50) 1017–1398 Springer US 33–65 http://dx.doi.org/10.1007/s11075-008-9215-6

## 5Arguments

1:     $\mathbf{jobu1}$ – Character(1)Input
On entry:
• if ${\mathbf{jobu1}}=\text{'Y'}$, ${U}_{1}$ is computed;
• otherwise, ${U}_{1}$ is not computed.
2:     $\mathbf{jobu2}$ – Character(1)Input
On entry:
• if ${\mathbf{jobu2}}=\text{'Y'}$, ${U}_{2}$ is computed;
• otherwise, ${U}_{2}$ is not computed.
3:     $\mathbf{jobv1t}$ – Character(1)Input
On entry:
• if ${\mathbf{jobv1t}}=\text{'Y'}$, ${V}_{1}^{\mathrm{T}}$ is computed;
• otherwise, ${V}_{1}^{\mathrm{T}}$ is not computed.
4:     $\mathbf{jobv2t}$ – Character(1)Input
On entry:
• if ${\mathbf{jobv2t}}=\text{'Y'}$, ${V}_{2}^{\mathrm{T}}$ is computed;
• otherwise, ${V}_{2}^{\mathrm{T}}$ is not computed.
5:     $\mathbf{trans}$ – Character(1)Input
On entry:
• if ${\mathbf{trans}}=\text{'T'}$, $X$, ${U}_{1}$, ${U}_{2}$, ${V}_{1}^{\mathrm{T}}$ and ${V}_{2}^{\mathrm{T}}$ are stored in row-major order;
• otherwise, $X$, ${U}_{1}$, ${U}_{2}$, ${V}_{1}^{\mathrm{T}}$ and ${V}_{2}^{\mathrm{T}}$ are stored in column-major order.
6:     $\mathbf{signs}$ – Character(1)Input
On entry:
• if ${\mathbf{signs}}=\text{'O'}$, the lower-left block is made nonpositive (the other convention);
• otherwise, the upper-right block is made nonpositive (the default convention).
7:     $\mathbf{m}$ – IntegerInput
On entry: $m$, the number of rows and columns in the orthogonal matrix $X$.
Constraint: ${\mathbf{m}}\ge 0$.
8:     $\mathbf{p}$ – IntegerInput
On entry: $p$, the number of rows in ${X}_{11}$ and ${X}_{12}$.
Constraint: $0\le {\mathbf{p}}\le {\mathbf{m}}$.
9:     $\mathbf{q}$ – IntegerInput
On entry: $q$, the number of columns in ${X}_{11}$ and ${X}_{21}$.
Constraint: $0\le {\mathbf{q}}\le {\mathbf{m}}$.
10:   $\mathbf{x11}\left({\mathbf{ldx11}},*\right)$ – Real (Kind=nag_wp) arrayInput/Output
Note: the second dimension of the array x11 must be at least $\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(1,{\mathbf{p}}\right)$ if ${\mathbf{trans}}=\text{'T'}$, and at least ${\mathbf{q}}$ otherwise.
On entry: the upper left partition of the orthogonal matrix $X$ whose CSD is desired.
On exit: contains details of the orthogonal matrix used in a simultaneous bidiagonalization process.
11:   $\mathbf{ldx11}$ – IntegerInput
On entry: the first dimension of the array x11 as declared in the (sub)program from which f08raf (dorcsd) is called.
Constraints:
• if ${\mathbf{trans}}=\text{'T'}$, ${\mathbf{ldx11}}\ge \mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(1,{\mathbf{q}}\right)$;
• otherwise ${\mathbf{ldx11}}\ge \mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(1,{\mathbf{p}}\right)$.
12:   $\mathbf{x12}\left({\mathbf{ldx12}},*\right)$ – Real (Kind=nag_wp) arrayInput/Output
Note: the second dimension of the array x12 must be at least $\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(1,{\mathbf{p}}\right)$ if ${\mathbf{trans}}=\text{'T'}$, and at least ${\mathbf{m}}-{\mathbf{q}}$ otherwise.
On entry: the upper right partition of the orthogonal matrix $X$ whose CSD is desired.
On exit: contains details of the orthogonal matrix used in a simultaneous bidiagonalization process.
13:   $\mathbf{ldx12}$ – IntegerInput
On entry: the first dimension of the array x12 as declared in the (sub)program from which f08raf (dorcsd) is called.
Constraints:
• if ${\mathbf{trans}}=\text{'T'}$, ${\mathbf{ldx12}}\ge \mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(1,{\mathbf{m}}-{\mathbf{q}}\right)$;
• otherwise ${\mathbf{ldx12}}\ge \mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(1,{\mathbf{p}}\right)$.
14:   $\mathbf{x21}\left({\mathbf{ldx21}},*\right)$ – Real (Kind=nag_wp) arrayInput/Output
Note: the second dimension of the array x21 must be at least $\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(1,{\mathbf{m}}-{\mathbf{p}}\right)$ if ${\mathbf{trans}}=\text{'T'}$, and at least ${\mathbf{q}}$ otherwise.
On entry: the lower left partition of the orthogonal matrix $X$ whose CSD is desired.
On exit: contains details of the orthogonal matrix used in a simultaneous bidiagonalization process.
15:   $\mathbf{ldx21}$ – IntegerInput
On entry: the first dimension of the array x21 as declared in the (sub)program from which f08raf (dorcsd) is called.
Constraints:
• if ${\mathbf{trans}}=\text{'T'}$, ${\mathbf{ldx21}}\ge \mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(1,{\mathbf{q}}\right)$;
• otherwise ${\mathbf{ldx21}}\ge \mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(1,{\mathbf{m}}-{\mathbf{p}}\right)$.
16:   $\mathbf{x22}\left({\mathbf{ldx22}},*\right)$ – Real (Kind=nag_wp) arrayInput/Output
Note: the second dimension of the array x22 must be at least $\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(1,{\mathbf{m}}-{\mathbf{p}}\right)$ if ${\mathbf{trans}}=\text{'T'}$, and at least ${\mathbf{m}}-{\mathbf{q}}$ otherwise.
On entry: the lower right partition of the orthogonal matrix $X$ CSD is desired.
On exit: contains details of the orthogonal matrix used in a simultaneous bidiagonalization process.
17:   $\mathbf{ldx22}$ – IntegerInput
On entry: the first dimension of the array x22 as declared in the (sub)program from which f08raf (dorcsd) is called.
Constraints:
• if ${\mathbf{trans}}=\text{'T'}$, ${\mathbf{ldx22}}\ge \mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(1,{\mathbf{m}}-{\mathbf{q}}\right)$;
• otherwise ${\mathbf{ldx22}}\ge \mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(1,{\mathbf{m}}-{\mathbf{p}}\right)$.
18:   $\mathbf{theta}\left(\mathrm{min}\phantom{\rule{0.125em}{0ex}}\left({\mathbf{p}},{\mathbf{m}}-{\mathbf{p}},{\mathbf{q}},{\mathbf{m}}-{\mathbf{q}}\right)\right)$ – Real (Kind=nag_wp) arrayOutput
On exit: the values ${\theta }_{i}$ for $i=1,2,\dots ,r$ where $r=\mathrm{min}\phantom{\rule{0.125em}{0ex}}\left(p,m-p,q,m-q\right)$. The diagonal submatrices $C$ and $S$ of ${\Sigma }_{p}$ are constructed from these values as
• $C=\mathrm{diag}\left(\mathrm{cos}\left({\mathbf{theta}}\left(1\right)\right),\dots ,\mathrm{cos}\left({\mathbf{theta}}\left(r\right)\right)\right)$ and
• $S=\mathrm{diag}\left(\mathrm{sin}\left({\mathbf{theta}}\left(1\right)\right),\dots ,\mathrm{sin}\left({\mathbf{theta}}\left(r\right)\right)\right)$.
19:   $\mathbf{u1}\left({\mathbf{ldu1}},*\right)$ – Real (Kind=nag_wp) arrayOutput
Note: the second dimension of the array u1 must be at least $\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(1,{\mathbf{p}}\right)$ if ${\mathbf{jobu1}}=\text{'Y'}$, and at least $1$ otherwise.
On exit: if ${\mathbf{jobu1}}=\text{'Y'}$, u1 contains the $p$ by $p$ orthogonal matrix ${U}_{1}$.
20:   $\mathbf{ldu1}$ – IntegerInput
On entry: the first dimension of the array u1 as declared in the (sub)program from which f08raf (dorcsd) is called.
Constraint: if ${\mathbf{jobu1}}=\text{'Y'}$, ${\mathbf{ldu1}}\ge \mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(1,{\mathbf{p}}\right)$.
21:   $\mathbf{u2}\left({\mathbf{ldu2}},*\right)$ – Real (Kind=nag_wp) arrayOutput
Note: the second dimension of the array u2 must be at least $\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(1,{\mathbf{m}}-{\mathbf{p}}\right)$ if ${\mathbf{jobu2}}=\text{'Y'}$, and at least $1$ otherwise.
On exit: if ${\mathbf{jobu2}}=\text{'Y'}$, u2 contains the $m-p$ by $m-p$ orthogonal matrix ${U}_{2}$.
22:   $\mathbf{ldu2}$ – IntegerInput
On entry: the first dimension of the array u2 as declared in the (sub)program from which f08raf (dorcsd) is called.
Constraint: if ${\mathbf{jobu2}}=\text{'Y'}$, ${\mathbf{ldu2}}\ge \mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(1,{\mathbf{m}}-{\mathbf{p}}\right)$.
23:   $\mathbf{v1t}\left({\mathbf{ldv1t}},*\right)$ – Real (Kind=nag_wp) arrayOutput
Note: the second dimension of the array v1t must be at least $\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(1,{\mathbf{q}}\right)$ if ${\mathbf{jobv1t}}=\text{'Y'}$, and at least $1$ otherwise.
On exit: if ${\mathbf{jobv1t}}=\text{'Y'}$, v1t contains the $q$ by $q$ orthogonal matrix ${{V}_{1}}^{\mathrm{T}}$.
24:   $\mathbf{ldv1t}$ – IntegerInput
On entry: the first dimension of the array v1t as declared in the (sub)program from which f08raf (dorcsd) is called.
Constraint: if ${\mathbf{jobv1t}}=\text{'Y'}$, ${\mathbf{ldv1t}}\ge \mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(1,{\mathbf{q}}\right)$.
25:   $\mathbf{v2t}\left({\mathbf{ldv2t}},*\right)$ – Real (Kind=nag_wp) arrayOutput
Note: the second dimension of the array v2t must be at least $\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(1,{\mathbf{m}}-{\mathbf{q}}\right)$ if ${\mathbf{jobv2t}}=\text{'Y'}$, and at least $1$ otherwise.
On exit: if ${\mathbf{jobv2t}}=\text{'Y'}$, v2t contains the $m-q$ by $m-q$ orthogonal matrix ${{V}_{2}}^{\mathrm{T}}$.
26:   $\mathbf{ldv2t}$ – IntegerInput
On entry: the first dimension of the array v2t as declared in the (sub)program from which f08raf (dorcsd) is called.
Constraint: if ${\mathbf{jobv2t}}=\text{'Y'}$, ${\mathbf{ldv2t}}\ge \mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(1,{\mathbf{m}}-{\mathbf{q}}\right)$.
27:   $\mathbf{work}\left(\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(1,{\mathbf{lwork}}\right)\right)$ – Real (Kind=nag_wp) arrayWorkspace
On exit: if ${\mathbf{info}}={\mathbf{0}}$, ${\mathbf{work}}\left(1\right)$ returns the optimal lwork.
If ${\mathbf{info}}>{\mathbf{0}}$ on exit, ${\mathbf{work}}\left(2:\mathit{r}\right)$ contains the values $\mathrm{PHI}\left(1\right),\dots \mathrm{PHI}\left(\mathit{r}-1\right)$ that, together with ${\mathrm{xref}}^{arg}\left(1\right),\dots {\mathrm{xref}}^{arg}\left(\mathit{r}\right)$, define the matrix in intermediate bidiagonal-block form remaining after nonconvergence. info specifies the number of nonzero PHI's.
28:   $\mathbf{lwork}$ – IntegerInput
On entry: the dimension of the array work as declared in the (sub)program from which f08raf (dorcsd) is called.
If ${\mathbf{lwork}}=-1$, a workspace query is assumed; the routine only calculates the optimal size of the work array, returns this value as the first entry of the work array, and no error message related to lwork is issued.
The minimum workspace required is $5×\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(1,r-1\right)+4×\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(1,r\right)+\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(1,8r\right)+\phantom{\rule{0ex}{0ex}}\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(1,p\right)+\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(1,m-p\right)+\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(1,q\right)+\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(1,m-q\right)+1$ where $r=\mathrm{min}\phantom{\rule{0.125em}{0ex}}\left(p,m-p,q,m-q\right)$. The optimal workspace depends on internal block sizes and the relative dimensions of the problem.
Constraint: ${\mathbf{lwork}}=-1$ or
${\mathbf{lwork}}\ge 5×\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(1,\mathit{r}-1\right)+4×\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(1,\mathit{r}\right)+\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(1,8\mathit{r}\right)+\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(1,{\mathbf{p}}\right)+\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(1,{\mathbf{m}}-{\mathbf{p}}\right)+\phantom{\rule{0ex}{0ex}}\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(1,{\mathbf{q}}\right)+\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(1,{\mathbf{m}}-{\mathbf{q}}\right)+1$.
29:   $\mathbf{iwork}\left({\mathbf{m}}-\mathrm{min}\phantom{\rule{0.125em}{0ex}}\left({\mathbf{p}},{\mathbf{m}}-{\mathbf{p}},{\mathbf{q}},{\mathbf{m}}-{\mathbf{q}}\right)\right)$ – Integer arrayWorkspace
30:   $\mathbf{info}$ – IntegerOutput
On exit: ${\mathbf{info}}=0$ unless the routine detects an error (see Section 6).

## 6Error Indicators and Warnings

${\mathbf{info}}<0$
If ${\mathbf{info}}=-i$, argument $i$ had an illegal value. An explanatory message is output, and execution of the program is terminated.
${\mathbf{info}}>0$
The Jacobi-type procedure failed to converge during an internal reduction to bidiagonal-block form. The process requires convergence to $\mathrm{min}\phantom{\rule{0.125em}{0ex}}\left({\mathbf{p}},{\mathbf{m}}-{\mathbf{p}},{\mathbf{q}},{\mathbf{m}}-{\mathbf{q}}\right)$ values, the value of info gives the number of converged values.

## 7Accuracy

The computed $CS$ decomposition is nearly the exact $CS$ decomposition for the nearby matrix $\left(X+E\right)$, where
 $E2 = Oε ,$
and $\epsilon$ is the machine precision.

## 8Parallelism and Performance

f08raf (dorcsd) is threaded by NAG for parallel execution in multithreaded implementations of the NAG Library.
f08raf (dorcsd) makes calls to BLAS and/or LAPACK routines, which may be threaded within the vendor library used by this implementation. Consult the documentation for the vendor library for further information.
Please consult the X06 Chapter Introduction for information on how to control and interrogate the OpenMP environment used within this routine. Please also consult the Users' Note for your implementation for any additional implementation-specific information.

The total number of floating-point operations required to perform the full $CS$ decomposition is approximately $2{m}^{3}$.
The complex analogue of this routine is f08rnf (zuncsd).

## 10Example

This example finds the full CS decomposition of
 $X = -0.7576 0.3697 0.3838 0.2126 -0.3112 -0.4077 -0.1552 -0.1129 0.2676 0.8517 -0.0488 0.7240 -0.6730 -0.1301 0.0602 -0.2287 0.0088 0.2235 -0.9235 0.2120 0.4530 0.5612 0.5806 0.1162 0.3595$
partitioned so that the top left block is $3$ by $2$.
The decomposition is performed both on submatrices of the orthogonal matrix $X$ and on separated partition matrices. Code is also provided to perform a recombining check if required.

### 10.1Program Text

Program Text (f08rafe.f90)

### 10.2Program Data

Program Data (f08rafe.d)

### 10.3Program Results

Program Results (f08rafe.r)

© The Numerical Algorithms Group Ltd, Oxford, UK. 2017