using a Discrete Cosine Transform (DCT). The function can be used as a building block in Randomised Numerical Linear Algebra (RNLA) algorithms, such as

Q R

decompositions, Singular Value Decompositions (SVDs), and (approximate) least squares solvers.

2 Specification

#include <nag.h>

void	f10dac (Nag_TransType trans, Integer m, Integer n, double a[], Integer pda, Integer k, Integer state[], NagError *fail)

The function may be called by the names: f10dac or nag_rnla_randproj_dct_real.

3 Description

A random projection is written as either

Y = A Ω

Y = Ω A,

where

A

is a real

m \times n

matrix,

Ω

is an

n \times k

matrix in the first case, and a

k \times m

matrix in the second case. These cases are referred to as random projection by post-multiplication and random projection by pre-multiplication, respectively.

When a random projection by post-multiplication uses the DCT, it is written as

Ω = D F C,

where

D

is a diagonal matrix whose values are uniformly distributed on the set

{−1, + 1}

F

is a DCT, and

C

is a matrix that selects a subset of columns, uniformly at random.

When a random projection by pre-multiplication uses the DCT, it is written as

Ω = R F D .

The operators

D

and

F

are as above and

R

is a matrix that selects a subset of rows, again uniformly at random.

None of these matrix operators require a full matrix-matrix product to be computed. The computational complexity of applying this type of random projection is

O (m n \log (k))

. More details of the DCT-based random projection can be found in Avron et al. (2010).

The DCT-based random projection is closely related to the Subsampled Random Fourier Transform (SRFT) presented in Section 4.6 of Halko et al. (2011).

4 References

Avron H, Maymounkov P and Toledo S (2010) Blendenpik: Supercharging LAPACK's least-squares solver SIAM J. Sci. Comput. 32(3) 1217–1236

Halko N (2012) Randomized methods for computing low-rank approximations of matrices PhD thesis

Halko N, Martinsson P G and Tropp J A (2011) Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions SIAM Rev. 53(2) 217–288 https://epubs.siam.org/doi/abs/10.1137/090771806

5 Arguments

1: $trans$ – Nag_TransType Input

On entry: specifies whether the operation pre-multiplies

A

or post-multiplies

A

$trans = Nag_NoTrans$: Random projection is done by post-multiplication, $Y = A Ω$ .
$trans = Nag_Trans$: Random projection is done by pre-multiplication, $Y = Ω A$ .

Constraint:

trans = Nag_NoTrans

Nag_Trans

2: $m$ – Integer Input

On entry:

m

, the number of rows of the matrix

A

Constraint:

m > 0

3: $n$ – Integer Input

On entry:

n

, the number of columns of the matrix

A

Constraint:

n > 0

4: $a [\dim]$ – double Input/Output

Note: the dimension, dim, of the array a must be at least

pda \times n

The

(i, j)

th element of the matrix

A

is stored in

a [(j - 1) \times pda + i - 1]

On entry: the

m \times n

matrix

A

On exit: if

trans = Nag_NoTrans

, the first

k

columns of

A

are overwritten with the

m \times k

random projection of

A

trans = Nag_Trans

, the first

k

rows of

A

are overwritten with the

k \times n

random projection of

A

5: $pda$ – Integer Input

On entry: the stride separating matrix row elements in the array a.

Constraint:

pda \geq m

6: $k$ – Integer Input

On entry:

k

, number of columns in the random projection,

Y = A Ω

Constraints:

if $trans = Nag_NoTrans$ , $0 < k \leq n$ ;
if $trans = Nag_Trans$ , $0 < k \leq m$ .

7: $state [\dim]$ – Integer Communication Array

Note: the dimension,

\dim

, of this array is dictated by the requirements of associated functions that must have been previously called. This array MUST be the same array passed as argument state in the previous call to nag_rand_init_repeatable (g05kfc) or nag_rand_init_nonrepeatable (g05kgc).

On entry: contains information on the selected base generator and its current state.

On exit: contains updated information on the state of the generator.

8: $fail$ – NagError * Input/Output

The NAG error argument (see Section 7 in the Introduction to the NAG Library CL Interface).

6 Error Indicators and Warnings

NE_ALLOC_FAIL: Dynamic memory allocation failed.
See Section 3.1.2 in the Introduction to the NAG Library CL Interface for further information.
NE_BAD_PARAM: On entry, argument $⟨ value ⟩$ had an illegal value.
NE_INT: On entry, $k = ⟨ value ⟩$ and $trans = ⟨ value ⟩$ .
Constraint: if $trans = Nag_NoTrans$ , $0 < k \leq n$ , if $trans = Nag_Trans$ , $0 < k \leq m$ .

On entry, $m = ⟨ value ⟩$ .
Constraint: $m > 0$ .

On entry, $n = ⟨ value ⟩$ .
Constraint: $n > 0$ .

On entry, $pda = ⟨ value ⟩$ .
Constraint: $pda \geq m$ .

On entry, state vector has been corrupted or not initialized.
NE_INTERNAL_ERROR: An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact NAG for assistance.
See Section 7.5 in the Introduction to the NAG Library CL Interface for further information.
NE_NO_LICENCE: Your licence key may have expired or may not have been installed correctly.
See Section 8 in the Introduction to the NAG Library CL Interface for further information.

7 Accuracy

The accuracy of algorithms that use f10dac depend on the extent to which the random projection,

Y

, captures the range (i.e., column space) of the matrix

A

. More formally, the following probabilistic error bound holds,

‖ (I - P_{Y}) A ‖ \leq \sqrt{1 + 7 n / k} \cdot σ_{k + 1}

with failure probability at most

O (k^{−1})

, and where

σ_{k + 1}

is the

(k + 1)

th singular value and the norm on the left-hand side of the equation is the spectral norm.

The matrix

(I - P_{Y})

is sometimes referred to as the orthogonal projector onto the complementary subspace,

{range (P_{Y})}^{⊥}

Informally,

Y

captures the range of

A

well if

A = P_{Y} A + (I - P_{Y}) A

is dominated by the first term,

P_{Y} A

See Section 8–11 of Halko et al. (2011) for more details.

8 Parallelism and Performance

Background information to multithreading can be found in the Multithreading documentation.

f10dac is threaded by NAG for parallel execution in multithreaded implementations of the NAG Library.

Please consult the X06 Chapter Introduction for information on how to control and interrogate the OpenMP environment used within this function. Please also consult the Users' Note for your implementation for any additional implementation-specific information.

9 Further Comments

f10dac uses the same DCT as c06rfc. The time taken by c06rfc is fastest if the only prime factors of the projected dimension are

2

3

5

. In order to make the performance of f10dac less sensitive to the size of the projected dimension, the input, a, is padded with zeros up to the nearest power of

2

or the nearest

1000

, whichever is smaller.

10 Example

This example applies a random projection using the DCT to the

6 \times 6

matrix

A = (\begin{array}{r} 2.27 & 0.28 & - 0.48 & 1.07 & - 2.35 & 0.62 \\ - 1.54 & - 1.67 & - 3.09 & 1.22 & 2.93 & - 7.39 \\ 1.15 & 0.94 & 0.99 & 0.79 & - 1.45 & 1.03 \\ - 1.94 & - 0.78 & - 0.21 & 0.63 & 2.30 & - 2.57 \\ - 1.94 & - 0.78 & - 0.21 & 0.63 & 2.30 & - 2.57 \\ - 1.94 & - 0.78 & - 0.21 & 0.63 & 2.30 & - 2.57 \end{array}) .

It then computes a

6 \times 5

orthornormal matrix

Q

and a probabilistic error bound,

E

that quantifies how well the range of

Q

approximates the range of

A

. The matrix

Q

is calculated through a

Q R

factorization of the projection

Y = A Ω

. The error bound is based on the following result from Section 4.3 of Halko et al. (2011). Given a sequence

{ω^{(i)} : i = 1, 2, \dots, r}

of standard Gaussian vectors, where

r

is some positive integer, then the measure,

E

, of the error in how well the projection captures the range of

A

is given by

‖ (I - P_{Y}) A ‖ = ‖ (I - Q Q^{*}) A ‖ \leq 10 \sqrt{\frac{2}{π}} \max_{i = 1, \dots, r} ‖ (I - Q Q^{*}) A ω^{(i)} ‖ = E, with probability ​ 1 - 10^{−r}

where

‖ \cdot ‖

is the spectral norm when applied to a matrix and the Euclidean norm when applied to a vector.

The exact approximation error

‖ (I - Q Q^{*}) A ‖

is also computed for comparison.

This example is constructed so that the true rank of

A

4

: the last

3

rows are identical so the matrix only has

4

linearly independent rows. This means that when

k \geq 4

the accuracy of

Q

will be close to machine precision.

f10da: FL CL CPP AD PY MB

NAG CL Interfacef10dac (randproj_​dct_​real)

▸▿ Contents

1 Purpose

2 Specification

3 Description

4 References

5 Arguments

6 Error Indicators and Warnings

7 Accuracy

8 Parallelism and Performance

9 Further Comments

10 Example

10.1 Program Text

10.2 Program Data

10.3 Program Results

NAG CL Interface
f10dac (randproj_dct_real)