g03caf computes the maximum likelihood estimates of the parameters of a factor analysis model. Either the data matrix or a correlation/covariance matrix may be input. Factor loadings, communalities and residual correlations are returned.

2 Specification

Fortran Interface

Subroutine g03caf (

matrix, weight, n, m, x, ldx, nvar, isx, nfac, wt, e, stat, com, psi, res, fl, ldfl, iop, iwk, wk, lwk, ifail)

Integer, Intent (In)	::	n, m, ldx, nvar, isx(m), nfac, ldfl, iop(5), lwk
Integer, Intent (Inout)	::	ifail
Integer, Intent (Out)	::	iwk(4*nvar+2)
Real (Kind=nag_wp), Intent (In)	::	x(ldx,m), wt(*)
Real (Kind=nag_wp), Intent (Inout)	::	fl(ldfl,nfac)
Real (Kind=nag_wp), Intent (Out)	::	e(nvar), stat(4), com(nvar), psi(nvar), res(nvar*(nvar-1)/2), wk(lwk)
Character (1), Intent (In)	::	matrix, weight

C Header Interface

#include <nag.h>

void

g03caf_ (const char *matrix, const char *weight, const Integer *n, const Integer *m, const double x[], const Integer *ldx, const Integer *nvar, const Integer isx[], const Integer *nfac, const double wt[], double e[], double stat[], double com[], double psi[], double res[], double fl[], const Integer *ldfl, const Integer iop[], Integer iwk[], double wk[], const Integer *lwk, Integer *ifail, const Charlen length_matrix, const Charlen length_weight)

The routine may be called by the names g03caf or nagf_mv_factor.

3 Description

Let

p

variables,

x_{1}, x_{2}, \dots, x_{p}

, with variance-covariance matrix

Σ

be observed. The aim of factor analysis is to account for the covariances in these

p

variables in terms of a smaller number,

k

, of hypothetical variables, or factors,

f_{1}, f_{2}, \dots, f_{k}

. These are assumed to be independent and to have unit variance. The relationship between the observed variables and the factors is given by the model:

x_{i} = \sum_{j = 1}^{k} λ_{i j} f_{j} + e_{i}, i = 1, 2, \dots, p

where

λ_{i j}

, for

i = 1, 2, \dots, p

and

j = 1, 2, \dots, k

, are the factor loadings and

e_{i}

, for

i = 1, 2, \dots, p

, are independent random variables with variances

ψ_{i}

, for

i = 1, 2, \dots, p

. The

ψ_{i}

represent the unique component of the variation of each observed variable. The proportion of variation for each variable accounted for by the factors is known as the communality. For this routine it is assumed that both the

k

factors and the

e_{i}

's follow independent Normal distributions.

The model for the variance-covariance matrix,

Σ

, can be written as:

Σ = Λ Λ^{T} + Ψ

(1)

where

Λ

is the matrix of the factor loadings,

λ_{i j}

, and

Ψ

is a diagonal matrix of unique variances,

ψ_{i}

, for

i = 1, 2, \dots, p

The estimation of the parameters of the model,

Λ

and

Ψ

, by maximum likelihood is described by Lawley and Maxwell (1971). The log-likelihood is:

- \frac{1}{2} (n - 1) \log (| Σ |) - \frac{1}{2} (n - 1) trace (S, Σ^{- 1}) + constant,

where

n

is the number of observations,

S

is the sample variance-covariance matrix or, if weights are used,

S

is the weighted sample variance-covariance matrix and

n

is the effective number of observations, that is, the sum of the weights. The constant is independent of the parameters of the model. A two stage maximization is employed. It makes use of the function

F (Ψ)

, which is, up to a constant,

−2 / (n - 1)

times the log-likelihood maximized over

Λ

. This is then minimized with respect to

Ψ

to give the estimates,

\hat{Ψ}

, of

Ψ

. The function

F (Ψ)

can be written as:

F (Ψ) = \sum_{j = k + 1}^{p} (θ_{j} - \log θ_{j}) - (p - k)

where values

θ_{j}

, for

j = 1, 2, \dots, p

are the eigenvalues of the matrix:

S^{*} = Ψ^{- 1 / 2} S Ψ^{- 1 / 2} .

The estimates

\hat{Λ}

, of

Λ

, are then given by scaling the eigenvectors of

S^{*}

, which are denoted by

V

\hat{Λ} = Ψ^{1 / 2} V {(Θ - I)}^{1 / 2} .

where

Θ

is the diagonal matrix with elements

θ_{i}

, and

I

is the identity matrix.

The minimization of

F (Ψ)

is performed using e04lbf which uses a modified Newton algorithm. The computation of the Hessian matrix is described by Clark (1970). However, instead of using the eigenvalue decomposition of the matrix

S^{*}

as described above, the singular value decomposition of the matrix

R Ψ^{- 1 / 2}

is used, where

R

is obtained either from the

Q R

decomposition of the (scaled) mean centred data matrix or from the Cholesky decomposition of the correlation/covariance matrix. The routine e04lbf ensures that the values of

ψ_{i}

are greater than a given small positive quantity,

δ

, so that the communality is always less than

1

. This avoids the so called Heywood cases.

In addition to the values of

Λ

Ψ

and the communalities, g03caf returns the residual correlations, i.e., the off-diagonal elements of

C - (Λ Λ^{T} + Ψ)

where

C

is the sample correlation matrix. g03caf also returns the test statistic:

χ^{2} = [n - 1 - (2 p + 5) / 6 - 2 k / 3] F (\hat{Ψ})

which can be used to test the goodness-of-fit of the model (1), see Lawley and Maxwell (1971) and Morrison (1967).

4 References

Clark M R B (1970) A rapidly convergent method for maximum likelihood factor analysis British J. Math. Statist. Psych.

Hammarling S (1985) The singular value decomposition in multivariate statistics SIGNUM Newsl. 20(3) 2–25

Lawley D N and Maxwell A E (1971) Factor Analysis as a Statistical Method (2nd Edition) Butterworths

Morrison D F (1967) Multivariate Statistical Methods McGraw–Hill

5 Arguments

1: $matrix$ – Character(1) Input

On entry: selects the type of matrix on which factor analysis is to be performed.

$matrix ='D'$: The data matrix will be input in x and factor analysis will be computed for the correlation matrix.
$matrix ='S'$: The data matrix will be input in x and factor analysis will be computed for the covariance matrix, i.e., the results are scaled as described in Section 9.
$matrix ='C'$: The correlation/variance-covariance matrix will be input in x and factor analysis computed for this matrix.

See Section 9.

Constraint:

matrix ='D'

'S'

'C'

2: $weight$ – Character(1) Input

On entry: if

matrix ='D'

'S'

, weight indicates if weights are to be used.

$weight ='U'$: No weights are used.
$weight ='W'$: Weights are used and must be supplied in wt.

Note: if

matrix ='C'

, weight is not referenced.

Constraint: if

matrix ='D'

'S'

weight ='U'

'W'

3: $n$ – Integer Input

On entry: if

matrix ='D'

'S'

the number of observations in the data array x.

matrix ='C'

the (effective) number of observations used in computing the (possibly weighted) correlation/variance-covariance matrix input in x.

Constraint:

n > nvar

4: $m$ – Integer Input

On entry: the number of variables in the data/correlation/variance-covariance matrix.

Constraint:

m \geq nvar

5: $x (ldx, m)$ – Real (Kind=nag_wp) array Input

On entry: the input matrix.

matrix ='D'

'S'

, x must contain the data matrix, i.e.,

x (i, j)

must contain the

i

th observation for the

j

th variable, for

i = 1, 2, \dots, n

and

j = 1, 2, \dots, m

matrix ='C'

, x must contain the correlation or variance-covariance matrix. Only the upper triangular part is required.

6: $ldx$ – Integer Input

On entry: the first dimension of the array x as declared in the (sub)program from which g03caf is called.

Constraints:

if $matrix ='D'$ or $'S'$ , $ldx \geq n$ ;
if $matrix ='C'$ , $ldx \geq m$ .

7: $nvar$ – Integer Input

On entry:

p

, the number of variables in the factor analysis.

Constraint:

nvar \geq 2

8: $isx (m)$ – Integer array Input

On entry:

isx (j)

indicates whether or not the

j

th variable is included in the factor analysis. If

isx (j) \geq 1

, the variable represented by the

j

th column of x is included in the analysis; otherwise it is excluded, for

j = 1, 2, \dots, m

Constraint:

isx (j) > 0

for nvar values of

j

9: $nfac$ – Integer Input

On entry:

k

, the number of factors.

Constraint:

1 \leq nfac \leq nvar

10: $wt (*)$ – Real (Kind=nag_wp) array Input

Note: the dimension of the array wt must be at least

n

weight ='W'

and

matrix ='D'

'S'

, and at least

1

otherwise.

On entry: if

weight ='W'

and

matrix ='D'

'S'

, wt must contain the weights to be used in the factor analysis. The effective number of observations in the analysis will then be the sum of weights. If

wt (i) = 0.0

, the

i

th observation is not included in the analysis.

weight ='U'

matrix ='C'

, wt is not referenced and the effective number of observations is

n

Constraints:

weight ='W'

$wt (i) \geq 0.0$ , for $i = 1, 2, \dots, n$ ;
$the sum of weights > nvar$ .

11: $e (nvar)$ – Real (Kind=nag_wp) array Output

On exit: the eigenvalues

θ_{i}

, for

i = 1, 2, \dots, p

12: $stat (4)$ – Real (Kind=nag_wp) array Output

On exit: the test statistics.

$stat (1)$: Contains the value $F (\hat{Ψ})$ .
$stat (2)$: Contains the test statistic, $χ^{2}$ .
$stat (3)$: Contains the degrees of freedom associated with the test statistic.
$stat (4)$: Contains the significance level.

13: $com (nvar)$ – Real (Kind=nag_wp) array Output

On exit: the communalities.

14: $psi (nvar)$ – Real (Kind=nag_wp) array Output

On exit: the estimates of

ψ_{i}

, for

i = 1, 2, \dots, p

15: $res (nvar \times (nvar - 1) / 2)$ – Real (Kind=nag_wp) array Output

On exit: the residual correlations. The residual correlation for the

i

th and

j

th variables is stored in

res ((j - 1) (j - 2) / 2 + i)

i < j

16: $fl (ldfl, nfac)$ – Real (Kind=nag_wp) array Output

On exit: the factor loadings.

fl (i, j)

contains

λ_{i j}

, for

i = 1, 2, \dots, p

and

j = 1, 2, \dots, k

17: $ldfl$ – Integer Input

On entry: the first dimension of the array fl as declared in the (sub)program from which g03caf is called.

Constraint:

ldfl \geq nvar

18: $iop (5)$ – Integer array Input

On entry: options for the optimization. There are four options to be set:

$iprint$	controls iteration monitoring;
	if $iprint \leq 0$ , there is no printing of information else if $iprint > 0$ , information is printed at every iprint iterations. The information printed consists of the value of $F (Ψ)$ at that iteration, the number of evaluations of $F (Ψ)$ , the current estimates of the communalities and an indication of whether or not they are at the boundary.
$maxfun$	the maximum number of function evaluations.
$acc$	the required accuracy for the estimates of $ψ_{i}$ .
$eps$	a lower bound for the values of $ψ$ , see Section 3.

Let

ε = machine precision

then if

iop (1) = 0

, the following default values are used:

$iprint = −1$
$maxfun = 100 p$
$acc = 10 \sqrt{ε}$
$eps = ε$

iop (1) \neq 0

, then

$iprint = iop (2)$
$maxfun = iop (3)$
$acc = 10^{- l}$ where $l = iop (4)$
$eps = 10^{- l}$ where $l = iop (5)$

Constraint: if

iop (1) \neq 0

iop (i)

must be such that

maxfun \geq 1

ε \leq acc < 1

and

ε \leq eps < 1

, for

i = 3, 4, 5

19: $iwk (4 \times nvar + 2)$ – Integer array Workspace

20: $wk (lwk)$ – Real (Kind=nag_wp) array Workspace

21: $lwk$ – Integer Input

On entry: the dimension of the array wk as declared in the (sub)program from which g03caf is called. The length of the workspace.

Constraints:

if $matrix ='D'$ or $'S'$ , $lwk \geq \max ((5 \times nvar \times nvar + 33 \times nvar - 4) / 2, n \times nvar + 7 \times nvar + nvar \times (nvar - 1) / 2)$ ;
if $matrix ='C'$ , $lwk \geq (5 \times nvar \times nvar + 33 \times nvar - 4) / 2$ .

22: $ifail$ – Integer Input/Output

On entry: ifail must be set to

0

−1

1

to set behaviour on detection of an error; these values have no effect when no error is detected.

A value of

0

causes the printing of an error message and program execution will be halted; otherwise program execution continues. A value of

−1

means that an error message is printed while a value of

1

means that it is not.

If halting is not appropriate, the value

−1

1

is recommended. If message printing is undesirable, then the value

1

is recommended. Otherwise, the value

−1

is recommended since useful values can be provided in some output arguments even when

ifail \neq 0

on exit. When the value $- 1$ or $1$ is used it is essential to test the value of ifail on exit.

On exit:

ifail = 0

unless the routine detects an error or a warning has been flagged (see Section 6).

6 Error Indicators and Warnings

If on entry

ifail = 0

−1

, explanatory error messages are output on the current error message unit (as defined by x04aaf).

Errors or warnings detected by the routine:

Note: in some cases g03caf may return useful information.

$ifail = 1$: On entry, $iop (1) \neq 1$ and $iop (3) = ⟨ value ⟩$ .
Constraint: $maxfun \geq 1$ .

On entry, $iop (1) \neq 1$ and $iop (4) = ⟨ value ⟩$ .
Constraint: $1 \leq acc \leq machine precision$ .

On entry, $iop (1) \neq 1$ and $iop (5) = ⟨ value ⟩$ .
Constraint: $1 \leq eps \leq machine precision$ .

On entry, $ldfl = ⟨ value ⟩$ and $nvar = ⟨ value ⟩$ .
Constraint: $ldfl \geq nvar$ .

On entry, $ldx = ⟨ value ⟩$ and $m = ⟨ value ⟩$ .
Constraint: $ldx \geq m$ .

On entry, $ldx = ⟨ value ⟩$ and $n = ⟨ value ⟩$ .
Constraint: $ldx \geq n$ .

On entry, $lwk = ⟨ value ⟩$ .
Constraint: $lwk \geq ⟨ value ⟩$ .

On entry, $m = ⟨ value ⟩$ and $nvar = ⟨ value ⟩$ .
Constraint: $m \geq nvar$ .

On entry, $matrix = ⟨ value ⟩$ .
Constraint: $matrix ='D'$ , $'S'$ or $'C'$ .

On entry, $n = ⟨ value ⟩$ and $nvar = ⟨ value ⟩$ .
Constraint: $n > nvar$ .

On entry, $nfac = ⟨ value ⟩$ .
Constraint: $nfac \geq 1$ .

On entry, $nfac = ⟨ value ⟩$ and $nvar = ⟨ value ⟩$ .
Constraint: $nfac \leq nvar$ .

On entry, $nvar = ⟨ value ⟩$ .
Constraint: $nvar > 1$ .

On entry, $weight = ⟨ value ⟩$ .
Constraint: when $matrix ='D'$ or $'S'$ , $weight ='U'$ or $'W'$ .

$ifail = 2$: On entry, $i = ⟨ value ⟩$ and $wt (i) < 0.0$ .
Constraint: $wt (i) \geq 0.0$ .

$ifail = 3$: On entry, $nvar = ⟨ value ⟩$ and $⟨ value ⟩$ values of $isx > 0$
Constraint: exactly nvar elements of $isx > 0$ .

The effective number of observations $\leq 1$ .

The number of variables $\geq$ number of included observations.

$ifail = 4$: On entry, the data matrix is not of full column rank or the input correlation/covariance matrix is not positive definite.

Two eigenvalues of $S^{*}$ are equal. This error exit is rare (see Lawley and Maxwell (1971)), and may be due to the data/correlation matrix being almost singular.

$ifail = 5$: The singular value decomposition has failed to converge. This is an unlikely error exit.

$ifail = 6$: The estimation procedure has failed to converge in $⟨ value ⟩$ iterations. Change iop to either increase the number of iterations $maxfun$ or increase the value of $acc$ .

$ifail = 7$: The convergence is not certain but a lower point could not be found. All results are computed.

$ifail = - 99$: An unexpected error has been triggered by this routine. Please contact NAG.
See Section 7 in the Introduction to the NAG Library FL Interface for further information.

$ifail = - 399$: Your licence key may have expired or may not have been installed correctly.
See Section 8 in the Introduction to the NAG Library FL Interface for further information.

$ifail = - 999$: Dynamic memory allocation failed.
See Section 9 in the Introduction to the NAG Library FL Interface for further information.

7 Accuracy

The accuracy achieved is discussed in e04lbf with the value of the argument xtol given by

acc

as described in parameter iop.

8 Parallelism and Performance

g03caf is threaded by NAG for parallel execution in multithreaded implementations of the NAG Library.

g03caf makes calls to BLAS and/or LAPACK routines, which may be threaded within the vendor library used by this implementation. Consult the documentation for the vendor library for further information.

Please consult the X06 Chapter Introduction for information on how to control and interrogate the OpenMP environment used within this routine. Please also consult the Users' Note for your implementation for any additional implementation-specific information.

9 Further Comments

The factor loadings may be orthogonally rotated by using g03baf and factor score coefficients can be computed using g03ccf. The maximum likelihood estimators are invariant to a change in scale. This means that the results obtained will be the same (up to a scaling factor) if either the correlation matrix or the variance-covariance matrix is used. As the correlation matrix ensures that all values of

ψ_{i}

are between

0

and

1

it will lead to a more efficient optimization. In the situation when the data matrix is input the results are always computed for the correlation matrix and then scaled if the results for the covariance matrix are required. When you input the covariance/correlation matrix the input matrix itself is used and you are advised to input the correlation matrix rather than the covariance matrix.

10 Example

This example is taken from Lawley and Maxwell (1971). The correlation matrix for nine variables is input and the parameters of a factor analysis model with three factors are estimated and printed.

g03ca: FL CL CPP AD

NAG FL Interfaceg03caf (factor)

▸▿ Contents

1 Purpose

2 Specification

3 Description

4 References

5 Arguments

6 Error Indicators and Warnings

7 Accuracy

8 Parallelism and Performance

9 Further Comments

10 Example

10.1 Program Text

10.2 Program Data

10.3 Program Results

NAG FL Interface
g03caf (factor)