g03 Chapter Contents
g03 Chapter Introduction
NAG C Library Manual

# NAG Library Function Documentnag_mv_factor (g03cac)

## 1  Purpose

nag_mv_factor (g03cac) computes the maximum likelihood estimates of the arguments of a factor analysis model. Either the data matrix or a correlation/covariance matrix may be input. Factor loadings, communalities and residual correlations are returned.

## 2  Specification

 #include #include
 void nag_mv_factor (Nag_FacMat matrix, Integer n, Integer m, const double x[], Integer tdx, Integer nvar, const Integer isx[], Integer nfac, const double wt[], double e[], double stat[], double com[], double psi[], double res[], double fl[], Integer tdfl, Nag_E04_Opt *options, double eps, NagError *fail)

## 3  Description

Let $p$ variables, ${x}_{1},{x}_{2},\dots ,{x}_{p}$, with variance-covariance matrix $\Sigma$ be observed. The aim of factor analysis is to account for the covariances in these $p$ variables in terms of a smaller number, $k$, of hypothetical variables, or factors, ${f}_{1},{f}_{2},\dots ,{f}_{k}$. These are assumed to be independent and to have unit variance. The relationship between the observed variables and the factors is given by the model:
 $x i = ∑ j=1 k λ ij f j + e i$
${\lambda }_{\mathit{i}\mathit{j}}$, for $\mathit{i}=1,2,\dots ,p$ and $\mathit{j}=1,2,\dots ,k$, are the factor loadings and ${e}_{\mathit{i}}$, for $\mathit{i}=1,2,\dots ,p$, are independent random variables with variances ${\psi }_{\mathit{i}}$, for $\mathit{i}=1,2,\dots ,p$. The ${\psi }_{i}$ represent the unique component of the variation of each observed variable. The proportion of variation for each variable accounted for by the factors is known as the communality. For this function it is assumed that both the $k$ factors and the ${e}_{i}$'s follow independent Normal distributions.
The model for the variance-covariance matrix, $\Sigma$, can be written as:
 $Σ = Λ ΛT + Ψ$ (1)
where $\Lambda$ is the matrix of the factor loadings, ${\lambda }_{ij}$, and $\Psi$ is a diagonal matrix of unique variances, ${\psi }_{\mathit{i}}$, for $\mathit{i}=1,2,\dots ,p$.
The estimation of the arguments of the model, $\Lambda$ and $\Psi$, by maximum likelihood is described by Lawley and Maxwell (1971). The log likelihood is:
 $- 1 2 n-1 log Σ - 1 2 n-1 trace S Σ -1 + constant,$
where $n$ is the number of observations, $S$ is the sample variance-covariance matrix or, if weights are used, $S$ is the weighted sample variance-covariance matrix and $n$ is the effective number of observations, that is, the sum of the weights. The constant is independent of the arguments of the model. A two stage maximization is employed. It makes use of the function $F\left(\Psi \right)$, which is, up to a constant, $-2/\left(n-1\right)$ times the log likelihood maximized over $\Lambda$. This is then minimized with respect to $\Psi$ to give the estimates, $\stackrel{^}{\Psi }$, of $\Psi$. The function $F\left(\Psi \right)$ can be written as:
 $F Ψ = ∑ j = k + 1 p θ j - log⁡θ j - p-k ,$
where values ${\theta }_{\mathit{j}}$, for $\mathit{j}=1,2,\dots ,p$ are the eigenvalues of the matrix:
 $S * = Ψ - 1 / 2 S Ψ - 1 / 2 .$
The estimates $\stackrel{^}{\Lambda }$, of $\Lambda$, are then given by scaling the eigenvectors of ${S}^{*}$, which are denoted by $V$:
 $Λ ^ = Ψ 1/2 V Θ-I 1/2 .$
where $\Theta$ is the diagonal matrix with elements ${\theta }_{i}$, and $I$ is the identity matrix.
The minimization of $F\left(\Psi \right)$ is performed using nag_opt_bounds_2nd_deriv (e04lbc) which uses a modified Newton algorithm. The computation of the Hessian matrix is described by Clark (1970). However, instead of using the eigenvalue decomposition of the matrix ${S}^{*}$ as described above, the singular value decomposition of the matrix $R{\Psi }^{-1/2}$ is used, where $R$ is obtained either from the $QR$ decomposition of the (scaled) mean-centred data matrix or from the Cholesky decomposition of the correlation/covariance matrix. The function nag_opt_bounds_2nd_deriv (e04lbc) ensures that the values of ${\psi }_{i}$ are greater than a given small positive quantity, $\delta$, so that the communality is always less than one. This avoids the so called Heywood cases.
In addition to the values of $\Lambda$, $\Psi$ and the communalities, nag_mv_factor (g03cac) returns the residual correlations, i.e., the off-diagonal elements of $C-\left(\Lambda {\Lambda }^{\mathrm{T}}+\Psi \right)$ where $C$ is the sample correlation matrix. nag_mv_factor (g03cac) also returns the test statistic:
 $χ 2 = n - 1 - 2 p + 5 / 6 - 2 k / 3 F Ψ ^$
which can be used to test the goodness-of-fit of the model (1), see Lawley and Maxwell (1971) and Morrison (1967).

## 4  References

Clark M R B (1970) A rapidly convergent method for maximum likelihood factor analysis British J. Math. Statist. Psych.
Hammarling S (1985) The singular value decomposition in multivariate statistics SIGNUM Newsl. 20(3) 2–25
Lawley D N and Maxwell A E (1971) Factor Analysis as a Statistical Method (2nd Edition) Butterworths
Morrison D F (1967) Multivariate Statistical Methods McGraw–Hill

## 5  Arguments

1:     matrixNag_FacMatInput
On entry: selects the type of matrix on which factor analysis is to be performed.
${\mathbf{matrix}}=\mathrm{Nag_DataCorr}$ (Data input)
The data matrix will be input in x and factor analysis will be computed for the correlation matrix.
${\mathbf{matrix}}=\mathrm{Nag_DataCovar}$
The data matrix will be input in x and factor analysis will be computed for the covariance matrix, i.e., the results are scaled as described in Section 8.
${\mathbf{matrix}}=\mathrm{Nag_MatCorr_Covar}$
The correlation/variance-covariance matrix will be input in x and factor analysis computed for this matrix.
Constraint: ${\mathbf{matrix}}=\mathrm{Nag_DataCorr}$, $\mathrm{Nag_DataCovar}$ or $\mathrm{Nag_MatCorr_Covar}$.
2:     nIntegerInput
On entry: if ${\mathbf{matrix}}=\mathrm{Nag_DataCorr}$ or $\mathrm{Nag_DataCovar}$ the number of observations in the data array x.
If ${\mathbf{matrix}}=\mathrm{Nag_MatCorr_Covar}$ the (effective) number of observations used in computing the (possibly weighted) correlation/variance-covariance matrix input in x.
Constraint: ${\mathbf{n}}>{\mathbf{nvar}}$.
3:     mIntegerInput
On entry: the number of variables in the data/correlation/variance-covariance matrix.
Constraint: ${\mathbf{m}}\ge {\mathbf{nvar}}$.
4:     x[$\mathit{dim1}×{\mathbf{tdx}}$]const doubleInput
On entry: the input matrix.
${\mathbf{matrix}}=\mathrm{Nag_DataCorr}$ or $\mathrm{Nag_DataCovar}$
x must contain the data matrix, i.e., ${\mathbf{x}}\left[\left(\mathit{i}-1\right)×{\mathbf{tdx}}+\mathit{j}-1\right]$ must contain the $\mathit{i}$th observation for the $\mathit{j}$th variable, for $\mathit{i}=1,2,\dots ,n$ and $\mathit{j}=1,2,\dots ,{\mathbf{m}}$.
${\mathbf{matrix}}=\mathrm{Nag_MatCorr_Covar}$
x must contain the correlation or variance-covariance matrix. Only the upper triangular part is required.
5:     tdxIntegerInput
On entry: the stride separating matrix column elements in the array x.
Constraint: ${\mathbf{tdx}}\ge {\mathbf{m}}$.
6:     nvarIntegerInput
On entry: the number of variables in the factor analysis, $p$.
Constraint: ${\mathbf{nvar}}\ge 2$.
7:     isx[m]const IntegerInput
On entry: ${\mathbf{isx}}\left[j-1\right]$ indicates whether or not the $j$th variable is to be included in the factor analysis.
If ${\mathbf{isx}}\left[\mathit{j}-1\right]\ge 1$, then the variable represented by the $\mathit{j}$th column of x is included in the analysis; otherwise it is excluded, for $\mathit{j}=1,2,\dots ,{\mathbf{m}}$.
Constraint: ${\mathbf{isx}}\left[j-1\right]>0$ for nvar values of $j$.
8:     nfacIntegerInput
On entry: the number of factors, $k$.
Constraint: $1\le {\mathbf{nfac}}\le {\mathbf{nvar}}$.
9:     wt[n]const doubleInput
On entry: if ${\mathbf{matrix}}=\mathrm{Nag_DataCorr}$ or $\mathrm{Nag_DataCovar}$ then the elements of wt must contain the weights to be used in the factor analysis. The effective number of observations is the sum of the weights. If ${\mathbf{wt}}\left[i-1\right]=0.0$ then the $i$th observation is not included in the analysis.
If ${\mathbf{matrix}}=\mathrm{Nag_MatCorr_Covar}$ or wt is set to the null pointer NULL, i.e., (double *)0, then wt is not referenced and the effective number of observations is $n$.
Constraint: if wt is referenced, then ${\mathbf{wt}}\left[i-1\right]\ge 0$ for $i=1,2,\dots ,n$, and the sum of the weights $>{\mathbf{nvar}}$.
10:   e[nvar]doubleOutput
On exit: the eigenvalues ${\theta }_{\mathit{i}}$, for $\mathit{i}=1,2,\dots ,p$.
11:   stat[$4$]doubleOutput
On exit: the test statistics.
${\mathbf{stat}}\left[0\right]$ contains the value $F\left(\stackrel{^}{\Psi }\right)$.
${\mathbf{stat}}\left[1\right]$ contains the test statistic, ${\chi }^{2}$.
${\mathbf{stat}}\left[2\right]$ contains the degrees of freedom associated with the test statistic.
${\mathbf{stat}}\left[3\right]$ contains the significance level.
12:   com[nvar]doubleOutput
On exit: the communalities.
13:   psi[nvar]doubleOutput
On exit: the estimates of ${\psi }_{\mathit{i}}$, for $\mathit{i}=1,2,\dots ,p$.
14:   res[${\mathbf{nvar}}×\left({\mathbf{nvar}}-1\right)/2$]doubleOutput
On exit: the residual correlations. The residual correlation for the $i$th and $j$th variables is stored in ${\mathbf{res}}\left[\left(j-1\right)\left(j-2\right)/2+i-1\right]$, $i.
15:   fl[${\mathbf{nvar}}×{\mathbf{tdfl}}$]doubleOutput
On exit: the factor loadings. ${\mathbf{fl}}\left[\left(\mathit{i}-1\right)×{\mathbf{tdfl}}+\mathit{j}-1\right]$ contains ${\lambda }_{\mathit{i}\mathit{j}}$, for $\mathit{i}=1,2,\dots ,p$ and $\mathit{j}=1,2,\dots ,k$.
16:   tdflIntegerInput
On entry: the stride separating matrix column elements in the array fl.
Constraint: ${\mathbf{tdfl}}\ge {\mathbf{nfac}}$.
17:   optionsNag_E04_Opt *Input/Output
On entry/exit: a pointer to a structure of type Nag_E04_Opt whose members are optional arguments for nag_opt_bounds_2nd_deriv (e04lbc). These structure members offer the means of adjusting some of the argument values of the algorithm.
If the optional arguments are not required the NAG defined null pointer, E04_DEFAULT, can be used in the function call. See the document for nag_opt_bounds_2nd_deriv (e04lbc) for further details.
18:   epsdoubleInput
On entry: a lower bound for the value of ${\Psi }_{i}$.
Constraint: .
19:   failNagError *Input/Output
The NAG error argument (see Section 3.6 in the Essential Introduction).

## 6  Error Indicators and Warnings

NE_2_INT_ARG_GT
On entry, ${\mathbf{nfac}}=〈\mathit{\text{value}}〉$ while ${\mathbf{nvar}}=〈\mathit{\text{value}}〉$. These arguments must satisfy ${\mathbf{nfac}}\le {\mathbf{nvar}}$.
NE_2_INT_ARG_LE
On entry, ${\mathbf{n}}=〈\mathit{\text{value}}〉$ while ${\mathbf{nvar}}=〈\mathit{\text{value}}〉$. These arguments must satisfy ${\mathbf{n}}>{\mathbf{nvar}}$.
NE_2_INT_ARG_LT
On entry, ${\mathbf{m}}=〈\mathit{\text{value}}〉$ while ${\mathbf{nvar}}=〈\mathit{\text{value}}〉$. These arguments must satisfy ${\mathbf{m}}\ge {\mathbf{nvar}}$.
On entry, ${\mathbf{tdfl}}=〈\mathit{\text{value}}〉$ while ${\mathbf{nfac}}=〈\mathit{\text{value}}〉$. These arguments must satisfy ${\mathbf{tdfl}}\ge {\mathbf{nfac}}$.
On entry, ${\mathbf{tdx}}=〈\mathit{\text{value}}〉$ while ${\mathbf{m}}=〈\mathit{\text{value}}〉$. These arguments must satisfy ${\mathbf{tdx}}\ge {\mathbf{m}}$.
NE_2_REAL_ARG_LT
On entry, ${\mathbf{step_max}}=〈\mathit{\text{value}}〉$ while ${\mathbf{optim_tol}}=〈\mathit{\text{value}}〉$. These arguments must satisfy ${\mathbf{step_max}}\ge {\mathbf{optim_tol}}$.
NE_ALLOC_FAIL
Dynamic memory allocation failed.
On entry, argument matrix had an illegal value.
On entry, argument ${\mathbf{print_level}}$ had an illegal value.
NE_INT_ARG_LT
On entry, ${\mathbf{nfac}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{nfac}}\ge 1$.
On entry, ${\mathbf{nvar}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{nvar}}\ge 2$.
NE_INTERNAL_ERROR
Additional error messages are output if the optimization fails to converge or if the options are set incorrectly. Details of these can be found in the nag_opt_bounds_2nd_deriv (e04lbc) document.
An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact NAG for assistance.
NE_INVALID_INT_RANGE_1
Value $〈\mathit{\text{value}}〉$ given to ${\mathbf{max_iter}}$ is not valid. Correct range is ${\mathbf{max_iter}}\ge 0$.
NE_INVALID_REAL_RANGE_EF
Value $〈\mathit{\text{value}}〉$ given to eps is not valid. Correct range is machine precision $\le {\mathbf{optim_tol}}<1.0$.
NE_INVALID_REAL_RANGE_FF
Value $〈\mathit{\text{value}}〉$ given to ${\mathbf{linesearch_tol}}$ is not valid. Correct range is $0.0\le {\mathbf{linesearch_tol}}<1.0$.
NE_MAT_RANK
On entry, ${\mathbf{matrix}}=\mathrm{Nag_DataCorr}$ or ${\mathbf{matrix}}=\mathrm{Nag_DataCovar}$ and the data matrix is not of full column rank, or ${\mathbf{matrix}}=\mathrm{Nag_MatCorr_Covar}$ and the input correlation/variance-covariance matrix is not positive definite. This exit may also be caused by two of the eigenvalues of ${S}^{*}$ being equal; this is rare (see Lawley and Maxwell (1971)) and may be due to the data/correlation matrix being almost singular.
NE_NEG_WEIGHT_ELEMENT
On entry, ${\mathbf{wt}}\left[〈\mathit{\text{value}}〉\right]=〈\mathit{\text{value}}〉$.
Constraint: when referenced, all elements of wt must be non-negative.
NE_NOT_APPEND_FILE
Cannot open file $〈\mathit{string}〉$ for appending.
NE_NOT_CLOSE_FILE
Cannot close file $〈\mathit{string}〉$.
NE_OBSERV_LT_VAR
With weighted data, the effective number of observations given by the sum of weights $\text{}=〈\mathit{\text{value}}〉$, while the number of variables included in the analysis, ${\mathbf{nvar}}=〈\mathit{\text{value}}〉$.
Constraint: effective number of observations $>{\mathbf{nvar}}+1$.
NE_OPT_NOT_INIT
Options structure not initialized.
NE_SVD_NOT_CONV
A singular value decomposition has failed to converge. This is a very unlikely error exit.
NE_VAR_INCL_INDICATED
The number of variables, nvar in the analysis $\text{}=〈\mathit{\text{value}}〉$, while number of variables included in the analysis via array ${\mathbf{isx}}=〈\mathit{\text{value}}〉$.
Constraint: these two numbers must be the same.
NW_COND_MIN
The conditions for a minimum have not all been satisfied but a lower point could not be found. Note that in this case all the results are computed. See nag_opt_bounds_2nd_deriv (e04lbc) for further details.
NW_TOO_MANY_ITER
The maximum number of iterations, $〈\mathit{\text{value}}〉$, have been performed.

## 7  Accuracy

The accuracy achieved is discussed in nag_opt_bounds_2nd_deriv (e04lbc).

The factor loadings may be orthogonally rotated by using nag_mv_orthomax (g03bac) and factor score coefficients can be computed using nag_mv_fac_score (g03ccc). The maximum likelihood estimators are invariant to a change in scale. This means that the results obtained will be the same (up to a scaling factor) if either the correlation matrix or the variance-covariance matrix is used. As the correlation matrix ensures that all values of ${\psi }_{i}$ are between 0 and 1 it will lead to a more efficient optimization. In the situation when the data matrix is input the results are always computed for the correlation matrix and then scaled if the results for the covariance matrix are required. When you input the covariance/correlation matrix the input matrix itself is used and so you are advised to input the correlation matrix rather than the covariance matrix.

## 9  Example

The example is taken from Lawley and Maxwell (1971). The correlation matrix for nine variables is input and the arguments of a factor analysis model with three factors are estimated and printed.

### 9.1  Program Text

Program Text (g03cace.c)

### 9.2  Program Data

Program Data (g03cace.d)

### 9.3  Program Results

Program Results (g03cace.r)