hide long namesshow long names
hide short namesshow short names
Integer type:  int32  int64  nag_int  show int32  show int32  show int64  show int64  show nag_int  show nag_int

PDF version (NAG web site, 64-bit version, 64-bit version)
Chapter Contents
Chapter Introduction
NAG Toolbox

NAG Toolbox: nag_mv_factor (g03ca)

Purpose

nag_mv_factor (g03ca) computes the maximum likelihood estimates of the parameters of a factor analysis model. Either the data matrix or a correlation/covariance matrix may be input. Factor loadings, communalities and residual correlations are returned.

Syntax

[e, stat, com, psi, res, fl, ifail] = g03ca(matrix, n, x, nvar, isx, nfac, iop, 'm', m, 'wt', wt)
[e, stat, com, psi, res, fl, ifail] = nag_mv_factor(matrix, n, x, nvar, isx, nfac, iop, 'm', m, 'wt', wt)
Note: the interface to this routine has changed since earlier releases of the toolbox:
Mark 24: drop weight, wt optional
.

Description

Let pp variables, x1,x2,,xpx1,x2,,xp, with variance-covariance matrix ΣΣ be observed. The aim of factor analysis is to account for the covariances in these pp variables in terms of a smaller number, kk, of hypothetical variables, or factors, f1,f2,,fkf1,f2,,fk. These are assumed to be independent and to have unit variance. The relationship between the observed variables and the factors is given by the model:
k
xi = λijfj + ei,  i = 1,2,,p
j = 1
xi=j=1kλijfj+ei,  i=1,2,,p
where λijλij, for i = 1,2,,pi=1,2,,p and j = 1,2,,kj=1,2,,k, are the factor loadings and eiei, for i = 1,2,,pi=1,2,,p, are independent random variables with variances ψiψi, for i = 1,2,,pi=1,2,,p. The ψiψi represent the unique component of the variation of each observed variable. The proportion of variation for each variable accounted for by the factors is known as the communality. For this function it is assumed that both the kk factors and the eiei's follow independent Normal distributions.
The model for the variance-covariance matrix, ΣΣ, can be written as:
Σ = ΛΛT + Ψ
Σ=ΛΛT+Ψ
(1)
where ΛΛ is the matrix of the factor loadings, λijλij, and ΨΨ is a diagonal matrix of unique variances, ψiψi, for i = 1,2,,pi=1,2,,p.
The estimation of the parameters of the model, ΛΛ and ΨΨ, by maximum likelihood is described by Lawley and Maxwell (1971). The log-likelihood is:
(1/2)(n1)log(|Σ|)(1/2)(n1)trace(S,Σ1) + constant,
-12(n-1)log(|Σ|)-12(n-1)trace(S,Σ-1)+constant,
where nn is the number of observations, SS is the sample variance-covariance matrix or, if weights are used, SS is the weighted sample variance-covariance matrix and nn is the effective number of observations, that is, the sum of the weights. The constant is independent of the parameters of the model. A two stage maximization is employed. It makes use of the function F(Ψ)F(Ψ), which is, up to a constant, 2 / (n1)-2/(n-1) times the log-likelihood maximized over ΛΛ. This is then minimized with respect to ΨΨ to give the estimates, Ψ̂Ψ^, of ΨΨ. The function F(Ψ)F(Ψ) can be written as:
p
F(Ψ) = (θjlogθj)(pk)
j = k + 1
F(Ψ)=j=k+1p(θj-logθj)-(p-k)
where values θjθj, for j = 1,2,,pj=1,2,,p are the eigenvalues of the matrix:
S* = Ψ1 / 2SΨ1 / 2.
S*=Ψ-1/2SΨ-1/2.
The estimates Λ̂Λ^, of ΛΛ, are then given by scaling the eigenvectors of S*S*, which are denoted by VV:
Λ̂ = Ψ1 / 2V(ΘI)1 / 2.
Λ^=Ψ1/2V(Θ-I)1/2.
where ΘΘ is the diagonal matrix with elements θiθi, and II is the identity matrix.
The minimization of F(Ψ)F(Ψ) is performed using nag_opt_bounds_mod_deriv2_comp (e04lb) which uses a modified Newton algorithm. The computation of the Hessian matrix is described by Clark (1970). However, instead of using the eigenvalue decomposition of the matrix S*S* as described above, the singular value decomposition of the matrix RΨ1 / 2RΨ-1/2 is used, where RR is obtained either from the QRQR decomposition of the (scaled) mean centred data matrix or from the Cholesky decomposition of the correlation/covariance matrix. The function nag_opt_bounds_mod_deriv2_comp (e04lb) ensures that the values of ψiψi are greater than a given small positive quantity, δδ, so that the communality is always less than one. This avoids the so called Heywood cases.
In addition to the values of ΛΛ, ΨΨ and the communalities, nag_mv_factor (g03ca) returns the residual correlations, i.e., the off-diagonal elements of C(ΛΛT + Ψ)C-(ΛΛT+Ψ) where CC is the sample correlation matrix. nag_mv_factor (g03ca) also returns the test statistic:
χ2 = [n1(2p + 5) / 62k / 3]F(Ψ̂)
χ2=[n-1-(2p+5)/6-2k/3]F(Ψ^)
which can be used to test the goodness-of-fit of the model (1), see Lawley and Maxwell (1971) and Morrison (1967).

References

Clark M R B (1970) A rapidly convergent method for maximum likelihood factor analysis British J. Math. Statist. Psych.
Hammarling S (1985) The singular value decomposition in multivariate statistics SIGNUM Newsl. 20(3) 2–25
Lawley D N and Maxwell A E (1971) Factor Analysis as a Statistical Method (2nd Edition) Butterworths
Morrison D F (1967) Multivariate Statistical Methods McGraw–Hill

Parameters

Compulsory Input Parameters

1:     matrix – string (length ≥ 1)
Selects the type of matrix on which factor analysis is to be performed.
matrix = 'D'matrix='D'
The data matrix will be input in x and factor analysis will be computed for the correlation matrix.
matrix = 'S'matrix='S'
The data matrix will be input in x and factor analysis will be computed for the covariance matrix, i.e., the results are scaled as described in Section [Further Comments].
matrix = 'C'matrix='C'
The correlation/variance-covariance matrix will be input in x and factor analysis computed for this matrix.
Constraint: matrix = 'D'matrix='D', 'S''S' or 'C''C'.
2:     n – int64int32nag_int scalar
If matrix = 'D'matrix='D' or 'S''S' the number of observations in the data array x.
If matrix = 'C'matrix='C' the (effective) number of observations used in computing the (possibly weighted) correlation/variance-covariance matrix input in x.
Constraint: n > nvarn>nvar.
3:     x(ldx,m) – double array
ldx, the first dimension of the array, must satisfy the constraint
  • if matrix = 'D'matrix='D' or 'S''S', ldxnldxn;
  • if matrix = 'C'matrix='C', ldxmldxm.
The input matrix.
If matrix = 'D'matrix='D' or 'S''S', x must contain the data matrix, i.e., x(i,j)xij must contain the iith observation for the jjth variable, for i = 1,2,,ni=1,2,,n and j = 1,2,,mj=1,2,,m.
If matrix = 'C'matrix='C', x must contain the correlation or variance-covariance matrix. Only the upper triangular part is required.
4:     nvar – int64int32nag_int scalar
pp, the number of variables in the factor analysis.
Constraint: nvar2nvar2.
5:     isx(m) – int64int32nag_int array
m, the dimension of the array, must satisfy the constraint mnvarmnvar.
isx(j)isxj indicates whether or not the jjth variable is included in the factor analysis. If isx(j)1isxj1, the variable represented by the jjth column of x is included in the analysis; otherwise it is excluded, for j = 1,2,,mj=1,2,,m.
Constraint: isx(j) > 0isxj>0 for nvar values of jj.
6:     nfac – int64int32nag_int scalar
kk, the number of factors.
Constraint: 1nfacnvar1nfacnvar.
7:     iop(55) – int64int32nag_int array
Options for the optimization. There are four options to be set:
iprintiprint controls iteration monitoring;
  if iprint0iprint0, then there is no printing of information else if iprint > 0iprint>0, then information is printed at every iprint iterations. The information printed consists of the value of F(Ψ)F(Ψ) at that iteration, the number of evaluations of F(Ψ)F(Ψ), the current estimates of the communalities and an indication of whether or not they are at the boundary.
maxfunmaxfun the maximum number of function evaluations.
accacc the required accuracy for the estimates of ψiψi.
epseps a lower bound for the values of ψψ, see Section [Description].
Let ε = machine precisionε=machine precision then if iop(1) = 0iop1=0, then the following default values are used:
  • iprint = 1iprint=-1
  • maxfun = 100pmaxfun=100p
  • acc = 10sqrt(ε)acc=10ε
  • eps = εeps=ε
If iop(1)0iop10, then
  • iprint = iop(2)iprint=iop2
  • maxfun = iop(3)maxfun=iop3
  • acc = 10lacc=10-l where l = iop(4)l=iop4
  • eps = 10leps=10-l where l = iop(5)l=iop5
Constraint: if iop(1)0iop10, iop(i)iopi must be such that maxfun1maxfun1, εacc < 1εacc<1 and εeps < 1εeps<1, for i = 3,4,5i=3,4,5.

Optional Input Parameters

1:     m – int64int32nag_int scalar
Default: The dimension of the array isx and the second dimension of the array x. (An error is raised if these dimensions are not equal.)
The number of variables in the data/correlation/variance-covariance matrix.
Constraint: mnvarmnvar.
2:     wt( : :) – double array
Note: the dimension of the array wt must be at least nn if weight = 'W'weight='W' and matrix = 'D'matrix='D' or 'S''S', and at least 11 otherwise.
If weight = 'W'weight='W' and matrix = 'D'matrix='D' or 'S''S', wt must contain the weights to be used in the factor analysis. The effective number of observations in the analysis will then be the sum of weights. If wt(i) = 0.0wti=0.0, the iith observation is not included in the analysis.
If weight = 'U'weight='U' or matrix = 'C'matrix='C', wt is not referenced and the effective number of observations is nn.
Constraint: if weight = 'W'weight='W', the sum of weights > nvarthe sum of weights >nvar, wt(i)0.0wti0.0, for i = 1,2,,ni=1,2,,n.

Input Parameters Omitted from the MATLAB Interface

weight ldx ldfl iwk wk lwk

Output Parameters

1:     e(nvar) – double array
The eigenvalues θiθi, for i = 1,2,,pi=1,2,,p.
2:     stat(44) – double array
The test statistics.
stat(1)stat1
Contains the value F(Ψ̂)F(Ψ^).
stat(2)stat2
Contains the test statistic, χ2χ2.
stat(3)stat3
Contains the degrees of freedom associated with the test statistic.
stat(4)stat4
Contains the significance level.
3:     com(nvar) – double array
The communalities.
4:     psi(nvar) – double array
The estimates of ψiψi, for i = 1,2,,pi=1,2,,p.
5:     res(nvar × (nvar1) / 2nvar×(nvar-1)/2) – double array
The residual correlations. The residual correlation for the iith and jjth variables is stored in res((j1)(j2) / 2 + i)res((j-1)(j-2)/2+i), i < ji<j.
6:     fl(ldfl,nfac) – double array
ldflnvarldflnvar.
The factor loadings. fl(i,j)flij contains λijλij, for i = 1,2,,pi=1,2,,p and j = 1,2,,kj=1,2,,k.
7:     ifail – int64int32nag_int scalar
ifail = 0ifail=0 unless the function detects an error (see [Error Indicators and Warnings]).

Error Indicators and Warnings

Note: nag_mv_factor (g03ca) may return useful information for one or more of the following detected errors or warnings.
Errors or warnings detected by the function:

Cases prefixed with W are classified as warnings and do not generate an error of type NAG:error_n. See nag_issue_warnings.

  ifail = 1ifail=1
On entry,ldfl < nvarldfl<nvar,
ornvar < 2nvar<2,
ornnvarnnvar,
ornfac < 1nfac<1,
ornvar < nfacnvar<nfac,
orm < nvarm<nvar,
ormatrix = 'D'matrix='D' or 'S''S' and ldx < nldx<n,
ormatrix = 'C'matrix='C' and ldx < mldx<m,
ormatrix'D'matrix'D', 'S''S' or 'C''C',
ormatrix = 'D'matrix='D' or 'S''S' and weight'U'weight'U' or 'W''W',
oriop(1)0iop10 and iop(3)iop3 is such that maxfun < 1maxfun<1,
oriop(1)0iop10 and iop(4)iop4 is such that acc1.0acc1.0,
oriop(1)0iop10 and iop(4)iop4 is such that acc < machine precisionacc<machine precision,
oriop(1)0iop10 and iop(5)iop5 is such that eps1.0eps1.0,
oriop(1)0iop10 and iop(5)iop5 is such that eps < machine precisioneps<machine precision,
ormatrix = 'C'matrix='C' and lwk < (5 × nvar × nvar + 33 × nvar4) / 2lwk<(5×nvar×nvar+33×nvar-4)/2,
ormatrix = 'D'matrix='D' or 'S''S' and
lwk < max ((5 × nvar × nvar + 33 × nvar4) / 2,n × nvar + 7 × nvar + nvar × (nvar1) / 2)lwk<max((5×nvar×nvar+33×nvar-4)/2,n×nvar+7×nvar+ nvar×(nvar-1)/2).
  ifail = 2ifail=2
On entry,weight = 'W'weight='W' and a value of wt < 0.0wt<0.0.
  ifail = 3ifail=3
On entry, there are not exactly nvar elements of isx > 0isx>0, or the effective number of observations nvarnvar.
  ifail = 4ifail=4
On entry, matrix = 'D'matrix='D' or 'S''S' and the data matrix is not of full column rank, or matrix = 'C'matrix='C' and the input correlation/variance-covariance matrix is not positive definite.
This exit may also be caused by two of the eigenvalues of S*S* being equal; this is rare (see Lawley and Maxwell (1971)), and may be due to the data/correlation matrix being almost singular.
  ifail = 5ifail=5
A singular value decomposition has failed to converge. This is a very unlikely error exit.
  ifail = 6ifail=6
The estimation procedure has failed to converge in the given number of iterations. Change iop to either increase number of iterations maxfunmaxfun or increase the value of accacc.
W ifail = 7ifail=7
The convergence is not certain but a lower point could not be found. See nag_opt_bounds_mod_deriv2_comp (e04lb) for further details. In this case all results are computed.

Accuracy

The accuracy achieved is discussed in nag_opt_bounds_mod_deriv2_comp (e04lb) with the value of the parameter xtol given by accacc as described in parameter iop.

Further Comments

The factor loadings may be orthogonally rotated by using nag_mv_rot_orthomax (g03ba) and factor score coefficients can be computed using nag_mv_factor_score (g03cc). The maximum likelihood estimators are invariant to a change in scale. This means that the results obtained will be the same (up to a scaling factor) if either the correlation matrix or the variance-covariance matrix is used. As the correlation matrix ensures that all values of ψiψi are between 00 and 11 it will lead to a more efficient optimization. In the situation when the data matrix is input the results are always computed for the correlation matrix and then scaled if the results for the covariance matrix are required. When you input the covariance/correlation matrix the input matrix itself is used and you are advised to input the correlation matrix rather than the covariance matrix.

Example

function nag_mv_factor_example
matrix = 'C';
n = int64(211);
x = [1, 0.523, 0.395, 0.471, 0.346, 0.426, 0.576, 0.434, 0.639;
     0.523, 1, 0.479, 0.506, 0.418, 0.462, 0.547, 0.283, 0.645;
     0.395, 0.479, 1, 0.355, 0.27, 0.254, 0.452, 0.219, 0.504;
     0.471, 0.506, 0.355, 1, 0.691, 0.791, 0.443, 0.285, 0.505;
     0.346, 0.418, 0.27, 0.691, 1, 0.679, 0.383, 0.149, 0.409;
     0.426, 0.462, 0.254, 0.791, 0.679, 1, 0.372, 0.314, 0.472;
     0.576, 0.547, 0.452, 0.443, 0.383, 0.372, 1, 0.385, 0.68;
     0.434, 0.283, 0.219, 0.285, 0.149, 0.314, 0.385, 1, 0.47;
     0.639, 0.645, 0.504, 0.505, 0.409, 0.472, 0.68, 0.47, 1];
nvar = int64(9);
isx = [int64(1);1;1;1;1;1;1;1;1];
nfac = int64(3);
iop = [int64(1);-1;500;2;5];
[e, stat, com, psi, res, fl, ifail] = ...
    nag_mv_factor(matrix, n, x, nvar, isx, nfac, iop)
 

e =

   15.9681
    4.3577
    1.8474
    1.1560
    1.1190
    1.0271
    0.9257
    0.8951
    0.8771


stat =

    0.0350
    7.1494
   12.0000
    0.8476


com =

    0.5495
    0.5729
    0.3835
    0.7877
    0.6195
    0.8231
    0.6005
    0.5384
    0.7691


psi =

    0.4505
    0.4271
    0.6165
    0.2123
    0.3805
    0.1769
    0.3995
    0.4616
    0.2309


res =

    0.0004
   -0.0128
    0.0220
    0.0114
   -0.0053
    0.0231
   -0.0100
   -0.0194
   -0.0162
    0.0033
   -0.0046
    0.0113
   -0.0122
   -0.0009
   -0.0008
    0.0153
   -0.0216
   -0.0108
    0.0023
    0.0294
   -0.0123
   -0.0011
   -0.0105
    0.0134
    0.0054
   -0.0057
   -0.0009
    0.0032
   -0.0059
    0.0097
   -0.0049
   -0.0114
    0.0020
    0.0074
    0.0033
   -0.0012


fl =

    0.6642   -0.3209    0.0735
    0.6888   -0.2471   -0.1933
    0.4926   -0.3022   -0.2224
    0.8372    0.2924   -0.0354
    0.7050    0.3148   -0.1528
    0.8187    0.3767    0.1045
    0.6615   -0.3960   -0.0777
    0.4579   -0.2955    0.4913
    0.7657   -0.4274   -0.0117


ifail =

                    0


function g03ca_example
matrix = 'C';
n = int64(211);
x = [1, 0.523, 0.395, 0.471, 0.346, 0.426, 0.576, 0.434, 0.639;
     0.523, 1, 0.479, 0.506, 0.418, 0.462, 0.547, 0.283, 0.645;
     0.395, 0.479, 1, 0.355, 0.27, 0.254, 0.452, 0.219, 0.504;
     0.471, 0.506, 0.355, 1, 0.691, 0.791, 0.443, 0.285, 0.505;
     0.346, 0.418, 0.27, 0.691, 1, 0.679, 0.383, 0.149, 0.409;
     0.426, 0.462, 0.254, 0.791, 0.679, 1, 0.372, 0.314, 0.472;
     0.576, 0.547, 0.452, 0.443, 0.383, 0.372, 1, 0.385, 0.68;
     0.434, 0.283, 0.219, 0.285, 0.149, 0.314, 0.385, 1, 0.47;
     0.639, 0.645, 0.504, 0.505, 0.409, 0.472, 0.68, 0.47, 1];
nvar = int64(9);
isx = [int64(1);1;1;1;1;1;1;1;1];
nfac = int64(3);
iop = [int64(1);-1;500;2;5];
[e, stat, com, psi, res, fl, ifail] = ...
    g03ca(matrix, n, x, nvar, isx, nfac, iop)
 

e =

   15.9681
    4.3577
    1.8474
    1.1560
    1.1190
    1.0271
    0.9257
    0.8951
    0.8771


stat =

    0.0350
    7.1494
   12.0000
    0.8476


com =

    0.5495
    0.5729
    0.3835
    0.7877
    0.6195
    0.8231
    0.6005
    0.5384
    0.7691


psi =

    0.4505
    0.4271
    0.6165
    0.2123
    0.3805
    0.1769
    0.3995
    0.4616
    0.2309


res =

    0.0004
   -0.0128
    0.0220
    0.0114
   -0.0053
    0.0231
   -0.0100
   -0.0194
   -0.0162
    0.0033
   -0.0046
    0.0113
   -0.0122
   -0.0009
   -0.0008
    0.0153
   -0.0216
   -0.0108
    0.0023
    0.0294
   -0.0123
   -0.0011
   -0.0105
    0.0134
    0.0054
   -0.0057
   -0.0009
    0.0032
   -0.0059
    0.0097
   -0.0049
   -0.0114
    0.0020
    0.0074
    0.0033
   -0.0012


fl =

    0.6642   -0.3209    0.0735
    0.6888   -0.2471   -0.1933
    0.4926   -0.3022   -0.2224
    0.8372    0.2924   -0.0354
    0.7050    0.3148   -0.1528
    0.8187    0.3767    0.1045
    0.6615   -0.3960   -0.0777
    0.4579   -0.2955    0.4913
    0.7657   -0.4274   -0.0117


ifail =

                    0



PDF version (NAG web site, 64-bit version, 64-bit version)
Chapter Contents
Chapter Introduction
NAG Toolbox

© The Numerical Algorithms Group Ltd, Oxford, UK. 2009–2013