NAG Library Routine Document
G03FCF
1 Purpose
G03FCF performs nonmetric (ordinal) multidimensional scaling.
2 Specification
SUBROUTINE G03FCF ( 
TYP, N, NDIM, D, X, LDX, STRESS, DFIT, ITER, IOPT, WK, IWK, IFAIL) 
INTEGER 
N, NDIM, LDX, ITER, IOPT, IWK(N*(N1)/2+N*NDIM+5), IFAIL 
REAL (KIND=nag_wp) 
D(N*(N1)/2), X(LDX,NDIM), STRESS, DFIT(2*N*(N1)), WK(15*N*NDIM) 
CHARACTER(1) 
TYP 

3 Description
For a set of
$n$ objects, a distance or dissimilarity matrix
$D$ can be calculated such that
${d}_{ij}$ is a measure of how ‘far apart’ the objects
$i$ and
$j$ are. If
$p$ variables
${x}_{k}$ have been recorded for each observation this measure may be based on Euclidean distance,
${d}_{ij}={\displaystyle \sum _{k=1}^{p}}{\left({x}_{ki}{x}_{kj}\right)}^{2}$, or some other calculation such as the number of variables for which
${x}_{kj}\ne {x}_{ki}$. Alternatively, the distances may be the result of a subjective assessment. For a given distance matrix, multidimensional scaling produces a configuration of
$n$ points in a chosen number of dimensions,
$m$, such that the distance between the points in some way best matches the distance matrix. For some distance measures, such as Euclidean distance, the size of distance is meaningful, for other measures of distance all that can be said is that one distance is greater or smaller than another. For the former metric scaling can be used, see
G03FAF, for the latter, a nonmetric scaling is more appropriate.
For nonmetric multidimensional scaling, the criterion used to measure the closeness of the fitted distance matrix to the observed distance matrix is known as
STRESS.
STRESS is given by,
where
${\hat{{d}_{ij}}}^{2}$ is the Euclidean squared distance between points
$i$ and
$j$ and
$\stackrel{~}{{d}_{ij}}$ is the fitted distance obtained when
$\hat{{d}_{ij}}$ is monotonically regressed on
${d}_{ij}$, that is
$\stackrel{~}{{d}_{ij}}$ is monotonic relative to
${d}_{ij}$ and is obtained from
$\hat{{d}_{ij}}$ with the smallest number of changes. So
STRESS is a measure of by how much the set of points preserve the order of the distances in the original distance matrix. Nonmetric multidimensional scaling seeks to find the set of points that minimize the
STRESS.
An alternate measure is squared
STRESS,
$\mathit{sstress}$,
in which the distances in
STRESS are replaced by squared distances.
In order to perform a nonmetric scaling, an initial configuration of points is required. This can be obtained from principal coordinate analysis, see
G03FAF. Given an initial configuration, G03FCF uses the optimization routine
E04DGF/E04DGA to find the configuration of points that minimizes
STRESS or
$\mathit{sstress}$. The routine
E04DGF/E04DGA uses a conjugate gradient algorithm. G03FCF will find an optimum that may only be a local optimum, to be more sure of finding a global optimum several different initial configurations should be used; these can be obtained by randomly perturbing the original initial configuration using routines from
Chapter G05.
4 References
Chatfield C and Collins A J (1980) Introduction to Multivariate Analysis Chapman and Hall
Krzanowski W J (1990) Principles of Multivariate Analysis Oxford University Press
5 Parameters
 1: TYP – CHARACTER(1)Input
On entry: indicates whether
STRESS or
$\mathit{sstress}$ is to be used as the criterion.
 ${\mathbf{TYP}}=\text{'T'}$
 STRESS is used.
 ${\mathbf{TYP}}=\text{'S'}$
 $\mathit{sstress}$ is used.
Constraint:
${\mathbf{TYP}}=\text{'S'}$ or $\text{'T'}$.
 2: N – INTEGERInput
On entry: $n$, the number of objects in the distance matrix.
Constraint:
${\mathbf{N}}>{\mathbf{NDIM}}$.
 3: NDIM – INTEGERInput
On entry: $m$, the number of dimensions used to represent the data.
Constraint:
${\mathbf{NDIM}}\ge 1$.
 4: D(${\mathbf{N}}\times \left({\mathbf{N}}1\right)/2$) – REAL (KIND=nag_wp) arrayInput
On entry: the lower triangle of the distance matrix
$D$ stored packed by rows. That is
${\mathbf{D}}\left(\left(\mathit{i}1\right)\times \left(\mathit{i}2\right)/2+\mathit{j}\right)$ must contain
${d}_{\mathit{i}\mathit{j}}$, for
$\mathit{i}=2,3,\dots ,n$ and
$\mathit{j}=1,2,\dots ,\mathit{i}1$. If
${d}_{ij}$ is missing then set
${d}_{ij}<0$; for further comments on missing values see
Section 8.
 5: X(LDX,NDIM) – REAL (KIND=nag_wp) arrayInput/Output
On entry: the
$\mathit{i}$th row must contain an initial estimate of the coordinates for the
$\mathit{i}$th point, for
$\mathit{i}=1,2,\dots ,n$. One method of computing these is to use
G03FAF.
On exit: the
$\mathit{i}$th row contains $m$ coordinates for the $\mathit{i}$th point, for $\mathit{i}=1,2,\dots ,n$.
 6: LDX – INTEGERInput
On entry: the first dimension of the array
X as declared in the (sub)program from which G03FCF is called.
Constraint:
${\mathbf{LDX}}\ge {\mathbf{N}}$.
 7: STRESS – REAL (KIND=nag_wp)Output
On exit: the value of
STRESS or
$\mathit{sstress}$ at the final iteration.
 8: DFIT($2\times {\mathbf{N}}\times \left({\mathbf{N}}1\right)$) – REAL (KIND=nag_wp) arrayOutput
On exit: auxiliary outputs.
If
${\mathbf{TYP}}=\text{'T'}$, the first
$n\left(n1\right)/2$ elements contain the distances,
$\hat{{d}_{ij}}$, for the points returned in
X, the second set of
$n\left(n1\right)/2$ contains the distances
$\hat{{d}_{ij}}$ ordered by the input distances,
${d}_{ij}$, the third set of
$n\left(n1\right)/2$ elements contains the monotonic distances,
$\stackrel{~}{{d}_{ij}}$, ordered by the input distances,
${d}_{ij}$ and the final set of
$n\left(n1\right)/2$ elements contains fitted monotonic distances,
$\stackrel{~}{{d}_{ij}}$, for the points in
X. The
$\stackrel{~}{{d}_{ij}}$ corresponding to distances which are input as missing are set to zero.
If ${\mathbf{TYP}}=\text{'S'}$, the results are as above except that the squared distances are returned.
Each distance matrix is stored in lower triangular packed form in the same way as the input matrix $D$.
 9: ITER – INTEGERInput
On entry: the maximum number of iterations in the optimization process.
 ${\mathbf{ITER}}=0$
 A default value of $50$ is used.
 ${\mathbf{ITER}}<0$
 A default value of $\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(50,5nm\right)$ (the default for E04DGF/E04DGA) is used.
 10: IOPT – INTEGERInput
On entry: selects the options, other than the number of iterations, that control the optimization.
 ${\mathbf{IOPT}}=0$
 Default values are selected as described in Section 8. In particular if an accuracy requirement of $\epsilon =0.00001$ is selected, see Section 7.
 ${\mathbf{IOPT}}>0$
 The default values are used except that the accuracy is given by ${10}^{i}$ where $i={\mathbf{IOPT}}$.
 ${\mathbf{IOPT}}<0$
 The option setting mechanism of E04DGF/E04DGA can be used to set all options except Iteration Limit; this option is only recommended if you are an experienced user of NAG optimization routines. For further details see E04DGF/E04DGA.
 11: WK($15\times {\mathbf{N}}\times {\mathbf{NDIM}}$) – REAL (KIND=nag_wp) arrayWorkspace
 12: IWK(${\mathbf{N}}\times \left({\mathbf{N}}1\right)/2+{\mathbf{N}}\times {\mathbf{NDIM}}+5$) – INTEGER arrayWorkspace
 13: IFAIL – INTEGERInput/Output

On entry:
IFAIL must be set to
$0$,
$1\text{ or}1$. If you are unfamiliar with this parameter you should refer to
Section 3.3 in the Essential Introduction for details.
For environments where it might be inappropriate to halt program execution when an error is detected, the value
$1\text{ or}1$ is recommended. If the output of error messages is undesirable, then the value
$1$ is recommended. Otherwise, if you are not familiar with this parameter, the recommended value is
$0$.
When the value $\mathbf{1}\text{ or}\mathbf{1}$ is used it is essential to test the value of IFAIL on exit.
On exit:
${\mathbf{IFAIL}}={\mathbf{0}}$ unless the routine detects an error or a warning has been flagged (see
Section 6).
6 Error Indicators and Warnings
If on entry
${\mathbf{IFAIL}}={\mathbf{0}}$ or
${{\mathbf{1}}}$, explanatory error messages are output on the current error message unit (as defined by
X04AAF).
Errors or warnings detected by the routine:
 ${\mathbf{IFAIL}}=1$
On entry,  ${\mathbf{NDIM}}<1$, 
or  ${\mathbf{N}}\le {\mathbf{NDIM}}$, 
or  ${\mathbf{TYP}}\ne \text{'T'}$ or $\text{'S'}$, 
or  ${\mathbf{LDX}}<{\mathbf{N}}$. 
 ${\mathbf{IFAIL}}=2$
On entry,  all elements of ${\mathbf{D}}\le 0.0$. 
 ${\mathbf{IFAIL}}=3$
The optimization has failed to converge in
ITER function iterations. Try either increasing the number of iterations using
ITER or increasing the value of
$\epsilon $, given by
IOPT, used to determine convergence. Alternatively try a different starting configuration.
 ${\mathbf{IFAIL}}=4$
The conditions for an acceptable solution have not been met but a lower point could not be found. Try using a larger value of
$\epsilon $, given by
IOPT.
 ${\mathbf{IFAIL}}=5$
The optimization cannot begin from the initial configuration. Try a different set of points.
 ${\mathbf{IFAIL}}=6$
The optimization has failed. This error is only likely if
${\mathbf{IOPT}}<0$. It corresponds to
${\mathbf{IFAIL}}={\mathbf{4}}$,
${\mathbf{7}}$ and
${\mathbf{9}}$ in
E04DGF/E04DGA.
7 Accuracy
After a successful optimization the relative accuracy of
STRESS should be approximately
$\epsilon $, as specified by
IOPT.
The optimization routine
E04DGF/E04DGA used by G03FCF has a number of options to control the process. The options for the maximum number of iterations (
Iteration Limit) and accuracy (
Optimality Tolerance) can be controlled by
ITER and
IOPT respectively. The printing option (
Print Level) is set to
$1$ to give no printing. The other option set is to stop the checking of derivatives (
${\mathbf{Verify}}=\mathrm{NO}$) for efficiency. All other options are left at their default values. If however
${\mathbf{IOPT}}<0$ is used, only the maximum number of iterations is set. All other options can be controlled by the option setting mechanism of
E04DGF/E04DGA with the defaults as given by that routine.
Missing values in the input distance matrix can be specified by a negative value and providing there are not more than about two thirds of the values missing the algorithm may still work. However the routine
G03FAF does not allow for missing values so an alternative method of obtaining an initial set of coordinates is required. It may be possible to estimate the missing values with some form of average and then use
G03FAF to give an initial set of coordinates.
9 Example
The data, given by
Krzanowski (1990), are dissimilarities between water vole populations in Europe. Initial estimates are provided by the first two principal coordinates computed.
9.1 Program Text
Program Text (g03fcfe.f90)
9.2 Program Data
Program Data (g03fcfe.d)
9.3 Program Results
Program Results (g03fcfe.r)