naginterfaces.library.mv.multidimscal_ordinal¶

naginterfaces.library.mv.multidimscal_ordinal(d, x, typ='T', itera=0, iopt=0, io_manager=None)[source]¶

multidimscal_ordinal performs non-metric (ordinal) multidimensional scaling.

For full information please refer to the NAG Library document for g03fc

https://www.nag.com/numeric/nl/nagdoc_29.3/flhtml/g03/g03fcf.html

Parameters

dfloat, array-like, shape $(n \times (n - 1) / 2)$

The lower triangle of the distance matrix $D$ stored packed by rows. That is $d [(i - 1) \times (i - 2) / 2 + j - 1]$ must contain $d_{i j}$ , for $j = 1, 2, \dots, i - 1$ , for $i = 2, 3, \dots, n$ . If $d_{i j}$ is missing then set $d_{i j} < 0$ ; for further comments on missing values see Further Comments.

xfloat, array-like, shape $(n, ndim)$

The $i$ th row must contain an initial estimate of the coordinates for the $i$ th point, for $i = 1, 2, \dots, n$ . One method of computing these is to use multidimscal_metric().

typstr, length 1, optional

Indicates whether $STRESS$ or $SSTRESS$ is to be used as the criterion.

$t y p ='T'$

$STRESS$ is used.

$t y p ='S'$

$SSTRESS$ is used.

iteraint, optional

The maximum number of iterations in the optimization process.

$i t e r a = 0$

A default value of $50$ is used.

$i t e r a < 0$

A default value of $m a x (50, 5 n m)$ (the default for opt.uncon_conjgrd_comp) is used.

ioptint, optional

Selects the options, other than the number of iterations, that control the optimization.

$i o p t = 0$

The tolerance $ϵ$ is set to $0.00001$ (Accuracy). All other values are set as described in Further Comments.

$i o p t > 0$

The tolerance $ϵ$ is set to $10^{- i}$ where $i = i o p t$ . All other values are set as described in Further Comments.

$i o p t < 0$

No values are changed, therefore, the default values of opt.uncon_conjgrd_comp are used.

io_managerFileObjManager, optional

Manager for I/O in this routine.

Returns

xfloat, ndarray, shape $(n, ndim)$

The $i$ th row contains $m$ coordinates for the $i$ th point, for $i = 1, 2, \dots, n$ .

stressfloat

The value of $STRESS$ or $SSTRESS$ at the final iteration.

dfitfloat, ndarray, shape $(2 \times n \times (n - 1))$

Auxiliary outputs.

If $t y p ='T'$ , the first $n (n - 1) / 2$ elements contain the distances, $_{i j}$ , for the points returned in $x$ , the second set of $n (n - 1) / 2$ contains the distances $_{i j}$ ordered by the input distances, $d_{i j}$ , the third set of $n (n - 1) / 2$ elements contains the monotonic distances, $_{i j}$ , ordered by the input distances, $d_{i j}$ and the final set of $n (n - 1) / 2$ elements contains fitted monotonic distances, $_{i j}$ , for the points in $x$ .

The $_{i j}$ corresponding to distances which are input as missing are set to zero.

If $t y p ='S'$ , the results are as above except that the squared distances are returned.

Each distance matrix is stored in lower triangular packed form in the same way as the input matrix $D$ .

Raises

NagValueError

(errno $1$ )

On entry, $t y p = ⟨ v a l u e ⟩$ .

Constraint: $t y p ='S'$ or $'T'$ .

(errno $1$ )

On entry, $n = ⟨ v a l u e ⟩$ and $ndim = ⟨ v a l u e ⟩$ .

Constraint: $n > ndim$ .

(errno $1$ )

On entry, $ndim = ⟨ v a l u e ⟩$ .

Constraint: $ndim \geq 1$ .

(errno $2$ )

On entry, all the elements of $d \leq 0.0$ .

(errno $3$ )

The optimization has failed to converge in $i t e r a$ function iterations.

(errno $5$ )

The optimization cannot begin from the initial configuration.

(errno $6$ )

The optimization has failed.

Warns

NagAlgorithmicWarning

(errno $4$ ): The conditions for an acceptable solution have not been met but a lower point could not be found.

Notes

In the NAG Library the traditional C interface for this routine uses a different algorithmic base. Please contact NAG if you have any questions about compatibility.

For a set of $n$ objects, a distance or dissimilarity matrix $D$ can be calculated such that $d_{i j}$ is a measure of how ‘far apart’ the objects $i$ and $j$ are. If $p$ variables $x_{k}$ have been recorded for each observation this measure may be based on Euclidean distance, $d_{i j} = \sum_{k = 1}^{p} {(x_{k i} - x_{k j})}_{k i}^{2}$ , or some other calculation such as the number of variables for which $x_{k j} \neq x_{k i}$ . Alternatively, the distances may be the result of a subjective assessment. For a given distance matrix, multidimensional scaling produces a configuration of $n$ points in a chosen number of dimensions, $m$ , such that the distance between the points in some way best matches the distance matrix. For some distance measures, such as Euclidean distance, the size of distance is meaningful, for other measures of distance all that can be said is that one distance is greater or smaller than another. For the former metric scaling can be used, see multidimscal_metric(), for the latter, a non-metric scaling is more appropriate.

For non-metric multidimensional scaling, the criterion used to measure the closeness of the fitted distance matrix to the observed distance matrix is known as $STRESS$ . $STRESS$ is given by,

\sqrt{\frac{\sum_{i = 1}^{n} \sum_{j = 1}^{i - 1} {(_{i j} -_{i j})}^{2}}{\sum_{i = 1}^{n} \sum_{j = 1}^{i - 1} {_{i j}}^{2}}}

where ${_{i j}}^{2}$ is the Euclidean squared distance between points $i$ and $j$ and $_{i j}$ is the fitted distance obtained when $_{i j}$ is monotonically regressed on $d_{i j}$ , that is $_{i j}$ is monotonic relative to $d_{i j}$ and is obtained from $_{i j}$ with the smallest number of changes. So $STRESS$ is a measure of by how much the set of points preserve the order of the distances in the original distance matrix. Non-metric multidimensional scaling seeks to find the set of points that minimize the $STRESS$ .

An alternate measure is $SSTRESS$ ,

\sqrt{\frac{\sum_{i = 1}^{n} \sum_{j = 1}^{i - 1} {({_{i j}}^{2} - {_{i j}}^{2})}^{2}}{\sum_{i = 1}^{n} \sum_{j = 1}^{i - 1} {_{i j}}^{4}}}

in which the distances in $STRESS$ are replaced by squared distances.

In order to perform a non-metric scaling, an initial configuration of points is required. This can be obtained from principal coordinate analysis, see multidimscal_metric(). Given an initial configuration, multidimscal_ordinal uses the optimization function opt.uncon_conjgrd_comp to find the configuration of points that minimizes $STRESS$ or $SSTRESS$ . The function opt.uncon_conjgrd_comp uses a conjugate gradient algorithm. multidimscal_ordinal will find an optimum that may only be a local optimum, to be more sure of finding a global optimum several different initial configurations should be used; these can be obtained by randomly perturbing the original initial configuration using functions from submodule rand.

References

Chatfield, C and Collins, A J, 1980, Introduction to Multivariate Analysis, Chapman and Hall

Krzanowski, W J, 1990, Principles of Multivariate Analysis, Oxford University Press

NAG and Python

Return to Front

naginterfaces.library.mv.multidimscal_ordinal¶

naginterfaces.library.mv.multidimscal_​ordinal¶

naginterfaces.library.mv.multidimscal_ordinal¶