g03 Chapter Contents
g03 Chapter Introduction
NAG Library Manual

# NAG Library Function Documentnag_mv_gaussian_mixture (g03gac)

## 1  Purpose

nag_mv_gaussian_mixture (g03gac) performs a mixture of Normals (Gaussians) for a given (co)variance structure.

## 2  Specification

 #include #include
 void nag_mv_gaussian_mixture (Integer n, Integer m, const double x[], Integer pdx, const Integer isx[], Integer nvar, Integer ng, Nag_Boolean popt, double prob[], Integer tdprob, Integer *niter, Integer riter, double w[], double g[], Nag_VarCovar sopt, double s[], double f[], double tol, double *loglik, NagError *fail)

## 3  Description

A Normal (Gaussian) mixture model is a weighted sum of $k$ group Normal densities given by,
 $p x∣w,μ,Σ = ∑ j=1 k wj g x∣μj,Σj , x∈ℝp$
where:
• $x$ is a $p$-dimensional object of interest;
• ${w}_{j}$ is the mixture weight for the $j$th group and $\sum _{\mathit{j}=1}^{k}{w}_{j}=1$;
• ${\mu }_{j}$ is a $p$-dimensional vector of means for the $j$th group;
• ${\Sigma }_{j}$ is the covariance structure for the $j$th group;
• $g\left(·\right)$ is the $p$-variate Normal density:
 $g x∣μj,Σj = 1 2π p/2 Σj 1/2 exp - 12 x-μj Σ j -1 x-μj T .$
Optionally, the (co)variance structure may be pooled (common to all groups) or calculated for each group, and may be full or diagonal.

## 4  References

Hartigan J A (1975) Clustering Algorithms Wiley

## 5  Arguments

1:     nIntegerInput
On entry: $n$, the number of objects. There must be more objects than parameters in the model.
Constraints:
• if ${\mathbf{sopt}}=\mathrm{Nag_GroupCovar}$, ${\mathbf{n}}>{\mathbf{ng}}×\left({\mathbf{nvar}}×{\mathbf{nvar}}+{\mathbf{nvar}}\right)$;
• if ${\mathbf{sopt}}=\mathrm{Nag_PooledCovar}$, ${\mathbf{n}}>{\mathbf{nvar}}×\left({\mathbf{ng}}+{\mathbf{nvar}}\right)$;
• if ${\mathbf{sopt}}=\mathrm{Nag_GroupVar}$, ${\mathbf{n}}>2×{\mathbf{ng}}×{\mathbf{nvar}}$;
• if ${\mathbf{sopt}}=\mathrm{Nag_PooledVar}$, ${\mathbf{n}}>{\mathbf{nvar}}×\left({\mathbf{ng}}+1\right)$;
• if ${\mathbf{sopt}}=\mathrm{Nag_OverallVar}$, ${\mathbf{n}}>{\mathbf{nvar}}×{\mathbf{ng}}+1$.
2:     mIntegerInput
On entry: the total number of variables in array x.
Constraint: ${\mathbf{m}}\ge 1$.
3:     x[${\mathbf{n}}×{\mathbf{pdx}}$]const doubleInput
On entry: ${\mathbf{x}}\left[\left(\mathit{i}-1\right)×{\mathbf{pdx}}+\mathit{j}-1\right]$ must contain the value of the $\mathit{j}$th variable for the $\mathit{i}$th object, for $\mathit{i}=1,2,\dots ,{\mathbf{n}}$ and $\mathit{j}=1,2,\dots ,{\mathbf{m}}$.
4:     pdxIntegerInput
On entry: the stride separating matrix column elements in the array x.
Constraint: ${\mathbf{pdx}}\ge {\mathbf{m}}$.
5:     isx[m]const IntegerInput
On entry: if ${\mathbf{nvar}}={\mathbf{m}}$ all available variables are included in the model and isx is not referenced; otherwise the $j$th variable will be included in the analysis if ${\mathbf{isx}}\left[\mathit{j}-1\right]=1$ and excluded if ${\mathbf{isx}}\left[\mathit{j}-1\right]=0$, for $\mathit{j}=1,2,\dots ,{\mathbf{m}}$.
Constraint: if ${\mathbf{nvar}}\ne {\mathbf{m}}$, ${\mathbf{isx}}\left[\mathit{j}-1\right]=1$ for nvar values of $\mathit{j}$ and ${\mathbf{isx}}\left[\mathit{j}-1\right]=0$ for the remaining ${\mathbf{m}}-{\mathbf{nvar}}$ values of $\mathit{j}$, for $\mathit{j}=1,2,\dots ,{\mathbf{m}}$.
6:     nvarIntegerInput
On entry: $p$, the number of variables included in the calculations.
Constraint: $1\le {\mathbf{nvar}}\le {\mathbf{m}}$.
7:     ngIntegerInput
On entry: $k$, the number of groups in the mixture model.
Constraint: ${\mathbf{ng}}\ge 1$.
8:     poptNag_BooleanInput
On entry: if ${\mathbf{popt}}=\mathrm{Nag_TRUE}$, the initial membership probabilities in prob are set internally; otherwise these probabilities must be supplied.
9:     prob[${\mathbf{n}}×{\mathbf{tdprob}}$]doubleInput/Output
On entry: if ${\mathbf{popt}}\ne \mathrm{Nag_TRUE}$, ${\mathbf{prob}}\left[\left(i-1\right)×{\mathbf{tdprob}}+j-1\right]$ is the probability that the $i$th object belongs to the $j$th group. (These probabilities are normalised internally.)
On exit: ${\mathbf{prob}}\left[\left(i-1\right)×{\mathbf{tdprob}}+j-1\right]$ is the probability of membership of the $i$th object to the $j$th group for the fitted model.
10:   tdprobIntegerInput
On entry: the stride separating matrix column elements in the array prob.
Constraint: ${\mathbf{tdprob}}\ge {\mathbf{ng}}$.
11:   niterInteger *Input/Output
On entry: the maximum number of iterations.
Suggested value: $15$
On exit: the number of completed iterations.
Constraint: ${\mathbf{niter}}\ge 1$.
12:   riterIntegerInput
On entry: if ${\mathbf{riter}}>0$, membership probabilities are rounded to $0.0$ or $1.0$ after the completion of every riter iterations.
Suggested value: $5$
13:   w[ng]doubleOutput
On exit: ${w}_{j}$, the mixing probability for the $j$th group.
14:   g[${\mathbf{nvar}}×{\mathbf{ng}}$]doubleOutput
On exit: ${\mathbf{g}}\left[\left(i-1\right)×{\mathbf{ng}}+j-1\right]$ gives the estimated mean of the $i$th variable in the $j$th group.
15:   soptNag_VarCovarInput
On entry: determines the (co)variance structure:
${\mathbf{sopt}}=\mathrm{Nag_GroupCovar}$
Groupwise covariance matrices.
${\mathbf{sopt}}=\mathrm{Nag_PooledCovar}$
Pooled covariance matrix.
${\mathbf{sopt}}=\mathrm{Nag_GroupVar}$
Groupwise variances.
${\mathbf{sopt}}=\mathrm{Nag_PooledVar}$
Pooled variances.
${\mathbf{sopt}}=\mathrm{Nag_OverallVar}$
Overall variance.
Constraint: ${\mathbf{sopt}}=\mathrm{Nag_GroupCovar}$, $\mathrm{Nag_PooledCovar}$, $\mathrm{Nag_GroupVar}$, $\mathrm{Nag_PooledVar}$ or $\mathrm{Nag_OverallVar}$.
16:   s[$\mathit{dim}$]doubleOutput
Note: the dimension, dim, of the array s must be at least $\mathit{a}×\mathit{b}×\mathit{c}$.
Where ${\mathbf{S}}\left(i,j,k\right)$ appears in this document, it refers to the array element ${\mathbf{s}}\left[\left(k-1\right)×\mathit{a}×\mathit{b}+\left(j-1\right)×\mathit{a}+i-1\right]$.
On exit: if ${\mathbf{sopt}}=\mathrm{Nag_GroupCovar}$, ${\mathbf{S}}\left(i,j,k\right)$ gives the $\left(i,j\right)$th element of the $k$th group, with $a=b={\mathbf{nvar}}$ and $c={\mathbf{ng}}$.
If ${\mathbf{sopt}}=\mathrm{Nag_PooledCovar}$, ${\mathbf{S}}\left(i,j,1\right)$ gives the $\left(i,j\right)$th element of the pooled covariance, with $a=b={\mathbf{nvar}}$ and $c=1$.
If ${\mathbf{sopt}}=\mathrm{Nag_GroupVar}$, ${\mathbf{S}}\left(j,k,1\right)$ gives the $j$th variance in the $k$th group, with $a={\mathbf{nvar}}$, $b={\mathbf{ng}}$ and $c=1$.
If ${\mathbf{sopt}}=\mathrm{Nag_PooledVar}$, ${\mathbf{S}}\left(j,1,1\right)$ gives the $j$th pooled variance., with $a={\mathbf{nvar}}$ and $b=c=1$
If ${\mathbf{sopt}}=\mathrm{Nag_OverallVar}$, ${\mathbf{S}}\left(1,1,1\right)$ gives the overall variance, with $a=b=c=1$.
17:   f[${\mathbf{n}}×{\mathbf{ng}}$]doubleOutput
On exit: ${\mathbf{f}}\left[\left(i-1\right)×{\mathbf{ng}}+j-1\right]$ gives the $p$-variate Normal (Gaussian) density of the $i$th object in the $j$th group.
18:   toldoubleInput
On entry: iterations cease the first time an improvement in log-likelihood is less than tol. If ${\mathbf{tol}}\le 0$ a value of ${10}^{-3}$ is used.
19:   loglikdouble *Output
On exit: the log-likelihood for the fitted mixture model.
20:   failNagError *Input/Output
The NAG error argument (see Section 3.6 in the Essential Introduction).

## 6  Error Indicators and Warnings

NE_ALLOC_FAIL
Dynamic memory allocation failed.
NE_ARRAY_SIZE
On entry, ${\mathbf{pdx}}=⟨\mathit{\text{value}}⟩$ and ${\mathbf{n}}=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{pdx}}\ge {\mathbf{n}}$.
On entry, ${\mathbf{tdprob}}=⟨\mathit{\text{value}}⟩$ and ${\mathbf{n}}=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{tdprob}}\ge {\mathbf{n}}$.
On entry, argument $⟨\mathit{\text{value}}⟩$ had an illegal value.
NE_CLUSTER_EMPTY
An iteration cannot continue due to an empty group, try a different initial allocation.
NE_INT
On entry, ${\mathbf{m}}=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{m}}\ge 1$.
On entry, ${\mathbf{ng}}=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{ng}}\ge 1$.
On entry, ${\mathbf{niter}}=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{niter}}\ge 1$.
NE_INT_2
On entry, ${\mathbf{nvar}}=⟨\mathit{\text{value}}⟩$ and ${\mathbf{m}}=⟨\mathit{\text{value}}⟩$.
Constraint: $1\le {\mathbf{nvar}}\le {\mathbf{m}}$.
NE_INTERNAL_ERROR
An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact NAG for assistance.
NE_MAT_NOT_POS_DEF
A covariance matrix is not positive definite, try a different initial allocation.
NE_OBSERVATIONS
On entry, ${\mathbf{n}}=⟨\mathit{\text{value}}⟩$ and $p=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{n}}>p$, the number of parameters, i.e., too few objects have been supplied for the model.
NE_PROBABILITY
On entry, row $⟨\mathit{\text{value}}⟩$ of supplied prob does not sum to $1$.
NE_VAR_INCL_INDICATED
On entry, ${\mathbf{nvar}}\ne {\mathbf{m}}$ and isx is invalid.

Not applicable.

Not applicable.

None.

## 10  Example

This example fits a Gaussian mixture model with pooled covariance structure to New Haven schools test data, see Table 5.1 (p. 118) in Hartigan (1975).

### 10.1  Program Text

Program Text (g03gace.c)

### 10.2  Program Data

Program Data (g03gace.d)

### 10.3  Program Results

Program Results (g03gace.r)