H Chapter Contents
H Chapter Introduction
NAG Library Manual

# NAG Library Routine DocumentH05ABF

Note:  before using this routine, please read the Users' Note for your implementation to check the interpretation of bold italicised terms and other implementation-dependent details.

## 1  Purpose

Given a set of $m$ features and a scoring mechanism for any subset of those features, H05ABF selects the best $n$ subsets of size $p$ using a direct communication branch and bound algorithm.

## 2  Specification

 SUBROUTINE H05ABF ( MINCR, M, IP, NBEST, LA, BSCORE, BZ, F, MINCNT, GAMMA, ACC, IUSER, RUSER, IFAIL)
 INTEGER MINCR, M, IP, NBEST, LA, BZ(M-IP,NBEST), MINCNT, IUSER(*), IFAIL REAL (KIND=nag_wp) BSCORE(NBEST), GAMMA, ACC(2), RUSER(*) EXTERNAL F

## 3  Description

Given $\Omega =\left\{{x}_{i}:i\in ℤ,1\le i\le m\right\}$, a set of $m$ unique features and a scoring mechanism $f\left(S\right)$ defined for all $S\subseteq \Omega$ then H05ABF is designed to find ${S}_{o1}\subseteq \Omega ,\left|{S}_{o1}\right|=p$, an optimal subset of size $p$. Here $\left|{S}_{o1}\right|$ denotes the cardinality of ${S}_{o1}$, the number of elements in the set.
The definition of the optimal subset depends on the properties of the scoring mechanism, if
 $fSi ≤ fSj , for all ​ Sj ⊆ Ω ​ and ​ Si ⊆ Sj$ (1)
then the optimal subset is defined as one of the solutions to
 $maximize S⊆Ω fS subject to S = p$
else if
 $f Si ≥ fSj , for all ​ Sj ⊆ Ω ​ and ​ Si ⊆ Sj$ (2)
then the optimal subset is defined as one of the solutions to
 $minimize S ⊆ Ω fS subject to S = p .$
If neither of these properties hold then H05ABF cannot be used.
As well as returning the optimal subset, ${S}_{o1}$, H05ABF can return the best $n$ solutions of size $p$. If ${S}_{o\mathit{i}}$ denotes the $\mathit{i}$th best subset, for $\mathit{i}=1,2,\dots ,n-1$, then the $\left(i+1\right)$th best subset is defined as the solution to either
 $maximize S ⊆ Ω - Soj : j∈ℤ , 1≤j≤i fS subject to S = p$
or
 $minimize S ⊆ Ω - Soj : j∈ℤ,1≤j≤i fS subject to S = p$
depending on the properties of $f$.
The solutions are found using a branch and bound method, where each node of the tree is a subset of $\Omega$. Assuming that (1) holds then a particular node, defined by subset ${S}_{i}$, can be trimmed from the tree if $f\left({S}_{i}\right)<\stackrel{^}{f}\left({S}_{on}\right)$ where $\stackrel{^}{f}\left({S}_{on}\right)$ is the $n$th highest score we have observed so far for a subset of size $p$, i.e., our current best guess of the score for the $n$th best subset. In addition, because of (1) we can also drop all nodes defined by any subset ${S}_{j}$ where ${S}_{j}\subseteq {S}_{i}$, thus avoiding the need to enumerate the whole tree. Similar short cuts can be taken if (2) holds. A full description of this branch and bound algorithm can be found in Ridout (1988).
Rather than calculate the score at a given node of the tree H05ABF utilizes the fast branch and bound algorithm of Somol et al. (2004), and attempts to estimate the score where possible. For each feature, ${x}_{i}$, two values are stored, a count ${c}_{i}$ and ${\stackrel{^}{\mu }}_{i}$, an estimate of the contribution of that feature. An initial value of zero is used for both ${c}_{i}$ and ${\stackrel{^}{\mu }}_{i}$. At any stage of the algorithm where both $f\left(S\right)$ and $f\left(S-\left\{{x}_{i}\right\}\right)$ have been calculated (as opposed to estimated), the estimated contribution of the feature ${x}_{i}$ is updated to
 $ciμ^i + f S - f S - xj ci+1$
and ${c}_{i}$ is incremented by $1$, therefore at each stage ${\stackrel{^}{\mu }}_{i}$ is the mean contribution of ${x}_{i}$ observed so far and ${c}_{i}$ is the number of observations used to calculate that mean.
As long as ${c}_{i}\ge k$, for the user-supplied constant $k$, then rather than calculating $f\left(S-\left\{{x}_{i}\right\}\right)$ this routine estimates it using $\stackrel{^}{f}\left(S-\left\{{x}_{i}\right\}\right)=f\left(S\right)-\gamma {\stackrel{^}{\mu }}_{i}$ or $\stackrel{^}{f}\left(S\right)-\gamma {\stackrel{^}{\mu }}_{i}$ if $f\left(S\right)$ has been estimated, where $\gamma$ is a user-supplied scaling factor. An estimated score is never used to trim a node or returned as the optimal score.
Setting $k=0$ in this routine will cause the algorithm to always calculate the scores, returning to the branch and bound algorithm of Ridout (1988). In most cases it is preferable to use the fast branch and bound algorithm, by setting $k>0$, unless the score function is iterative in nature, i.e., $f\left(S\right)$ must have been calculated before $f\left(S-\left\{{x}_{i}\right\}\right)$ can be calculated.
H05ABF is a direct communication version of H05AAF.

## 4  References

Narendra P M and Fukunaga K (1977) A branch and bound algorithm for feature subset selection IEEE Transactions on Computers 9 917–922
Ridout M S (1988) Algorithm AS 233: An improved branch and bound algorithm for feature subset selection Journal of the Royal Statistics Society, Series C (Applied Statistics) (Volume 37) 1 139–147
Somol P, Pudil P and Kittler J (2004) Fast branch and bound algorithms for optimal feature selection IEEE Transactions on Pattern Analysis and Machine Intelligence (Volume 26) 7 900–912

## 5  Parameters

1:     MINCR – INTEGERInput
On entry: flag indicating whether the scoring function $f$ is increasing or decreasing.
${\mathbf{MINCR}}=1$
$f\left({S}_{i}\right)\le f\left({S}_{j}\right)$, i.e., the subsets with the largest score will be selected.
${\mathbf{MINCR}}=0$
$f\left({S}_{i}\right)\ge f\left({S}_{j}\right)$, i.e., the subsets with the smallest score will be selected.
For all ${S}_{j}\subseteq \Omega$ and ${S}_{i}\subseteq {S}_{j}$.
Constraint: ${\mathbf{MINCR}}=0$ or $1$.
2:     M – INTEGERInput
On entry: $m$, the number of features in the full feature set.
Constraint: ${\mathbf{M}}\ge 2$.
3:     IP – INTEGERInput
On entry: $p$, the number of features in the subset of interest.
Constraint: $1\le {\mathbf{IP}}\le {\mathbf{M}}$.
4:     NBEST – INTEGERInput
On entry: $n$, the maximum number of best subsets required. The actual number of subsets returned is given by LA on final exit. If on final exit ${\mathbf{LA}}\ne {\mathbf{NBEST}}$ then ${\mathbf{IFAIL}}={\mathbf{42}}$ is returned.
Constraint: ${\mathbf{NBEST}}\ge 1$.
5:     LA – INTEGEROutput
On exit: the number of best subsets returned.
6:     BSCORE(NBEST) – REAL (KIND=nag_wp) arrayOutput
On exit: holds the score for the LA best subsets returned in BZ.
7:     BZ(${\mathbf{M}}-{\mathbf{IP}}$,NBEST) – INTEGER arrayOutput
On exit: the $j$th best subset is constructed by dropping the features specified in ${\mathbf{BZ}}\left(\mathit{i},\mathit{j}\right)$, for $\mathit{i}=1,2,\dots ,{\mathbf{M}}-{\mathbf{IP}}$ and $\mathit{j}=1,2,\dots ,{\mathbf{LA}}$, from the set of all features, $\Omega$. The score for the $j$th best subset is given in ${\mathbf{BSCORE}}\left(j\right)$.
8:     F – SUBROUTINE, supplied by the user.External Procedure
F must evaluate the scoring function $f$.
The specification of F is:
 SUBROUTINE F ( M, DROP, LZ, Z, LA, A, SCORE, IUSER, RUSER, INFO)
 INTEGER M, DROP, LZ, Z(LZ), LA, A(LA), IUSER(*), INFO REAL (KIND=nag_wp) SCORE(max(LA,1)), RUSER(*)
1:     M – INTEGERInput
On entry: $m=\left|\Omega \right|$, the number of features in the full feature set.
2:     DROP – INTEGERInput
On entry: flag indicating whether the intermediate subsets should be constructed by dropping features from the full set (${\mathbf{DROP}}=1$) or adding features to the empty set (${\mathbf{DROP}}=0$). See SCORE for additional details.
3:     LZ – INTEGERInput
On entry: the number of features stored in Z.
4:     Z(LZ) – INTEGER arrayInput
On entry: ${\mathbf{Z}}\left(\mathit{i}\right)$, for $\mathit{i}=1,2,\dots ,{\mathbf{LZ}}$, contains the list of features which, along with those specified in A, define the subsets whose score is required. See SCORE for additional details.
5:     LA – INTEGERInput
On entry: if ${\mathbf{LA}}>0$, the number of subsets for which a score must be returned.
If ${\mathbf{LA}}=0$, the score for a single subset should be returned. See SCORE for additional details.
6:     A(LA) – INTEGER arrayInput
On entry: ${\mathbf{A}}\left(\mathit{j}\right)$, for $\mathit{j}=1,2,\dots ,{\mathbf{LA}}$, contains the list of features which, along with those specified in Z, define the subsets whose score is required. See SCORE for additional details.
7:     SCORE($\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left({\mathbf{LA}},1\right)$) – REAL (KIND=nag_wp) arrayOutput
On exit: the value $f\left({S}_{\mathit{j}}\right)$, for $\mathit{j}=1,2,\dots ,{\mathbf{LA}}$, the score associated with the $j$th subset. ${S}_{j}$ is constructed as follows:
${\mathbf{DROP}}=1$
${S}_{j}$ is constructed by dropping the features specified in the first LZ elements of Z and the single feature given in ${\mathbf{A}}\left(j\right)$ from the full set of features, $\Omega$. The subset will therefore contain ${\mathbf{M}}-{\mathbf{LZ}}-1$ features.
${\mathbf{DROP}}=0$
${S}_{j}$ is constructed by adding the features specified in the first LZ elements of Z and the single feature specified in ${\mathbf{A}}\left(j\right)$ to the empty set, $\varnothing$. The subset will therefore contain ${\mathbf{LZ}}+1$ features.
In both cases the individual features are referenced by the integers $1$ to M with $1$ indicating the first feature, $2$ the second, etc., for some arbitrary ordering of the features, chosen by you prior to calling H05ABF. For example, $1$ might refer to the first variable in a particular set of data, $2$ the second, etc..
If ${\mathbf{LA}}=0$, the score for a single subset should be returned. This subset is constructed by adding or removing only those features specified in the first LZ elements of Z. If ${\mathbf{LZ}}=0$, this subset will either be $\Omega$ or $\varnothing$.
8:     IUSER($*$) – INTEGER arrayUser Workspace
9:     RUSER($*$) – REAL (KIND=nag_wp) arrayUser Workspace
F is called with the parameters IUSER and RUSER as supplied to H05ABF. You are free to use the arrays IUSER and RUSER to supply information to F as an alternative to using COMMON global variables.
10:   INFO – INTEGERInput/Output
On entry: ${\mathbf{INFO}}=0$.
On exit: set INFO to a nonzero value if you wish H05ABF to terminate with ${\mathbf{IFAIL}}={\mathbf{82}}$.
F must either be a module subprogram USEd by, or declared as EXTERNAL in, the (sub)program from which H05ABF is called. Parameters denoted as Input must not be changed by this procedure.
9:     MINCNT – INTEGERInput
On entry: $k$, the minimum number of times the effect of each feature, ${x}_{i}$, must have been observed before $f\left(S-\left\{{x}_{i}\right\}\right)$ is estimated from $f\left(S\right)$ as opposed to being calculated directly.
If $k=0$ then $f\left(S-\left\{{x}_{i}\right\}\right)$ is never estimated. If ${\mathbf{MINCNT}}<0$ then $k$ is set to $1$.
10:   GAMMA – REAL (KIND=nag_wp)Input
On entry: $\gamma$, the scaling factor used when estimating scores. If ${\mathbf{GAMMA}}<0$ then $\gamma$ is set to $1$.
11:   ACC($2$) – REAL (KIND=nag_wp) arrayInput
On entry: a measure of the accuracy of the scoring function, $f$.
Letting ${a}_{i}={\epsilon }_{1}\left|f\left({S}_{i}\right)\right|+{\epsilon }_{2}$, then when confirming whether the scoring function is strictly increasing or decreasing (as described in MINCR), or when assessing whether a node defined by subset ${S}_{i}$ can be trimmed, then any values in the range $f\left({S}_{i}\right)±{a}_{i}$ are treated as being numerically equivalent.
If $0\le {\mathbf{ACC}}\left(1\right)\le 1$ then ${\epsilon }_{1}={\mathbf{ACC}}\left(1\right)$, otherwise ${\epsilon }_{1}=0$.
If ${\mathbf{ACC}}\left(2\right)\ge 0$ then ${\epsilon }_{2}={\mathbf{ACC}}\left(2\right)$, otherwise ${\epsilon }_{2}=0$.
In most situations setting both ${\epsilon }_{1}$ and ${\epsilon }_{2}$ to zero should be sufficient. Using a nonzero value, when one is not required, can significantly increase the number of subsets that need to be evaluated.
12:   IUSER($*$) – INTEGER arrayUser Workspace
13:   RUSER($*$) – REAL (KIND=nag_wp) arrayUser Workspace
IUSER and RUSER are not used by H05ABF, but are passed directly to F and may be used to pass information to this routine as an alternative to using COMMON global variables.
14:   IFAIL – INTEGERInput/Output
On entry: IFAIL must be set to $0$, $-1\text{​ or ​}1$. If you are unfamiliar with this parameter you should refer to Section 3.3 in the Essential Introduction for details.
For environments where it might be inappropriate to halt program execution when an error is detected, the value $-1\text{​ or ​}1$ is recommended. If the output of error messages is undesirable, then the value $1$ is recommended. Otherwise, if you are not familiar with this parameter, the recommended value is $0$. When the value $-\mathbf{1}\text{​ or ​}\mathbf{1}$ is used it is essential to test the value of IFAIL on exit.
On exit: ${\mathbf{IFAIL}}={\mathbf{0}}$ unless the routine detects an error or a warning has been flagged (see Section 6).

## 6  Error Indicators and Warnings

If on entry ${\mathbf{IFAIL}}={\mathbf{0}}$ or $-{\mathbf{1}}$, explanatory error messages are output on the current error message unit (as defined by X04AAF).
Errors or warnings detected by the routine:
${\mathbf{IFAIL}}=11$
On entry, ${\mathbf{MINCR}}=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{MINCR}}=0$ or $1$.
${\mathbf{IFAIL}}=21$
On entry, ${\mathbf{M}}=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{M}}\ge 2$.
${\mathbf{IFAIL}}=31$
On entry, ${\mathbf{IP}}=⟨\mathit{\text{value}}⟩$ and ${\mathbf{M}}=⟨\mathit{\text{value}}⟩$.
Constraint: $1\le {\mathbf{IP}}\le {\mathbf{M}}$. On entry, ${\mathbf{IP}}=⟨\mathit{\text{value}}⟩$ and ${\mathbf{M}}=⟨\mathit{\text{value}}⟩$.
Constraint: $1\le {\mathbf{IP}}\le {\mathbf{M}}$.
${\mathbf{IFAIL}}=41$
On entry, ${\mathbf{NBEST}}=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{NBEST}}\ge 1$.
${\mathbf{IFAIL}}=42$
On entry, ${\mathbf{NBEST}}=⟨\mathit{\text{value}}⟩$.
But only $⟨\mathit{\text{value}}⟩$ best subsets could be calculated.
${\mathbf{IFAIL}}=81$
On exit from F, ${\mathbf{SCORE}}\left(⟨\mathit{\text{value}}⟩\right)=⟨\mathit{\text{value}}⟩$, which is inconsistent with the score for the parent node. Score for the parent node is $⟨\mathit{\text{value}}⟩$. On exit from F, ${\mathbf{SCORE}}\left(⟨\mathit{\text{value}}⟩\right)=⟨\mathit{\text{value}}⟩$, which is inconsistent with the score for the parent node. Score for the parent node is $⟨\mathit{\text{value}}⟩$.
${\mathbf{IFAIL}}=82$
A nonzero value for INFO has been returned: ${\mathbf{INFO}}=⟨\mathit{\text{value}}⟩$.
${\mathbf{IFAIL}}=-999$
Dynamic memory allocation failed.

## 7  Accuracy

The subsets returned by H05ABF are guaranteed to be optimal up to the accuracy of the calculated scores.

The maximum number of unique subsets of size $p$ from a set of $m$ features is $N=\frac{m!}{\left(m-p\right)!p!}$. The efficiency of the branch and bound algorithm implemented in H05ABF comes from evaluating subsets at internal nodes of the tree, that is subsets with more than $p$ features, and where possible trimming branches of the tree based on the scores at these internal nodes as described in Narendra and Fukunaga (1977). Because of this it is possible, in some circumstances, for more than $N$ subsets to be evaluated. This will tend to happen when most of the features have a similar effect on the subset score.
If multiple optimal subsets exist with the same score, and NBEST is too small to return them all, then the choice of which of these optimal subsets is returned is arbitrary.

## 9  Example

This example finds the three linear regression models, with five variables, that have the smallest residual sums of squares when fitted to a supplied dataset. The data used in this example was simulated.

### 9.1  Program Text

Program Text (h05abfe.f90)

### 9.2  Program Data

Program Data (h05abfe.d)

### 9.3  Program Results

Program Results (h05abfe.r)