hide long namesshow long names
hide short namesshow short names
Integer type:  int32  int64  nag_int  show int32  show int32  show int64  show int64  show nag_int  show nag_int

PDF version (NAG web site, 64-bit version, 64-bit version)
Chapter Contents
Chapter Introduction
NAG Toolbox

NAG Toolbox: nag_best_subset_given_size_revcomm (h05aa)

Purpose

Given a set of mm features and a scoring mechanism for any subset of those features, nag_best_subset_given_size_revcomm (h05aa) selects the best nn subsets of size pp using a reverse communication branch and bound algorithm.

Syntax

[irevcm, drop, lz, z, la, a, bscore, bz, icomm, rcomm, ifail] = h05aa(irevcm, mincr, m, ip, drop, lz, z, la, a, bscore, bz, mincnt, gamma, acc, icomm, rcomm, 'nbest', nbest, 'licomm', licomm, 'lrcomm', lrcomm)
[irevcm, drop, lz, z, la, a, bscore, bz, icomm, rcomm, ifail] = nag_best_subset_given_size_revcomm(irevcm, mincr, m, ip, drop, lz, z, la, a, bscore, bz, mincnt, gamma, acc, icomm, rcomm, 'nbest', nbest, 'licomm', licomm, 'lrcomm', lrcomm)

Description

Given Ω = {xi : i,1im}Ω={xi:i,1im}, a set of mm unique features and a scoring mechanism f(S) f( S )  defined for all S Ω S Ω  then nag_best_subset_given_size_revcomm (h05aa) is designed to find So1 Ω , |So1| = p So1 Ω , | So1 | = p , an optimal subset of size pp. Here |So1||So1| denotes the cardinality of So1So1, the number of elements in the set.
The definition of the optimal subset depends on the properties of the scoring mechanism, if
f(Si) f(Sj) , for all ​ Sj Ω ​ and ​ Si Sj
f(Si) f(Sj) , for all ​ Sj Ω ​ and ​ Si Sj
(1)
then the optimal subset is defined as one of the solutions to
maximize f(S)  subject to  |S| = p
SΩ
maximize SΩ f(S)   subject to   |S| = p
else if
f(Si) f(Sj) , for all ​ Sj Ω ​ and ​ Si Sj
f( Si ) f(Sj) , for all ​ Sj Ω ​ and ​ Si Sj
(2)
then the optimal subset is defined as one of the solutions to
minimize f(S)  subject to  |S| = p.
S Ω
minimize S Ω f(S)   subject to   | S | = p .
If neither of these properties hold then nag_best_subset_given_size_revcomm (h05aa) cannot be used.
As well as returning the optimal subset, So1So1, nag_best_subset_given_size_revcomm (h05aa) can return the best nn solutions of size pp. If SoiSoi denotes the iith best subset, for i = 1,2,,n1i=1,2,,n-1, then the (i + 1)(i+1)th best subset is defined as the solution to either
maximize f(S)  subject to  |S| = p
S Ω {Soj : j,1ji}
maximize S Ω - { Soj : j , 1ji } f(S)   subject to   | S | = p
or
minimize f(S)  subject to  |S| = p
S Ω {Soj : j,1ji}
minimize S Ω - { Soj : j,1ji } f(S)   subject to   | S | = p
depending on the properties of ff.
The solutions are found using a branch and bound method, where each node of the tree is a subset of ΩΩ. Assuming that (1) holds then a particular node, defined by subset SiSi, can be trimmed from the tree if f(Si) < (Son) f(Si) < f^(Son)  where (Son) f^(Son)  is the nnth highest score we have observed so far for a subset of size pp, i.e., our current best guess of the score for the nnth best subset. In addition, because of (1) we can also drop all nodes defined by any subset SjSj where SjSiSjSi, thus avoiding the need to enumerate the whole tree. Similar short cuts can be taken if (2) holds. A full description of this branch and bound algorithm can be found in Ridout (1988).
Rather than calculate the score at a given node of the tree nag_best_subset_given_size_revcomm (h05aa) utilizes the fast branch and bound algorithm of Somol et al. (2004), and attempts to estimate the score where possible. For each feature, xixi, two values are stored, a count cici and μ̂iμ^i, an estimate of the contribution of that feature. An initial value of zero is used for both cici and μ̂iμ^i. At any stage of the algorithm where both f(S) f( S )  and f(S{xi}) f( S - { xi } )  have been calculated (as opposed to estimated), the estimated contribution of the feature xixi is updated to
( ciμ̂i + [f(S)f(S{xj})] )/( ci + 1 )
ciμ^i + [ f( S ) - f( S - { xj } ) ] ci+1
and cici is incremented by 11, therefore at each stage μ̂iμ^i is the mean contribution of xixi observed so far and cici is the number of observations used to calculate that mean.
As long as cikcik, for the user-supplied constant kk, then rather than calculating f(S{xi}) f( S - { xi } )  this function estimates it using (S{xi}) = f(S) γ μ̂i f^( S - { xi } ) = f( S ) - γ μ^i  or (S) γ μ̂i f^( S ) - γ μ^i  if f(S) f( S )  has been estimated, where γγ is a user-supplied scaling factor. An estimated score is never used to trim a node or returned as the optimal score.
Setting k = 0k=0 in this function will cause the algorithm to always calculate the scores, returning to the branch and bound algorithm of Ridout (1988). In most cases it is preferable to use the fast branch and bound algorithm, by setting k > 0k>0, unless the score function is iterative in nature, i.e., f(S) f( S )  must have been calculated before f(S{xi}) f( S - { xi } )  can be calculated.

References

Narendra P M and Fukunaga K (1977) A branch and bound algorithm for feature subset selection IEEE Transactions on Computers 9 917–922
Ridout M S (1988) Algorithm AS 233: An improved branch and bound algorithm for feature subset selection Journal of the Royal Statistics Society, Series C (Applied Statistics) (Volume 37) 1 139–147
Somol P, Pudil P and Kittler J (2004) Fast branch and bound algorithms for optimal feature selection IEEE Transactions on Pattern Analysis and Machine Intelligence (Volume 26) 7 900–912

Parameters

Note: this function uses reverse communication. Its use involves an initial entry, intermediate exits and re-entries, and a final exit, as indicated by the parameter irevcm. Between intermediate exits and re-entries, all parameters other than bscore must remain unchanged.

Compulsory Input Parameters

1:     irevcm – int64int32nag_int scalar
On initial entry: must be set to 00.
On intermediate re-entry: irevcm must remain unchanged.
Constraint: irevcm = 0irevcm=0 or 11.
2:     mincr – int64int32nag_int scalar
Flag indicating whether the scoring function ff is increasing or decreasing.
mincr = 1mincr=1
f(Si) f(Sj) f(Si) f(Sj) , i.e., the subsets with the largest score will be selected.
mincr = 0mincr=0
f(Si) f(Sj) f(Si) f(Sj) , i.e., the subsets with the smallest score will be selected.
For all SjΩSjΩ and SiSjSiSj
Constraint: mincr = 0mincr=0 or 11.
3:     m – int64int32nag_int scalar
mm, the number of features in the full feature set.
Constraint: m2m2.
4:     ip – int64int32nag_int scalar
pp, the number of features in the subset of interest.
Constraint: 1ipm1ipm.
5:     drop – int64int32nag_int scalar
On initial entry: drop need not be set.
On intermediate re-entry: drop must remain unchanged.
6:     lz – int64int32nag_int scalar
On initial entry: lz need not be set.
On intermediate re-entry: lz must remain unchanged.
7:     z(mipm-ip) – int64int32nag_int array
On initial entry: z need not be set.
On intermediate re-entry: z must remain unchanged.
8:     la – int64int32nag_int scalar
On initial entry: la need not be set.
On intermediate re-entry: la must remain unchanged.
9:     a(max (nbest,m)max(nbest,m)) – int64int32nag_int array
On initial entry: a need not be set.
On intermediate re-entry: a must remain unchanged.
10:   bscore(max (nbest,m)max(nbest,m)) – double array
On initial entry: bscore need not be set.
On intermediate re-entry: bscore(j)bscorej must hold the score for the jjth subset as described in irevcm.
11:   bz(mipm-ip,nbest) – int64int32nag_int array
On initial entry: bz need not be set.
On intermediate re-entry: bz must remain unchanged.
12:   mincnt – int64int32nag_int scalar
kk, the minimum number of times the effect of each feature, xixi, must have been observed before f(S{xi}) f( S - { xi } )  is estimated from f(S) f( S )  as opposed to being calculated directly.
If k = 0k=0 then f(S{xi}) f( S - { xi } )  is never estimated. If mincnt < 0mincnt<0 then kk is set to 11.
13:   gamma – double scalar
γγ, the scaling factor used when estimating scores. If gamma < 0gamma<0 then γγ is set to 11.
14:   acc(22) – double array
A measure of the accuracy of the scoring function, ff.
Letting ai = ε1 |f(Si)| + ε2 ai = ε1 | f(Si) | + ε2 , then when confirming whether the scoring function is strictly increasing or decreasing (as described in mincr), or when assessing whether a node defined by subset SiSi can be trimmed, then any values in the range f(Si) ± ai f(Si) ± ai  are treated as being numerically equivalent.
If 0acc(1)10acc11 then ε1 = acc(1)ε1=acc1, otherwise ε1 = 0ε1=0.
If acc(2)0acc20 then ε2 = acc(2)ε2=acc2, otherwise ε2 = 0ε2=0.
In most situations setting both ε1ε1 and ε2ε2 to zero should be sufficient. Using a nonzero value, when one is not required, can significantly increase the number of subsets that need to be evaluated.
15:   icomm(licomm) – int64int32nag_int array
licomm, the dimension of the array, must satisfy the constraint
  • if mincnt = 0mincnt=0, licomm 2 × max (nbest,m) + m(m + 2) + (m + 1) × max (mip,1) + 27 licomm 2× max(nbest,m)+ m(m+2)+ (m+1)× max(m-ip,1)+ 27 ;
  • otherwise licomm 2 × max (nbest,m) + m(m + 3) + (2m + 1) × max (mip,1) + 25 licomm 2× max(nbest,m)+ m(m+3)+ (2m+1)× max(m-ip,1)+ 25 .
On initial entry: icomm need not be set.
licomm, the dimension of the array, must satisfy the constraint
  • if mincnt = 0mincnt=0, licomm 2 × max (nbest,m) + m(m + 2) + (m + 1) × max (mip,1) + 27 licomm 2× max(nbest,m)+ m(m+2)+ (m+1)× max(m-ip,1)+ 27 ;
  • otherwise licomm 2 × max (nbest,m) + m(m + 3) + (2m + 1) × max (mip,1) + 25 licomm 2× max(nbest,m)+ m(m+3)+ (2m+1)× max(m-ip,1)+ 25 .
On intermediate re-entry: icomm must remain unchanged.
16:   rcomm(lrcomm) – double array
lrcomm, the dimension of the array, must satisfy the constraint
  • if mincnt = 0mincnt=0, lrcomm 9 + nbest + m × max (mip,1) lrcomm 9+ nbest+ m× max(m-ip,1 ) ;
  • otherwise lrcomm 8 + m + nbest + m × max (mip,1) lrcomm 8+ m+ nbest+ m× max(m-ip,1 ) .
On initial entry: rcomm need not be set.
lrcomm, the dimension of the array, must satisfy the constraint
  • if mincnt = 0mincnt=0, lrcomm 9 + nbest + m × max (mip,1) lrcomm 9+ nbest+ m× max(m-ip,1 ) ;
  • otherwise lrcomm 8 + m + nbest + m × max (mip,1) lrcomm 8+ m+ nbest+ m× max(m-ip,1 ) .
On intermediate re-entry: rcomm must remain unchanged.

Optional Input Parameters

1:     nbest – int64int32nag_int scalar
Default: 11
nn, the maximum number of best subsets required. The actual number of subsets returned is given by la on final exit. If on final exit lanbestlanbest then ifail = 53ifail=53 is returned.
Constraint: nbest1nbest1.
2:     licomm – int64int32nag_int scalar
Default: The dimension of the array icomm.
The length of the array icomm. If licomm is too small and licomm2licomm2 then ifail = 172ifail=172 is returned and the minimum value for licomm and lrcomm are given by icomm(1)icomm1 and icomm(2)icomm2 respectively.
Constraints:
  • if mincnt = 0mincnt=0, licomm 2 × max (nbest,m) + m(m + 2) + (m + 1) × max (mip,1) + 27 licomm 2× max(nbest,m)+ m(m+2)+ (m+1)× max(m-ip,1)+ 27 ;
  • otherwise licomm 2 × max (nbest,m) + m(m + 3) + (2m + 1) × max (mip,1) + 25 licomm 2× max(nbest,m)+ m(m+3)+ (2m+1)× max(m-ip,1)+ 25 .
3:     lrcomm – int64int32nag_int scalar
Default: The dimension of the array rcomm.
The length of the array rcomm. If lrcomm is too small and licomm2licomm2 then ifail = 172ifail=172 is returned and the minimum value for licomm and lrcomm are given by icomm(1)icomm1 and icomm(2)icomm2 respectively.
Constraints:
  • if mincnt = 0mincnt=0, lrcomm 9 + nbest + m × max (mip,1) lrcomm 9+ nbest+ m× max(m-ip,1 ) ;
  • otherwise lrcomm 8 + m + nbest + m × max (mip,1) lrcomm 8+ m+ nbest+ m× max(m-ip,1 ) .

Input Parameters Omitted from the MATLAB Interface

None.

Output Parameters

1:     irevcm – int64int32nag_int scalar
On intermediate exit: irevcm = 1irevcm=1 and before re-entry the scores associated with la subsets must be calculated and returned in bscore.
The la subsets are constructed as follows:
drop = 1drop=1
The jjth subset is constructed by dropping the features specified in the first lz elements of z and the single feature given in a(j)aj from the full set of features, ΩΩ. The subset will therefore contain mlz1m-lz-1 features.
drop = 0drop=0
The jjth subset is constructed by adding the features specified in the first lz elements of z and the single feature specified in a(j)aj to the empty set, . The subset will therefore contain lz + 1lz+1 features.
In both cases the individual features are referenced by the integers 11 to m with 11 indicating the first feature, 22 the second, etc., for some arbitrary ordering of the features. The same ordering must be used in all calls to nag_best_subset_given_size_revcomm (h05aa).
If la = 0la=0, the score for a single subset should be returned. This subset is constructed by adding or removing only those features specified in the first lz elements of z.
If lz = 0lz=0, this subset will either be ΩΩ or .
The score associated with the jjth subset must be returned in bscore(j)bscorej.
On final exit: irevcm = 0irevcm=0, and the algorithm has terminated.
2:     drop – int64int32nag_int scalar
On intermediate exit: flag indicating whether the intermediate subsets should be constructed by dropping features from the full set (drop = 1drop=1) or adding features to the empty set (drop = 0drop=0). See irevcm for details.
On final exit: is undefined.
3:     lz – int64int32nag_int scalar
On intermediate exit: the number of features stored in z.
On final exit: is undefined.
4:     z(mipm-ip) – int64int32nag_int array
On intermediate exit: z(i)zi, for i = 1,2,,lzi=1,2,,lz, contains the list of features which, along with those specified in a, define the subsets whose score is required. See irevcm for additional details.
On final exit: is undefined.
5:     la – int64int32nag_int scalar
On intermediate exit: if la > 0la>0, the number of subsets for which a score must be returned.
If la = 0la=0, the score for a single subset should be returned. See irevcm for additional details.
On final exit: the number of best subsets returned.
6:     a(max (nbest,m)max(nbest,m)) – int64int32nag_int array
On intermediate exit: a(j)aj, for j = 1,2,,laj=1,2,,la, contains the list of features which, along with those specified in z, define the subsets whose score is required. See irevcm for additional details.
On final exit: is undefined.
7:     bscore(max (nbest,m)max(nbest,m)) – double array
On intermediate exit: is undefined.
On final exit: holds the score for the la best subsets returned in bz.
8:     bz(mipm-ip,nbest) – int64int32nag_int array
On intermediate exit: is used for storage between calls to nag_best_subset_given_size_revcomm (h05aa).
On final exit: the jjth best subset is constructed by dropping the features specified in bz(i,j)bzij, for i = 1,2,,mipi=1,2,,m-ip and j = 1,2,,laj=1,2,,la, from the set of all features, ΩΩ. The score for the jjth best subset is given in bscore(j)bscorej.
9:     icomm(licomm) – int64int32nag_int array
On intermediate exit: is used for storage between calls to nag_best_subset_given_size_revcomm (h05aa).
On final exit: is not defined. The first two elements, icomm(1)icomm1 and icomm(2)icomm2 contain the minimum required value for licomm and lrcomm respectively.
10:   rcomm(lrcomm) – double array
On intermediate exit: is used for storage between calls to nag_best_subset_given_size_revcomm (h05aa).
On final exit: is not defined.
11:   ifail – int64int32nag_int scalar
ifail = 0ifail=0 unless the function detects an error (see [Error Indicators and Warnings]).

Error Indicators and Warnings

Errors or warnings detected by the function:

Cases prefixed with W are classified as warnings and do not generate an error of type NAG:error_n. See nag_issue_warnings.

  ifail = 11ifail=11
Constraint: irevcm = 0irevcm=0 or 11.
  ifail = 21ifail=21
Constraint: mincr = 0mincr=0 or 11.
  ifail = 22ifail=22
mincr has changed between calls.
  ifail = 31ifail=31
Constraint: m2m2.
  ifail = 32ifail=32
m has changed between calls.
  ifail = 41ifail=41
Constraint: 1ipm1ipm. Constraint: 1ipm1ipm.
  ifail = 42ifail=42
ip has changed between calls.
  ifail = 51ifail=51
Constraint: nbest1nbest1.
  ifail = 52ifail=52
nbest has changed between calls.
W ifail = 53ifail=53
On entry.
But only __ best subsets could be calculated.
  ifail = 61ifail=61
drop has changed between calls.
  ifail = 71ifail=71
lz has changed between calls.
  ifail = 91ifail=91
la has changed between calls.
  ifail = 111ifail=111
bscore(_) = _bscore_=_, which is inconsistent with the score for the parent node. bscore(_) = _bscore_=_, which is inconsistent with the score for the parent node.
  ifail = 131ifail=131
mincnt has changed between calls.
  ifail = 141ifail=141
gamma has changed between calls.
  ifail = 151ifail=151
acc(1)acc1 has changed between calls.
  ifail = 152ifail=152
acc(2)acc2 has changed between calls.
  ifail = 161ifail=161
icomm has been corrupted between calls. icomm has been corrupted between calls.
  ifail = 171ifail=171
licomm is too small, .
icomm is too small to return the required array sizes.
W ifail = 172ifail=172
licomm is too small, .
The minimum required values for licomm and lrcomm are returned in icomm(1)icomm1 and icomm(2)icomm2 respectively. licomm is too small, .
The minimum required values for licomm and lrcomm are returned in icomm(1)icomm1 and icomm(2)icomm2 respectively.
  ifail = 181ifail=181
rcomm has been corrupted between calls.
  ifail = 999ifail=-999
Dynamic memory allocation failed.

Accuracy

The subsets returned by nag_best_subset_given_size_revcomm (h05aa) are guaranteed to be optimal up to the accuracy of your calculated scores.

Further Comments

The maximum number of unique subsets of size pp from a set of mm features is N = (m ! )/( (mp) ! p ! ) N= m! (m-p)!p! . The efficiency of the branch and bound algorithm implemented in nag_best_subset_given_size_revcomm (h05aa) comes from evaluating subsets at internal nodes of the tree, that is subsets with more than pp features, and where possible trimming branches of the tree based on the scores at these internal nodes as described in Narendra and Fukunaga (1977). Because of this it is possible, in some circumstances, for more than NN subsets to be evaluated. This will tend to happen when most of the features have a similar effect on the subset score.
If multiple optimal subsets exist with the same score, and nbest is too small to return them all, then the choice of which of these optimal subsets is returned is arbitrary.

Example

function nag_best_subset_given_size_revcomm_example
m      = int64(14);
ip     = int64(5);
nbest  = int64(3);
mincr  = int64(0);
mincnt = int64(-1);
gamma  = -1;
acc    = [-1, -1];
mip    = m - ip;
z      = zeros(mip, 1, 'int64');
a      = zeros(max(nbest, m), 1, 'int64');
bz     = zeros(mip, nbest, 'int64');
bscore = zeros(max(nbest, m), 1);
irevcm = int64(0);
drop   = int64(0);
lz     = int64(0);
la     = int64(0);
acc    = [0, 0];

% Data required by the scoring function
n = 40;
x = [
-1.59, 0.19, 0.40, 0.43,-0.40, 0.79, 0.06, 0.33, 1.60, 0.58,-1.12, 1.23, 1.07,-0.07;
-0.25, 0.61,-0.36, 1.16, 0.61,-2.05,-0.02,-0.04, 0.80,-0.73,-0.63,-0.75,-0.73, 1.43;
-2.28, 0.46,-0.65, 0.33, 0.16,-0.21,-1.61,-0.54, 0.48, 0.37,-0.95,-2.14, 0.48, 2.02;
-0.52, 1.05, 0.64, 0.02,-1.12, 0.23, 0.06,-1.26, 1.40,-0.98, 2.47, 0.49,-0.02,-0.05;
-0.84, 1.86, 0.10, 0.73,-1.41, 0.98, 0.20,-0.89, 1.84, 2.56, 0.60,-0.12, 0.71, 0.23;
 1.12,-0.51,-0.58, 0.09,-1.14, 2.11,-0.11,-0.34,-1.04,-0.43,-0.01,-0.38, 1.80, 0.05;
 0.06, 0.85,-2.09, 0.22,-1.35,-0.36, 1.20, 0.41, 0.80,-0.28, 0.18, 0.27, 0.92, 0.63;
-0.48,-1.02, 0.08,-0.06, 0.13,-1.18, 2.30, 0.03, 0.45, 0.62,-1.97, 0.97, 0.93,-0.18;
 0.08,-0.31, 0.43,-0.38, 0.01, 1.30, 0.66, 0.65,-0.59, 0.76, 0.04, 0.17,-0.76,-0.90;
 0.66, 1.14, 0.40, 2.37, 1.10, 0.17,-0.38, 1.15,-1.00,-0.13,-0.69,-0.62,-0.18, 0.00;
-1.08,-0.21,-1.13,-0.79,-0.76,-1.58, 0.38,-0.03, 1.26,-0.51,-0.75, 0.86, 0.29, 0.68;
-0.74,-1.59,-0.58,-1.09, 1.18,-1.70,-1.02, 0.36, 1.05, 1.30,-0.98,-1.36,-1.28,-1.32;
 0.40,-1.58,-1.30,-0.10,-1.34, 0.65,-0.56, 0.39,-0.73,-0.32, 2.19,-0.49, 0.69, 0.18;
 0.75,-3.09,-0.61,-1.89, 0.15, 0.77,-0.49,-0.63, 1.20,-0.04, 1.02, 0.31, 0.81,-0.45;
-0.65, 1.57,-1.50,-1.45, 0.21, 0.06, 0.24, 2.24,-1.34, 0.30, 1.39,-0.38,-0.71, 0.48;
 1.36, 1.40,-1.40,-0.90, 0.36,-0.21,-0.97, 0.36,-0.26, 0.08, 0.06,-1.49, 0.43,-1.61;
-1.01, 1.50,-0.61,-0.25,-1.01,-0.43, 1.90,-1.33,-0.96,-0.02, 0.51,-1.38,-0.78, 1.82;
 1.34, 1.02, 3.50, 0.10, 0.50, 0.04, 0.61,-0.57,-2.69,-0.64,-0.34,-0.21,-1.97,-0.19;
 0.29, 0.67,-0.38,-0.63,-0.24, 1.21,-0.09, 0.90,-2.20, 1.72, 0.29, 0.66, 0.19,-0.57;
 0.67,-0.56,-0.41, 1.22,-0.30, 0.77, 0.82, 0.36, 1.18, 1.87,-1.48, 0.52, 1.35, 0.13;
-0.40,-1.10,-0.83, 0.71, 1.99,-0.24, 1.30,-0.34,-0.70, 0.28, 0.16, 0.27, 0.37,-1.79;
-0.78, 0.60,-0.45,-0.26,-0.23, 0.89, 0.87, 1.01, 1.20, 0.28, 0.79, 2.76, 0.35, 1.31;
-1.29, 0.62,-0.59, 1.52, 0.62, 0.21, 1.31, 1.09,-0.36,-0.34,-0.03,-0.59,-1.70,-0.03;
 0.40,-1.45,-0.98, 2.10,-1.09,-0.53,-0.38,-1.36, 0.13, 0.70,-1.51, 0.08,-0.62,-0.64;
 0.43,-0.86, 0.70,-1.07,-0.76, 0.72,-0.14,-1.58, 0.00, 0.58,-0.21, 1.30, 2.02, 1.52;
-0.48, 0.01, 1.30, 0.58,-0.54, 1.09, 0.91, 2.90, 1.32,-1.20,-0.59,-0.51, 0.20,-1.74;
-1.32,-1.41,-0.58,-1.29, 1.61,-0.35,-0.72,-1.92,-1.09, 0.56,-0.87,-0.71, 1.25, 0.10;
 1.43, 0.69, 1.34,-0.32, 2.84,-1.43,-0.47,-0.01, 0.83,-0.72,-0.78, 0.50,-1.22, 0.54;
 0.82, 0.46, 0.15,-0.57, 0.93, 1.33,-0.23,-1.07, 0.76, 0.25,-1.96, 0.39, 0.24,-0.26;
-0.91, 0.23,-0.19, 1.58,-0.27, 0.33,-0.60,-1.39,-0.30,-0.81,-0.95, 0.88,-0.09,-0.35;
 0.65,-1.14, 1.18,-1.06,-0.68,-0.22, 0.21, 0.94, 1.08, 0.81,-0.33, 0.42,-0.90, 0.49;
-0.36,-0.50,-0.02,-0.04, 0.77, 0.62,-1.35,-0.64, 1.20, 1.22, 0.18,-1.39,-0.81,-0.99;
-1.82, 1.06, 0.28, 0.14, 0.62,-0.80,-1.08,-2.15, 1.37, 1.57,-1.48,-0.79, 0.28,-0.20;
 1.54, 0.50, 0.13,-0.68, 0.26,-1.13, 0.62,-0.43, 0.39, 1.14, 0.15, 1.03, 0.46, 0.40;
-1.61,-0.61, 0.93,-0.37, 0.44,-1.45, 0.58,-1.77, 0.72,-2.05,-0.03,-1.24,-1.40,-0.06;
-0.48, 0.67, 0.04, 0.27,-0.84,-0.06,-3.67, 0.09, 1.66,-0.30, 1.67, 1.08, 0.00, 0.43;
-1.65,-1.16,-1.17, 1.12, 0.11,-0.15, 0.48,-1.72, 1.08,-0.94, 0.49,-0.56, 0.95, 1.09;
-0.85,-0.02, 1.18,-1.16, 0.49, 1.56,-0.60, 0.32, 0.72,-1.20, 2.52, 1.78, 0.16,-0.01;
-0.60,-0.73,-1.23, 1.50, 0.40,-0.20,-0.65, 0.68, 1.09, 0.40,-1.50,-2.10, 0.21,-0.18;
-0.66,-0.01,-0.01, 0.85,-2.04, 1.17,-0.56, 1.72,-0.18, 1.14,-0.96,-0.92,-0.28, 1.58];
y = [-2.44;
     -2.97;
      7.42;
      3.00;
      8.83;
      0.03;
      2.57;
      8.31;
      4.55;
    -23.10;
      3.38;
     -0.13;
      5.47;
     13.97;
     20.94;
    -12.87;
     28.26;
      6.89;
      5.37;
     -1.50;
    -23.00;
     14.09;
    -11.05;
    -32.04;
     23.36;
     -5.58;
      2.48;
     -5.30;
     -7.77;
    -34.25;
     26.78;
    -11.85;
     -8.62;
     12.35;
     -1.54;
    -16.59;
     -8.69;
      7.82;
    -18.56;
     17.21];
% Query required length of communication arrays
icomm = zeros(2, 1, 'int64');
rcomm = zeros(0, 0);
warning('off', 'NAG:warning');
[irevcm, drop, lz, z, la, a, bscore, bz, icomm, rcomm, ifail] = ...
    nag_best_subset_given_size_revcomm(irevcm, mincr, m, ip, drop, lz, z, la, a, ...
          bscore, bz, mincnt, gamma, acc, icomm, rcomm);
warning('on', 'NAG:warning');

% Ignore the warning message - required size of communication arrays now
% stored in icomm
rcomm = zeros(icomm(2), 1);
icomm = zeros(icomm(1), 1, 'int64');

cnt = 0;
[irevcm, drop, lz, z, la, a, bscore, bz, icomm, rcomm, ifail] = ...
    nag_best_subset_given_size_revcomm(irevcm, mincr, m, ip, drop, lz, z, la, a, ...
          bscore, bz, mincnt, gamma, acc, icomm, rcomm);
% Call the reverse communication best subset routine in a loop, terminating
% when irevcm = 0 on exit
while not(irevcm == 0)
  % Calculate and return the score for the required models and keep track
  % of the number of subsets evaluated
  cnt = cnt + max(1,la);
  [bscore] = calc_subset_score(m, drop, lz, z, la, a, x, y, bscore);

  [irevcm, drop, lz, z, la, a, bscore, bz, icomm, rcomm, ifail] = ...
      nag_best_subset_given_size_revcomm(irevcm, mincr, m, ip, drop, lz, z, la, a, ...
            bscore, bz, mincnt, gamma, acc, icomm, rcomm);
end

fprintf('\n    Score        Feature Subset\n');
fprintf('    -----        --------------\n');
% Display the best subsets and corresponding scores.
% nag_best_subset_given_size_revcomm returns a list of features excluded
% from the best subsets, so this is inverted to give the set of features
% included in each subset
ibz = 1:m;
for i = 1:la
  mask = ones(1, m, 'int64');
  for j = 1:mip
    mask(bz(j, i)) = 0;
  end
  fprintf('%12.5e %5d %5d %5d %5d %5d\n', bscore(i), ibz(logical(mask)));
end

fprintf('\n%d subsets evaluated in total\n', cnt);


function [bscore] = calc_subset_score(m, drop, lz, z, la, a, x, y, bscore)
  % Set up the initial feature set.
  % If drop = 0, this is the Null set (i.e. no features).
  % If drop = 1 then this is the full set (i.e. all features)
  if drop == 0
    isx = zeros(m, 1, 'int64');
  else
    isx = ones(m, 1, 'int64');
  end

  % Add (if drop = 0) or remove (if drop = 1) all the features specified in z
  inv_drop = not(drop);
  for i = 1:lz
    isx(z(i)) = inv_drop;
  end

  for i=1:max(la, 1)
    if (la > 0)
      if (i > 1)
        % Reset the feature altered at the last iteration
        isx(a(i-1)) = drop;
      end

      %  Add or drop the i'th feature in a
      isx(a(i)) = inv_drop;
    end

    ip = int64(sum(isx));

    % Fit the regression model
    rss = nag_correg_linregm_fit('z', x, isx, ip, y);

    % Return the score (the residual sums of squares)
    bscore(i) = rss;
  end
 

    Score        Feature Subset
    -----        --------------
 1.04753e+03     4     7     8    10    14
 1.05987e+03     4     5     7     8    14
 1.07016e+03     4     5     7    10    14

45 subsets evaluated in total

function h05aa_example
m      = int64(14);
ip     = int64(5);
nbest  = int64(3);
mincr  = int64(0);
mincnt = int64(-1);
gamma  = -1;
acc    = [-1, -1];
mip    = m - ip;
z      = zeros(mip, 1, 'int64');
a      = zeros(max(nbest, m), 1, 'int64');
bz     = zeros(mip, nbest, 'int64');
bscore = zeros(max(nbest, m), 1);
irevcm = int64(0);
drop   = int64(0);
lz     = int64(0);
la     = int64(0);
acc    = [0, 0];

% Data required by the scoring function
n = 40;
x = [
-1.59, 0.19, 0.40, 0.43,-0.40, 0.79, 0.06, 0.33, 1.60, 0.58,-1.12, 1.23, 1.07,-0.07;
-0.25, 0.61,-0.36, 1.16, 0.61,-2.05,-0.02,-0.04, 0.80,-0.73,-0.63,-0.75,-0.73, 1.43;
-2.28, 0.46,-0.65, 0.33, 0.16,-0.21,-1.61,-0.54, 0.48, 0.37,-0.95,-2.14, 0.48, 2.02;
-0.52, 1.05, 0.64, 0.02,-1.12, 0.23, 0.06,-1.26, 1.40,-0.98, 2.47, 0.49,-0.02,-0.05;
-0.84, 1.86, 0.10, 0.73,-1.41, 0.98, 0.20,-0.89, 1.84, 2.56, 0.60,-0.12, 0.71, 0.23;
 1.12,-0.51,-0.58, 0.09,-1.14, 2.11,-0.11,-0.34,-1.04,-0.43,-0.01,-0.38, 1.80, 0.05;
 0.06, 0.85,-2.09, 0.22,-1.35,-0.36, 1.20, 0.41, 0.80,-0.28, 0.18, 0.27, 0.92, 0.63;
-0.48,-1.02, 0.08,-0.06, 0.13,-1.18, 2.30, 0.03, 0.45, 0.62,-1.97, 0.97, 0.93,-0.18;
 0.08,-0.31, 0.43,-0.38, 0.01, 1.30, 0.66, 0.65,-0.59, 0.76, 0.04, 0.17,-0.76,-0.90;
 0.66, 1.14, 0.40, 2.37, 1.10, 0.17,-0.38, 1.15,-1.00,-0.13,-0.69,-0.62,-0.18, 0.00;
-1.08,-0.21,-1.13,-0.79,-0.76,-1.58, 0.38,-0.03, 1.26,-0.51,-0.75, 0.86, 0.29, 0.68;
-0.74,-1.59,-0.58,-1.09, 1.18,-1.70,-1.02, 0.36, 1.05, 1.30,-0.98,-1.36,-1.28,-1.32;
 0.40,-1.58,-1.30,-0.10,-1.34, 0.65,-0.56, 0.39,-0.73,-0.32, 2.19,-0.49, 0.69, 0.18;
 0.75,-3.09,-0.61,-1.89, 0.15, 0.77,-0.49,-0.63, 1.20,-0.04, 1.02, 0.31, 0.81,-0.45;
-0.65, 1.57,-1.50,-1.45, 0.21, 0.06, 0.24, 2.24,-1.34, 0.30, 1.39,-0.38,-0.71, 0.48;
 1.36, 1.40,-1.40,-0.90, 0.36,-0.21,-0.97, 0.36,-0.26, 0.08, 0.06,-1.49, 0.43,-1.61;
-1.01, 1.50,-0.61,-0.25,-1.01,-0.43, 1.90,-1.33,-0.96,-0.02, 0.51,-1.38,-0.78, 1.82;
 1.34, 1.02, 3.50, 0.10, 0.50, 0.04, 0.61,-0.57,-2.69,-0.64,-0.34,-0.21,-1.97,-0.19;
 0.29, 0.67,-0.38,-0.63,-0.24, 1.21,-0.09, 0.90,-2.20, 1.72, 0.29, 0.66, 0.19,-0.57;
 0.67,-0.56,-0.41, 1.22,-0.30, 0.77, 0.82, 0.36, 1.18, 1.87,-1.48, 0.52, 1.35, 0.13;
-0.40,-1.10,-0.83, 0.71, 1.99,-0.24, 1.30,-0.34,-0.70, 0.28, 0.16, 0.27, 0.37,-1.79;
-0.78, 0.60,-0.45,-0.26,-0.23, 0.89, 0.87, 1.01, 1.20, 0.28, 0.79, 2.76, 0.35, 1.31;
-1.29, 0.62,-0.59, 1.52, 0.62, 0.21, 1.31, 1.09,-0.36,-0.34,-0.03,-0.59,-1.70,-0.03;
 0.40,-1.45,-0.98, 2.10,-1.09,-0.53,-0.38,-1.36, 0.13, 0.70,-1.51, 0.08,-0.62,-0.64;
 0.43,-0.86, 0.70,-1.07,-0.76, 0.72,-0.14,-1.58, 0.00, 0.58,-0.21, 1.30, 2.02, 1.52;
-0.48, 0.01, 1.30, 0.58,-0.54, 1.09, 0.91, 2.90, 1.32,-1.20,-0.59,-0.51, 0.20,-1.74;
-1.32,-1.41,-0.58,-1.29, 1.61,-0.35,-0.72,-1.92,-1.09, 0.56,-0.87,-0.71, 1.25, 0.10;
 1.43, 0.69, 1.34,-0.32, 2.84,-1.43,-0.47,-0.01, 0.83,-0.72,-0.78, 0.50,-1.22, 0.54;
 0.82, 0.46, 0.15,-0.57, 0.93, 1.33,-0.23,-1.07, 0.76, 0.25,-1.96, 0.39, 0.24,-0.26;
-0.91, 0.23,-0.19, 1.58,-0.27, 0.33,-0.60,-1.39,-0.30,-0.81,-0.95, 0.88,-0.09,-0.35;
 0.65,-1.14, 1.18,-1.06,-0.68,-0.22, 0.21, 0.94, 1.08, 0.81,-0.33, 0.42,-0.90, 0.49;
-0.36,-0.50,-0.02,-0.04, 0.77, 0.62,-1.35,-0.64, 1.20, 1.22, 0.18,-1.39,-0.81,-0.99;
-1.82, 1.06, 0.28, 0.14, 0.62,-0.80,-1.08,-2.15, 1.37, 1.57,-1.48,-0.79, 0.28,-0.20;
 1.54, 0.50, 0.13,-0.68, 0.26,-1.13, 0.62,-0.43, 0.39, 1.14, 0.15, 1.03, 0.46, 0.40;
-1.61,-0.61, 0.93,-0.37, 0.44,-1.45, 0.58,-1.77, 0.72,-2.05,-0.03,-1.24,-1.40,-0.06;
-0.48, 0.67, 0.04, 0.27,-0.84,-0.06,-3.67, 0.09, 1.66,-0.30, 1.67, 1.08, 0.00, 0.43;
-1.65,-1.16,-1.17, 1.12, 0.11,-0.15, 0.48,-1.72, 1.08,-0.94, 0.49,-0.56, 0.95, 1.09;
-0.85,-0.02, 1.18,-1.16, 0.49, 1.56,-0.60, 0.32, 0.72,-1.20, 2.52, 1.78, 0.16,-0.01;
-0.60,-0.73,-1.23, 1.50, 0.40,-0.20,-0.65, 0.68, 1.09, 0.40,-1.50,-2.10, 0.21,-0.18;
-0.66,-0.01,-0.01, 0.85,-2.04, 1.17,-0.56, 1.72,-0.18, 1.14,-0.96,-0.92,-0.28, 1.58];
y = [-2.44;
     -2.97;
      7.42;
      3.00;
      8.83;
      0.03;
      2.57;
      8.31;
      4.55;
    -23.10;
      3.38;
     -0.13;
      5.47;
     13.97;
     20.94;
    -12.87;
     28.26;
      6.89;
      5.37;
     -1.50;
    -23.00;
     14.09;
    -11.05;
    -32.04;
     23.36;
     -5.58;
      2.48;
     -5.30;
     -7.77;
    -34.25;
     26.78;
    -11.85;
     -8.62;
     12.35;
     -1.54;
    -16.59;
     -8.69;
      7.82;
    -18.56;
     17.21];
% Query required length of communication arrays
icomm = zeros(2, 1, 'int64');
rcomm = zeros(0, 0);
warning('off', 'NAG:warning');
[irevcm, drop, lz, z, la, a, bscore, bz, icomm, rcomm, ifail] = ...
    h05aa(irevcm, mincr, m, ip, drop, lz, z, la, a, ...
          bscore, bz, mincnt, gamma, acc, icomm, rcomm);
warning('on', 'NAG:warning');

% Ignore the warning message - required size of communication arrays now
% stored in icomm
rcomm = zeros(icomm(2), 1);
icomm = zeros(icomm(1), 1, 'int64');

cnt = 0;
[irevcm, drop, lz, z, la, a, bscore, bz, icomm, rcomm, ifail] = ...
    h05aa(irevcm, mincr, m, ip, drop, lz, z, la, a, ...
          bscore, bz, mincnt, gamma, acc, icomm, rcomm);
% Call the reverse communication best subset routine in a loop, terminating
% when irevcm = 0 on exit
while not(irevcm == 0)
  % Calculate and return the score for the required models and keep track
  % of the number of subsets evaluated
  cnt = cnt + max(1,la);
  [bscore] = calc_subset_score(m, drop, lz, z, la, a, x, y, bscore);

  [irevcm, drop, lz, z, la, a, bscore, bz, icomm, rcomm, ifail] = ...
      h05aa(irevcm, mincr, m, ip, drop, lz, z, la, a, ...
            bscore, bz, mincnt, gamma, acc, icomm, rcomm);
end

fprintf('\n    Score        Feature Subset\n');
fprintf('    -----        --------------\n');
% Display the best subsets and corresponding scores. h05aa returns
% a list of features excluded from the best subsets, so this is inverted
% to give the set of features included in each subset
ibz = 1:m;
for i = 1:la
  mask = ones(1, m, 'int64');
  for j = 1:mip
    mask(bz(j, i)) = 0;
  end
  fprintf('%12.5e %5d %5d %5d %5d %5d\n', bscore(i), ibz(logical(mask)));
end

fprintf('\n%d subsets evaluated in total\n', cnt);


function [bscore] = calc_subset_score(m, drop, lz, z, la, a, x, y, bscore)
  % Set up the initial feature set.
  % If drop = 0, this is the Null set (i.e. no features).
  % If drop = 1 then this is the full set (i.e. all features)
  if drop == 0
    isx = zeros(m, 1, 'int64');
  else
    isx = ones(m, 1, 'int64');
  end

  % Add (if drop = 0) or remove (if drop = 1) all the features specified in z
  inv_drop = not(drop);
  for i = 1:lz
    isx(z(i)) = inv_drop;
  end

  for i=1:max(la, 1)
    if (la > 0)
      if (i > 1)
        % Reset the feature altered at the last iteration
        isx(a(i-1)) = drop;
      end

      %  Add or drop the i'th feature in a
      isx(a(i)) = inv_drop;
    end

    ip = int64(sum(isx));

    % Fit the regression model
    rss = g02da('z', x, isx, ip, y);

    % Return the score (the residual sums of squares)
    bscore(i) = rss;
  end
 

    Score        Feature Subset
    -----        --------------
 1.04753e+03     4     7     8    10    14
 1.05987e+03     4     5     7     8    14
 1.07016e+03     4     5     7    10    14

45 subsets evaluated in total


PDF version (NAG web site, 64-bit version, 64-bit version)
Chapter Contents
Chapter Introduction
NAG Toolbox

© The Numerical Algorithms Group Ltd, Oxford, UK. 2009–2013