Integer type:  int32  int64  nag_int  show int32  show int32  show int64  show int64  show nag_int  show nag_int

PDF version (NAG web site, 64-bit version, 64-bit version)
Chapter Contents
Chapter Introduction
NAG Toolbox

# NAG Toolbox Chapter IntroductionM01 — Sorting and Searching

## Scope of the Chapter

This chapter is concerned with sorting and searching numeric or character data. It handles only the simplest types of data structure and it is concerned only with internal sorting and searching – that is, sorting and searching a set of data which can all be stored within the program.
If you have large files of data or complicated data structures to be sorted or searched you should use a comprehensive sorting or searching program or package.

## Background to the Problems

### Sorting

The usefulness of sorting is obvious (perhaps a little too obvious, since sorting can be expensive and is sometimes done when not strictly necessary). Sorting may traditionally be associated with data processing and non-numerical programming, but it has many uses within the realm of numerical analysis. For example, within the NAG Toolbox, sorting is used to arrange eigenvalues in ascending order of absolute value, in the manipulation of sparse matrices, and in the ranking of observations for nonparametric statistics.
The general problem may be defined as follows. We are given N$N$ items of data
 R1,R2, … ,RN. $R1,R2,…,RN.$
Each item Ri${R}_{i}$ contains a key Ki${K}_{i}$ which can be ordered relative to any other key according to some specified criterion (for example, ascending numeric value). The problem is to determine a permutation
 p(1),p(2), … ,p(N) $p(1),p(2),…,p(N)$
which puts the keys in order:
 Kp(1) ≤ Kp(2) ≤ … ≤ Kp(N). $Kp(1)≤Kp(2)≤…≤Kp(N).$
Sometimes we may wish actually to rearrange the items so that their keys are in order; for other purposes we may simply require a table of indices so that the items can be referred to in sorted order; or yet again we may require a table of ranks, that is, the positions of each item in the sorted order.
For example, given the single-character items, to be sorted into alphabetic order
• E B A D C
the indices of the items in sorted order are
• 3$3$ 2$2$ 5$5$ 4$4$ 1$1$
and the ranks of the items are
• 5$5$ 2$2$ 1$1$ 4$4$ 3$3$.
Indices may be converted to ranks, and vice versa, by simply computing the inverse permutation.
The items may consist solely of the key (each item may simply be a number). On the other hand, the items may contain additional information (for example, each item may be an eigenvalue of a matrix and its associated eigenvector, the eigenvalue being the key). In the latter case there may be many distinct items with equal keys, and it may be important to preserve the original order among them (if this is achieved, the sorting is called ‘stable’).
There are a number of ingenious algorithms for sorting. For a fascinating discussion of them, and of the whole subject, see Knuth (1973).

### Searching

Searching is a process of retrieving data stored in a computer's memory.
The general problem may be defined as follows:
• We are given n$n$ items of data that have been sorted and a sought-after item x$x$. Each item contains a key. The problem is to find which item has x$x$ as its key.
• We may be interested in different information gained from the search. We may wish to know if item x$x$ was or was not found or the position of the item that was found.
There are a number of different search algorithms. For more on the subject, see Knuth (1973) and Wirth (2004).

## Recommendations on Choice and Use of Available Functions

The following categories of functions are provided:
• – functions which rearrange the data into sorted order (M01C);
• – functions which determine the ranks of the data, leaving the data unchanged (M01D);
• – functions which rearrange the data according to pre-determined ranks (M01E);
• – functions which search the data (M01N);
• – service functions (M01Z).
In the first two and the fourth categories functions are provided for double and integer numeric data, and for character data. In the third category there are functions for rearranging double, complex, integer and character data. Utilities for the manipulation of sparse matrices can be found in Chapter F11.
If the task is simply to rearrange a one-dimensional array of data into sorted order, then an M01C function should be used, since this requires no extra workspace and is faster than any other method. There are no M01C functions for more complicated data structures, because the cost of rearranging the data is likely to outstrip the cost of comparisons. Instead, a combination of M01D and M01E functions, or some other approach, must be used as described below.
For many applications it is in fact preferable to separate the task of determining the sorted order (ranking) from the task of rearranging data into a pre-determined order; the latter task may not need to be performed at all. Frequently it may be sufficient to refer to the data in sorted order via an index vector, without rearranging it. Frequently also one set of data (e.g., a column of a matrix) is used for determining a set of ranks, which are then applied to other data (e.g., the remaining columns of the matrix).
To determine the ranks of a set of data, use an M01D function. Routines are provided for ranking one-dimensional arrays, and for ranking rows or columns of two-dimensional arrays. For ranking an arbitrary data structure, use nag_sort_arbitrary_rank (m01dz), which is, however, much less efficient than the other M01D functions.
To create an index vector so that data can be referred to in sorted order, first call an M01D function to determine the ranks, and then call nag_sort_permute_invert (m01za) to convert the vector of ranks into an index vector.
To rearrange data according to pre-determined ranks: use an M01E function if the data is stored in a one-dimensional array; or if the data is stored in a more complicated structure
• either use an index vector to generate a new copy of the data in the desired order
• or rearrange the data without using extra storage by first calling nag_sort_permute_decompose (m01zc) and then using the simple code-framework given in the document for nag_sort_permute_decompose (m01zc) (assuming that the elements of data all occupy equal storage).
To search for an item in a one-dimensional sorted array of data, use an M01N function. These functions return the index of the first item with the key equal to the sought-after item or if it is not found, the index of the last item containing the biggest value less than the sought-after item.
Examples of these operations can be found in the function documents of the relevant functions.

## Functionality Index

 Ranking,
 arbitrary data nag_sort_arbitrary_rank (m01dz)
 columns of a matrix,
 integer numbers nag_sort_intmat_rank_columns (m01dk)
 real numbers nag_sort_realmat_rank_columns (m01dj)
 rows of a matrix,
 integer numbers nag_sort_intmat_rank_rows (m01df)
 real numbers nag_sort_realmat_rank_rows (m01de)
 vector,
 character data nag_sort_charvec_rank (m01dc)
 integer numbers nag_sort_intvec_rank (m01db)
 real numbers nag_sort_realvec_rank (m01da)
 Rearranging (according to pre-determined ranks):
 vector,
 character data nag_sort_charvec_rank_rearrange (m01ec)
 complex numbers nag_sort_cmplxvec_rank_rearrange (m01ed)
 integer numbers nag_sort_intvec_rank_rearrange (m01eb)
 real numbers nag_sort_realvec_rank_rearrange (m01ea)
 Searching (i.e., exact match or the nearest lower value):
 binary search,
 vector,
 integer numbers nag_sort_intvec_search (m01nb)
 null terminated strings nag_sort_charvec_search (m01nc)
 real numbers nag_sort_realvec_search (m01na)
 Service functions,
 check validity of a permutation nag_sort_permute_check (m01zb)
 decompose a permutation into cycles nag_sort_permute_decompose (m01zc)
 invert a permutation (ranks to indices or vice versa) nag_sort_permute_invert (m01za)
 Sorting (i.e., rearranging into sorted order):
 quick sort,
 vector,
 character data nag_sort_charvec_sort (m01cc)
 integer numbers nag_sort_intvec_sort (m01cb)
 real numbers nag_sort_realvec_sort (m01ca)

## References

Knuth D E (1973) The Art of Computer Programming (Volume 3) (2nd Edition) Addison–Wesley
Wirth N (2004) Algorithms and Data Structures 35–36 Prentice Hall

PDF version (NAG web site, 64-bit version, 64-bit version)
Chapter Contents
Chapter Introduction
NAG Toolbox

© The Numerical Algorithms Group Ltd, Oxford, UK. 2009–2013