Essential Introduction

All users both familiar or unfamiliar with this Library who are thinking of using a routine from it, are asked to please follow these instructions:

(a) | read the whole of this Essential Introduction; |

(b) | select an appropriate chapter or routine by using the online Keyword and GAMS Search; |

(c) | read the relevant Chapter Introduction; |

(d) | choose a routine, and read the routine document. If the routine does not after all meet your needs, return to step (c); |

(e) | read the Users' Note for your implementation; |

(f) | consult local documentation, which should be provided by your local support staff, about access to the Library on your computing system; |

(g) | obtain a copy of the example program (see Section 4.5) for the particular routine of interest and experiment with it. |

You should now be in a position to include a call to the routine in a program, and to attempt to compile and run it. You may of course need to refer back to the relevant documentation in the case of difficulties, for advice on assessment of results, and so on.

As you become familiar with the Library, some of steps (a) to (h) can be omitted, but it is useful to:

- have read this
**Essential Introduction**; - be familiar with the
**Chapter Introduction**; - read the
**routine document**; - be aware of the
**Users' Note**for your implementation.

The NAG Library is a comprehensive collection of **routines** for the solution of numerical and statistical problems.

The Library is divided into **chapters**, each devoted to a branch of numerical analysis or statistics. Each chapter has a three-character name and a title,
e.g.,

Exceptionally, Chapters H and S have one-character names. The chapters and their names are based on the ACM modified SHARE classification index (see ACM (1960–1976)).

All documented routines in the Library have six-character names, beginning with the characters of the chapter name,
e.g.,

Note that the second and third characters are **digits**, not letters; e.g., 0 is the digit zero, not the letter O. The last letter of each routine name almost always appears as ‘F’ in the documentation. Chapters D03 and E04 have some routines whose last letter is ‘A’ rather than ‘F’. An ‘A’ version is always paired with an ‘F’ routine, the ‘A’ version being safe to use in a multithreaded environment, but otherwise having identical functionality to the ‘F’ version.

Chapter F06 (Linear Algebra Support Routines) contains all the Basic Linear Algebra Subprograms, BLAS (Dongarra et al. (1988) and Dongarra et al. (1990)), with NAG-style names as well as the actual BLAS names, e.g., F06PAF (DGEMV). The name in brackets is the equivalent double precision BLAS name. Chapter F16 contains some of the routines specified in the BLAS Technical Form (The BLAS Technical Forum Standard (2001) and Blackford et al. (2002)) and also some additional routines for integer valued vectors that are not in the standard. Some of the routines in Chapter F16 have both NAG-style names and BLAS names. Chapter F07 (Linear Equations (LAPACK)) and Chapter F08 (Least Squares and Eigenvalue Problems (LAPACK)) contain routines derived from the LAPACK project (Anderson et al. (1999)); also, Chapter F01 (Matrix Operations, Including Inversion) contains storage conversion routines derived from the LAPACK project. Like the BLAS, these routines have NAG-style names as well as LAPACK names, e.g., F07ADF (DGETRF). Details regarding these alternate names can be found in the relevant Chapter Introductions.

In order to take full advantage of machine-specific versions of BLAS and LAPACK routines provided by some computer hardware vendors, you are encouraged to use the BLAS and LAPACK names (e.g., DGEMV and DGETRF) rather than the corresponding NAG-style names (e.g., F06PAF (DGEMV) and F07ADF (DGETRF)) wherever possible in your programs.

Each documented routine has, in addition to its short six-character name, a long name beginning with the root **nagf_** and consisting of an underscore separated list of words. The long-name naming scheme has been chosen so that the long names group like routines together and group routines within a suite together.

The long name for each routine in a chapter is listed in the respective Chapter Contents page. The second word in the long name is fixed for each chapter, e.g., routines in Chapter D01 (Quadrature) all have long names that begin **nagf_quad_**.

Each chapter has a unique second word in its set of long names with the exception of Chapters F07 and F08 which share the same second word (**lapack**).

Note that the long names of BLAS and LAPACK routines, such as nagf_blas_dgemm, will not take advantage of machine-specific versions of BLAS and LAPACK. As mentioned in Section 2.1 you are recommended to use the plain BLAS or LAPACK name (in this case DGEMM) for performance reasons.

Routines that are marked for withdrawal have long names that have the third word **withdraw**. At subsequent marks of the library, any routine that becomes marked for withdrawal will have the third word **withdraw** inserted into its long name; the original long name will no longer be available for the given routine at that stage.

For those chapters that have both A and F versions of a routine, the long name for the F version is the same as that of the A version, but with an additional last word (**old** – signifying that the F version predates the A version).

It should be noted that the long names are implemented by the use of aliasing in the NAG Library interface block modules, and so long names are only accessible when calling the NAG Library from a Fortran program that USEs nag_library.mod.

Please refer to Section 3.2.1 for advice on supplying alternative routine names and, possibly, simplified routine interfaces.

The NAG Library Manual is the principal documentation for the NAG Library.
It has the same chapter structure as the Library: each chapter of routines in the Library has a corresponding chapter (of the same name) in the Manual. The chapters occur in alphanumeric order. General introductory documents appear at the beginning of the Manual.

Each chapter consists of the following documents:

**Chapter Contents**, e.g., Chapter D01;**Chapter Introduction**, e.g., the D01 Chapter Introduction;**Routine Documents**, one for each documented routine in the chapter.

A routine document has the same name as the routine which it describes. Within each chapter, routine documents occur in alphanumeric order of short names. For those chapters that have both ‘A’ and ‘F’ versions of a routine, the routine descriptions are combined into one routine document.

Documentation is provided in the following formats:

**HTML**, a fully linked version of the manual using HTML, SVG and MathML (recommended for browsing) and providing links to the PDF version of each document (recommended for printing);**PDF**, a full PDF manual browsed using the PDF bookmarks, or via HTML index files;**Single file PDF**, the manual as a single PDF file;**Windows HTML help**, Windows HTML help version as a single file.

Advice on viewing and navigating the formats available can be found in the document Online Documentation.

The most up-to-date version of the documentation is accessible via the NAG web site (see Section 5).

The Library is available on many different computer systems. For each distinct system, an **implementation** of the Library is prepared by NAG, e.g., the
Linux 64 (Intel® 64 / AMD64), GNU gfortran
implementation. The implementation is distributed to sites as a tested compiled library.

An implementation is usually specific to a range of machines/operating systems
(e.g., x86–64 architectures);
it may also be specific to a particular Fortran compiler, or compiler option
(such as
for calling convention).

Essentially the same facilities are provided in all implementations of the Library, but, because of differences in arithmetic behaviour and in the compilation system, routines cannot be expected to give identical results on different systems, especially for sensitive numerical problems.

The documentation supports all implementations of the Library, with the help of a few simple conventions, and a small amount of implementation-dependent information, which is published in a separate **Users' Note** for each implementation (see Section 4.4).

Periodically a new **Mark** of the NAG Library is released: new routines are added, corrections and/or improvements are made to existing routines; and occasionally routines are withdrawn if they have been superseded by improved routines.

You must know **which implementation**, **which precision** and **which mark and revision** of the Library you are using or intend to use. To find out which implementation, precision and mark of the Library is available at your site, you can run a program which calls the NAG Library routine A00AAF.

The program could be:

USE nag_library, ONLY: a00aaf CALL a00aaf END

Alternatively, the example program for A00AAF can be run using the nag_example scripts supplied with your implementation (see the Users' Note for details).

An example of the output is:

*** Start of NAG Library implementation details *** Implementation title: Linux, 64-bit, NAG Fortran (32-bit integers) Precision: FORTRAN double precision Product Code: FLL6A25D9L Mark: 25.0 (self-contained) *** End of NAG Library implementation details ***

All routines in the Library conform to the ISO Fortran 95 Standard (ISO (1997)).

A NAG Library routine **cannot** be guaranteed to return meaningful results irrespective of the data supplied to it. Care and thought **must** be exercised in:

(a) | formulating the problem; |

(b) | programming the use of Library routines; |

(c) | assessing the significance of the results. |

The Foreword to the Manual provides some further discussion of points (a) and (c); the remainder of Section 3 is concerned with (b) and (c).

The Library and its documentation are designed with the assumption that you will write a calling program in Fortran (although it may be called from other languages – see Section 3.11).

When programming a call to a routine, read the routine document carefully, especially the description of the **Parameters**. This states clearly which parameters must have values assigned to them on entry to the routine, and which return useful values on exit. See Section 4.3 for further guidance.

The most common types of programming error in using the Library are:

- incorrect parameters in a call to a Library routine;
- calling the Library from a single precision program.

The USE of the nag_library MODULE will help detect or prevent some of these errors. For example, when using this, incorrect parameter types will be caught at compile time and using KIND=nag_wp in the type of real and complex variables will maintain consistency with the Library.

Therefore if a call to a Library routine results in an unexpected error message from the system (or possibly from within the Library), **check** the following:

**Have some actual array arguments been passed as different dummy arguments (i.e., an array appears more than once in the argument list with different INTENTs)?****Have all array parameters been dimensioned correctly?**

Avoid the use of NAG-type names for your own program units or COMMON blocks: in general, do not use names which contain a three-character NAG chapter name embedded in them; they may clash with the names of an auxiliary routine or COMMON block used by the NAG Library.

If the Library is called from a Fortran program then it is possible to use alternative names for user-callable routines. This can be done via the ‘USE nag_library’ statement at the start of the (sub)program in which the Library routine is called. For example, you wish to use the name ‘BesselJ0’ instead of the Library name S17AEF. In this case the line

USE nag_library, ONLY: s17aefwould be replaced by

USE nag_library, ONLY: BesselJ0 => s17aef

The (sub)program would then use the name ‘BesselJ0’ in place of S17AEF and call it with the identical interface.

If Library routines are called from other environments then many such environments offer ways of ‘aliasing’ a routine name by a preferred alternative name.

For many of the Library routines with more complex interfaces it is likely that only a subset of the functionality is required and that some parameter values will always remain unchanged or will not be referenced. In such cases it may be preferable to write your own wrapper to the Library routine with a much simpler interface and with a preferred alternative name. For example, if you wish to integrate a system of stiff ordinary differential equations without root finding or intermediate output, you could create the simple interface wrapper to the more complicated D02EJF interface.

SUBROUTINE BDFsolve(xend,y) USE nag_library, ONLY: nag_wp, d02ejf, d02ejw, d02ejx, d02ejy REAL(kind=nag_wp) :: xend, y(:) REAL(kind=nag_wp) :: tol, xstart INTEGER :: ifail, iw, n CHARACTER :: relabs REAL(kind=nag_wp), ALLOCATABLE :: w(:) n = SIZE(y) tol = 1.0e-3_nag_wp relabs = 'M' iw = (12+n)*n + 50 ALLOCATE(w(iw)) ifail = 0 xstart = 0.0_nag_wp CALL d02ejf(xstart,xend,n,y,fcn,d02ejy,tol,relabs,d02ejx, & d02ejw,w,iw,ifail) RETURN END SUBROUTINE BDFsolve

The above example of a user-defined wrapper would be compiled and linked with a main program that would include the simple call:

CALL BDFsolve(xend,y)

The environment for the NAG Library is defined by the nag_library MODULE. Certain routines require you to USE this to access named constants (e.g., nag_wp). It is recommended that you also USE the MODULE to enable checking of INTERFACEs in the Library.

The exact location of nag_library.mod is installation dependent; please see the Users' Note for your implementation.

Routines in the Library that require a user-supplied function may be classified as either direct communication or reverse communication.

Direct communication routines require a user-supplied subroutine to be provided as an actual argument to the NAG Library routine. You must write this subroutine using a very rigid interface as specified in the relevant routine document. For the majority of applications this is the simplest and most convenient usage. Sometimes however this approach can be restrictive:

(i) | when the required format of the subroutine does not allow useful information to be passed conveniently to and from your calling program; |

(ii) | when the direct communication routine is being called from another computer language which does not fully support procedure arguments in a way that is compatible with the Library. |

These restrictions can be removed by using a reverse communication routine. Instead of obtaining the solution in one call, reverse communication routines perform one step of the solution process before returning to the calling program with an appropriate flag (IREVCM) set. The value of IREVCM determines whether the process has finished or whether fresh information is required. In the latter case the required information must be calculated before re-entering the reverse communication routine. Thus you have the responsibility for providing an iterative loop. Although reverse communication routines will typically be more complicated to use than direct communication equivalents they do provide greater flexibility for the evaluation of the function.

The error, failure or warning conditions considered here are those that can be detected by explicit coding in a Library routine. Such conditions must be anticipated by the author of the routine. They should not be confused with run-time errors detected by the compilation system, e.g., detection of overflow or failure to assign an initial value to a variable.

In the rest of this document we use the word ‘error’ to cover all types of error, failure or warning conditions detected by the routine. They fall roughly into three classes.

**All three classes of errors are handled in the same way by the Library.**

(i) | On entry to the routine the value of a parameter is out of range. This means that it is not useful, or perhaps even meaningful, to begin computation. |

(ii) | During computation the routine decides that it cannot yield the desired results, and indicates a failure condition. For example, a matrix inversion routine will indicate a failure condition if it considers that the matrix is singular and so cannot be inverted. |

(iii) | Although the routine completes the computation and returns results, it cannot guarantee that the results are completely reliable; it therefore returns a warning. For example, an optimization routine may return a warning if it cannot guarantee that it has found a local minimum. |

Each error which can be detected by a Library routine is associated with a number. Some numbers such as those associated with a failure in dynamic memory allocation (see Section 3.6) or detecting a valid licence (Section 3.7) are the same for all Library routines and may not be listed in individual routine documents. Recently added routines have standardized on using the same number for unexpected error exits (Section 3.8). All other numbers, with explanations of the errors, are listed in Section 6 (Error Indicators and Warnings) in the routine document. Unless the document specifically states to the contrary, you should not assume that the routine necessarily tests for the occurrence of the errors in their order of error number, i.e., the detection of an error does not imply that other errors have or have not been detected.

Most of the NAG Library routines which can be called directly by you have a parameter called IFAIL. This parameter is concerned with the NAG Library error trapping mechanism (and, for some routines, with controlling the output of error messages and advisory messages).

IFAIL has **two** purposes:

(i) | to allow you to specify what action the Library routine should take if an error is detected; |

(ii) | to inform you of the outcome of the call of the routine. |

For purpose (i), you **must** assign a value to IFAIL before the call to the Library routine. Since IFAIL is reset by the routine for purpose (ii), the parameter must be the name of a variable, **not** a literal or constant.

The value assigned to IFAIL before entry should be either $0$ (**hard fail** option), or $1$ or $-1$ (**soft fail** option). If after completing its computation the routine has not detected an error, IFAIL is reset to $0$ to indicate a **successful call.** Control returns to the calling program in the normal way. If the routine does detect an error, its action depends on whether the hard or soft fail option was chosen. If IFAIL is set to any value other than $-1$, $0$ or $1$ before calling the library routine, a default of IFAIL$\text{}=1$ is assumed.

If you set IFAIL to $0$ before calling the Library routine, execution of the program will terminate if the routine detects an error. Before the program is stopped, this error message is output:

** ABNORMAL EXIT from NAG Library routine XXXXXX: IFAIL = n ** NAG hard failure - execution terminatedwhere XXXXXX is the routine name, and n is the number associated with the detected error. An explanation of error number n is given in Section 6 of the routine document XXXXXX.

In addition, most routines output explanatory error messages immediately before the standard termination message shown above.

The hard fail option should be selected if you are in any doubt about continuing the execution of the program after an unsuccessful call to a NAG Library routine. For environments where it might be inappropriate to halt program execution when an error is detected it is recommended that the hard fail option is **not** used.

To select this option, you must set IFAIL to $1$ or $-1$ before calling the Library routine. Note that IFAIL$\text{}=1$ is assumed when IFAIL is set to an invalid value before calling the Library routine.

If the routine detects an error, IFAIL is reset to the associated error number; further computation within the routine is suspended and control returns to the calling program.

If you set IFAIL to $1$, then no error message is output (**silent exit**). If the output of error messages is undesirable, then silent exit is recommended.

If you set IFAIL to $-1$ (**noisy exit**), then before control is returned to the calling program, the following error message is output:

** ABNORMAL EXIT from NAG Library routine XXXXXX: IFAIL = n ** NAG soft failure - control returnedIn addition, most routines output explanatory error messages immediately before the above standard message.

The soft fail option puts the onus on you to handle any errors detected by the Library routine. With the proviso that you are able to implement it **properly**, it is clearly more flexible than the hard fail option since it allows computation to continue in the case of errors. In particular there are at least two cases where its flexibility is useful:

(i) | where additional information about the error or the progress of computation is returned via some of the other parameters; |

(ii) | in some routines, ‘partial’ success can be achieved, e.g., a probable solution found but not all conditions fully satisfied, so the routine returns a warning. On the basis of the advice in Section 6 and elsewhere in the routine document, you may decide that this partially successful call is adequate for certain purposes. |

The notation $\u2329\mathit{\text{value}}\u232a$ appearing in the documented error message is a place holder that will be populated by the value of a variable, argument name or some other piece of information when that error message is displayed.

The error handling mechanism described above was introduced into the NAG Library at Mark 12. It supersedes the earlier mechanism which for most routines allowed IFAIL to be set by you to $0$ or 1 only. The new mechanism is compatible with the old except that the details of the messages output on hard failure have changed. The new mechanism also allows you to set IFAIL to $-1$ (soft failure, noisy exit).

A few routines (introduced mainly at Marks 7 and 8) use IFAIL in a different way to control the output of error messages, and also of advisory messages (see Chapter X04). In those routines IFAIL is regarded as a decimal integer whose least significant digits are denoted $ba$ with the following significance:

$a=0$: hard failure | $a=1$: soft failure |

$b=0$: silent exit | $b=1$: noisy exit |

Details are given in the documents of the relevant routines; for those routines this alternative use of IFAIL remains valid.

Most NAG Library routines perform no output to an external file, except possibly to output an error message. All error messages are written to a logical **error message** unit. This unit number (which is set by default to 6 in most implementations) can be changed by calling the Library routine X04AAF.

Some NAG Library routines may optionally output their final results, or intermediate results to monitor the course of computation. In general, output other than error messages is written to a logical **advisory message** unit. This unit number (which is also set by default to 6 in most implementations) can be changed by calling the Library routine X04ABF. Although it is logically distinct from the error message unit, in practice the two unit numbers may be the same. A few routines in Chapter E04 allow this unit number to be specified directly as an option.

All output from the Library is appropriately formatted.

There are only a few Library routines which perform input from an external file. These examples occur in Chapters E04, E05 and H. The unit number of the external file is a parameter to the routine, and all input is formatted.

You must ensure that the relevant Fortran unit numbers are associated with the desired external files, either by an OPEN statement in your calling program, or by operating system commands.

In addition to those Library routines which are documented and are intended to be called by you directly, the Library also contains many auxiliary routines.

In general, you need not be concerned with them at all, although you may be made aware of their existence if, for example, you examine a memory map of an executable program which calls NAG routines. The only exception is that when calling some NAG Library routines you may be required or allowed to supply the name of an auxiliary routine from the NAG Library as an external procedure parameter. The routine documents give the necessary details. In such cases, you only need to supply the name of the routine; you **never** need to know details of its parameter list.

NAG auxiliary routines have names which are similar to the name of the documented routine(s) to which they are related, but with last letter ‘Z’, ‘Y’, and so on, e.g.,

- G13AFZ is an auxiliary routine called by G13AFF.

A few chapters contain auxiliary routines whose names are obtained by adding 50 to the second and third characters of the chapter name. For instance, Chapter E04 has an auxiliary routine with the name E54NFU which is normally used as the actual argument for the QPHESS parameter of E04NFA; the corresponding name to be used with E04NFF is E04NFU.

Some NAG Library routines perform dynamic memory allocation to simplify their interfaces.
Where possible, the amount of memory allocated by a routine will be given in the routine document (usually as a function of routine parameters).
All memory allocated by NAG routines is deallocated before exit.

In the case where a routine detects a failure to dynamically allocate sufficient memory, the routine will set an error condition, by setting $\mathrm{IFAIL}=-999$, and exit with an appropriate error message.

If your implementation is license managed then your local site will have details on how the license management is implemented; please contact your site installer for details. To determine whether a valid license is available on your machine run the example program for A00ACF.

Should a valid license not be found when calling license managed routines from the Library then the routine will set an error condition, by setting $\mathrm{IFAIL}=-399$, and exit with an appropriate error message. On Unix based systems, the appropriate environment variables should then be checked (e.g., NAG_KUSARI_FILE) to make sure this points to the licence file containing a valid licence, and the licence file should be checked for any obvious errors (e.g., the licence refers to a different implementation). If everything appears to be correct then please contact NAG (see Section 5 for details).

Internal calls to Library routines are checked for error exits even when these exits are not to be expected. Should an unexpected error exit occur the routine will set an error condition by setting IFAIL and exit with an appropriate error message. Historically, the number returned in IFAIL was particular to that routine and differing numbers could be used for this purpose. However, recently added routines have standardized by setting $\mathrm{IFAIL}=-99$ for unexpected error detection.

Implementations of the Library facilitate the use of threads wherever possible; that is, you can call routines from the Library from within a multithreaded application. See the Thread Safety document for more detailed guidance on using the Library in a multithreaded context. You may also need to refer to the Users' Note for details of whether your implementation of the Library has been compiled in a manner that facilitates the use of threads.

Note that in some implementations, the Library is linked with one or more vendor libraries to provide, for example, efficient BLAS routines. NAG cannot guarantee that any such vendor library is thread safe.

The time taken to execute a routine from the NAG Library has traditionally depended, to a large degree, on the serial performance capabilities of the processor being used. In an effort to go beyond the performance limitations of a single core processor, multithreaded implementations of the NAG Library are available. These implementations divide the computational workload of some routines between multiple cores and executes these tasks in parallel. Traditionally, such systems consisted of a small number of processors each with a single core. Improvements in the performance capabilities of these processors had until recently happened in line with increases in clock frequencies. However, this increase reached a limit which meant that processor designers had to find another way in which to improve performance; this led to the development of **multicore** processors, which are now ubiquitous. Instead of consisting of a single compute core, multicore processors consist of two or more, which typically comprise at least a Central Processing Unit and a small cache. Thus making effective use of parallelism, wherever possible, has become imperative in order to maximize the performance potential of modern hardware resources, and the multithreaded implementations.

The effectiveness of parallelism can be measured by how much faster a parallel program is compared to an equivalent serial program. This is called the parallel **speedup**. If a serial program has been parallelized then the speedup of the parallel implementation of the program is defined by dividing the time taken by the original serial program on a given problem by the time taken by the parallel program using $n$ cores to compute the same problem. Ideal speedup is obtained when this value is $n$ (i.e., when the parallel program takes $\frac{1}{n}$th the time of the original serial program). If speedup of the parallel program is close to ideal for increasing values of $n$ then we say the program has good **scalability**.

The scalability of a parallel program may be less than the ideal value because of two factors:

(a) | the overheads introduced as part of the parallel implementation, and |

(b) | inherently serial parts of the program. |

Overheads include communication and synchronisation as well as any extra setup required to allow parallelism. Such overheads can depend on efficiency of implementation and use of Application Programming Interfaces (APIs), and can vary depending on underlying hardware. The impact on performance of inherently serial fractions of a program is explained theoretically (i.e., assuming an idealised system in which overheads are zero) by **Amdahl's law**. Amdahl's law places an upper bound on the speedup of a parallel program with a given inherently serial fraction. If $r$ is the parallelizable fraction of a program and $s=1-r$ is the inherently serial fraction then the speedup using $n$ sub-tasks, ${S}_{n}$, satisfies the following:

$${S}_{n}\le \frac{1}{\left(s+\frac{r}{n}\right)}$$ |

Thus, for example, this says that a program with a serial fraction of one quarter can only ever achieve a speedup of 4 since as $n\to \infty $, ${S}_{n}\le 4$.

Parallelism may be utilised on two classes of systems: shared memory and distributed memory machines, which require different programming techniques. Distributed memory machines are composed of processors located in multiple components which each have their own memory space and are connected by a network. Communication and synchronisation between these components is explicit. Shared memory machines have multiple processors (or a single multicore processor) which can all access the same memory space, and this shared memory is used for communication and synchronisation. The NAG Library makes use of shared memory parallelism using the OpenMP API as described in Section 3.10.2.

Parallel programs which use OpenMP create (or "fork") a number of **threads** from a single process when required at run-time. (Programs which make use of shared memory parallelism are also called **multithreaded** programs.) Once the parallel work has been completed the threads return control to the parent process and become inactive (or "join") until the next region of parallel work. The threads share the same memory address space, i.e., that of the parent process, and this shared memory is used for communication and synchronisation. OpenMP provides some mechanisms for access control so that, as well as allowing all threads to access shared variables, it is possible for each thread to have private copies of other variables that only it can access. For shared variables, thread safety is an issue. A program is deemed to be "thread safe" if it can be executed using two or more threads without compromising results. Thread safe programs should return equally valid results no matter how many threads are used in the parallel regions. However, that is not to say that identical results can be guaranteed, or should be expected. Identical results are often impossible in a parallel program since using different numbers of threads may cause floating-point arithmetic to be evaluated in a different (but equally valid) order, thus changing the accumulation of rounding errors. For a more in-depth discussion of reproducibility of results see Section 3.12.

The multithreaded implementations differ from the serial implementations of the NAG Library in that it makes use of multithreading through the OpenMP API (version 3.0), which is a portable specification for shared memory programming that is available in many different compilers on a wide range of different hardware platforms (see OpenMP).

Note that not all routines are parallelized; you should check Section 8 of the routine documents to find details about parallelism and performance of routines of interest.

There are two situations in which a call to a routine in the NAG Library makes use of multithreading:

A complete list of all the routines in the NAG Library, and their threaded status is given in the Multithreaded Routines document.

1. | The routine being called is a NAG-specific routine that has been threaded using OpenMP, or that internally calls another NAG-specific routine that is threaded. |

2. | The routine being called calls through to the vendor library (e.g., Intel MKL, AMD ACML, IBM ESSL, Oracle Sunperf, etc.). This happens if the routine is not specific to the NAG Library, and the vendor library offers superior parallel performance and equivalent numerical properties. For example, most BLAS and LAPACK routines fall into this category. The vendor library recommended for use with your implementation of the NAG Library (whether the NAG Library is threaded or not) may be threaded. Please consult the documentation for the vendor library for further information. |

It is useful to understand how OpenMP is used within the library in order to avoid the potential pitfalls which lead to making inefficient use of the library.

A call to a threaded NAG-specific routine may, depending on input and at one or more points during execution, use OpenMP to create a team of threads for a parallel region of work. The team of threads will fork at the start of the parallel region before joining at the end of the parallel region. Both the fork and the join will happen internally within the routine call (although there are situations in which the teams of threads may be made available to orphaned directives in your code via user-supplied subprograms, see Section 8 of the routine documents for further information). Furthermore, OpenMP constructs within NAG routines bind to teams of threads created within the NAG code (i.e., there are no orphaned directives). For threaded NAG-specific routines all thread management is performed by the OpenMP run-time and NAG does not provide any extra threading controls or options. Thus all OpenMP environment variables and function settings apply equally to calls to these NAG routines and to your own parallel regions. In particular, you should take care when calling these NAG routines from within your own parallel regions, since if nested parallelism is enabled (it is disabled by default) the NAG routine will fork-and-join a team of threads for each calling thread, which may lead to contention on system resources and very poor performance. Poor performance due to contention can also occur if the number of threads requested exceeds that which the hardware is capable of supporting, or if some hardware resources are busy executing other processes (which may belong to other users in a shared system). For these reasons you should be aware of the maximum number of threads supported in hardware and the workload of your machine, and use this information in selecting a number of threads which minimizes contention on resources. Please read the Users' Note for advice about setting the number of threads to use, or contact NAG (see Section 5) for advice.

If you are calling multithreaded NAG routines from within another threading mechanism you need to be aware of whether or not this threading mechanism is compatible with the OpenMP compiler runtime used to build the multithreaded implementation of the NAG Library on your platform(s) of choice. The Users' Note document for each of the implementations in question will include some guidance on this, and you should contact NAG for further advice if required.

Parallelism is used in many places throughout the NAG Library since, although many routines have not been the focus of parallel development by NAG, they may benefit by calling routines that have, and/or by calling parallel vendor routines (e.g., BLAS, LAPACK). Thus, the performance improvement due to multithreading, if any, will vary depending upon which routine is called, problem sizes and other parameters, system design and operating system configuration. If you frequently call a routine with similar data sizes and other parameters, it may be worthwhile to experiment with different numbers of threads to determine the choice that gives optimal performance. Please contact NAG for further advice if required.

As a general guide, many key routines in the following areas are known to benefit from shared memory parallelism:

- Dense and Sparse Linear Algebra
- FFTs
- Random Number Generators
- Quadrature
- Partial Differential Equations
- Interpolation
- Curve and Surface Fitting
- Correlation and Regression Analysis
- Multivariate Methods
- Time Series Analysis
- Financial Option Pricing
- Global Optimization
- Wavelets

In general the NAG Library can be called from other computer languages (such as
C and Visual Basic)
provided that appropriate mappings exist between
their data types.

NAG has produced C Header Files which comprise of a set of header files, indicating the match between C and Fortran data types for various compilers, documentation and examples. The documentation, examples and C Header Files are available from the NAG Web sites (see Section 5).

The Dynamic Link Library (DLL) implementation can be called in a straightforward manner from a number of languages and environments, e.g., Visual Basic, Visual Basic for Applications (Excel), Fortran, C and C++. Guidance on this is provided in the Users' Note for the NAG Library DLLs. Further details can be found on the NAG Web sites.

The results obtained when calling a NAG Library routine depend not only on the algorithm used to solve the problem, but also on the compiler used to build the library, compiler run-time libraries, and also the arithmetic properties of the machine on which the code is run.

Historically, different kinds of computer hardware tended to have different kinds of arithmetic. Some machines would store floating-point numbers using a base 16 significand and exponent system, others would use base 2, and some even used base 8 or 10. Such differences caused major headaches for software library providers because code that worked well on one arithmetic system might not behave in exactly the same way on another. This meant that great care had to be taken to make the library code **portable**.

In addition, it was not unheard of for machine arithmetic to have flaws or errors where basic operations such as multiplication or division could sometimes give incorrect results, especially on numbers that were in some way ‘extreme’, such as being very large or small.

After the first of the IEEE standards for floating-point arithmetic (ANSI/IEEE (1985)) was introduced in the 1980s, the situation improved greatly. Nowadays most significant hardware, and certainly most hardware that NAG libraries run on, will use IEEE-style base 2 arithmetic. This makes production of portable code easier, but there are still problems, partly due to the latitude allowed by the IEEE standards. For example, hardware which uses extra-precise 80-bit internal registers for arithmetic, as originally introduced in the Intel 8087 coprocessor in the 1980s, behaves slightly differently from hardware that uses 64-bit registers, particularly if a compiler generates optimized code which holds arithmetic subexpressions in the extra-precise registers.

Since for performance reasons computer arithmetic is generally finite precision (as is certainly the case for IEEE standard
arithmetic) most of the numerical methods implemented by NAG Library routines can only return an approximation to the true solution, simply due to accumulation of rounding errors.

It should therefore be clear that running a program which calls a NAG Library routine with the same data on two different machines can give different results, due to compiler, hardware and run-time library considerations. Usually these differences are small – it may be that a result computed on one machine differs only in the last few significant bits from the same result computed on another machine – for example, when solving a well-conditioned set of linear equations on two different machines. Occasionally small differences may be magnified, for example if a conditional test depends on an imprecise result. A routine that searches for a mininum of an optimization problem may converge to a different local minimum, but in general, so long as the routine's documentation doesn't claim that the **same** local minimum will always be obtained, this should be acceptable. Even if an algorithm converges to the same local minimum, arithmetic differences may mean that a different number of iterations is taken to get there.

Modern hardware and optimizing compilers have introduced further scope for arithmetic quirks. An example is in the use of **Streaming SIMD Extension (SSE)** instructions. These low-level machine instructions allow hardware to operate on more than one number in parallel, if your compiler is smart enough to generate and use them correctly, or if you hand-code your own assembly language routines.

SSE instructions enable low-level parallelism of floating-point arithmetic operations. For example, a 128-bit SSE register can hold two 64-bit double precision (or four 32-bit single precision) numbers at the same time, and operate on them all simultaneously. This can lead to big time savings when working on large amounts of data.

But this may come at a price. Efficient use of SSE instructions can sometimes depend on exactly how the memory used to store data is aligned. Some SSE instructions for moving data to and from memory need memory to be aligned on a 16-byte boundary. If it happens that the memory (for example, a pointer to an array of numbers) that a NAG routine uses is **not** aligned nicely, then it may not be possible to use those SSE instructions.
An optimizing compiler might well generate two instruction streams, one for when it detects that memory is aligned, and one for when it is not.

An example should serve to make things clearer. Suppose we wish to compute the inner product of two vectors, X and Y, each of length N. The inner product (or dot product) of two vectors is computed by multiplying together corresponding elements of the two vectors, and summing the individual products to get the result. A routine compiled by a good optimizing compiler would load numbers two or four at a time, multiply them together two or four at a time, and accumulate the results into the final result.

But if the memory is not nicely aligned – and it may well not be – the compiler needs to generate a different code path to deal with the situation. Here the result will take longer to get because the products must be computed and accumulated one at a time. At run-time, the code checks whether it can take the fast path or not, and works appropriately.

The problem is that by altering the order of the accumulations, we are quite possibly changing the final result, simply due to rounding differences when working with finite precision computer arithmetic. Instead of getting the inner product

we may get

$$s={x}_{1}\times {y}_{1}+{x}_{2}\times {y}_{2}+{x}_{3}\times {y}_{3}+\cdots +{x}_{n}\times {y}_{n}$$ |

$$s=\left({x}_{1}\times {y}_{1}+{x}_{3}\times {y}_{3}\right)+\left({x}_{2}\times {y}_{2}+{x}_{4}\times {y}_{4}\right)+\cdots \text{.}$$ |

It is likely that the result will be just as accurate either way – neither result will be precise due to finite arithmetic – but they may differ by a tiny amount. And if that tiny difference leads to a different decision being made by the code that called the inner product routine, the difference may be magnified.

Furthermore, it is possible that the same program running with bitwise identical data on the same machine may give different results when run twice in a row simply because, when the program is loaded, by chance some piece of memory may or may not be aligned on a particular boundary. Such non-deterministic results can be frustrating if the user of the program depends on always getting identical results for the same data.

On even newer hardware, **AVX** instructions use 256-bit registers, and can therefore operate on more numbers at a time. For AVX instructions, memory may need to be 32-byte aligned.

Some memory used by NAG Library routines is allocated inside the NAG Library. In order to minimize differences due to effects like that described above, we can try to make sure the memory is always aligned nicely – for example, by use of more controllable memory allocation routines where available – but that is not always possible since it partly depends on the support of the compiler.

Of course, no Library routine has control over memory you have allocated before being passed to the routine. If you do observe non-deterministic results which you suspect are due to memory considerations, and you are unable to accept this variation, then you are advised to make sure that any memory you allocate is aligned nicely; unfortunately, precisely how you do this is dependent on your system, but you may be able to get advice through NAG's usual support channels (see Section 5).

Parallelism, coming from a multithreaded implementation of the NAG Library and/or a multithreaded vendor library is another potential source of non-determinism in numerical results. Some routines may give different results when run on different numbers of cores, or even different results when a calculation is repeated on the same number of cores. Where reproducibility of results is vital, a purely serial NAG library, without parallelism in either NAG routines or calls to parallel vendor library routines will generally be available in an appropriate implementation, and may be the best choice. You are advised to consult NAG (see Section 5) for advice.

Mathematical operations on fixed-length floating point numbers (e.g., 32-bit floats or 64-bit doubles) are not associative. This means that a computer may produce different results for $a+\left(b+c\right)$ and $\left(a+b\right)+c$. For example, an IEEE 754 32-bit floating point number has a mantissa of $23$ bits. Therefore in this number format ${2}^{24}+1={2}^{24}$, which means that for instance $\left({2}^{24}+1\right)-{2}^{24}=0$ while ${2}^{24}+\left(1-{2}^{24}\right)=1$. BWR is a term which refers to the case in which a given computer program (e.g., a set of source codes) produces bit-for-bit the exact same answer in different computing environments such as

1. | Different operating systems (e.g., answers produced on Windows vs answers produced on Linux). |

2. | Different CPU architectures (e.g., Intel vs AMD or Intel Sandy Bridge vs Intel Ivy Bridge etc.). |

3. | Different compiler versions. |

4. | Different numbers of threads. |

Users often desire BWR however it is extremely difficult to achieve. Typically you should ensure that:

(a) | Instructions are always executed in exactly the same order. |

(b) | No advanced CPU features are used which may not be available on other processors (e.g., SSE3, SSE4, AVX). |

(c) | A fixed number of threads is always used. |

Often condition (a) is equivalent to compiling with no (or very limited) compiler optimizations, since newer versions of compilers typically improve their code optimization algorithms, which means one version of a compiler may optimize a set of operations one way while the next version may optimize it a different way. Condition (b) typically means that only basic SSE instructions are allowed, such as are supported across the widest range of processors and the enhanced SIMD instructions present in newer processors are not exploited.

The result is that to achieve BWR across a wide range of computing environments one often has to sacrifice a lot of performance.

An implementation of the NAG Library that is not self-contained will make calls to an appropriate vendor library containing, in particular, high performance linear algebra routines. The NAG Library has no direct control over BWR with respect to results obtained from calls to the vendor library. However, for at least one such vendor library, CBWR has been introduced such that if an environment variable is set and a set of conditions adhered to in the code calling the vendor library then BWR can be forced. Where CBWR is available for a vendor library used by an implementation of the NAG Library, details will be given in the Users' Note for that implementation.

It should be noted that many NAG routines do not adhere to the conditions set out by vendor library CBWR and so it may not be possible to ensure BWR for all NAG Library routines across different CPU architectures for implementations that are not self-contained.

The Manual is designed to serve the following functions for the NAG Library:

- to give background information about different areas of numerical and statistical computation;
- to advise on the choice of the most suitable NAG Library routine or routines to solve a particular problem;
- to give all the information needed to call a NAG Library routine correctly from a Fortran program, and to assess the results.

At the beginning of the Manual are some general introductory documents which provide some background and additional information.

The document entitled ‘Mark 25 NAG Fortran Library News’ provides details of new routines added, details of routines scheduled for withdrawal and details of routines withdrawn at this mark.

The document entitled ‘Advice on Replacement Calls for Withdrawn/Superseded Routines’ provides advice on how to modify your program.

The online documentation includes a Keyword and GAMS Search which provides you with a form to search the Library for keywords.

Having found a likely chapter or routine, you should read the corresponding **Chapter Introduction,** which gives background information about that area of numerical computation, and recommendations on the choice of a routine, including indexes, tables and decision trees.

When you have chosen a routine, you must consult the **routine document**. Each routine document is essentially self-contained (it may, however, contain references to related documents). It includes a description of the method, detailed specifications of each parameter, explanations of each error exit, remarks on accuracy, and (in most cases) an example program to illustrate the use of the routine.

All routine documents have the same structure consisting of ten numbered sections:

1. |
Purpose |

2. |
Specification |

3. |
Description |

4. |
References |

5. |
Parameters (see Section 4.3 below) |

6. |
Error Indicators and Warnings |

7. |
Accuracy |

8. |
Parallelism and Performance |

9. |
Further Comments |

10. |
Example (see Section 4.5 below) |

In some documents (notably Chapters E04, E05 and H) there are a further three sections:

11. |
Algorithmic Details |

12. |
Optional Parameters |

13. |
Description of Monitoring Information |

The sections numbered 11. and 13. above are optional; thus, the section titled **Optional Parameters** may appear as (the possibly final) Section 11.

Section 5 of each routine document contains the specification of the parameters, in the order of their appearance in the parameter list.

Parameters are classified as follows.

Input: you must assign values to these parameters on or before entry to the routine, and these values are unchanged on exit from the routine.

Output: you need not assign values to these parameters before entry to the routine; the routine may assign values to them.

Input/Output: you must assign values to these parameters before entry to the routine, and the routine may then change these values.

Workspace: array parameters which are used as workspace by the routine. You must supply arrays of the correct type and dimension. In general, you need not be concerned with their contents.

Communication Array:
parameters which are used to communicate data from one routine call to another.

External Procedure: a routine which must be supplied (e.g., to evaluate an integrand or to print intermediate output). Usually it must be supplied as part of your calling program, in which case its specification includes full details of its parameter list and specifications of its parameters (all enclosed in a box). Its parameters are classified in the same way as those of the Library routine, but because you must write the procedure rather than call it, the significance of the classification is different.

- Input: values may be supplied on entry, which your procedure
**must not**change. - Output: you may or must assign values to these parameters before exit from your procedure.
- Input/Output: values may be supplied on entry, and you may or must assign values to them before exit from your procedure.

Occasionally, as mentioned in Section 3.5, the procedure can be supplied from the NAG Library, and then you only need to know its name.

User Workspace: array parameters which are passed by the Library routine to an external procedure parameter. They are not used by the routine, but you may use them to pass information between your calling program and the external procedure.

Dummy: a simple variable which is not used by the routine. A variable or constant of the correct type must be supplied, but its value need not be set. (A dummy parameter is usually a parameter which was required by an earlier version of the routine and is retained in the parameter list for compatibility.)

The word ‘Constraint:’ or ‘Constraints:’ in the specification of an Input parameter introduces a statement of the range of valid values for that parameter, e.g.,

- Constraint: $\mathrm{N}>0$.

If the routine is called with an invalid value for the parameter
(e.g.,$\mathrm{N}=0$),
the routine will usually take an error exit, returning a nonzero value of IFAIL (see Section 3.3).

Constraints on parameters of type CHARACTER only list upper case alphabetic characters, e.g.,

- Constraint: $\mathrm{CHECK}=\text{'N'}$.

In practice, all routines with CHARACTER parameters will permit the use of lower case characters.

The phrase ‘Suggested value:’ introduces a suggestion for a reasonable initial setting for an Input parameter (e.g., accuracy or maximum number of iterations) in case you are unsure what value to use; you should be prepared to use a different setting if the suggested value turns out to be unsuitable for your problem.

Most array parameters have dimensions which depend on the size of the problem. In Fortran terminology they have ‘adjustable dimensions’: the dimensions occurring in their declarations are integer variables which are also parameters of the Library routine.

For example, a Library routine might have the specification:

SUBROUTINE <name> (M, N, A, B, LDB) INTEGER M, N, A(N), B(LDB,N), LDB

For a **one-dimensional** array parameter, such as A in this example, the specification would begin

- A(N) – INTEGER array

You must ensure that the dimension of the array, as declared in your calling (sub)program, is at least as large as the value you supply for N. It may be larger, but the routine uses only the first N elements.

For a **two-dimensional** array parameter, such as B in the example, the specification might be

- B(LDB,N) – INTEGER array
- On entry: the $m$ by $n$ matrix $B$.

- LDB – INTEGER
- On entry: the first dimension of the array B as declared in the (sub)program from which <name> is called.
- Constraint: $\mathrm{LDB}\ge \mathrm{M}$.

You **must** supply the **first** dimension of the array B, as declared in your calling (sub)program, through the parameter LDB, even though the number of rows actually used by the routine is determined by the parameter M. You must ensure that the first dimension of the array is at least as large as the value you supply for M. The extra parameter LDB is needed
to allow the routine to act on subarrays of a larger two-dimensional array, e.g., factorizing a diagonal submatrix of a larger matrix.

You must also ensure that the **second** dimension of the array, as declared in your calling (sub)program, is at least as large as the value you supply for N. It may be larger, but the routine uses only the first N columns.

A program to call the hypothetical routine used as an example in this section might include the statements:

INTEGER AA(100), BB(100,50) LDB = 100 . . . M = 80 N = 20 CALL <name>(M,N,AA,BB,LDB) |
or |
INTEGER ALLOCATABLE :: AA(:), BB(:,:) INTEGER :: M, N, LDB . . . READ(5,*) M, N LDB = M ALLOCATE (AA(M),BB(LDB,N)) CALL <name>(M,N,AA,BB,LDB) |

Many NAG routines contain array parameters declared with the ‘assumed size’ array dimension, and would be given as

INTEGER A(*), B(LDB,*)

However, the original declaration of an array in your calling program must always have dimensions, greater than or equal to the minimum value documented. The advantage of using allocatable arrays is that they can be dynamically allocated to be of a correct size not known at compile time.

Consult an expert or a textbook on Fortran if you have difficulty in calling NAG routines with array parameters.

In order to support all implementations of the Library, the Manual has adopted a convention of using bold
italics to distinguish terms which have different interpretations in different implementations.

One bold italicised term is machine precision, which denotes the relative precision to which real floating-point numbers are stored in the computer, e.g., in an implementation with approximately 16 decimal digits of precision, machine precision has a value of approximately ${10}^{-16}$.

The precise value of machine precision is given by the routine X02AJF. Other routines in Chapter X02 return the values of other implementation-dependent constants, such as the overflow threshold, or the largest representable integer. Refer to the X02 Chapter Introduction for more details.

The bold italicised term block size is used only in Chapters F07 and F08. It denotes the block size used by block algorithms in these chapters. You only need to be aware of its value when it affects the amount of workspace to be supplied – see the parameters WORK and LWORK of the relevant routine documents and the appropriate Chapter Introduction.

For each implementation of the Library, a separate **Users' Note** is published. This is a short document, revised at each mark. At most installations it is available in machine-readable form. It gives any necessary additional information which applies specifically to that implementation, in particular:

- the values returned by Chapter X02 routines;
- the default unit numbers for output (see Section 3.4);
- the meanings of the precision parameters nag_rp (reduced precision), nag_wp (basic precision) and nag_hp (additional precision).

The **example program** in Section 10 of most routine documents illustrates a simple call of the routine. The programs are designed so that they can be fairly easily modified, and so serve as the basis for a simple program to solve your problem.

For each implementation of the Library, NAG distributes the example programs in machine-readable form, with all necessary modifications already applied. Many sites make the programs accessible to you in this form.
Generic forms of the programs, without implementation-specific modifications, may be obtained directly from the NAG web site. The Users' Note for your implementation will mention any special changes which need to be made to the example
programs.

Note that the results obtained from running the example programs may not be identical in all implementations, and may not agree exactly with the results in the Manual.

For many routine documents, a plot of the example program results is also provided. In some cases the example program has been modified slightly to produce larger sets of results to give a more representative plot of the solution profile produced.

The NAG Technical Support Service is available for general enquiries from all users and also for technical queries from sites that subscribe to the support service.

The service is available during office hours, but contact is possible by email and telephone (answering machine) at all times. Please see the Users' Note or the NAG web site for contact details.

When contacting the NAG Technical Support Service, it helps us to deal with your query quickly if you can quote your NAG customer reference number and NAG product code.

The NAG web site is an information service providing items of interest to users and prospective users of NAG products and services. The information is regularly updated and reviewed, and includes implementation availability, descriptions of products, downloadable software and documentation, case studies, industry articles and technical reports. The NAG web site can be accessed via:

Various aspects of the design and development of the NAG Library, and NAG's technical policies and organization are given in Ford (1982), Ford et al. (1979), Ford and Pool (1984), and Hague et al. (1982).

ACM (1960–1976) Collected algorithms from ACM index by subject to algorithms

Al–Mohy A H and Higham N J (2011) Computing the action of the matrix exponential, with an application to exponential integrators *SIAM J. Sci. Statist. Comput.* **33(2)** 488-511

Anderson E, Bai Z, Bischof C, Blackford S, Demmel J, Dongarra J J, Du Croz J J, Greenbaum A, Hammarling S, McKenney A and Sorensen D (1999) *LAPACK Users' Guide* (3rd Edition) SIAM, Philadelphia http://www.netlib.org/lapack/lug

ANSI (1966) USA standard Fortran *Publication X3.9* American National Standards Institute

ANSI (1978) American National Standard Fortran *Publication X3.9* American National Standards Institute

ANSI/IEEE (1985) IEEE standard for binary floating-point arithmetic *Std 754-1985* IEEE, New York

ANSI/IEEE POSIX (1995) *POSIX Standard Thread Library* ANSI/IEEE POSIX 1003.1c:1995

Basic Linear Algebra Subprograms Technical (BLAST) Forum (2001) *Basic Linear Algebra Subprograms Technical (BLAST) Forum Standard* University of Tennessee, Knoxville, Tennessee http://www.netlib.org/blas/blast-forum/blas-report.pdf

Blackford L S, Demmel J, Dongarra J J, Duff I S, Hammarling S, Henry G, Heroux M, Kaufman L, Lumsdaine A, Petitet A, Pozo R, Remington K and Whaley R C (2002) An updated set of Basic Linear Algebra Subprograms (BLAS) *ACM Trans. Math. Software* **28** 135–151

Dongarra J J, Du Croz J J, Duff I S and Hammarling S (1990) A set of Level 3 basic linear algebra subprograms *ACM Trans. Math. Software* **16** 1–28

Dongarra J J, Du Croz J J, Hammarling S and Hanson R J (1988) An extended set of FORTRAN basic linear algebra subprograms *ACM Trans. Math. Software* **14** 1–32

Ford B (1982) Transportable numerical software *Lecture Notes in Computer Science* **142** 128–140 Springer–Verlag

Ford B, Bentley J, Du Croz J J and Hague S J (1979) The NAG Library ‘machine’ *Softw. Pract. Exper.* **9(1)** 65–72

Ford B and Pool J C T (1984) The evolving NAG Library service *Sources and Development of Mathematical Software* (ed W Cowell) 375–397 Prentice–Hall

Hague S J, Nugent S M and Ford B (1982) Computer-based documentation for the NAG Library *Lecture Notes in Computer Science* **142** 91–127 Springer–Verlag

ISO (1997) ISO Fortran 95 programming language (ISO/IEC 1539–1:1997)

ISO/IEC (1990) Information technology – programming language C *Current C Language Standard* ISO/IEC 9899:1990

Kernighan B W and Ritchie D M (1988) *The C Programming Language* (2nd Edition) Prentice–Hall

OpenMP *The OpenMP Specification for Parallel Programming* http://www.openmp.org

The BLAS Technical Forum Standard (2001) http://www.netlib.org/blas/blast-forum