We all know that computers are becoming increasingly parallel. The latest generation of x86 processors have 12 cores and Intel recently announced an experimental 48 core processor. A lot of attention has focussed on the associated reductions in clock speeds which makes software run slower, but less attention has focussed on other issues such as changes in the amounts of cache available and more contention for memory bandwidth.
The fact is that access to memory is fast becoming the scarce resource, rather than CPU cycles. Most mathematical algorithms in the NAG Library (and other similar software) are designed for ease-of-use with the user passing in a collection of data structures describing the problem to be solved and the routine only exiting when it has calculated the answer or an error is encountered. However we do have a number of routines which take a different approach, called reverse communication. Here, the user provides some initial data and when the routine wants more it stops. State is preserved between calls but the routine never "sees" all the input data at once. An example is f11be which solves a sparse system of equations Ax=b. Instead of supplying A and b explicitly, the routine stops when it needs information like the value of Au where u is an intermediate estimate of the solution.
There are two main advantages to this approach. First, the user can carry out other computations with the data while iterating over it, reducing memory access. Secondly, the user can represent thne data any way that they want, rather than being forced to use the layout mandated by the routine being called. Of course there are downsides too - the user may unwittingly choose a representation for their data which is very inefficient, and of course their program will be more complicated. Nevertheless this may be a much more efficient way of using libraries like ours in the future, and its possible that many of our new routines will follow this approach in years to come.