d04 – Numerical Differentiation

This chapter is concerned with calculating approximations to derivatives of a function $f$.

The problem of numerical differentiation does not receive very much attention nowadays. Although the Taylor series plays a key role in much of classical analysis, the poor reputation enjoyed by numerical differentiation has led numerical analysts to construct techniques for most problems which avoid the explicit use of numerical differentiation.

One may briefly and roughly define the term numerical differentiation as any process in which numerical values of derivatives ${f}^{\left(s\right)}\left({x}_{0}\right)$ are obtained from evaluations $f\left({x}_{i}\right)$ of the function $f\left(x\right)$ at several abscissae ${x}_{i}$ near ${x}_{0}$. This problem can be stable, well-conditioned, and accurate when complex-valued abscissae are used. This was first pointed out by Lyness and Moler (1967). An item of numerical software for this appears in Lyness and Ande (1971). However, in many applications the use of complex-valued abscissae is either prohibitive or prohibited. The main difficulty in using real abscissae is that amplification of round-off error can completely obliterate meaningful results. In the days when one relied on hand calculating machines and tabular data, the frustration caused by this effect led to the abandonment of serious use of the process.

There are several reasons for believing that, in the present-day computing environment, numerical differentiation might find a useful role. The first is that, by present standards, it is rather a small-scale calculation, so its cost may well be negligible compared with any overall saving in cost that might result from its use. Secondly, the assignment of a step length $h$ is now generally open. One does not have to rely on tabular data. Thirdly, although the amplification of round-off error is an integral part of the calculation, its effect can be measured reliably and automatically by the function at the time of the calculation.

Thus you do not have to gauge the round-off level (or noise level) of the function values or assess the effect of this on the accuracy of the results. A function does this automatically, returning with each result an error estimate which has already taken round-off error amplification into account.

We now illustrate, by means of a very simple example, the importance of the round-off error effect. A very simple approximation of ${f}^{\prime}\left(0\right)$, based on the identity

is

If there were no precision problem, this formula would be the only one needed in the theory of first-order numerical differentiation. We could simply take $h={10}^{-40}$ (or $h={10}^{-1000}$) to obtain an excellent approximation based on two function values. It is only when we consider in detail how a machine with finite precision comes to calculate $\left(f\left(h\right)-f\left(-h\right)\right)/2h$ that the necessity for a sophisticated theory becomes apparent.

$${f}^{\prime}\left(0\right)=\left(f\left(h\right)-f\left(-h\right)\right)/2h+\left({h}^{2}/3!\right){f}^{\prime \prime \prime}\left(\xi \right)\text{,}$$ | (1) |

$$\left(f\left(h\right)-f\left(-h\right)\right)/2h\text{.}$$ |

To simplify the subsequent description we shall assume that the quantities involved are neither so close to zero that the machine underflow characteristics need be considered nor so large that machine overflow occurs. The approximation mentioned above involves the function values $f\left(h\right)$ and $f\left(-h\right)$. In general no computer has available these numbers exactly. Instead it uses approximations $\hat{f}\left(h\right)$ and $\hat{f}\left(-h\right)$ whose relative accuracy is less than some tolerance ${\epsilon}_{\mathrm{f}}$. If the function $f\left(x\right)$ is a library function, for example $\mathrm{sin}x$, ${\epsilon}_{f}$ may coincide with the machine accuracy argument ${\epsilon}_{\mathrm{m}}$. More generally the function $f\left(x\right)$ is calculated in a user-supplied function and ${\epsilon}_{f}$ is larger than ${\epsilon}_{m}$ by a small factor $5$ or $6$ if the calculation is short or by some larger factor in an extended calculation. This factor is not usually known by you.

$\hat{f}\left(h\right)$ and $\hat{f}\left(-h\right)$ are related to $f\left(h\right)$ and $f\left(-h\right)$ by

Thus even if the rest of the calculation were carried out exactly, it is trivial to show that

The difference between the quantity actually calculated and the quantity which one attempts to calculate may be as large as ${\epsilon}_{f}f\left(\xi \right)/h$; for example using $h={10}^{-40}$ and ${\epsilon}_{m}={10}^{-7}$ this gives a ‘conditioning error’ of ${10}^{33}f\left(\xi \right)$.

$$\begin{array}{ll}\hat{f}\left(h\right)=f\left(h\right)\left(1+{\theta}_{1}{\epsilon}_{f}\right)\text{,}& \left|{\theta}_{1}\right|\le 1\\ & \\ \hat{f}\left(-h\right)=f\left(-h\right)\left(1+{\theta}_{2}{\epsilon}_{f}\right)\text{,}& \left|{\theta}_{2}\right|\le 1\text{.}\end{array}$$ |

$$\frac{\hat{f}\left(h\right)-\hat{f}\left(-h\right)}{2h}-\frac{f\left(h\right)-f\left(-h\right)}{2h}\simeq 2\varphi {\epsilon}_{f}\frac{f\left(\xi \right)}{2h}\text{, \hspace{1em}}\left|\varphi \right|\le 1\text{.}$$ |

In practice much more sophisticated formulae than (1) above are used, and for these and for the corresponding higher-derivative formulae the error analysis is different and more complicated in detail. But invariably the theory contains the same overall feature. In a finite length calculation, the error is composed of two main parts: a discretization error which increases as $h$ becomes large and is zero for $h=0$; and a ‘conditioning’ error due to amplification of round-off error in function evaluation, which increases as $h$ becomes small and is infinite for $h=0$.

The functions in this chapter have to take into account internally both these sources of error in the results. Thus they return pairs of results, ${\mathbf{der}}\left[j-1\right]$ and ${\mathbf{erest}}\left[j-1\right]$ where ${\mathbf{der}}\left[j-1\right]$ is an approximation to ${f}^{\left(j\right)}\left({x}_{0}\right)$ and ${\mathbf{erest}}\left[j-1\right]$ is an estimate of the error in the approximation ${\mathbf{der}}\left[j-1\right]$. If the function has not been misled, ${\mathbf{der}}\left[j-1\right]$ and ${\mathbf{erest}}\left[j-1\right]$ satisfy

In this respect, numerical differentiation functions are fundamentally different from other functions. You do not specify an error criterion. Instead the function provides the error estimate and this may be very large.

$$\left|{\mathbf{der}}\left[j-1\right]-{f}^{\left(j\right)}\left({x}_{0}\right)\right|\le {\mathbf{erest}}\left[j-1\right]\text{.}$$ |

We mention here a terminological distinction. A fully automatic function is one in which you do not need to provide any information other than that required to specify the problem. Such a function (at a cost in computing time) decides an appropriate internal argument such as the step length $h$ by itself. On the other hand a function which requires you to provide a step length $h$, but automatically chooses from several different formulae each based on the specified step length, is termed a semi-automatic function.

The situation described above must have seemed rather depressing when hand machines were in common use. For example in the simple illustration one does not know the values of the quantities ${f}^{\prime \prime \prime}\left(\xi \right)$ or ${\epsilon}_{f}$ involved in the error estimates, and the idea of altering the value of $h$ and starting again must have seemed appalling. However, by present-day standards, it is a relatively simple matter to write a program which carries out all the previously considered time-consuming calculations which may seem necessary. None of the functions in this chapter are particularly revolutionary in concept. They simply utilize the computer for the sort of task for which it was originally designed. It carries out simple tedious calculations which are necessary to estimate the accuracy of the results of other even simpler tedious calculations.

(a) | One immediate use to which a set of derivatives at a point is likely to be put is that of constructing a Taylor series representation:
The series formed from the Taylor series using elementary operations such as integration or differentiation has the same overall characteristic. A technique based on a Taylor series expansion may be quite accurate, even if the individual derivatives are not, so long as the less accurate derivatives are associated with known small coefficients.
The error introduced by using $n$ approximate derivatives ${\mathbf{der}}\left[j-1\right]$ is bounded by
In this sort of application the accuracy of the result depends on the size of the errors in the numerical differentiation. There are other applications where the effect of large errors ${\mathbf{erest}}\left[j-1\right]$ is merely to prolong a calculation, but not to impair the final accuracy. |
||||

(b) | A simple Taylor series approach such as described in (a) is used to find a starting value for a rapidly converging but extremely local iterative process. | ||||

(c) | The technique known as ‘subtracting out the singularity’ as a preliminary to numerical quadrature may be extended and may be carried out approximately. Thus suppose we are interested in evaluating
The integrand function on the right of (2) has no singularity, but its third derivative does. Thus using numerical quadrature for this integral is much cheaper than using numerical quadrature for the original integral (in the left-hand side of (2)).
However, (2) is an identity whatever values of $A$ and $B$ are assigned. Thus the same procedure can be used with $A$ and $B$ being approximations to ${\varphi}^{\prime}\left(0\right)$ and ${\varphi}^{\prime \prime}\left(0\right)$ provided by a numerical differentiation function. The integrand would now be more difficult to integrate than if $A$ and $B$ were correct but still much less difficult than the original integrand (on the left-hand side of (2)). But, assuming that the automatic quadrature function is reliable, the overall result would still be correct. The effect of using approximate derivatives rather than exact derivatives does not impair the accuracy of the result. It simply makes the result more expensive to obtain, but not nearly as expensive as if no derivatives were used at all. |
||||

(d) | The calculation of a definite integral may be based on the Euler–Maclaurin expansion
The examples given above are only intended to illustrate the sort of use to which approximate derivatives may be put. Very simple illustrations have been used for clarity, and in such simple cases there are usually more efficient approaches to the problem. The same ideas applied in a more complicated or restrictive setting may provide an efficient approach to a problem for which no simple standard approach exists. |

(a) | At the present, there is only one numerical differentiation algorithm available in this chapter accessible using direct communication in nag_numdiff_1d_real (d04aac), and using reverse communication in nag_numdiff_1d_real_eval (d04bac). These are semi-automatic functions for obtaining approximations to the first fourteen derivatives of a real valued function $f\left(x\right)$ at a specified point ${x}_{0}$.
For nag_numdiff_1d_real (d04aac), you must provide a
function
representing $f\left(x\right)$, the value of ${x}_{0}$, an upper limit $n\le 14$ on the order of the derivatives required and a step length $h$.
For nag_numdiff_1d_real_eval (d04bac), you must supply the value of ${x}_{0}$, $20$ other abscissae and the function values at those abscissae. Both functions return a set of approximations ${\mathbf{der}}\left[j-1\right]$ and corresponding error estimates ${\mathbf{erest}}\left[j-1\right]$ which hopefully satisfy
It is important that the error estimate ${\mathbf{erest}}\left[j-1\right]$ be taken into consideration before any use of ${\mathbf{der}}\left[j-1\right]$ is made. The actual size of ${\mathbf{erest}}\left[j-1\right]$ depends on the analytic structure of the function, on the computational precision used and on the step size $h$, and is difficult to predict. Thus you have to run the function to find out how accurate the results are. Usually you will find the higher-order derivatives are successively more inaccurate and that past a certain order, say $10$ or $11$, the size of ${\mathbf{erest}}\left[j-1\right]$ actually exceeds ${\mathbf{der}}\left[j-1\right]$. Clearly when this happens the approximation ${\mathbf{der}}\left[j-1\right]$ is unusable. |
||

(b) | There is available in the algorithm section of CACM (see Lyness and Ande (1971)) a semi-automatic Fortran function for numerical differentiation of an analytical function $f\left(z\right)$ at a possibly complex abscissa ${z}_{0}$. This is a stable problem. It can be used for any problem for which nag_numdiff_1d_real (d04aac) might be used and produces more accurate results, and derivatives of arbitrary high order. However, even if ${z}_{0}$ is real and $f\left(z\right)$ is real for $z$, this function requires a user-supplied function which evaluates $f\left(z\right)$ for complex values of $z$ and it makes use of complex arithmetic. | ||

(c) | Routines are available in Chapter e02 to differentiate functions which are polynomials (in Chebyshev series representation) or cubic splines (in B-spline representation). |

None.

Lyness J N and Ande G (1971) Algorithm 413, ENTCAF and ENTCRE: evaluation of normalised Taylor coefficients of an analytic function *Comm. ACM* **14(10)** 669–675

Lyness J N and Moler C B (1967) Numerical differentiation of analytic functions *SIAM J. Numer. Anal.* **4(2)** 202–210