PDF version (NAG web site
, 64bit version, 64bit version)
NAG Toolbox: nag_opt_estimate_deriv (e04xa)
Purpose
nag_opt_estimate_deriv (e04xa) computes an approximation to the gradient vector and/or the Hessian matrix for use in conjunction with, or following the use of an optimization function (such as
nag_opt_nlp1_rcomm (e04uf)).
Syntax
[
mode,
hforw,
objf,
objgrd,
hcntrl,
h,
iwarn,
user,
info,
ifail] = e04xa(
msglvl,
epsrf,
x,
mode,
objfun,
hforw, 'n',
n, 'user',
user)
[
mode,
hforw,
objf,
objgrd,
hcntrl,
h,
iwarn,
user,
info,
ifail] = nag_opt_estimate_deriv(
msglvl,
epsrf,
x,
mode,
objfun,
hforw, 'n',
n, 'user',
user)
Description
nag_opt_estimate_deriv (e04xa) is similar to routine FDCALC described in
Gill et al. (1983a). It should be noted that this function aims to compute sufficiently accurate estimates of the derivatives for use with an optimization algorithm. If you require more accurate estimates you should refer to
Chapter D04.
nag_opt_estimate_deriv (e04xa) computes finite difference approximations to the gradient vector and the Hessian matrix for a given function. The simplest approximation involves the forwarddifference formula, in which the derivative
${f}^{\prime}\left(x\right)$ of a univariate function
$f\left(x\right)$ is approximated by the quantity
for some interval
$h>0$, where the subscript 'F' denotes ‘forwarddifference’ (see
Gill et al. (1983b)).
To summarise the procedure used by
nag_opt_estimate_deriv (e04xa) (for the case when the objective function is available and you require estimates of gradient values and Hessian matrix diagonal values, i.e.,
${\mathbf{mode}}=0$) consider a univariate function
$f$ at the point
$x$. (In order to obtain the gradient of a multivariate function
$F\left(x\right)$, where
$x$ is an
$n$vector, the procedure is applied to each component of
$x$, keeping the other components fixed.) Roughly speaking, the method is based on the fact that the bound on the relative truncation error in the forwarddifference approximation tends to be an increasing function of
$h$, while the relative condition error bound is generally a decreasing function of
$h$, hence changes in
$h$ will tend to have opposite effects on these errors (see
Gill et al. (1983b)).
The ‘best’ interval
$h$ is given by
where
$\Phi $ is an estimate of
${f}^{\prime \prime}\left(x\right)$, and
${e}_{R}$ is an estimate of the relative error associated with computing the function (see Chapter 8 of
Gill et al. (1981)). Given an interval
$h$,
$\Phi $ is defined by the secondorder approximation
The decision as to whether a given value of
$\Phi $ is acceptable involves
$\hat{c}\left(\Phi \right)$, the following bound on the relative condition error in
$\Phi $:
(When
$\Phi $ is zero,
$\hat{c}\left(\Phi \right)$ is taken as an arbitrary large number.)
The procedure selects the interval
${h}_{\varphi}$ (to be used in computing
$\Phi $) from a sequence of trial intervals
$\left({h}_{k}\right)$. The initial trial interval is taken as
$10\stackrel{}{h}$, where
unless you specify the initial value to be used.
The value of
$\hat{c}\left(\Phi \right)$ for a trial value
${h}_{k}$ is defined as ‘acceptable’ if it lies in the interval
$\left[0.001,0.1\right]$. In this case
${h}_{\varphi}$ is taken as
${h}_{k}$, and the current value of
$\Phi $ is used to compute
${h}_{F}$ from
(1). If
$\hat{c}\left(\Phi \right)$ is unacceptable, the next trial interval is chosen so that the relative condition error bound will either decrease or increase, as required. If the bound on the relative condition error is too large, a larger interval is used as the next trial value in an attempt to reduce the condition error bound. On the other hand, if the relative condition error bound is too small,
${h}_{k}$ is reduced.
The procedure will fail to produce an acceptable value of $\hat{c}\left(\Phi \right)$ in two situations. Firstly, if ${f}^{\prime \prime}\left(x\right)$ is extremely small, then $\hat{c}\left(\Phi \right)$ may never become small, even for a very large value of the interval. Alternatively, $\hat{c}\left(\Phi \right)$ may never exceed $0.001$, even for a very small value of the interval. This usually implies that ${f}^{\prime \prime}\left(x\right)$ is extremely large, and occurs most often near a singularity.
As a check on the validity of the estimated first derivative, the procedure provides a comparison of the forwarddifference approximation computed with
${h}_{F}$ (as above) and the centraldifference approximation computed with
${h}_{\varphi}$. Using the centraldifference formula the first derivative can be approximated by
where
$h>0$. If the values
${h}_{F}$ and
${h}_{\varphi}$ do not display some agreement, neither can be considered reliable.
When both function and gradients are available and you require the Hessian matrix (i.e.,
${\mathbf{mode}}=1$)
nag_opt_estimate_deriv (e04xa) follows a similar procedure to the case above with the exception that the gradient function
$g\left(x\right)$ is substituted for the objective function and so the forwarddifference interval for the first derivative of
$g\left(x\right)$ with respect to variable
${x}_{j}$ is computed. The
$j$th column of the approximate Hessian matrix is then defined as in Chapter 2 of
Gill et al. (1981), by
where
${h}_{j}$ is the best forwarddifference interval associated with the
$j$th component of
$g$ and
${e}_{j}$ is the vector with unity in the
$j$th position and zeros elsewhere.
When only the objective function is available and you require the gradients and Hessian matrix (i.e.,
${\mathbf{mode}}=2$)
nag_opt_estimate_deriv (e04xa) again follows the same procedure as the case for
${\mathbf{mode}}=0$ except that this time the value of
$\hat{c}\left(\Phi \right)$ for a trial value
${h}_{k}$ is defined as acceptable if it lies in the interval
$\left[0.0001,0.01\right]$ and the initial trial interval is taken as
The approximate Hessian matrix
$G$ is then defined as in Chapter 2 of
Gill et al. (1981), by
References
Gill P E, Murray W, Saunders M A and Wright M H (1983a) Documentation for FDCALC and FDCORE Technical Report SOL 83–6 Stanford University
Gill P E, Murray W, Saunders M A and Wright M H (1983b) Computing forwarddifference intervals for numerical optimization SIAM J. Sci. Statist. Comput. 4 310–321
Gill P E, Murray W and Wright M H (1981) Practical Optimization Academic Press
Parameters
Compulsory Input Parameters
 1:
$\mathrm{msglvl}$ – int64int32nag_int scalar

Must indicate the amount of intermediate output desired (see
Printed output for a description of the printed output). All output is written on the current advisory message unit (see
nag_file_set_unit_advisory (x04ab)).
Value  Definition 
0  No printout 
1  A summary is printed out for each variable plus any warning messages. 
Other  Values other than $0$ and $1$ should normally be used only at the direction of NAG. 
 2:
$\mathrm{epsrf}$ – double scalar

Must define
${e}_{R}$, which is intended to be a measure of the accuracy with which the problem function
$F$ can be computed. The value of
${e}_{R}$ should reflect the relative precision of
$1+\leftF\left(x\right)\right$, i.e., acts as a relative precision when
$\leftF\right$ is large, and as an absolute precision when
$\leftF\right$ is small. For example, if
$F\left(x\right)$ is typically of order
$1000$ and the first six significant digits are known to be correct, an appropriate value for
${e}_{R}$ would be
$\text{1.0e\u22126}$.
A discussion of
epsrf is given in Chapter 8 of
Gill et al. (1981). If
epsrf is either too small or too large on entry a warning will be printed if
${\mathbf{msglvl}}=1$, the argument
iwarn set to the appropriate value on exit and
nag_opt_estimate_deriv (e04xa) will use a default value of
${e}_{M}^{0.9}$, where
${e}_{M}$ is the
machine precision.
If ${\mathbf{epsrf}}\le 0.0$ on entry then nag_opt_estimate_deriv (e04xa) will use the default value internally. The default value will be appropriate for most simple functions that are computed with full accuracy.
 3:
$\mathrm{x}\left({\mathbf{n}}\right)$ – double array

The point $x$ at which the derivatives are to be computed.
 4:
$\mathrm{mode}$ – int64int32nag_int scalar

Indicates which derivatives are required.
 ${\mathbf{mode}}=0$
 The gradient and Hessian diagonal values having supplied the objective function via objfun.
 ${\mathbf{mode}}=1$
 The Hessian matrix having supplied both the objective function and gradients via objfun.
 ${\mathbf{mode}}=2$
 The gradient values and Hessian matrix having supplied the objective function via objfun.
 5:
$\mathrm{objfun}$ – function handle or string containing name of mfile

If
${\mathbf{mode}}=0$ or
$2$,
objfun must calculate the objective function; otherwise if
${\mathbf{mode}}=1$,
objfun must calculate the objective function and the gradients.
[mode, objf, objgrd, user] = objfun(mode, n, x, nstate, user)
Input Parameters
 1:
$\mathrm{mode}$ – int64int32nag_int scalar

mode indicates which argument values within
objfun need to be set.
To
objfun,
mode is always set to the value that you set it to before the call to
nag_opt_estimate_deriv (e04xa).
 2:
$\mathrm{n}$ – int64int32nag_int scalar

The number $n$ of variables as input to nag_opt_estimate_deriv (e04xa).
 3:
$\mathrm{x}\left({\mathbf{n}}\right)$ – double array

The point $x$ at which the objective function (and gradients if ${\mathbf{mode}}=1$) is to be evaluated.
 4:
$\mathrm{nstate}$ – int64int32nag_int scalar

Will be set to
$1$ on the first call of
objfun by
nag_opt_estimate_deriv (e04xa), and is
$0$ for all subsequent calls. Thus, if you wish,
nstate may be tested within
objfun in order to perform certain calculations once only. For example you may read data.
 5:
$\mathrm{user}$ – Any MATLAB object
objfun is called from
nag_opt_estimate_deriv (e04xa) with the object supplied to
nag_opt_estimate_deriv (e04xa).
Output Parameters
 1:
$\mathrm{mode}$ – int64int32nag_int scalar

mode indicates which argument values within
objfun need to be set.
Its value must not be altered unless you wish to indicate a failure within
objfun, in which case it should be set to a negative value. If
mode is negative on exit from
objfun, the execution of
nag_opt_estimate_deriv (e04xa) is terminated with
ifail set to
mode.
 2:
$\mathrm{objf}$ – double scalar

Must be set to the value of the objective function.
 3:
$\mathrm{objgrd}\left({\mathbf{n}}\right)$ – double array

If
${\mathbf{mode}}=1$,
${\mathbf{objgrd}}\left(j\right)$ must contain the value of the first derivative with respect to
$x$.
If
${\mathbf{mode}}\ne 1$,
objgrd need not be set.
 4:
$\mathrm{user}$ – Any MATLAB object
 6:
$\mathrm{hforw}\left({\mathbf{n}}\right)$ – double array

The initial trial interval for computing the appropriate partial derivative to the
$j$th variable.
If
${\mathbf{hforw}}\left(j\right)\le 0.0$, then the initial trial interval is computed by
nag_opt_estimate_deriv (e04xa) (see
Description).
Optional Input Parameters
 1:
$\mathrm{n}$ – int64int32nag_int scalar

Default:
the dimension of the arrays
x,
hforw. (An error is raised if these dimensions are not equal.)
The number $n$ of independent variables.
Constraint:
${\mathbf{n}}\ge 1$.
 2:
$\mathrm{user}$ – Any MATLAB object
user is not used by
nag_opt_estimate_deriv (e04xa), but is passed to
objfun. Note that for large objects it may be more efficient to use a global variable which is accessible from the mfiles than to use
user.
Output Parameters
 1:
$\mathrm{mode}$ – int64int32nag_int scalar

Is changed
only if you set
mode negative in
objfun, i.e., you have requested termination of
nag_opt_estimate_deriv (e04xa).
 2:
$\mathrm{hforw}\left({\mathbf{n}}\right)$ – double array

${\mathbf{hforw}}\left(j\right)$ is the best interval found for computing a forwarddifference approximation to the appropriate partial derivative for the $j$th variable.
 3:
$\mathrm{objf}$ – double scalar

The value of the objective function evaluated at the input vector in
x.
 4:
$\mathrm{objgrd}\left({\mathbf{n}}\right)$ – double array

If
${\mathbf{mode}}=0$ or
$2$,
${\mathbf{objgrd}}\left(j\right)$ contains the best estimate of the first partial derivative for the
$j$th variable.
If
${\mathbf{mode}}=1$,
${\mathbf{objgrd}}\left(j\right)$ contains the first partial derivative for the
$j$th variable evaluated at the input vector in
x.
 5:
$\mathrm{hcntrl}\left({\mathbf{n}}\right)$ – double array

${\mathbf{hcntrl}}\left(j\right)$ is the best interval found for computing a centraldifference approximation to the appropriate partial derivative for the $j$th variable.
 6:
$\mathrm{h}\left(\mathit{ldh},:\right)$ – double array

The first dimension of the array
h will be
${\mathbf{n}}$.
The second dimension of the array
h will be
$1$ if
${\mathbf{mode}}=0$ and at least
${\mathbf{n}}$ if
${\mathbf{mode}}=1$ or
$2$.
If
${\mathbf{mode}}=0$, the estimated Hessian diagonal elements are contained in the first column of this array.
If ${\mathbf{mode}}=1$ or $2$, the estimated Hessian matrix is contained in the leading $n$ by $n$ part of this array.
 7:
$\mathrm{iwarn}$ – int64int32nag_int scalar

${\mathbf{iwarn}}=0$ on successful exit.
If the value of
epsrf on entry is too small or too large then
iwarn is set to
$1$ or
$2$ respectively on exit and the default value for
epsrf is used within
nag_opt_estimate_deriv (e04xa).
If
${\mathbf{msglvl}}>0$ then warnings will be printed if
epsrf is too small or too large.
 8:
$\mathrm{user}$ – Any MATLAB object
 9:
$\mathrm{info}\left({\mathbf{n}}\right)$ – int64int32nag_int array

${\mathbf{info}}\left(j\right)$ represents diagnostic information on variable
$j$. (See
Error Indicators and Warnings for more details.)
 10:
$\mathrm{ifail}$ – int64int32nag_int scalar
${\mathbf{ifail}}={\mathbf{0}}$ unless the function detects an error (see
Error Indicators and Warnings).
Error Indicators and Warnings
On exit from
nag_opt_estimate_deriv (e04xa) both diagnostic arguments
info and
ifail should be tested.
ifail represents an overall diagnostic indicator, whereas the integer array
info represents diagnostic information on each variable.
Errors or warnings detected by the function:
Cases prefixed with W are classified as warnings and
do not generate an error of type NAG:error_n. See nag_issue_warnings.
 W ${\mathbf{ifail}}<0$

A negative value of
ifail indicates an exit from
nag_opt_estimate_deriv (e04xa) because you set
mode negative in
objfun. The value of
ifail will be the same as your setting of
mode.
 ${\mathbf{ifail}}=1$

On entry, one or more of the following conditions are satisfied: ${\mathbf{n}}<1$, $\mathit{ldh}<{\mathbf{n}}\text{ or}{\mathbf{mode}}$ is invalid.
 W ${\mathbf{ifail}}=2$

One or more variables have a nonzero
info value. This may not necessarily represent an unsuccessful exit – see diagnostic information on
info.
 ${\mathbf{ifail}}=99$
An unexpected error has been triggered by this routine. Please
contact
NAG.
 ${\mathbf{ifail}}=399$
Your licence key may have expired or may not have been installed correctly.
 ${\mathbf{ifail}}=999$
Dynamic memory allocation failed.
Diagnostic information returned via
info is as follows:
 ${\mathbf{info}}=1$

The appropriate function appears to be constant.
${\mathbf{hforw}}\left(i\right)$ is set to the initial trial interval value (see
Description) corresponding to a wellscaled problem and
Error est. in the printed output is set to zero. This value occurs when the estimated relative condition error in the first derivative approximation is unacceptably large for every value of the finite difference interval. If this happens when the function is not constant the initial interval may be too small; in this case, it may be worthwhile to rerun
nag_opt_estimate_deriv (e04xa) with larger initial trial interval values supplied in
hforw (see
Description). This error may also occur if the function evaluation includes an inordinately large constant term or if
epsrf is too large.
 ${\mathbf{info}}=2$

The appropriate function appears to be linear or odd.
${\mathbf{hforw}}\left(i\right)$ is set to the smallest interval with acceptable bounds on the relative condition error in the forward and backwarddifference estimates. In this case, the estimated relative condition error in the second derivative approximation remained large for every trial interval, but the estimated error in the first derivative approximation was acceptable for at least one interval. If the function is not linear or odd the relative condition error in the second derivative may be decreasing very slowly, it may be worthwhile to rerun
nag_opt_estimate_deriv (e04xa) with larger initial trial interval values supplied in
hforw (see
Description).
 ${\mathbf{info}}=3$

The second derivative of the appropriate function appears to be so large that it cannot be reliably estimated (i.e., near a singularity). ${\mathbf{hforw}}\left(i\right)$ is set to the smallest trial interval.
This value occurs when the relative condition error estimate in the second derivative remained very small for every trial interval.
If the second derivative is not large the relative condition error in the second derivative may be increasing very slowly. It may be worthwhile to rerun
nag_opt_estimate_deriv (e04xa) with smaller initial trial interval values supplied in
hforw (see
Description). This error may also occur when the given value of
epsrf is not a good estimate of a bound on the absolute error in the appropriate function (i.e.,
epsrf is too small).
 ${\mathbf{info}}=4$

The algorithm terminated with an apparently acceptable estimate of the second derivative. However the forwarddifference estimates of the appropriate first derivatives (computed with the final estimate of the ‘optimal’ forwarddifference interval) and the central difference estimates (computed with the interval used to compute the final estimate of the second derivative) do not agree to half a decimal place. The usual reason that the forward and centraldifference estimates fail to agree is that the first derivative is small.
If the first derivative is not small, it may be helpful to execute the procedure at a different point.
Accuracy
If ${\mathbf{ifail}}={\mathbf{0}}$ on exit the algorithm terminated successfully, i.e., the forwarddifference estimates of the appropriate first derivatives (computed with the final estimate of the ‘optimal’ forwarddifference interval ${h}_{F}$) and the centraldifference estimates (computed with the interval ${h}_{\varphi}$ used to compute the final estimate of the second derivative) agree to at least half a decimal place.
In short word length implementations when computing the full Hessian matrix given function values only (i.e., ${\mathbf{mode}}=2$) the elements of the computed Hessian will have at best $1$ to $2$ figures of accuracy.
Further Comments
To evaluate an acceptable set of finite difference intervals for a wellscaled problem, the function will require around two function evaluations per variable; in a badly scaled problem however, as many as six function evaluations per variable may be needed.
If you request the full Hessian matrix supplying both function and gradients (i.e.,
${\mathbf{mode}}=1$) or function only (i.e.,
${\mathbf{mode}}=2$) then a further
n or
$3\times {\mathbf{n}}\times \left({\mathbf{n}}+1\right)/2$ function evaluations respectively are required.
Description of the Printed Output
The following is a description of the printed output from
nag_opt_estimate_deriv (e04xa) as controlled by the argument
msglvl.
Output when
${\mathbf{msglvl}}=1$ is as follows:
J 
number of variable for which the difference interval has been computed. 
$\mathtt{X}\left(j\right)$ 
$j$th variable of $x$ as set by you. 
F. dif. int. 
the best interval found for computing a forwarddifference approximation to the appropriate partial derivative with respect to the $j$th variable. 
C. dif. int. 
the best interval found for computing a centraldifference approximation to the appropriate partial derivative with respect to the $j$th variable. 
Error est. 
a bound on the estimated error in the final forwarddifference approximation. When ${\mathbf{info}}\left(j\right)=1$, Error est. is set to zero. 
Grad. est. 
best estimate of the first partial derivative with respect to the $j$th variable. 
Hess diag est. 
best estimate of the second partial derivative with respect to the $j$th variable. 
fun evals. 
the number of function evaluations used to compute the final difference intervals for the $j$th variable. 
$\mathtt{info}\left(j\right)$ 
the value of info for the $j$th variable. 
Example
This example computes the gradient vector and the Hessian matrix of the following function:
at the point
$\left(3,1,0,1\right)$.
Open in the MATLAB editor:
e04xa_example
function e04xa_example
fprintf('e04xa example results\n\n');
msglvl = int64(0);
epsrf = 1;
x = [3; 1; 0; 1];
mode = int64(0);
hforw = [1; 1; 1; 1];
[mode, hforw, objf, objgrd, hcntrl, h, iwarn, user, info, ifail] = ...
e04xa(...
msglvl, epsrf, x, mode, @objfun, hforw);
fprintf('At x = ');
fprintf(' %7.2f',x);
fprintf('\nGradient vector is:\n');
fprintf(' %7.2f',objgrd);
fprintf('\nEstimated Hessian matrix diagonal is:\n');
fprintf(' %7.2f',h);
fprintf('\n');
function [mode, objf, objgrd, user] = objfun(mode, n, x, nstate, user)
objgrd = zeros(n, 1);
a = x(1) + 10*x(2);
b = x(3)  x(4);
c = x(2)  2*x(3);
d = x(1)  x(4);
objf = a^2 + 5*b^2 + c^4 + 10*d^4;
if (mode == 1)
objgrd(1) = 2*a + 40*d^3;
objgrd(2) = 20*a + 4*c^3;
objgrd(3) = 10*b  8*c^3;
objgrd(4) =10*b  40*d^3;
end
e04xa example results
At x = 3.00 1.00 0.00 1.00
Gradient vector is:
306.00 144.00 2.00 310.00
Estimated Hessian matrix diagonal is:
482.00 212.00 57.99 490.00
PDF version (NAG web site
, 64bit version, 64bit version)
© The Numerical Algorithms Group Ltd, Oxford, UK. 2009–2015