Derivative-free Optimization (DFO)

Optimizing complex numerical models is one of the most common problems found in the industry (finance, multi-physics simulations, engineering, etc.). To solve these optimization problems with a standard optimization algorithm such as Gauss–Newton (for problems with a nonlinear least squares structure) or CG (for unstructured nonlinear objective) requires good estimates of the model's derivatives. They can be computed by:

• explicitly written derivatives
• algorithmic differentiation (see NAG AD tools)
• finite differences (bumping), $$\frac{\partial \phi}{\partial x_i} \approx \frac{\phi(x+he_i) - \phi(x)}{h}$$

If exact derivatives are easy to compute then using derivative-based methods is preferable. However, explicitly writing the derivatives or applying AD methods might be impossible if the model is a black box. The alternative, estimating derivatives via finite differences, can quickly become impractical or too computationally expensive as it presents several issues:

• Expensive, one gradient evaluation requires at least $$n+1$$ model evaluations;
• Inaccurate, the size of the model perturbations $$h$$ influences greatly the quality of the derivative estimations and is not easy to choose;
• Sensitive to noise, if the model is subject to some randomness (e.g. Monte Carlo simulations) or is computed to low accuracy to save computing time, then finite differences estimations will be highly inaccurate;
• Poor utilization of model evaluations, each evaluation is only used for one element of one gradient and the information is discarded as soon as that gradient is no longer useful to the solver.

These issues can greatly slow down the convergence of the optimization solver or even completely prevent it. Conversely, DFO solvers are designed to get good improvements of the objective in these situations. They are able to reach convergence with a lot fewer function evaluations and are naturally quite robust to noise in the model evaluations.

NAG introduces, at Mark 27, a complete update of its model-based DFO solvers for nonlinear least squares problems, nag_opt_handle_solve_dfls (e04ff) and nag_opt_handle_solve_dfls_rcomm (e04fg), and unstructured nonlinear problems, nag_opt_handle_solve_dfno (e04jd) and nag_opt_handle_solve_dfno_rcomm (e04je). They present a number of attractive features:

• Improved performance;
• Integrated to the NAG Optimization Modeling Suite with simple interfaces for the solvers and related routines;
• Optional reverse communication interface (e04fg, e04je);
• Able to start making progress with as few as two objective evaluations;
• Improved resilience to noise.
Problems structure

One frequent problem in practice is tuning a model's parameters to fit real world observations as well as possible. Let us consider a process that is observed at times $$t_i$$ and measured with results $$y_i$$, for $$i=1,2,\dots,m$$. Furthermore, the process is assumed to behave according to a numerical model $$\phi(t,x)$$ where $$x$$ are parameters of the model. Given the fact that the measurements might be inaccurate and the process might not exactly follow the model, it is beneficial to find model parameters $$x$$ so that the error of the fit of the model to the measurements is minimized. This can be formulated as an optimization problem in which $$x$$ is the decision variables and the objective function is the sum of squared errors of the fit at each individual measurement, thus: $\begin{array}{ll}\underset{x\in {ℝ}^{n}}{\mathrm{minimize}}\phantom{\rule{0.25em}{0ex}}& \sum _{i=1}^{m}{\left(\varphi \left({t}_{i},x\right)-{y}_{i}\right)}^{2}\\ \text{subject to}& l\le x\le u\end{array}$

When the optimization problem cannot be written as nonlinear least squares, a more general formulation has to be used:

$\begin{array}{ll}\underset{x\in {ℝ}^{n}}{\mathrm{minimize}}\phantom{\rule{0.25em}{0ex}}& f\left(x\right)\\ \text{subject to}& l\le x\le u\end{array}$

The NAG solvers accommodate both of these formulations.

Noise robustness: illustration

Consider the following unbounded problem where $$\epsilon$$ is some random uniform noise in the interval $$\left[-\nu,\nu\right]$$ and $$r_i$$ are the residuals of the Rosenbrock test function.

$\underset{x\in {ℝ}^{n}}{\mathrm{minimize}}f\left(x\right)=\sum _{i=1}^{m}{\left({r}_{i}\left(x\right)+\epsilon \right)}^{2}$

Let us solve this problem with a Gauss–Newton method combined with finite differences, nag_opt_lsq_uncon_mod_func_comp (e04fc) and the corresponding derivative-free solver (e04ff). For various noise level $$\nu$$, we present in the following table the number of model evaluations needed to find a solution with an 'achievable' precision with respect to the noise level: $f(x) < \max (10^{-8}, 10 \times \nu^2)$

The figures in Table 1 show the average (on 50 runs) of number objective evaluations to solve the problem. The number in brackets represent the number of failed runs out of 50.

 noise level $$\nu$$ $$0$$ $$10^{-10}$$ $$10^{-8}$$ $$10^{-6}$$ $$10^{-4}$$ $$10^{-2}$$ $$10^{-1}$$ e04fc 90 (0) 93 (0) 240 (23) $$\infty$$ (50) $$\infty$$ (50) $$\infty$$ (50) $$\infty$$ (50) e04ff 22 (0) 22 (0) 22 (0) 22 (0) 22 (0) 17 (0) 15 (0)

On this example, the new derivative-free solver is both cheaper in terms of model evaluations and far more robust with respect to noise.

References

[1]   C. Cartis, J. Fiala, B. Marteau, and L. Roberts Improving the Flexibility and robustness of model-based derivative-free optimization solvers ACM Transactions On Numerical Software. 2019.

[2]   C. Cartis and L. Roberts A derivative-free Gauss–Newton method Mathematical Programming Computation. 2017.

[3]   Powell M. J. D. The BOBYQA algorithm for bound constrained optimization without derivatives Report DAMTP 2009/NA06 University of Cambridge. 2009.