# Derivative-Free Optimization for Data Fitting

Calibrating the parameters of complex numerical models to fit real world observations is one of the most common problems found in the industry (finance, multi-physics simulations, engineering, etc.).

Let us consider a process that is observed at times $t_i$ and measured with results $y_i$, for $i=1,2,\dots,m$. Furthermore, the process is assumed to behave according to a numerical model $\phi(t,x)$ where $x$ are parameters of the model. Given the fact that the measurements might be inaccurate and the process might not exactly follow the model, it is beneficial to find model parameters $x$ so that the error of the fit of the model to the measurements is minimized. This can be formulated as an optimization problem in which $x$ is the decision variables and the objective function is the sum of squared errors of the fit at each individual measurement, thus:

NAG introduces, at Mark 26.1, a model-based derivative-free solver (e04ff) able to exploit the structure of calibration problems. It is a part of the *NAG Optimization Modelling Suite* which significantly simplifies the interface of the solver and related routines.

## Derivative-free Optimization for Least Squares Problems

To solve a nonlinear least squares problem (1) most standard optimization algorithms such as Gauss–Newton require derivatives of the model or their estimates.

They can be computed by:

- explicitly written derivatives
- algorithmic differentiation (see NAG AD tools)
- finite differences (bumping), $\frac{\partial \phi}{\partial x_i} \approx \frac{\phi(x+he_i) - \phi(x)}{h}$

If exact derivatives are easy to compute then using derivative-based methods is preferable.

However, explicitly writing the derivatives or applying AD methods might be impossible if the model is a black box.

The alternative, estimating derivatives via finite differences, can quickly become impractical or too computationally expensive as it presents several issues:

**Expensive**, one gradient evaluation requires at least $n+1$ model evaluations;**Inaccurate**, the size of the model perturbations $h$ influences greatly the quality of the derivative estimations and is not easy to choose;**Sensitive to noise**, if the model is subject to some randomness (e.g. Monte Carlo simulations) or is computed to low accuracy to save computing time then finite differences estimations will be highly inaccurate;**Poor utilization of model evaluations**, each evaluation is only used for one element of one gradient and the information is discarded as soon as that gradient is no longer useful to the solver.

These issues can greatly slow down the convergence of the optimization solver or even completely prevent it.

In these cases, using a derivative-free solver is the preferable option as it is:

- able to reach convergence with a lot fewer function evaluations;
- more robust with respect to noise in the model evaluations.

### Illustration on a noisy test case:

Consider the following unbounded problem where $\epsilon$ is some random uniform noise in the interval $\left[-\nu,\nu\right]$ and $r_i$ are the residuals of the Rosenbrock test function $(m = n = 2)$:

\begin{equation} \min_{x\in \mathbb{R}^n} \sum_{i=1}^m{(r_i(x) + \epsilon)^2}\\ \label{eq:rosen} \end{equation}

Let us solve this problem with a Gauss–Newton method combined with finite differences (e04fc) and the new derivative-free solver (e04ff).

We present in the following table the number of model evaluations needed to reach a point within $10^{-5}$ of the actual solution for various noise levels $\nu$ (non convergence is marked as $\infty$).

level of noise $\nu$ | 0.0e00 | 1.0e$-$10 | 1.0e$-$08 | 1.0e$-$06 | 1.0e$-$04 | 1.0e$-$02 | 1.0e$-$01 |
---|---|---|---|---|---|---|---|

e04fc | 89 | 92 | 221 | $\infty$ | $\infty$ | $\infty$ | $\infty$ |

e04ff | 29 | 29 | 29 | 29 | 29 | 31 | $\infty$ |

The number of objective evaluations required to reach a point within $10^{-5}$ of the solution

On this example, the new derivative solver is both cheaper in term of model evaluations and far more robust with respect to noise.

### References

Powell M J D (2009) The BOBYQA algorithm for bound constrained optimization without derivatives *Report DAMTP 2009/NA06* University of Cambridge http://www.damtp.cam.ac.uk/user/na/NA_papers/NA2009_06.pdf

Zhang H, CONN A R and Scheinberg k (2010) A Derivative-Free Algorithm for Least-Squares Minimization *SIAM J. Optim.* **20(6)** 3555–3576