*This post is a part of The NAG Optimization Corner series.*

This is the first blog post in a series within The NAG Optimization Corner dedicated to providing guidance on choosing the right optimization tool for a given problem and demonstrating the consequences of an inappropriate choice.

We at NAG recognize that choosing the right optimization tool can be quite hard and that the process can seem nebulous at times. A portion of our customer service requests deal with this very subject. Most of the time the queries start with the same question: “Why is the solver failing?” and quite often a closer inspection reveals that either the wrong solver has been chosen or the model could be significantly improved. While this post touches upon the former, we appreciate that building a good or even adequate model can be far from easy, this topic will be reviewed in an upcoming post in this series, so stay tuned!

In this post, we will show two examples that illustrate the importance of choosing the right solver and conclude with some important remarks.

Contrary to baseball caps, one size does not fit all when it comes to solving optimization problems. In fact, it is the recommended practise to always try first the tightest possible fit. Take for example the following case: solving a Linear Program (LP) using a general nonlinear programming (NLP) solver instead of the tight-fit LP solver. The NLP tool will be more expensive and less robust than the specialized lean LP solver that understands and exploits the linearity of the problem.

In what follows we will show why it is essential to understand and exploit the **underlying structure** of an optimization problem and match it with a solver that can **take advantage** of it.

Let’s start by quickly introducing the various parts that make up an optimization problem. A problem is made up of **variables**, **a function objective** and **constraints**. Different types of variables, objectives and constraints define different problem classes. NAG’s Optimization E04 Chapter Introduction offers a comprehensive description of the various classes of optimization problems and the recommended solvers to use.

Calibration (data fitting, regression) problems are usually stated in terms of linear (LSQ) or nonlinear (NLN-LSQ) least squares with the task to minimize the sum of squared residuals. The residual functions, ${r}_{i}\left({x}\right)$, represent the error between the model prediction and the data, while the objective function is denoted as

$${f\left({x}\right)=\frac{1}{2}\sum _{i=1}^{m}{r}_{i}^{2}\left({x}\right)}.$$

For more information on least square optimization see NAG's documentation on optimization.

The following experiment was performed, we solved a batch of 39 unconstrained NLN-LSQ problems and compared a specialized NLN-LSQ solver against a general-purpose NLP solver. The benchmark results, reported in Figure 1, show that NLN-LSQ is noticeably faster, in median the specialized NLN-LSQ solver offers 1.8 times speedup over the general NLP solver with 20% of the problems being solved at least 5 times faster. As the last handful of problems 32–39 reveal, sometimes the NLP solver can outperform the specialized one due to factors such as the starting point favouring a particular method over others. This also highlights the importance of trialling more than one solver.

^{Figure 1. Benchmark on 39 CUTEst NLN-LSQ problems comparing a NLN-LSQ and NLP solver performances. Vertical bar height represents the speedup factor measured as the ratio time$\text{NLP}$$\u2215$time$\text{NLN-LSQ}$.}

The most important speed-up factor for the NLN-LSQ solver is in how the Hessian is approximated. In more detail, the problem is defined as

$$\begin{array}{llllllll}\hfill \underset{{x\in {\mathbb{R}}^{n}}}{min}\phantom{\rule{1em}{0ex}}& {f\left({x}\right)=\frac{1}{2}\sum _{i=1}^{m}{r}_{i}{\left({x}\right)}^{2}=\frac{1}{2}\phantom{\rule{0.3em}{0ex}}r{\left({x}\right)}^{T}r\left({x}\right)}\phantom{\rule{2em}{0ex}}& \hfill & {\text{nonlinearleastsquaresobjective}}\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\end{array}$$

with the following **gradient** (first derivatives) and **Hessian **(second derivatives)

$$\begin{array}{llll}\hfill \nabla f\left(x\right)& =J\left(x\right)\phantom{\rule{0.3em}{0ex}}r\left(x\right)\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill {\nabla}^{2}f\left(x\right)& =J{\left(x\right)}^{T}J\left(x\right)+\sum _{i=1}^{m}{r}_{i}\left(x\right)\phantom{\rule{3.26288pt}{0ex}}{\nabla}^{2}{r}_{i}\left(x\right),\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\end{array}$$

where $J\left(x\right)$ is the Jacobian of the residuals. Now, the Hessian turns out to be the matrix-matrix product of the transposed Jacobian by itself (also known as the Gauss-Newton matrix) and a sum of $m$ residual Hessians. If the problem has a good fit solution (small residuals ${r}_{i}\left(x\right)$) then close to a solution the second part involving the sum of residual Hessians, ${\sum}_{i=1}^{m}{r}_{i}\left(x\right)\phantom{\rule{3.26288pt}{0ex}}{\nabla}^{2}{r}_{i}\left(x\right)$, can be considered negligible and thus are omitted. This last fact is very important, since evaluating $m$ individual Hessians can be **>extremely **expensive.

**Dedicated LSQ are generally faster because they are aware and exploit these important facts and thus can successfully approximate second derivatives using only the first derivatives of the residuals**. While generic NLP solvers can not make these key assumptions and do not take advantage of them.

General-purpose NLP solvers are capable of solving a large variety of problem classes at the trade-off of not exploiting the underlying problem structure. They should only be used when specialized solvers are not available. In the next example, we illustrate such trade-off by solving a batch of 50 convex QP (Quadratic Programming problems) and comparing the cost incurred in using the general NLP solver nlp2_sparse_solve (e04vh) instead of the specialized convex quadratic one qpconvex2_sparse_solve (e04nq) where both solvers are active set methods. Computational results reported in Figure 2 show that **QP solver e04nq is in median2.3 times faster than NLP solver e04vh** and at least 5 times faster in 28% of the problems.

^{Figure 2. Benchmark on 50 convex QP problems comparing QP and NLP solvers performance. Vertical bar height represents the speedup factor measured as the ratio time$\text{NLP}$$\u2215$time$\text{QP}$}

The biggest speed-up factor is essentially that e04nq, the specialized solver, is tailored to the QP structure and exploits the fact that the Hessian (second derivatives) is constant for all $x$. While the general solver e04vh does not have access to the same information as it does not request second derivatives. In general, NLP solvers that can use second-order derivatives will have a high computational cost per iteration, requiring e.g., costly factorizations of the Hessian or partial updates of an approximation to it — potentially at each iteration.

In this post, we presented two examples that illustrate the trade-off of using generic optimization solvers to solve problems that have an underlying structure. The numerical results show that specialized solvers are preferable to generic ones.

**Choose the right optimization solver**. It works to your advantage to take time to understand the underlying structure of the problem and choose the solver that is tightest to the requirements. This is why the**NAG Library offers you a large choice of optimization solvers**.- The specialised solver delivers better performance (time to solution) but also
**robustness and precision**. - Choosing the right solver is not always an easy task.
**Here at NAG, we are always ready to help you**. We have a friendly support desk and many online resources at your disposal, such as the easy to explore decision-tree diagrams to help you choose the right optimization tool. - If there are two or more solvers that fit your problem then numerical experimenting helps in choosing the best one. We will address this topic in an upcoming post.

Our next blog post in this series will deal with dense and sparse problems, so stay tuned and see you soon!

To see all the previous blogs, please go here. You can also find various examples through our GitHub Local optimization page.