*This post is a part of The NAG Optimization Corner series.*

The NAG Optimization Corner series have demonstrated the importance of choosing the right solver to match the problem type, and have shown how not doing so may well lead to a significant waste of computational resource. Finding the most suitable solver for your problem can be challenging, but don’t worry! It’s not like finding a needle in a haystack as the decision trees in Section 5 of NAG Optimization E04 Chapter Introduction offer invaluable help. The decision tree will take you through a series of questions that will guide you to the correct solver, starting with

**Is the problem sparse/large-scale?**

In this blog, we will demonstrate two types of optimization problem to show the importance of this question, and provide guidance on how to answer it.

# What is sparsity?

We call a matrix sparse if its entries are mostly zero. The data representation for a sparse problem has the form of a sparse matrix which usually defines the linear constraint matrix or Jacobian matrix of the nonlinear constraints. A large-scale problem usually has a fairly large number of variables and constraints, yet only a small number of the constraints involve all variables, and most constraints depend on only small subsets of the variables. Thus, the problem is sparse. Figure 1 depicts the sparsity pattern of a sparse matrix of dimension $n=3948$, we can see it is fairly sparse as the $density=nnz\u2215{n}^{2}=0.756\%$ is below $1\%$. **Capturing and passing the sparse structure of a problem, if it exists, to the right solver is essential as the specialized solver takes advantage of the sparse structure of the matrix and avoids wasting computation and memory on the zeros**.

To highlight the importance of exploiting the sparsity of a problem, let’s see an example of Cholesky factorization which is widely used in numerical algorithms such as Interior Point Method (IPM) in convex optimization. All large-scale optimization solvers in the world exploit the sparsity pattern very carefully during matrix factorization to achieve both numerical stability and efficiency.

The Cholesky factorization of a real symmetric positive definite matrix $M$ is to find a factor $L$ such that $M=L{L}^{T}$, where $L$ is a lower triangular matrix. We have selected $9$ positive definite matrices in the Matrix Market [1] that arise from disciplines such as structural engineering and structural mechanics. As shown in Table 1, all of the matrices are fairly sparse with density of less than $1.5\%$.

Prob. ID | Matrix | $n$ | $nnz$ | Density(%) |

$1$ | bcsstk15 | 3948 | 117816 | 0.756 |

$2$ | bcsstk16 | 4884 | 290378 | 1.217 |

$3$ | s1rmt3m1 | 5489 | 217651 | 0.722 |

$4$ | s2rmt3m1 | 5489 | 217681 | 0.722 |

$5$ | s1rmq4m1 | 5489 | 262411 | 0.871 |

$6$ | s3rmq4m1 | 5489 | 262943 | 0.873 |

$7$ | s2rmq4m1 | 5489 | 263351 | 0.874 |

$8$ | bcsstk17 | 10974 | 428650 | 0.356 |

$9$ | bcsstk18 | 11948 | 149090 | 0.104 |

^{Table 1: Statistics on 9 sparse matrices used to test dense and sparse Cholesky factorization. ($n$ number of variables, $nnz$ number of nonzeros, density:=$nnz\u2215{n}^{2}$)}

As discussed before, storing only nonzero entries will reduce memory requirement and exploiting the sparsity pattern will reduce the computational cost greatly. To illustrate this point, we have tested both sparse and dense Cholesky routines on the selected dataset and the result is shown in Figure 2. As we can see, overall the sparse Cholesky is faster than the dense one for all test cases. The reason is not hard to imagine as** the dense algorithm is linked to the dimension of the matrix, whereas the sparse algorithm is mainly connected to the number of nonzeros**. A huge loss in performance from the dense algorithm appears for problem $8$ and $9$ due to the big jump in problem size, yet from its sparse counterpart, there is uniformly good performance as the density remains pretty low across all test cases.

^{Figure 2: Computational time comparison between sparse and dense Cholesky}

To further illustrate how sparsity affects the performance of an algorithm, we randomly generated symmetric positive definite matrices of size 1000, 3000 and 5000. For matrices of a certain size, the density varies from $10\%$ to $100\%$. Then both sparse and dense Cholesky were called to factorize the matrices and the computational time comparison can be found in Figure 3. Unsurprisingly, for matrices of the same size, the time for dense Cholesky doesn’t vary too much even as the density increases. However, sparse Cholesky uses more and more time as the matrix gets denser, and at some point, it will require more computational time than the dense algorithm. **For matrices with relatively high density (such as 20% and above), the tradeoff is no longer viable and the dense counterpart becomes a better choice**.

^{Figure 3: Comparison of factorization methods on matrices of size 1000, 3000, 5000: various density}

A general LP problem has a standard form:

$$\begin{array}{cc}\underset{x\in {\Re}^{n}}{\text{minimize}}\hfill & {c}^{T}x\hfill \\ & \\ \text{subjectto}\hfill & Ax=b,\hfill \\ & \\ \hfill & x\ge 0.\hfill \end{array}$$ | (1) |

Here we used two active-set methods lp_solve (e04mf) (dense solver) and qpconvex2_sparse_solve (e04nq) (sparse solver) from the NAG Library to solve the above model and the results are shown in Table 2. Both test cases have a density less than $1.0\%$ and e04nq spent less than 1 second for each case while the dense solver was significantly slower. Therefore, **passing a sparse problem to a dense solver will completely kill your performance**.

Recently, a NAG Library user contacted our Technical Support team with the following ${l}_{1}$ linear fitting problem

$$\begin{array}{cc}\underset{x\in {\Re}^{n}}{\text{minimize}}\hfill & \parallel Ax-b{\parallel}_{1}\hfill \end{array}$$ | (2) |

where matrix $A$ is tall and slim, e.g., of dimension $1000\times 21$ and full dense matrix. At first glance, it might appear to be a dense problem as there are no zeros in the problem data $A$ and $b$. The ${l}_{1}$-norm minimization can be reformulated to Linear Programming, so the user performed the transformation and used active-set methods lp_solve (e04mf) (dense solver). However, the solver was not as fast as he expected and the user sought our advice.

This type of misunderstanding and confusion is not unusual and care needs to be taken. In order to decide whether the problem is sparse or dense, we need to consider the data of the problem model that the solver solves, not the original data. For instance, problem (2) can be reformulated into LP problem

Taking a closer look at the inequality constraints

$$-y\le Ax-b\le y\phantom{\rule{1em}{0ex}}\iff \phantom{\rule{1em}{0ex}}\left(\begin{array}{cc}\hfill A\hfill & \hfill -I\hfill \\ \hfill -A\hfill & \hfill -I\hfill \end{array}\right)\left(\begin{array}{c}\hfill x\hfill \\ \hfill y\hfill \end{array}\right)\le \left(\begin{array}{c}\hfill b\hfill \\ \hfill -b\hfill \end{array}\right),$$

where $I$ is the identity matrix of dimension $m$, we can see the number of variables has been increased by $m$ and many zero entries have been introduced to the model. Thus, a sparse LP solver should be chosen for this particular model. We tested it with a matrix $A$ of dimension $1000\times 21$ using the active-set method solver lp_solve (e04mf) and sparse Interior Point Method solver handle_solve_lp_ipm (e04mt) from the NAG Library. As we can see in Table 3, the sparse LP solver again shows great performance compared to the dense solver.

Solver | e04mf | e04mt | e02ga |

Time | 49.7s | 0.18s | 0.007s |

^{Table 3: Three solvers for ${l}_{1}$ linear fitting problem}

Clearly, the sparse solver is the reason for the improved computation time. However, the dedicated solver glin_l1sol (e02ga) for problem (2) did an even better job than the sparse LP solver, since it exploits thoroughly the known problem structure. **Overall the dedicated solver delivered a** $7100\times $** speedup over the original attempt**. Therefore, it is always recommended to try a dedicated solver first for a particular problem. Dedicated solvers are slightly out of the scope of this blog, please see NAG Library documentation for the full content of NAG offerings in dedicated solvers.

Sparsity is a great feature and should be considered carefully.

- Choose a sparse solver if there is an underlying sparse structure in your problem. Otherwise, you will significantly harm your performance.
- It’s important to
**analyse the data structure**that the solver sees instead of original data if reformulation is involved. - Don’t forget to
**search for a dedicated solver first**!

To see all the previous blogs, please go here. You can also find various examples through our GitHub Local optimisation page. See you in the next blog.