Optimization Corner

This post is a part of The NAG Optimization Corner series.

The NAG Optimization Corner series have demonstrated the importance of choosing the right solver to match the problem type, and have shown how not doing so may well lead to a significant waste of computational resource. Finding the most suitable solver for your problem can be challenging, but don’t worry! It’s not like finding a needle in a haystack as the decision trees in Section 5 of NAG Optimization E04 Chapter Introduction offer invaluable help. The decision tree will take you through a series of questions that will guide you to the correct solver, starting with

Is the problem sparse/large-scale?

In this blog, we will demonstrate two types of optimization problem to show the importance of this question, and provide guidance on how to answer it.

What is sparsity?

 
We call a matrix sparse if its entries are mostly zero. The data representation for a sparse problem has the form of a sparse matrix which usually defines the linear constraint matrix or Jacobian matrix of the nonlinear constraints. A large-scale problem usually has a fairly large number of variables and constraints, yet only a small number of the constraints involve all variables, and most constraints depend on only small subsets of the variables. Thus, the problem is sparse. Figure 1 depicts the sparsity pattern of a sparse matrix of dimension n = 3948, we can see it is fairly sparse as the density = nnzn2 = 0.756% is below 1%. Capturing and passing the sparse structure of a problem, if it exists, to the right solver is essential as the specialized solver takes advantage of the sparse structure of the matrix and avoids wasting computation and memory on the zeros.

Impact of sparsity on Cholesky factorization

To highlight the importance of exploiting the sparsity of a problem, let’s see an example of Cholesky factorization which is widely used in numerical algorithms such as Interior Point Method (IPM) in convex optimization. All large-scale optimization solvers in the world exploit the sparsity pattern very carefully during matrix factorization to achieve both numerical stability and efficiency.

The Cholesky factorization of a real symmetric positive definite matrix M is to find a factor L such that M = LLT, where L is a lower triangular matrix. We have selected 9 positive definite matrices in the Matrix Market [1] that arise from disciplines such as structural engineering and structural mechanics. As shown in Table 1, all of the matrices are fairly sparse with density of less than 1.5%.

Prob. ID Matrix n nnz Density(%)
1 bcsstk15 3948 117816 0.756
2 bcsstk16 4884 290378 1.217
3 s1rmt3m1 5489 217651 0.722
4 s2rmt3m1 5489 217681 0.722
5 s1rmq4m1 5489 262411 0.871
6 s3rmq4m1 5489 262943 0.873
7 s2rmq4m1 5489 263351 0.874
8 bcsstk17 10974 428650 0.356
9 bcsstk18 11948 149090 0.104

Table 1: Statistics on 9 sparse matrices used to test dense and sparse Cholesky factorization. (n number of variables, nnz number of nonzeros, density:=nnzn2)

As discussed before, storing only nonzero entries will reduce memory requirement and exploiting the sparsity pattern will reduce the computational cost greatly. To illustrate this point, we have tested both sparse and dense Cholesky routines on the selected dataset and the result is shown in Figure 2. As we can see, overall the sparse Cholesky is faster than the dense one for all test cases. The reason is not hard to imagine as the dense algorithm is linked to the dimension of the matrix, whereas the sparse algorithm is mainly connected to the number of nonzeros. A huge loss in performance from the dense algorithm appears for problem 8 and 9 due to the big jump in problem size, yet from its sparse counterpart, there is uniformly good performance as the density remains pretty low across all test cases.

PIC

Figure 2: Computational time comparison between sparse and dense Cholesky

To further illustrate how sparsity affects the performance of an algorithm, we randomly generated symmetric positive definite matrices of size 1000, 3000 and 5000. For matrices of a certain size, the density varies from 10% to 100%. Then both sparse and dense Cholesky were called to factorize the matrices and the computational time comparison can be found in Figure 3. Unsurprisingly, for matrices of the same size, the time for dense Cholesky doesn’t vary too much even as the density increases. However, sparse Cholesky uses more and more time as the matrix gets denser, and at some point, it will require more computational time than the dense algorithm. For matrices with relatively high density (such as 20% and above), the tradeoff is no longer viable and the dense counterpart becomes a better choice.

PIC

Figure 3: Comparison of factorization methods on matrices of size 1000, 3000, 5000: various density

Example in Linear Programming (LP)

A general LP problem has a standard form:

minimizex ncTx subject to Ax = b, x 0. (1)

Here we used two active-set methods lp_solve (e04mf) (dense solver) and qpconvex2_sparse_solve (e04nq) (sparse solver) from the NAG Library to solve the above model and the results are shown in Table 2. Both test cases have a density less than 1.0% and e04nq spent less than 1 second for each case while the dense solver was significantly slower. Therefore, passing a sparse problem to a dense solver will completely kill your performance.

Problem n m nnz Density
          Iter Time Iter Time
25fv47 1571 821 11127 0.86% 8030 36.9s 6923 0.46s
bnl2 3489 2324 16124 0.2% 7568 229.1s 4560 0.58s

Table 2: Two LP problems for two LP solvers (same algorithm type)

Be prepared for the surprise from dedicated solvers

Recently, a NAG Library user contacted our Technical Support team with the following l1 linear fitting problem

minimizex nAx - b1 (2)

where matrix A is tall and slim, e.g., of dimension 1000 × 21 and full dense matrix. At first glance, it might appear to be a dense problem as there are no zeros in the problem data A and b. The l1-norm minimization can be reformulated to Linear Programming, so the user performed the transformation and used active-set methods lp_solve (e04mf) (dense solver). However, the solver was not as fast as he expected and the user sought our advice.

This type of misunderstanding and confusion is not unusual and care needs to be taken. In order to decide whether the problem is sparse or dense, we need to consider the data of the problem model that the solver solves, not the original data. For instance, problem (2) can be reformulated into LP problem

minimizex n,y m iyi subject to - y Ax - b y, y 0. (3)

Taking a closer look at the inequality constraints

-y Ax-b y A -I-A -I x y b -b ,

where I is the identity matrix of dimension m, we can see the number of variables has been increased by m and many zero entries have been introduced to the model. Thus, a sparse LP solver should be chosen for this particular model. We tested it with a matrix A of dimension 1000 × 21 using the active-set method solver lp_solve (e04mf) and sparse Interior Point Method solver handle_solve_lp_ipm (e04mt) from the NAG Library. As we can see in Table 3, the sparse LP solver again shows great performance compared to the dense solver.

Solver e04mf e04mt e02ga
Time 49.7s 0.18s 0.007s

Table 3: Three solvers for l1 linear fitting problem

Clearly, the sparse solver is the reason for the improved computation time. However, the dedicated solver glin_l1sol (e02ga) for problem (2) did an even better job than the sparse LP solver, since it exploits thoroughly the known problem structure. Overall the dedicated solver delivered a 7100× speedup over the original attempt. Therefore, it is always recommended to try a dedicated solver first for a particular problem. Dedicated solvers are slightly out of the scope of this blog, please see NAG Library documentation for the full content of NAG offerings in dedicated solvers.

Things to remember

Sparsity is a great feature and should be considered carefully.

  • Choose a sparse solver if there is an underlying sparse structure in your problem. Otherwise, you will significantly harm your performance.
  • It’s important to analyse the data structure that the solver sees instead of original data if reformulation is involved.
  • Don’t forget to search for a dedicated solver first!

To see all the previous blogs, please go here. You can also find various examples through our GitHub Local optimisation page. See you in the next blog.

References

[1]   Ronald F Boisvert, Roldan Pozo, Karin Remington, Richard F Barrett, and Jack J Dongarra. Matrix market: a web resource for test matrix collections. In Quality of Numerical Software, pages 125–137. Springer, 1997.

Author
Leave a Comment