### Blog Author: Dr Jennifer Pestana - Mathematics and Statistics Lecturer, University of Strathclyde

Linear systems involving Toeplitz matrices arise in many applications, including differential and integral equations and signal and image processing (see, e.g., this article and the books by Ng, and Chan and Jin). More recently, Toeplitz systems have appeared in discretisations of fractional diffusion problems. This is because fractional diffusion operators are non-local, and lead to dense matrices; if these dense matrices are Toeplitz, it's possible to develop fast solvers.

In a recent blog post for NAG, Mike Croucher showed that using specialised direct Toeplitz solvers, rather than a generic solver, can result in a massive speed-up. Here, we show that in addition to these tailored direct approaches, preconditioned iterative methods can be competitive for these problems. Perhaps surprisingly, this is true even when the Toeplitz matrix is dense.

## Toeplitz Linear Systems

We want to solve a linear system \(Ax=b\), where \(A \in \mathbb{R}^{n\times n}\) is a Toeplitz matrix, i.e., a matrix with constant diagonals. Since \(A\) has (at most) \(2n-1\) degrees of freedom, it's perhaps not surprising that we can solve this linear system with \(O(n^2)\) or fewer flops, using fast or superfast solvers. In fact, if \(A\) is symmetric positive definite, we can apply the NAG routine used in Mike's post.

But what if our Toeplitz matrix is nonsymmetric? We would still like a fast solver, but in this case codes for fast and superfast direct solvers aren't so easy to come by. As we shall see, an alternative is to instead apply preconditioned iterative methods. (Note that these methods can also be applied if \(A\) is symmetric.)

Let's first set up our nonsymmetric problem. This one happens to be a lower Hessenberg matrix from a fractional diffusion problem (Example 4.2 from this paper), but there's nothing particularly special about it.

from naginterfaces.library.lapacklin import dgesv # A general solver from naginterfaces.library.sparse import real_gen_basic_setup, real_gen_basic_solver, real_gen_basic_diag # Nonsymmetric iterative method codes from naginterfaces.library.sparse import real_symm_basic_setup, real_symm_basic_solver, real_symm_basic_diag # Symmetric iterative method codes from pytictoc import TicToc # For timing import numpy as np import scipy.linalg from scipy.fftpack import fft, ifft # For fast matrix-vector products timer = TicToc() # Construct a real, nonsymmetric Toeplitz matrix matrix_size = 10000 alpha=1.3; g=np.zeros(matrix_size+1) g[0]=1 for i in range(matrix_size): g[i+1]=(1-(alpha+1)/(i+1))*g[i] col = g[1:matrix_size+1] row = np.concatenate([g[1::-1]*np.ones(2),np.zeros(matrix_size-2)]) A = scipy.linalg.toeplitz(col,row) # Construct a right-hand side x = np.random.rand((matrix_size)) x = x/np.sqrt(matrix_size) b = np.matmul(A,x) b_nag_gen = np.reshape(b,[matrix_size,1]) # Reshape b for the generic NAG solver

Let's start by applying a general-purpose solver:

timer.tic() [asol,ipiv,x_direct] = dgesv(A, b_nag_gen) timer.toc() t_direct = timer.tocvalue() # This next line will enable us to compare with the other solution vectors we obtain x_direct = np.reshape(x_direct,matrix_size)

Elapsed time is 15.417057 seconds.

## Krylov Subspace Methods

We want to see if we can do better by exploiting the Toeplitz structure. Here, we'll use Krylov subspace methods. Since these are iterative methods, the time they take to run depends on both the cost per iteration and the number of iterations.

One of the main contributors to the cost per iteration is the computation of one or two matrix-vector products with \(A\) at each iteration. This is why Krylov methods are usually used for sparse matrices. However, if \(A\) is Toeplitz, then regardless of whether it is sparse or dense, matrix-vector products can be performed quickly via a circulant embedding and the fast Fourier transform.

Let's see this in action. To do so, we need to set up a couple of vectors that store the eigenvalues of the circulant embedding matrices. (We have one for \(A\) and one for \(A^T\) since both will be needed by the iterative solver.)

Avec = fft(np.concatenate([col,np.zeros(1),row[matrix_size:0:-1]])) ATvec = fft(np.concatenate([row,np.zeros(1),col[matrix_size:0:-1]]))

We'll first test the speed of the usual matrix-vector product with \(A\):

# Set up a random vector np.random.seed(2) y = np.random.rand(matrix_size) # Perform the usual matrix-vector product timer.tic() z_slow = A.dot(y) timer.toc() t_slow = timer.tocvalue()

Elapsed time is 0.054436 seconds.

And now the fast matrix-vector product:

timer.tic() z = ifft(np.multiply(Avec,fft(np.concatenate([y,np.zeros(matrix_size)])))) z_fast = np.asarray(z.real[0:matrix_size], order='C') timer.toc() t_fast = timer.tocvalue()

Elapsed time is 0.009971 seconds.

which is quite a bit faster:

t_slow/t_fast

5.1808748916460114

The difference in speed becomes more pronounced as the dimension \(n\) increases since a generic matrix-vector product is \(O(n^2)\) while the fast Toeplitz product is \(O(n\log n)\). To check that we're not compromising on accuracy, we can look at \(\|z_{\text{fast}} - z_{\text{slow}}\|_2\), from which we see that the fast Toeplitz matrix-vector product is pretty accurate.

np.linalg.norm(z_fast - z_slow)

3.1262523228454646e-14

## Preconditioning

The number of iterations of our Krylov method depends on \(A\), \(b\) and the initial guess of \(x\) in a highly nonlinear manner. However, for our problem (and many other Toeplitz problems) the number of iterations needed for an acceptably accurate solution is high.

To reduce the number of iterations needed it is typical to precondition, i.e., to change the linear system to an equivalent one with "better" properties. Left preconditioning replaces the original system by \(P^{-1}Ax = P^{-1}b\), for some invertible matrix \(P \in \mathbb{R}^{n\times n}\), but right preconditioning, and preconditioning that preserves symmetry, are possible.

There are many preconditioners for Toeplitz systems, a number of which are based on properties of a scalar-valued function known as the generating function or symbol. More information on suitable preconditioners can be found in the books by Ng, and Chan and Jin, mentioned at the start of this post.

We're using the Strang preconditioner here. To apply this preconditioner quickly, we'll make use of scipy's fast circulant solve (which is again based on the fast Fourier transform). This means we only need to store the first column, \(c\), of our preconditioner.

midpoint = np.uintc(np.floor(matrix_size/2)) c = np.concatenate([col[0:midpoint+1],row[matrix_size-midpoint-1:0:-1]])

Now we're ready to apply our Krylov subspace method. Here we'll use NAG's left-preconditioned restarted GMRES (GMRES(50)). However, since we will need fewer than 50 iterations, RGMRES is equivalent to standard left-preconditioned GMRES.

The termination criterion is the default one in the NAG routine f11bdf with a tolerance of \(10^{-8}\). The NAG iterative solvers use reverse communication; for more information see the documentation.

timer.tic() # Settings for RGMRES method = 'RGMRES' precon = 'P' m = 50 tol = 1e-8 maxitn = 10 anorm = 0 sigmax = 0 monit = -1 # Initialisation routine comm = real_gen_basic_setup(method, precon, matrix_size, m, tol, maxitn, anorm, sigmax, monit, norm=None, weight='N', iterm=1) irevcm = 0; u = np.zeros(matrix_size) v = b wgt = np.zeros(matrix_size) # GMRES solver while (irevcm != 4): irevcm = real_gen_basic_solver(irevcm,u,v,wgt,comm) if irevcm == -1: # v = A^T*u y = ifft(np.multiply(ATvec,fft(np.concatenate([u,np.zeros(matrix_size)])))) v = np.asarray(y.real[0:matrix_size], order='C') elif irevcm == 1: # v = A*y y = ifft(np.multiply(Avec,fft(np.concatenate([u,np.zeros(matrix_size)])))) v = np.asarray(y.real[0:matrix_size], order='C') elif irevcm == 2: v = np.asarray(scipy.linalg.solve_circulant(c,u), order='C') elif irevcm == 3: [itn,stplhs,stprhs,anorm,sigmax] = real_gen_basic_diag(comm) [itn,stplhs,stprhs,anorm,sigmax] = real_gen_basic_diag(comm) timer.toc() t_gmres = timer.tocvalue() x_gmres = u # store the approximate solution

Elapsed time is 0.074340 seconds.

This is quite a lot faster than the general solver:

t_direct/t_gmres

206.8624024704361

We can also see how many GMRES iterations were performed:

itn

10

and the relative residual norm at termination (\(\|b-Ax_{\text{gmres}}\|_2/\|b\|_2\))

np.linalg.norm(b-np.matmul(A,x_gmres))/np.linalg.norm(b)

2.8162012822505773e-14

which looks pretty good! Now let us compare our solution with the one obtained by the general purpose solver. We'll measure the relative error, \(\|x_{\text{direct}} - x_{\text{gmres}}\|_2/\|x_{\text{direct}}\|_2\):

np.linalg.norm(x_direct - x_gmres)/np.linalg.norm(x_direct)

1.1140269961814175e-11

so the two solutions are very close! This shows that, for Toeplitz systems, Krylov subspace methods are certainly worth considering.

## Symmetrisation of Toeplitz matrices

For Toeplitz problems we have another trick up our sleeve: we can transform our nonsymmetric Toeplitz matrix to a symmetric Hankel matrix by flipping the rows (or columns) of \(A\). Mathematically, we solve \(YAx = Yb\) (or \(AYz = b\), \(z = Yx\)), where \(Y\) is the reverse identity matrix.

The advantage of symmetrising is that we can use a Krylov subspace method with some nice properties. We'll use MINRES, which is quite similar to GMRES but typically has a lower cost per iteration (and some other nice features).

To set up, we'll first flip the right-hand side vector.

Yb = np.asarray(np.flipud(b),order='C')

The next thing to sort out is the preconditioner, which for MINRES should be symmetric positive definite (to preserve symmetry). We'll simply take \((C^TC)^{1/2}\), where \(C\) is the Strang preconditioner we used above. This is sometimes called the absolute value circulant preconditioner. Below is a cheap way of computing the first column of this circulant, but the details aren't important.

c_abs = ifft(np.abs(fft(c))) c_abs = c_abs.real

We're now ready to apply NAG's preconditioned MINRES with the absolute value circulant preconditioner. We're again using a tolerance of \(10^{-8}\) for termination, but note that the stopping criterion is different from the RGMRES one (for details, see the documentation for f11gdf).

timer.tic() # Settings for MINRES method = 'MINRES' precon = 'P' tol = 1e-8 maxitn = 500 anorm = 0 sigmax = 0 maxits = 7 monit = -1 # Initialisation routine [lwreq,comm] = real_symm_basic_setup(method, precon, matrix_size, tol, maxitn, anorm, sigmax, maxits, monit) irevcm = 0; u = np.zeros(matrix_size) v = Yb wgt = np.zeros(matrix_size) # MINRES solver while (irevcm != 4): irevcm = real_symm_basic_solver(irevcm,u,v,wgt,comm) if irevcm == 1: # v = A*u y = ifft(np.multiply(Avec,fft(np.concatenate([u,np.zeros(matrix_size)])))) v = np.asarray(np.flipud(y.real[0:matrix_size]),order='C'); elif irevcm == 2: v = np.asarray(scipy.linalg.solve_circulant(c_abs,u), order='C') elif irevcm == 3: [itn,stplhs,stprhs,anorm,sigmax,its,sigerr] = real_symm_basic_diag(comm) [itn,stplhs,stprhs,anorm,sigmax,its,sigerr] = real_symm_basic_diag(comm) timer.toc() t_minres = timer.tocvalue() x_minres = u # store the approximate solution

Elapsed time is 0.049735 seconds.

This is also much faster than the direct solver:

t_direct/t_minres

307.79616807696027

It's also faster than GMRES because MINRES has a low cost per iteration:

t_gmres/t_minres

1.4879270684335653

It's also faster than GMRES because MINRES has a low cost per iteration:

itn

13

However, MINRES does require more iterations than GMRES, which highlights that both the cost per iteration and the number of iterations determine the total time of the Krylov subspace method.

The relative residual norm at termination (\(\|b-Ax_{\text{minres}}\|_2/\|b\|_2\)) is also quite small, although it's not as small as the GMRES relative residual norm we obtained:

np.linalg.norm(b-np.matmul(A,x_minres))/np.linalg.norm(b)

9.670593217978096e-09

Looking now at the relative error, \(\|x_{\text{direct}} - x_{\text{minres}}\|_2/\|x_{\text{direct}}\|_2\), we see that this is also small, although again not as small as for GMRES. (Decreasing the tolerance would improve the accuracy of the MINRES solution, at a cost of more iterations.)

np.linalg.norm(x_direct - x_minres)/np.linalg.norm(x_direct)

1.6661088967643173e-06

## Conclusions

Direct solvers for Toeplitz matrices can be extremely fast. However, for certain problems, particularly those for which a fast direct method isn't readily available, iterative solvers can be a great alternative. In the example above, both GMRES and MINRES were orders of magnitude faster than the general-purpose solver.

Iterative solvers are often daunting for practitioners because they are not "black box" methods. Hopefully, the example above shows that this need not be a barrier; fast matrix-vector products and preconditioners may only require a couple of lines of code. Our experience is that the MINRES method proposed above is typically faster than GMRES when a good preconditioner is available, but other nonsymmetric solvers, such as CGS, BiCGStab or TFQMR may also be good options. The nice thing about the NAG Library is that switching between these iterative methods requires only a few small changes to the code.

It's important to note that for some problems the simple preconditioners above may not be so suitable. However, there are now many options available for different types of systems, with the books by Ng, and Chan and Jin a great place to start.

### NOTE FROM NAG: Toeplitz solver provision is a current area of development research at NAG. If you're looking for a solver today, see 'Real Toeplitz Solve' for the NAG Library for Python version, and the NAG Library equivalent see here.

` `