## Introduction

A real *n*-by-*n* matrix * A* will, in general, be defined by its

*n*

^{2}elements that would need to be stored for computations to be performed using

*. If*

**A***n*is very large, this can amount to a lot of storage.

Fortunately, many real world applications use sparse matrices that are comprised mainly of zero-valued elements; the number of defining elements that need to be stored is typically some fixed multiple of *n*
(30*n*, say). A special case is the banded matrix:

This banded matrix has elements defined on its main diagonal (the dashed line), on *k _{l}* sub-diagonals (blue region) and on

*k*super-diagonals (green region). The total number of non-zero elements is

_{u}*(k*minus a few corner points. What we would really like to do is solve problems using this matrix, like

_{l}+k_{u}+1)n**, but storing only the non-zero elements. Various storage formats emerged from various packages, but the most familiar to people solving such problems is that adopted by the linear algebra package LAPACK (and also used extensively in the NAG Libraries). This works by storing the non-zero elements of**

*AX = Y**in a rectangular, (*

**A***k*) by

_{l}+k_{u}+1*n*, matrix

*, such that the diagonals of*

**B***become rows of*

**A***while the columns of*

**B***remain as columns of*

**A***:*

**B**
While this is appealing in terms of being able to reconstruct readily in our mind's eye the original matrix, it is less well suited to solving problems (such as * AX = B*) as quickly as possible.
The problem is that most matrix computations rely, for their speed, on very fast kernel operations (written by the likes of Intel or AMD) working on smallish matrix blocks; this requires matrix blocks to be
stored as you would expect for a general matrix, while in

*above the block elements have become higgledy-piggledy. What is required is a storage format that keeps some of the block structure of the original matrix*

**B**

**A**## The Simplest Case - to ease you in

To begin to describe an alternative storage scheme that preserves block structure, it is probably best to begin with the simplest scenario ... a symmetric matrix with bandwidth *k* and dimension *n* =
(*k*+1)*m* (that is, *n* is an integer multiple of *k+1*)

Once you partition a lower band (including main diagonal) into triangles as above, the possibilities for improved storage forms become immediately clear. The * B* triangular blocks can either be slid up or
slid across to the right to complete a square (

*k+1*by

*k+1*) block with a neighbouring

*triangular block. To make it easier to convert from LAPACK storage format, I will go for sliding the*

**A***blocks up, giving:*

**B**The zero paranoiacs might want to remove the trailing zeros above by storing the last ** A_{3}** triangle in

*Rectangular Full Packed*(RFP) format [see here], but this may be a little over-the-top.

### What can you do with these blocks?

Those in touch with their linear algebra side will now wonder what can be done with the banded matrix stored in this form. In particular, they will whisper the words *Cholesky decomposition*. Don't worry, it is
just the first of two steps in solving * AX = Y*, a set of simultaneous equations. Without going into the nitty-gritty, I will just say that the lower triangular banded

*formed in this decomposition can be partitioned exactly as above, and it can work with each triangular block in turn, overwriting them as it goes. The second stage of solution,*

**L***backsubstitution*, then works with the triangular blocks of

*to return the solution*

**L**

**X****How do I put A into this strange storage format?**

**You couldn't be expected to do this yourself. On the other hand it would be easy to write a utility routine that converted from the LAPACK style format to the above triangular block style; and this can be done in-place
since columns are preserved. The mapping between elements of the blocked storage matrix (C, say) and the elements of A is also quite straightforward: **

**for***C*_{qj}= A_{ij}*j = 1,...,n*and*i = j,...,j+k*, where*q = mod(i-1,k+1) + 1*.**This makes life easy when A_{ij} is a function of i and j.**

**Â **

**The next simplest case -- to lessen your ease**

**In the simplest case everything fitted just nicely, but for more general n there are going to be some leftovers at the bottom-right of A. A picture tells the story ...**

**And the mapping above tells us where to put the left-overs ...**

**Â **

**Next?**

**This stuff is all very well when we have symmetric matrices, but what about non-symmetric matrices like that in Illustration 1? I'll leave that till the next blog (very soon). Lawrence**