fast-algorithms

On the 6th of October, NAG collaborator Professor Nick Higham of the University of Manchester and I presented a webinar entitled What's New With the Nearest Correlation Matrix? You can watch a recording of the webinar here. In this blog post, I will talk a little more about some of the content we covered, with a focus on the new NAG® Library symbolic adjoint for the nearest correlation matrix (NCM), introduced at Mark 27.3.

First, we need to talk a little about correlations. Given two sequences of data, \(X_1\) and \(X_2\) (or random variables in statistical language), the correlation between them, \(\text{corr}(X_1,X_2)\), is a measure of the degree to which they are linearly related. Given \(n\) such random variables, a correlation matrix is the \(n\times n\) matrix with \( (i,j) \) entry given by \(\text{corr}(X_i,X_j)\). Anywhere that multivariate data is being collected, correlation matrices are likely to be encountered.

A randomly generated correlation matrix
Figure 1: A correlation matrix with entries coloured according to their value. Note the 1s on the diagonal.

Correlation matrices have a number of useful mathematical properties, perhaps the most important of which is that they have non-negative eigenvalues (they are positive semi-definite). However, when data is collected from real-world measurements, it is possible to encounter matrices that are not positive semi-definite. For example, this can happen if data is collected asynchronously or inaccurately, or if there are missing observations.

Given an invalid correlation matrix, \(X\), in order to proceed with subsequent analysis, it may be necessary to fix the matrix, otherwise, we may encounter numerical issues. For example, in financial mathematics, using indefinite matrices in place of true correlation matrices can yield negative variances. We can fix our invalid correlation matrix by finding the nearest correlation matrix. Using the Frobenius norm, this is the matrix, \(G\), which minimizes \[\|X-G\|_F.\]

The NAG Library contains a number of state-of-the-art routines for solving various flavours of the nearest correlation matrix problem, developed in collaboration with Nick Higham and his research group. See Chapter G02 for further details.

One of the most striking things to come out of the webinar was the number of applications of the NCM problem. What started out as an important problem in finance, involving correlations between assets in portfolio management, has now, due to the rise of data science, found additional applications in insurance, seismology, meteorology, measuring sea levels and even tourism.

Irrespective of the application area, NCM problems are typically embedded within a longer workflow. Often, as part of such workflows sensitivities are required, that is, we would like to know how a small (or rather infinitesimal) perturbation to a given input can affect the outputs of the workflow. How, then, can we compute sensitivities for NCM problems?

Sensitivities are obtained by computing derivatives, and we can do this by using the NAG AD Library, where AD stands for automatic differentiation (it is also sometimes known as algorithmic differentiation, and we have previously blogged about it here). The idea is to use a tool to differentiate the code line by line. One such tool is dco/c++, which works seamlessly with the NAG AD Library.

In the language of AD, differentiating our code produces the so-called algorithmic adjoint, but if you are unfamiliar with AD, think of it simply as the derivative. Since Mark 27, we have had algorithmic adjoints available for g02aa and g02ab, which compute nearest correlation matrices by solving an associated optimization problem using a Newton method. NAG's AD capabilities have come about as a result of a long-standing collaboration with Professor Uwe Naumann and his team at RWTH Aachen University.

It's worth making a short digression at this point because there is an important question we haven't considered: are NCM problems even differentiable? Technically, the answer is, NO! So, where does that leave us?

When Qi and Sun[1] developed the original version of the Newton method for the NCM (since enhanced by Higham and Borsdorf[2]), they considered this problem in great depth. The issue is that hidden away in any NCM algorithm is the function \(f(x) = \text{max}(x,0)\), which is used to shift eigenvalues away from the negative real line, but which is nondifferentiable at the origin, as shown in Figure 2.

The function f(x) = max(x,0)
Figure 2: The function \(f(x) = \text{max}(x,0)\)

Also shown in Figure 2 is a series of the red dashed lines touching \(f(x)\) at the origin. These are known as subderivatives, and the complete set of all such lines form what is known as the subdifferential of \(f\) at 0. In our case, the subdifferential is all those lines through the origin with gradient in \([0,1]\). There is a lot of mathematical theory behind subdifferentials, generalizing the idea of derivatives for nondifferentiable functions. Qi and Sun were able to use this theory to show that, provided we are consistent in our choice of which subderivative we use at the origin, then the nondifferentiability in the NCM can be handled. Indeed, numerical experiments show that algorithmic adjoints of the NAG NCM solvers give the same results as other techniques, such as finite-difference approximations (also known as bumping or bump and reval).

The issue of nondifferentiability is an important one in AD. In particular, even mathematically smooth functions could potentially be implemented in a manner that results in nondifferentiable code (for example, by branching to different types of approximation depending on the input values). There's lots more we could discuss here, but that's for another post, as now I'd like to talk about how AD performs.

To compute the algorithmic adjoint, the AD tool needs to perform a data flow reversal on the computer code, which requires both a forward and a reverse pass through the code. During the forward pass, a lot of data must be stored, which is then used during the reverse pass. For example, for a \(250\times 250\) NCM problem, the algorithmic adjoint of g02aa requires about 20GB of memory (whereas the NCM itself requires less than 3GB of memory).

Memory footprint of adjoint modes for g02aa
Figure 3: Memory footprint of adjoint modes for g02aa

As problem sizes increase, the memory requirements of algorithmic adjoints can cause issues. However, for certain algorithms, another approach is available: the symbolic adjoint. To compute the symbolic adjoint, we go back to the mathematics behind the algorithm and differentiate that, before reimplementing the differentiated algorithm.

Runtime of adjoint modes for g02aa
Figure 4: Runtime of adjoint modes for g02aa

In the NAG AD Library, a number of routines now have symbolic adjoints available, and for Mark 27.3, we have developed a symbolic adjoint of g02aa. Figures 3 and 4 show how it performs compared with the algorithmic adjoint. For a \(250\times 250\) NCM problem it is 70x faster and uses 2500x less memory. Such a dramatic performance improvement is not unusual for symbolic vs. algorithmic adjoints. Symbolic adjoints are not always available, but when they are they invariably significantly outperform their algorithmic counterparts. They do, however, require an exact solution of the problem to be valid (for example, full convergence of iterative algorithms), which is not always a given; in these cases, the algorithmic adjoint gives the more robust derivative.

If you have existing code calling the algorithmic adjoint of g02aa, then it is very easy to switch to the symbolic adjoint: you simply set the adjoint strategy in the ad_handle to symbolic: ad_handle.set_strategy(nag::ad::symbolic);

Of course, if your NCM problem was embedded in a longer workflow, then you may have more work to do in order to compute sensitivities, but this is where the power and versatility of AD really becomes apparent. Through a combination of applying AD to your own code (using dco/c++), calling routines from the NAG AD Library, and using symbolic adjoints when they are available, sensitivities can readily be traced through the entirety of your computation. See here for more details on using dco/c++ with the NAG AD Library.

The new symbolic adjoint for g02aa is now available, see here for the documentation. Full product trials are available for the NAG Library and dco/c++.

References

[1] H. Qi and D. Sun (2006). A quadratically convergent Newton method for computing the nearest correlation matrix, SIAM J. Matrix Anal. and Appl., 28(2), pp. 360-385.

[2] R. Borsdorf and N. J. Higham (2010). A preconditioned Newton algorithm for the nearest correlation matrix, IMA J. Num. Anal., 30(1), pp.94-107.

Author
Leave a Comment