Quant Finance News

In this issue:

Major Update to NAG's Algorithmic Differentiation Tool - dco/c++

NAG continues to pioneer in the development of Algorithmic Differentiation (AD) software and with new research and development coming to the fore this December we announce a major new release of the AD software tool, dco/c++.

dco/c++ is an AD software tool for computing sensitivities of C++ codes. It embodies over 15 man years of R&D, a lot of which has required original research. It's an operator-overloading tool with a slick API: the tool is easy to learn, easy to use, can be applied quickly to a code base, and integrates easily with build and testing frameworks.

What's new in dco/c++ v3.2

  • Re-engineered internals mean dco/c++ is now ~30% faster and uses ~30% less memory
  • Vector reverse mode: for simulations with more than one output, several columns of the Jacobian or Hessian can now be computed at once using vector data types
  • Parallel reverse mode: for simulations with more than one output, the columns of the Jacobian or Hessian can now easily be computed in parallel.  This can be combined with vector reverse mode.
  • Jacobian pre-accumulation: sections of the computation can be collapsed into a pre-computed Jacobian, further reducing memory use
  • Disk tape: allows the tape to be recorded straight to disk.  Although slower, this allows very large computations to complete without having to use checkpointing to reduce memory use
  • Tape activity logging and improved error handling

It might be that the organisation you work at has a licence for NAG’s AD software - do contact us and we’ll check for you. For more information on dco/c++ click here. Trials, training, consulting and help with Proof of Concept projects are all available from NAG.

Algorithmic Differentiation for Accelerators – dco/map

For those wishing to implement AD on accelerators NAG provides dco/map. An overview is available in this Technical Poster High Performance Tape-Free Adjoint AD for C++11.

Presentation Slides: Second Order Sensitivities: AAD Construction and Use for CPU and GPU - by Jacques Du Toit and Chris Kenyon

Technical Report: Batched Least Squares of Tall Skinny Matrices

NAG has produced a highly efficient batched least squares solver for NVIDIA GPUs. The solver allows matrices in a batch to have different sizes and content. The code is optimized for tall skinny matrices. These frequently arise in data fitting problems such as XVA in finance, and are typically not that easy to parallelize. The code is 20x to 40x faster than building a batched GPU least squares solver using the NVIDIA libraries (cuBLAS, cuSolver). This gives a pronounced speedup for applications where the matrices are already in GPU memory.

Read the full technical report 'Batched Least Squares of Tall Skinny Matrices'. The code mentioned here is available to trial; to arrange access email us. 

New Optimization routines added to the NAG Library at Mark 26.1

Two new Optimization routines have just been added to the NAG Optimization Modelling Suite within the NAG Library - Derivative-free Optimization for Data Fitting (thought to be the first such commercial solver available to the public in the world), and an Interior Point Method for Large Scale Linear Programming Problems.

NAG added its first derivative-free solver to the NAG Library approximately five years ago. Since then this field has attracted significant academic attention, resulting in numerous advances. The new Mark 26.1 derivative-free optimization solver can effectively exploit the structure of calibration problems and we are excited to learn of its use by our users.

The new Interior Point Method for Large Scale Linear Programming Problems is built upon a very efficient sparse linear algebra package and implements two variants of interior point methods: the Primal-Dual and Self-Dual methods. The Primal-Dual usually offers the fastest convergence and is the default choice of solver. Both implementations should present significant improvements for large scale problems over the current LP/QP solvers in the Library, such as e04nq. Early client adoption has been strong. Several clients have reported significant speed-ups as a result of adopting this new solver 20x speed up and more has been reported!  If you are an existing supported user of the NAG Library then it is likely that you can access the latest Optimization routines by upgrading your software. If you have any questions about this please don't hesitate to contact us.

Try the Library for 30 days with a full product trial. Apply here.

Webinar: Leverage multi-core performance with Intel Threading Building Blocks (Intel TBB) – January 2018

NAG continues to provide expert teaching provision to aid the use of Intel® Threading Building Blocks (Intel® TBB). The previous webinar was a great success. Do sign up for the next series being held online in January 2018.

On completing the series, participants will have an in-depth knowledge of TBB, how it enables parallel programming, what differentiates it from other parallel programming models, and how to use common parallel programming patterns to parallelize their own code.

The series targets an audience with intermediate or advanced programming experience, and with beginning through intermediate parallel programming experience. The series specifically targets programmers just starting to need parallel programming. It also provides material for experienced high performance computing (HPC) programmers, who have little exposure to task-based tools like TBB.

More information and registration

Making improvements to a gridding algorithm for the Square Kilometre Array (SKA) Telescope on GPUs and Xeon Phi

A big part of what we do at NAG includes tuning and porting codes for the likes of Intel, NVIDIA, ARM and AMD. Although we cannot talk about this work at a specific level to preserve client integrity, we can share when we are asked to do similar work for collaborative projects. The Square Kilometre Array story is one example of how we can add extensive value to investigating and improving numerical codes. The story was published recently at the Supercomputing conference in Denver.

NAG was recently asked by the Scientific Computing Group at the University of Oxford’s prestigious e-Research Centre to investigate methods for improving the performance of a convolution gridding algorithm used in radio astronomy for processing fringe visibilities, targeting Intel Knights Landing (Xeon Phi) and NVIDIA P100 GPU. During their investigation, NAG experts used simulated Square Kilometre Array (SKA) data to observe the potential differences in algorithm enhancements that related to particular hardware choices.

Although the SKA Radio Telescope is not due to begin collecting data until 2020, work is already underway to design and implement the software needed to process the vast amounts of data that the project will produce, hence NAG being asked to look at algorithm use.

NAG is sharing some of the initial comparative performance figures related to the work on the optimization of a signal processing code for large data sets for the SKA project and will publish a Technical Poster on this subject at the Supercomputing Trade Show (SC17) and Conference in Denver.

Take a look at the results here [poster and case story]

New Optimization Corner Technical Blog Series

The first blog in the NAG Optimization Corner is on “The price of derivatives – Using Finite Differences”

Derivatives play an important role in the whole field of nonlinear optimization as many of the algorithms require derivative information in one form or another. This post describes several ways to compute derivatives and focusses on the well-known finite difference approximation in detail.

Through the text we assume that the objective function f(x) is sufficiently smooth and is minimized without constraints.

Why are derivatives so important?

Recall how a typical nonlinear optimization solver works. It gradually improves the estimate of the solution step by step. The solver needs to decide at each iteration the location of the new estimate, however, it has only very local information about the landscape of the function - the current function value and the derivatives. The derivatives express the slope of the function at a point so it seems natural that the derivative information be used to define, for example, a descent search direction and the solver searches along this ray for a new (better) point. In addition, if the derivatives are close to zero, the function is locally flat, they report (together with other pieces of information) that a local solution has been reached. It is easy to imagine that mistakes in derivatives might mislead the solver which then fails. It is therefore crucial to have a reliable way to provide derivatives whenever possible. Read the full blog post here.

Look out for future blogs in the NAG Optimization Corner.

Technical and Numerical Research

Technical Report: A Finite Volume – Alternating Direction Implicit Approach for the Calibration of Stochastic Local Volatility Models


Calibration of stochastic local volatility (SLV) models to their underlying local volatility model is often performed by numerically solving a two-dimensional non-linear forward Kolmogorov equation. We propose a novel finite volume (FV) discretization in the numerical solution of general one- and two-dimensional forward Kolmogorov equations. The FV method does not require a transformation of the PDE. This constitutes a main advantage in the calibration of SLV models as the pertinent PDE coefficients are often non-smooth. Moreover, the FV discretization has the crucial property that the total numerical mass is conserved. Applying the FV discretization in the calibration of SLV models yields a non-linear system of ODEs. Numerical time stepping is performed by the Hundsdorfer–Verwer ADI scheme to increase the computational efficiency. The non-linearity in the system of ODEs is handled by introducing an inner iteration. Ample numerical experiments are presented that illustrate the effectiveness of the calibration procedure.

Technical Poster: Adjoint Algorithmic Differentiation of a GPU Accelerated Application

View the research in full here. To get access this code please contact us

High Performance Tape-free Adjoint AD for C++11 - Introducing dco/map, a cross-platform, accelerator ready AAD tool

View the research in full here 

Read the previous issue