# Algorithmic Differentiation Tools & Adjoint Solvers

Precise derivative information at a fraction of the cost
What is Algorithmic Differentiation?

Algorithmic Differentiation (AD) is a novel way to differentiate computer code, i.e. to compute ∂F/∂s1, ∂F/∂x2,... where F is given by a computer program, e.g.

   if(x1<x2) then
F = x1*x1 + x2*x2 + exp(x3*x3) + ...
else
F = x1 + x2 + x3 + ...
endif


The derivatives are mathematically exact and computed to machine precision. Traditionally, derivative estimates would be obtained by finite differences (also called "bump and revalue"), which is slow and can be inaccurate. In contrast, AD (and adjoint AD in particular - AAD) gives precise derivative information at a fraction of the cost. NAG and expert partners provide a variety of services and software for Algorithmic Differentiation including consulting, training courses, software tools and the embedding of these in client applications plus a compiler and algorithms for AD.

## Algorithmic Differentiation Webinar

NAG Algorithmic Differentiation Services

## What does NAG Provide?

Adjoint AD is demanding to apply to non-trivial codes. The biggest (but by no means only) problem is memory use. Any production AD tool must allow this to be controlled effectively. In addition, since AD operates at source code level, 3rd party library dependencies can pose problems since the underlying source is not available to the tool. A production AD tool must have a way to handle these dependencies, ideally by being applied to the underlying (3rd party) source to create a full AD solution.

• dco/c++ integrates easily with your code base, is extremely flexible, and has been applied to huge (millions of lines) production codes
• dco/c++ allows the memory use of adjoint codes to be constrained almost arbitrarily
• dco/map is a cutting edge C++11 tape free operator overloading AD tool designed specifically to handle accelerators (GPUs, Xeon Phi, etc).  An overview is available here: New Technical Poster: High Performance Tape-Free Adjoint AD for C++11
• dco/map combines the benefits of operator overloading (easy application, one single source code) with the benefits of handwritten adjoints (very fast and minimal memory use)
• dco/map can handle the race conditions inherent in parallel adjoint codes easily and efficiently
• dco/c++ and dco/map are easily coupled to create AAD implementations of heterogeneous codes

## Adjoint Versions of Mathematical Routines

NAG provides AD versions of mathematical and statistical algorithms such as those found in the NAG Library. NAG delivers AD routines that may be used with NAG's own AD tools as well as 3rd party AD tools or handwritten adjoint codes.

NAG in partnership with RWTH Aachen University provides consulting services, training, software and support to develop AD solutions for clients. NAG has delivered AD training to several large financial institutions to give them a solid grounding in the mathematical, computer science and computational aspects of AD, common pitfalls and techniques for working around them, as well as various tricks and optimizations. NAG has also helped to embed AD tools into large Tier 1 quantitative finance libraries and has helped overcome particular memory constraints in large adjoint codes.

## NAG Fortran Compiler for AD

For simulation programs written in Fortran a version of the NAG Fortran Compiler has been extended to serve as a pre-processor to dco. The seamless integration of AD into a complex build system is facilitated. Hence, the amount of modifications to be made by the user in the original source code can be minimized or even be eliminated entirely.

The principles of AD in the context of computational finance and techniques for avoiding memory problems in adjoint codes is discussed in NAG Technical Report TR3/14 (3). Readers of the technical report can request the example programs it is based on. The example programs include a copy of dco and a trial licence so that readers can run the examples on their own machines, and apply dco to their own codes.

Algorithmic Differentiation Examples

Here are some examples of what adjoint AD has enabled in various domains.

Engineering

AD enables sensitivity analyses of huge simulations, enabling shape optimization, intelligent design and and comprehensive risk studies.

Figure 1 shows sensitivities of the drag coefficient to each point on a car's surface when it moves at high speed (left) and low speed (right). The simulation was performed with AD-enabled OpenFOAM built on top of the AD tool dco (See below for information about dco). The surface mesh had 5.5 million cells and the gradient vector was 18GB. (1)

The normal simulation on a top-end desktop took 44s, while the AD-enabled simulation took 273s. To obtain the same gradient information, on the same machine, by finite differences would take roughly 5 years.

The gradient information can now be used to optimize the shape of the car so as to reduce the drag.

(1) Towara M and Naumann U (2013). A discrete adjoint model for OpenFOAM. Procedia Comp. Sci. Volume 18.

Scientific Modelling

AD helps our understanding of climate change and improves weather predictions.

Figure 2 shows the sensitivity of the amount of water flowing through the Drake passage to changes in the topography of the ocean floor. The simulation was performed with the AD-enabled MIT Global Circulation Model (MITgcm) run on a supercomputer. The ocean was meshed with 64,800 grid points. (2)

Obtaining the gradient through finite differences took a month and a half. The adjoint AD code obtained the gradient in less than 10 minutes.

The gradient information can be used to further refine climate prediction models and our understanding of global weather, for example the high sensitivity over the South Pacific Ridge and Indonesian Throughflows even though these are far away from the Drake Passage.

(2) Utke J, Naumann U, Wunsch C, Hill C, Heimbach P, Fagan M, Tallent N and Strout M. (2008). OpenAD/F: A modular, open-source tool for automatic differentiation of Fortran codes. ACM Trans. Math. Softw, 34(4) 18:1-18:36.

Finance

AD and AAD is used in finance to get sensitivities of complex instruments quickly, enabling real time risk management and hedging of quantities like xVA.

Here we show some results from a paper which studied two typical codes arising in finance: Monte Carlo and PDEs. (3)

Monte Carlo
34 0.5s 29.0s 3.0s (2.5MB) 6.0x 9.7x
62 0.7s 80.9s 5.1s (3.2MB) 7.3x 15.9x
142 1.5s 423.5s 12.4s (5.1MB) 8.3x 34.2x
222 2.3s 1010.7s 24.4s (7.1MB) 10.6x 41.4x
PDE
34 0.6s 37.7s 11.6s (535MB) 19.3x 3.3x
62 1.0s 119.5s 18.7s (919MB) 18.7x 6.4x
142 2.6s 741.2s 39s (2GB) 15.0x 19x
222 4.1s 1857.3s 60s (3GB) 14.6x 31x

Table 1: Run times and memory requirements as a function of gradient size n for Monte Carlo and PDE applications.

Table 1 shows the runtimes of a first-order adjoint code using dco vs. central finite differences on a typical finance application (option pricing under local volatility model, 10K sample paths/spatial points and 360 time steps). The second column f is the normal runtime of the application, cfd is the runtime for central finite differences and AD is adjoint AD runtime along with additional memory required (tape size). Calculations were run on a laptop so only the relative runtimes AD/f and cfd/AD are important, the latter showing the speedup of AD over finite differences.

In finance such derivative information is often used for hedging and risk calculations, so these gradients must be computed many times per day.

(3) du Toit J and Naumann U (2014). Adjoint algorithmic differentiation tool support for typical numerical patterns in computational finance. NAG Technical Report TR3/14.

Algorithmic Differentiation vs. Finite Differences

AD computes mathematical derivatives of a computer program. Conceptually the process is straightforward:

• Computers can only add, subtract, multiply and divide floating point numbers
• Any computer program (operating on floating point numbers) simply consists of many of these fundamental operations strung together, along with calls to special functions (e.g. sin, cos, exp, etc.)
• It is simple to compute the derivatives of all these basic operations
• We can use the chain rule to combine these basic derivatives to get the derivative of our computer program

This process of differentiating a code can be applied recursively to generate 2nd, 3rd, 4th or higher derivatives of any computer program. Since the code is differentiated symbolically (albeit at a very low level), the derivatives are mathematically exact and are computed to machine accuracy.

Traditionally derivative estimates are computed through finite differences (also called “bumping”). Compared with these, AD offers three clear advantages:

• Finite differences can be tricky to get right: some knowledge of the function is often required
• Higher order finite differences can be arbitrarily bad: AD will give exact gradients of any order
• Adjoint AD (see here for more details) can be orders of magnitude faster than bumping
Technical Reports and Posters