- Why AD?
- AD in more depth
- Adjoint AD Tool Support for Numerical Patterns in Finance
- Exact First- and Second-Order Greeks by Algorithmic Differentiation
- Tool-Based Approach to Algorithmic Differentiation of Adjoint Methods - Video clip
- Adjoint Algorithmic Differentiation of a GPU Accelerated Application
- Prize winning student paper: AD in optimizing a LIBOR Market Model
- Adjoint Methods in Computational Finance
- Training

Algorithmic Differentiation (AD) is a novel way to differentiate computer code, i.e. to compute $\frac{\partial F}{\partial x_1}, \frac{\partial F}{\partial x_2},\ldots$ where $F$ is given by a computer program, e.g.

if(x1<x2) then F = x1*x1 + x2*x2 + exp(x3*x3) + ... else F = x1 + x2 + x3 + ... endif

The derivatives are mathematically exact and computed to machine precision. Traditionally, derivative estimates would be obtained by finite differences (also called "bump and revalue"), which is slow and can be inaccurate. In contrast, AD (and adjoint AD in particular) gives precise derivative information at a fraction of the cost.

## Examples

Here are some examples of what adjoint AD has enabled in various domains.

**Engineering**

Figure 1: Sensitivity of drag coefficient to the surface geometry of a car at high speed (left) and low speed (right)

AD enables sensitivity analyses of huge simulations, enabling shape optimization, intelligent design and and comprehensive risk studies.

Figure 1 shows sensitivities of the drag coefficient to each point on a car's surface when it moves at high speed (left) and low speed (right). The simulation was performed with AD-enabled OpenFOAM built on top of the AD tool `dco`

(See below for information about dco). The surface mesh had 5.5 million cells and the gradient vector was 18GB. (1)

The normal simulation on a top-end desktop took 44s, while the AD-enabled simulation took 273s. To obtain the same gradient information, on the same machine, by finite differences would take roughly 5 years.

The gradient information can now be used to optimize the shape of the car so as to reduce the drag.

(1) Towara M and Naumann U (2013). A discrete adjoint model for OpenFOAM. *Procedia Comp. Sci.* Volume 18.

**Scientific Modelling**

Figure 2: MITgcm sensitivities of zonal ocean water flow through the Drake Passage to changes in bottom topography.

AD helps our understanding of climate change and improves weather predictions.

Figure 2 shows the sensitivity of the amount of water flowing through the Drake passage to changes in the topography of the ocean floor. The simulation was performed with the AD-enabled MIT Global Circulation Model (MITgcm) run on a supercomputer. The ocean was meshed with 64,800 grid points. (2)

Obtaining the gradient through finite differences took a month and a half. The adjoint AD code obtained the gradient in less than 10 minutes.

The gradient information can be used to further refine climate prediction models and our understanding of global weather, for example the high sensitivity over the South Pacific Ridge and Indonesian Throughflows even though these are far away from the Drake Passage.

(2) Utke J, Naumann U, Wunsch C, Hill C, Heimbach P, Fagan M, Tallent N and Strout M. (2008). OpenAD/F: A modular, open-source tool for automatic differentiation of Fortran codes. *ACM Trans. Math. Softw*, **34(4)** 18:1-18:36.

**Finance**

AD is used in finance to get sensitivities of complex instruments quickly, enabling real time risk management and hedging of quantities like xVA.

Here we show some results from a paper which studied two typical codes arising in finance: Monte Carlo and PDEs. (3)

Monte Carlo | |||||
---|---|---|---|---|---|

$n$ | $f$ | cfd |
AD |
$\frac{\tt AD}{f}$ | $\frac{\tt cfd}{\tt AD}$ |

34 | 0.5s | 29.0s | 3.0s (2.5MB) | 6.0x | 9.7x |

62 | 0.7s | 80.9s | 5.1s (3.2MB) | 7.3x | 15.9x |

142 | 1.5s | 423.5s | 12.4s (5.1MB) | 8.3x | 34.2x |

222 | 2.3s | 1010.7s | 24.4s (7.1MB) | 10.6x | 41.4x |

PDE | |||||

34 | 0.6s | 37.7s | 11.6s (535MB) | 19.3x | 3.3x |

62 | 1.0s | 119.5s | 18.7s (919MB) | 18.7x | 6.4x |

142 | 2.6s | 741.2s | 39s (2GB) | 15.0x | 19x |

222 | 4.1s | 1857.3s | 60s (3GB) | 14.6x | 31x |

Table 1: Run times and memory requirements as a function of gradient size $n$ for Monte Carlo and PDE applications.

Table 1 shows the runtimes of a first-order adjoint code using `dco`

vs. central finite differences on a typical finance application (option pricing under local volatility model, 10K sample paths/spatial points and 360 time steps). The second column $f$ is the normal runtime of the application, ${\tt cfd}$ is the runtime for central finite differences and ${\tt AD}$ is adjoint AD runtime along with additional memory required (tape size). Calculations were run on a laptop so only the relative runtimes $\frac{\tt AD}{f}$ and $\frac{\tt cfd}{\tt AD}$ are important, the latter showing the speedup of AD over finite differences.

In finance such derivative information is often used for hedging and risk calculations, so these gradients must be computed many times per day.

(3) du Toit J and Naumann U (2014). Adjoint algorithmic differentiation tool support for typical numerical patterns in computational finance. *NAG Technical Report TR3/14*.

## Algorithmic Differentiation vs. Finite Differences

AD computes mathematical derivatives of a computer program. Conceptually the process is straightforward:

- Computers can only add, subtract, multiply and divide floating point numbers
- Any computer program (operating on floating point numbers) simply consists of many of these fundamental operations strung together, along with calls to special functions (e.g. sin, cos, exp, etc.)
- It is simple to compute the derivatives of all these basic operations
- We can use the chain rule to combine these basic derivatives to get the derivative of our computer program

This process of differentiating a code can be applied recursively to generate 2nd, 3rd, 4th or higher derivatives of any computer program. Since the code is differentiated symbolically (albeit at a very low level), the derivatives are mathematically exact and are computed to machine accuracy.

Traditionally derivative estimates are computed through finite differences (also called “bumping”). Compared with these, AD offers three clear advantages:

- Finite differences can be tricky to get right: some knowledge of the function is often required
- Higher order finite differences can be arbitrarily bad: AD will give exact gradients of any order
- Adjoint AD (see here for more details) can be orders of magnitude faster than bumping

## What does NAG Provide?

Although conceptually simple, AD (and ajoint AD in particular) is demanding to apply to a non-trivial code. When AD is used in production, high-performance, robust and flexible tools are essential to get the code running quickly and to keep it maintainable. In addition it is clear that external library dependencies break AD, since AD operates at source code level. A production AD tool must have a way to handle these dependencies, ideally by being applied to the underlying (3rd party) source to create a full AD solution.

**AD Services**

NAG in collaboration with RWTH Aachen provides consulting services, training, software and support to develop AD solutions for clients. NAG has delivered AD training to several large financial institutions to give them a solid grounding in the mathematical, computer science and computational aspects of AD, common pitfalls and techniques for working around them, as well as various tricks and optimizations. NAG has also helped to embed AD tools into large Tier 1 quantitative finance libraries and has helped overcome particular memory constraints in large adjoint codes.

**AD Tools**

NAG provides a best-in-class C++/Fortran operator-overloading AD tool called `dco`

(derivative computation through overloading). This has been used on production codes numbering in the millions of lines and running on platforms ranging from single threaded desktops to large supercomputers.

- The
`dco`

tool is developed jointly by NAG and RWTH Aachen, integrates easily with your code base and is extremely flexible - The flexibility is crucial since it allows the memory use of adjoint codes to be constrained almost arbitrarily
- It also allows a
`dco`

adjoint calculation to use accelerators such as GPGPUs

**NAG Fortran Compiler for AD**

For simulation programs written in Fortran a version of the NAG Fortran compiler has been extended to serve as a pre-processor to `dco`

. The seamless integration of AD into a complex build system is facilitated. Hence, the amount of modifications to be made by the user in the original source code can be minimized or even be eliminated entirely.

**AD Versions of Computational Algorithms**

NAG provides AD versions of computational algorithms such as those found in the NAG Library. These AD routines are compatible with `dco`

to provide maximal productivity, but versions can be provided that are compatible with any AD tool including hand-written AD code.

**How to try AD**

The principles of AD in the context of computational finance and techniques for avoiding memory problems in adjoint codes is discussed in NAG Technical Report TR3/14 (3). Readers of the technical report can request the example programs it is based on. The example programs include a copy of `dco`

and a trial licence so that readers can run the examples on their own machines, and apply `dco`

to their own codes.