Algorithmic Differentiation Tools & Adjoint Solvers
Algorithmic Differentiation (AD) is a novel way to differentiate computer code, i.e. to compute ∂F/∂s1, ∂F/∂x2,... where F is given by a computer program, e.g.
if(x1<x2) then F = x1*x1 + x2*x2 + exp(x3*x3) + ... else F = x1 + x2 + x3 + ... endif
The derivatives are mathematically exact and computed to machine precision. Traditionally, derivative estimates would be obtained by finite differences (also called "bump and revalue"), which is slow and can be inaccurate. In contrast, AD (and adjoint AD in particular - AAD) gives precise derivative information at a fraction of the cost. NAG and expert partners provide a variety of services and software for Algorithmic Differentiation including consulting, training courses, software tools and the embedding of these in client applications plus a compiler and algorithms for AD.
What does NAG Provide?
Although conceptually simple, AD (and ajoint/ AAD in particular) is demanding to apply to a non-trivial code. When AD is used in production, high-performance, robust and flexible tools are essential to get the code running quickly and to keep it maintainable. In addition it is clear that external library dependencies break AD, since AD operates at source code level. A production AD tool must have a way to handle these dependencies, ideally by being applied to the underlying (3rd party) source to create a full AD solution.
NAG provides a best-in-class C++/Fortran operator-overloading AD tool called dco (derivative computation through overloading). This has been used on production codes numbering in the millions of lines and running on platforms ranging from single threaded desktops to large supercomputers.
- The dco tool is developed jointly by NAG and RWTH Aachen, integrates easily with your code base and is extremely flexible
- The flexibility is crucial since it allows the memory use of adjoint codes to be constrained almost arbitrarily
- It also allows a dco adjoint calculation to use accelerators such as GPGPUs
NAG provides AD versions of mathematical and statistical algorithms such as those found in the NAG Library. NAG delivers AD solvers that may be used with NAG's AD tool as well as Adjoint Solvers that will work with third party AD solvers and handwritten adjoint codes.
NAG in partnership with RWTH Aachen University provides consulting services, training, software and support to develop AD solutions for clients. NAG has delivered AD training to several large financial institutions to give them a solid grounding in the mathematical, computer science and computational aspects of AD, common pitfalls and techniques for working around them, as well as various tricks and optimizations. NAG has also helped to embed AD tools into large Tier 1 quantitative finance libraries and has helped overcome particular memory constraints in large adjoint codes.
NAG Fortran Compiler for AD
For simulation programs written in Fortran a version of the NAG Fortran compiler has been extended to serve as a pre-processor to dco. The seamless integration of AD into a complex build system is facilitated. Hence, the amount of modifications to be made by the user in the original source code can be minimized or even be eliminated entirely.
How to try AD
The principles of AD in the context of computational finance and techniques for avoiding memory problems in adjoint codes is discussed in NAG Technical Report TR3/14 (3). Readers of the technical report can request the example programs it is based on. The example programs include a copy of dco and a trial licence so that readers can run the examples on their own machines, and apply dco to their own codes.
Here are some examples of what adjoint AD has enabled in various domains.
Figure 1: Sensitivity of drag coefficient to the surface geometry of a car at high speed (left) and low speed (right)
AD enables sensitivity analyses of huge simulations, enabling shape optimization, intelligent design and and comprehensive risk studies.
Figure 1 shows sensitivities of the drag coefficient to each point on a car's surface when it moves at high speed (left) and low speed (right). The simulation was performed with AD-enabled OpenFOAM built on top of the AD tool dco (See below for information about dco). The surface mesh had 5.5 million cells and the gradient vector was 18GB. (1)
The normal simulation on a top-end desktop took 44s, while the AD-enabled simulation took 273s. To obtain the same gradient information, on the same machine, by finite differences would take roughly 5 years.
The gradient information can now be used to optimize the shape of the car so as to reduce the drag.
(1) Towara M and Naumann U (2013). A discrete adjoint model for OpenFOAM. Procedia Comp. Sci. Volume 18.
Figure 2: MITgcm sensitivities of zonal ocean water flow through the Drake Passage to changes in bottom topography.
AD helps our understanding of climate change and improves weather predictions.
Figure 2 shows the sensitivity of the amount of water flowing through the Drake passage to changes in the topography of the ocean floor. The simulation was performed with the AD-enabled MIT Global Circulation Model (MITgcm) run on a supercomputer. The ocean was meshed with 64,800 grid points. (2)
Obtaining the gradient through finite differences took a month and a half. The adjoint AD code obtained the gradient in less than 10 minutes.
The gradient information can be used to further refine climate prediction models and our understanding of global weather, for example the high sensitivity over the South Pacific Ridge and Indonesian Throughflows even though these are far away from the Drake Passage.
(2) Utke J, Naumann U, Wunsch C, Hill C, Heimbach P, Fagan M, Tallent N and Strout M. (2008). OpenAD/F: A modular, open-source tool for automatic differentiation of Fortran codes. ACM Trans. Math. Softw, 34(4) 18:1-18:36.
AD and AAD is used in finance to get sensitivities of complex instruments quickly, enabling real time risk management and hedging of quantities like xVA.
Here we show some results from a paper which studied two typical codes arising in finance: Monte Carlo and PDEs. (3)
Table 1: Run times and memory requirements as a function of gradient size n for Monte Carlo and PDE applications.
Table 1 shows the runtimes of a first-order adjoint code using dco vs. central finite differences on a typical finance application (option pricing under local volatility model, 10K sample paths/spatial points and 360 time steps). The second column f is the normal runtime of the application, cfd is the runtime for central finite differences and AD is adjoint AD runtime along with additional memory required (tape size). Calculations were run on a laptop so only the relative runtimes AD/f and cfd/AD are important, the latter showing the speedup of AD over finite differences.
In finance such derivative information is often used for hedging and risk calculations, so these gradients must be computed many times per day.
AD computes mathematical derivatives of a computer program. Conceptually the process is straightforward:
- Computers can only add, subtract, multiply and divide floating point numbers
- Any computer program (operating on floating point numbers) simply consists of many of these fundamental operations strung together, along with calls to special functions (e.g. sin, cos, exp, etc.)
- It is simple to compute the derivatives of all these basic operations
- We can use the chain rule to combine these basic derivatives to get the derivative of our computer program
This process of differentiating a code can be applied recursively to generate 2nd, 3rd, 4th or higher derivatives of any computer program. Since the code is differentiated symbolically (albeit at a very low level), the derivatives are mathematically exact and are computed to machine accuracy.
Traditionally derivative estimates are computed through finite differences (also called “bumping”). Compared with these, AD offers three clear advantages:
- Finite differences can be tricky to get right: some knowledge of the function is often required
- Higher order finite differences can be arbitrarily bad: AD will give exact gradients of any order
- Adjoint AD (see here for more details) can be orders of magnitude faster than bumping