This post is part of the AD Myths Debunked series.
You're curious about AD and its benefits, but you've heard that using adjoint mode AD (AAD) uses a huge amount of memory and you will run out fast... so, you don't pursue the idea much further.
We've heard this objection many times over the years. When we ask people why they think memory consumption is so huge, we usually get a familiar narrative about the basic application of adjoint AD, and yes, these can consume a lot of memory.
But is that always the case? Is it that difficult to keep memory usage down when using AD? Or is that a myth?
At NAG, we have applied our tool, dco/c++, to many quant libraries and have shown many times that it is very efficient in memory usage. dco/c++ handles arbitrary C++ codes with ease: it imposes no restrictions on form or behaviour. The interface is flexible and provides all the bits and pieces that make it easy to integrate advanced AAD techniques to reduce memory usage arbitrarily. This means dco/c++ can easily handle Monte Carlo, PDE solvers, calibration codes, curve building, analytic and semi-analytic pricing of all flavours, root finders and fixed-point iterators, linear algebra and regression, while in each case allowing users to exploit structure to optimize memory use and runtime.
If you need to compute many sensitivities, AAD is the method of choice. It can be hugely faster than finite differences methods (a.k.a. bumping). The performance of AAD is measured by the ratio (adjoint runtime) / (original runtime). With dco/c++, we observe adjoint factors between 1.1 and 20. This factor is independent of the number of sensitivities you need - it could be thousands or millions!
This seems like magic if you think about your model the way you usually do, from the inputs to the outputs, but AAD does it the other way around, it runs from the outputs to the inputs. To be able to do that and get these immense speedups, the AD tool needs to be able to reverse the flow of information in your application. Basically, this relies on recording the computation, i.e. holding an image of your application and of all operations in memory, and stepping through that recording in reverse order. dco/c++ uses optimized data structures, i.e. a heavily compressed image by default. We've compared dco/c++ to other tools (see the chart below). The vertical axis shows memory use relative to dco/c++ while the horizontal axis is a series of small test codes. The most interesting data points are perhaps "comiso" (a code with a very large amount of floating-point computations), "libor" (Mike Giles' Libor Monte Carlo code), "Monte Carlo" (a mockup of a 1D local volatility model using Padé approximants rather than cubic splines) and "1D PDE" (a Crank-Nicholson scheme to solve the same local volatility model). The behaviour of AD tools depends quite strongly on the specifics of the code they are applied to.
Using the optimized data structures inside dco/c++, yet more advanced techniques can be used to constrain memory usage almost arbitrarily. These techniques include checkpointing, efficient treatment of mathematical algorithms (e.g. symbolic adjoints), exploiting concurrency (e.g. pathwise adjoints), pre-accumulation strategies and much more. We collected these common techniques in so-called adjoint code design patterns. Indeed, dco/c++ views adjoint AD as computation on a graph, and the interface allows users to "play" with this graph to exploit the structure. With the help of our experience and dco/c++, these optimizations have been applied to the libraries of many clients.
NAG's AD toolset has been developed over the last 12 years and it builds upon a further 10 years of AD R&D experience in C++. We know how to reduce the memory footprint of AAD to a feasible level.
Myths are narratives, which might sound like truths, and by talking through these in some detail and sharing our experiences, we hope to help businesses navigate these issues. Results matter, myths should not