Algorithmic Differentiation Masterclass Series: Advanced Adjoint Techniques
Thu 1st October 2020 - Thu 26th November 2020
Online Webinar Series

Following the success of the summer AD Masterclass series, NAG is delighted to present a follow-up Masterclass series on Advanced Adjoint Techniques. Building on the material of the first series, we look at checkpointing and symbolic adjoints, advanced AD for Machine Learning, two webinars on Monte Carlo, and finally a look at second order sensitivities. The webinar series is open to all. If you'd like access to the material from the first series, please contact to obtain login details.

1st October 2020 - Masterclass 1: Checkpointing and external functions: Manipulating the DAG.
Dr Viktor Mosenkis
We explore how to manipulate the DAG to move beyond a straightforward algorithmic adjoint. We look at how to make "gaps" in the tape by passivating parts of a calculation, and then how to fill those gaps in the tape interpretation with user-provided external (callback) functions. This is the most basic form of memory management and shows how to trade re-computation for storage. This process is called checkpointing, and we briefly present various checkpointing approaches.

15th October 2020 - Masterclass 2: Checkpointing and external functions: Injecting symbolic information.
Dr Viktor Mosenkis
Now that we can make gaps in the tape, we look at various ways one might fill those gaps. Sometimes we can use symbolic information to derive an efficient adjoint for a particular section of code. We look at linear algebra, root finding and unconstrained optimisation, and present synthetic examples of these. We then look at some implementation issues that need particular attention when the code has more than one output: external function callbacks will be called more than once, memory management requires some care.

29th October 2020 - Masterclass 3: Guest Lecture by Professor Uwe Naumann on Advanced AD topics in Machine Learning
Machine learning has been made possible thanks to backpropagation (adjoint AD). In a nutshell, this allows optimisers to find accurate decent directions, which greatly helps the solvers to find (locally) optimal weights for a given network. However this is by no means the only application of AD to machine learning. Network pruning uses AD to optimally remove nodes from a neural net, sometimes achieving quite dramatic compression without losing accuracy. Significance analysis uses AD to guide the sampling of training data, so that a model can be trained to the same accuracy but with less input data. We also consider a more general outlook where networks move beyond the simple forms that are used today, and start including more general information about the systems they attempt to model.

12th November 2020 - Masterclass 4: Monte Carlo.
Dr Viktor Mosenkis
We apply the machinery we've developed to look at Monte Carlo (we will not cover "American" Monte Carlo, but we could - please get in touch). By exploiting independence of the sample paths, we show how to constrain memory use and get better cache efficiency. We show how to re-use parallelism in the primal code and talk about handling the inherent race conditions. We also look at vectorisation, and close with a short review of smoothing techniques. Note this is a review only, with pointers to the literature. Effective smoothing is problem-dependent.

19th November 2020 - Masterclass 5: Guest Lecture by Professor Uwe Naumann on Adjoint Code Design Patterns applied to Monte Carlo
Producing an efficient adjoint of a non-trivial simulation program is a significant amount of work. Good tools help tremendously, but substantial user intervention is still often required. Many simulation codes share certain features. The goal of Adjoint Code Design Patterns is to exploit these features/patterns to reduce the effort required to create efficient adjoints. This talk focuses on Monte Carlo adjoints, but we will also look at patterns arising from the implicit function theorem as well as checkpointing.

25th November 2020 - Masterclass 6: Computing Hessians.
Dr Viktor Mosenkis
We present the second order adjoint and tangent models and examine the 4 possible ways in which one can compute Hessians. We examine the "forward over reverse" model in some detail since it has many implementation benefits: it allows us to re-use a first order adjoint code. This has implications for external functions. We look at computing parts of a Hessian, both accurately and approximately, before looking at sparsity.