Use of Algorithmic Differentiation (AD) leads to more efficient sampling for Hamiltonian Monte Carlo (HMC)

NAG’s AD tool, dco/c++ (jointly developed with RWTH Aachen University) differentiates C++ code. It can differentiate any C++ code where the implemented function has well-defined derivatives. The combination of dco/c++ with open-source Stan software for HMC creates new opportunities for doing Bayesian uncertainty quantification using existing C++ code bases. The developer effort required to get started is minimal, and advanced functionality of dco/c++ can be used to tune performance. MCMC-based uncertainty quantification has long been seen as too computationally expensive for many real-world problems, but this is changing. Our AD tool dco/c++ has an important part to play in this.

Results

The specific finding in our paper was that the combination of AD and HMC lead to effective samples sizes that were bigger by a factor of 3-4 than alternative methods. The raw sample size in MCMC is simply the number of iterations that the sampler runs for (1,000 in the example below) once the burn-in phase is finished. The effective sample size adjusts the raw sample size to account for the auto-correlation between samples from the same MCMC chain. For example, an MCMC sampler that has a higher CPU time may generate samples that have less auto-correlation, and therefore more accurate estimates of the posterior distribution. In particular we found that the No U-Turn Sampler (an HMC sampler) generated samples with lower auto-correlation than the other sampler we tested. And also that increasing the accuracy of the derivatives in HMC resulted in lower auto-correlation, i.e., higher effective sample sizes. See table. For the reasons given above, the effective sample size is a more important metric than CPU time when measuring the efficiency of MCMC samplers. 

Table

The example used to generate these results had a relatively small number of inputs (five). The CPU time of adjoint AD scales independently with the number of inputs. This means that the CPU time could be faster by an order of magnitude for problems with high-dimensional input when dco/c++ is used instead of finite differences. Only basic features of dco/c++ have been used in this benchmark. There is scope for reducing CPU time by using more advanced dco/c++ features, e.g., symbolic adjoints for eigenvalue computation.

If you have a C++ code base and need to do Bayesian uncertainty quantification take a look at our software. The approach could be extended from MCMC to variational inference, and from C++ code bases to Fortran code bases. Feedback welcome from open-source community and from potential industrial users.

Leave a Comment

This form has an automated anti-spam system running (Recaptcha). If it suspects you are not a valid visitor a backup challenge will appear here.