Access the most advanced AD Software in the world

Algorithmic Differentiation (AD) and Adjoint AD (AAD) are extremely powerful technologies. Applying them by hand to production sized codes is a serious, lengthy undertaking and requires a team of specialists. Code maintenance and updating also becomes more expensive and complex. For this reason, most people have turned to AD tools to get sensitivities of their simulation codes.

NAG AD Software Portfolio: dco/c++, dco/map & NAG AD Library

NAG's AD Software is based on over 15 person years of research. Organizations that are utilizing the Tools are reaping the extensive benefits that can be gained from implementing AD methods in their computation. NAG AD Software is battle proven, at scale, in business critical applications.

dco/c++

(A)AD software tool for computing sensitivities of C++ codes 

  • It embodies over 15 person years of R&D, a lot of which has required original research
  • It's an operator-overloading tool with a slick API: the tool is easy to learn, easy to use, can be applied quickly to a code base and integrates easily with build and testing frameworks
  • Arbitrary order derivatives of any code can be computed, accurate to machine precision: the tool can answer all your sensitivity-related questions
  • Tier 1 and 2 banks have applied it to their core pricing and risk libraries and use it in production: the tool has been battle proven, at scale, in business critical applications
  • Both customer and in-house testing show that dco/c++ offers best-in-class performance, thanks to an advanced template engine and highly optimized internal data structures
  • Baseline memory use is low and the intuitive checkpointing interface allows memory use to be further controlled and constrained almost arbitrarily: the success of adjoint AD lies in balancing memory use and computation, and dco/c++ gives the user full control in a very natural way
  • The checkpointing interface allows handwritten adjoints to be specified for any part of the code, allows interfacing with GPUs, and much more: as users learn more about AD, dco/c++ allows them to implement all the tricks that people have developed to handle particular code patterns
  • It supports parallel adjoints: on modern architectures exploiting parallelism effectively is crucial, and dco/c++ allows the parallelism to be carried over into the adjoint code as well

dco/c++ Key Features

  • Code generation: A hybrid technology that combines the efficiency of source transformation with the flexibility and ease of use of an operator overloading tool
  • User-defined tape callbacks (external adjoints)
  • Slick, productivity-orientated interface
  • Very fast computation with expression templates and highly optimized tape
  • Full control over memory use
  • Supports parallelism and GPUs (in combination with dco/map)
  • Vector tangent and adjoint modes
  • Activity analysis
  • Sparsity pattern detection
  • Tape compression
  • Direct tape manipulation
  • Adjoint MPI support  

These features allow advanced DAG manipulation and allow users to create highly efficient adjoint implementations.

dco/map

C++11 tape free operator overloading AD tool designed to handle accelerators (GPUs etc) 

  • dco/map combines the benefits of operator overloading (easy application, one single source code) with the benefits of handwritten adjoints (very fast and minimal memory use)
  • dco/map can handle the race conditions inherent in parallel adjoint codes easily and efficiently
  • First and second order tangent and adjoint
  • Produces single unified code for primal, tangent and adjoint
  • Thread safe by design: high performance array and scalar types for shared input data
  • Primal as fast as non dco/map primal
  • Specialised high-performance array types to handle race conditions inherent in parallel adjoints
  • Supports whole of C++11, cross platform 
  • API for storing things you don't want to recompute
  • Easy integration with NAG’s dco/c++ via external adjoint interface
NAG AD Library

World-class adjoint numerical and statistical solvers

  • The NAG AD Library computes seamlessly with NAG’s AD Tool, dco/c++, and can be used with any other AD solution
  • Delivers exact derivatives instead of approximations when finite differences are used
  • NAG Library users who apply AD can now use high-quality adjoint routines from NAG:
    • No need to write adjoint versions of these routines or resort to inferior replacement
  • No specific product dependency: the NAG AD Library can be used with any AD tool
  • Easy switch between symbolic and algorithmic adjoints
    • Same interface for symbolic and algorithmic adjoints making use quick and easy
  • NAG provides a single AD solution when utilizing both dco/c++ and the NAG AD Library
  • Use of internal representation (IR) to reverse the routine call tree
  • Differentiate output variables with respect to a specified set of input variables only
  • Adjoints need only be interpreted once per output variable which reduces computation time when (usually) the number of outputs is small relative to number of inputs
  • Adjoint routines can be used as intrinsics in dco/c++ (just like using sin(x) in C++)
  • Algorithmic adjoints of routines with user-supplied functions can be used without additional development time
  • Smooth transition from non dco/c++ solution to solution with dco/c++
  • No need to copy variables when used with dco/c++ (binary compatible data types)
  • The NAG AD Library is fully documented, maintained and supported by computational experts: receive first-line technical support as and when needed
Custom AD Software Solutions

dco/c++ is a very efficient, high-productivity AD tool.  However, it is sometimes desirable to produce the AD implementation in a different way, often to handle computationally expensive sections of code.  Typically, this is through hand-writing an adjoint implementation, or using a high-performance tool such as dco/map to make the AD implementation, or even porting the AD code to a GPU.

NAG AD Solution Services can assist in all these cases, whether it be writing adjoints for particular pieces of code, or porting to GPU and making GPU adjoint implementations.

AD Software Benefits
Api background v1
Easy to learn and easy to use

dco/c++ is an operator-overloading tool with a slick API: the tool is easy to learn, easy to use, can be applied quickly to a code base and integrates easily with build and testing frameworks

Monitor background 2
Battle proven, at scale

Tier 1 and 2 banks have applied it to their core pricing and risk libraries and use it in production: the tool has been battle proven, at scale, in business critical applications

Docs background 1
Sensitivity-related questions answered

Arbitrary order derivatives of any code can be computed, accurate to machine precision: the tool can answer all your sensitivity-related questions

Puzzle background 1
Interfacing with GPUs, and more

The checkpointing interface allows handwritten adjoints to be specified for any part of the code, allows interfacing with GPUs, and much more: as users progress, dco/c++ allows users to implement tricks to handle particular code patterns.

DekaBank improves modelling accuracy and risk management with NAG® dco/c++

DekaBank wanted better risk management, more accurate pricing and to support the bank’s expanding derivatives business, all without increasing computing costs. That’s when they turned to automatic differentiation (AD), and in particular adjoint automatic differentiation (AAD). After comparing three tools, DekaBank chose NAG’s AD solution, NAG® dco/c++.

Software Details
dco/c++
dco/map
NAG AD Library

What's new in dco/c++ 4.1

  • New data type 'dco::multi_mode': This release brings a new data type that can combine arbitrary dco/c++ functionality in one go. It is intended to reduce the number of required instantiations of the underlying code. This reduces the complexity of the build system and leads to an expected reduction in compilation times. With this type, you can, e.g., record a tape and, at the same time, propagate vector tangents.
  • API update for Jacobian preaccumulator and the local gradient functionality (factory methods).
  • More precise information on memory consumption enabled by new enum (`dco::size_of_e`).
  • Enhanced debugging capabilities: Beside writing the tape in csv- or dot-format, we now enable to print the tape to a stream in a human-readable format.

What's new in dco/c++ 4.0

  • Code generation: A hybrid technology that combines the efficiency of source transformation with the flexibility and ease of use of an operator overloading tool, with support for primal, tangent and adjoint code. This supports all combinations of scalar and vector modes for computing first and higher derivatives.
  • The use of source transformation gives two advantages: in the transformation step and in the compilation step. In the transformation step, they can implement optimisations based on properties of the underlying differentiation rules. In addition, in the compilation step, the built-in optimisation passes of steadily advancing compilers are simply inherited. Showstoppers usually: Applicability and maintainability. 
  • dco/c++ 4.0 uses overloading techniques to generate a representation of the program in memory, and unparse the various modes (primal, tangent, adjoint) into a C++ file. The dynamic nature of this approach (building the representation at run time) introduces an important constraint on the code to be differentiated - the control flow is not allowed to depend on input data. We overcome this constraint for branches with smart use of modern C++ features such as lambda expressions, in combination with classical elements from the preprocessor.
  • Support for std::ldexp and std::frexp.
  • Aligned memory allocation by default. In some circumstances, this leads to more efficient memory access for vectorized operations.
  • Requires C++17 now.

What's new in dco/c++ 3.8

  • Modern tape interface based on a smart pointer implementation.
    This release comes with a new data type: 'smart_tape_ptr_t'. It consolidates the interface for global (dco::ga1s) and multiple tapes (dco::ga1sm). Additionally, this change makes the interface exception safe and avoids the explicit use of the 'remove' function. Support for the previous API is fully maintained.
  • Performance enhancement for the chunk tape.
    In addition, the performance of the chunk tape has been increased by 15-50%. The exact speed-up depends on the underlying code used in the benchmarks. The performance gap between the blob and chunk tape implementation is thereby reduced and for some of the benchmarks, removed completely.

What's new in dco/c++ 3.7

  • New data type dco::gtas:
    This release brings a new data type for debugging and unit testing. It makes it possible to compute tangents and adjoints at the same time. This can be used to compute the tangent-adjoint-identity easily.

What's new in dco/c++ 3.5 and 3.6

  • User-defined tape callbacks with C++11 lambdas:
    The user can make use of C++11 lambdas and other callables when passing a user-defined tape callback to dco/c++. This greatly enhances easy-of-use for check-pointing and symbolic adjoints implementations.
  • Compatibility with the NAG AD Library:
    Compatibility of upcoming NAG AD Library versions and dco/c++ versions is now given. This means the user can update the NAG Library and dco/c++ asynchronously. (This is possible for NAG Library versions >= MK27.3 and dco/c++ >= v3.5.)
  • Enhanced performance:
    Vector modes and the adjoint vector got an update, which increased performance under specific circumstances by 10-20%. Depending on the compiler, the auto-vectorizer didn't do a good job. This update makes it easier for the compiler to spot vectorizable loops.

What's new in dco/c++ 3.4

  • Vector Type (support for vectorization):
    The dco/c++ data type gv<DCO_BASE_TYPE, VECTOR_SIZE>::type implements a vector data type primarily useful for SSE/AVX vectorization. This type can then be used as base type for primal, tangent and adjoint dco/c++ types.
  • Binary Compatible Passive Type:
    This type can be used to turn parts of an adjoint computation passive, i.e. no tape activity is performed for objects of this type. Declaring a gbcp type of an adjoint type results in equal type sizes (=binary compatibility), while the gbcp type only provides access to the value object of the active type. A gbcp type can safely be cast to its value type when performing passive computations. Chaining gbcp types can be used to access any lower order of a higher order active type.
  • Thread-local (global tape):
    dco::ga1s<T>, dco::ga1v<T>, etc. are by default thread safe due to use of a thread-local global tape
  • Complex data type:
    dco::complex_t used as specialization of std::complex; req. for Windows and old gcc versions
  • Faster compilation:
    Inlining is important to achieve best run time performance but it can increase compilation time. The user can now switch off the aggressive inlining in dco/c++ for faster compilation.

What's new in dco/c++ 3.3

  • Modulo adjoint propagation (less memory use)
    The vector of adjoints is compressed by analysing the maximum number of required distinct adjoint memory locations. During interpretation, adjoint memory, which is no longer required, is overwritten and thus reused by indexing through modulo operations. This feature is especially useful for iterative algorithms (e.g. time iterations). The required memory for the vector of adjoints usually stays constant, independent of the number of iterations. Combined with use of disk tape, almost arbitrarily sized tapes can be generated, which might be especially of interest for prototyping or validation purposes.
  • Sparse tape interpretation (debug capability to avoid NaNs)
    The adjoint interpretation of the tape can omit propagation along edges when the corresponding adjoint to be propagated is zero. This might be of use when NaNs or Infs occur as local partial derivatives (e.g. when computing square root of zero), but this local result is only used for subsequent computations which are not relevant for the overall output. This feature might have a performance impact on tape interpretation and should therefore be considered for debugging configuration only.

What's new in dco/c++ v3.2

  • Re-engineered internals mean dco/c++ is now roughly 30% faster and uses roughly 30% less memory (based on internal testing)
  • Vector reverse mode: for simulations with more than one output, several columns of the Jacobian or Hessian can now be computed at once using vector data types
  • Parallel reverse mode: for simulations with more than one output, the columns of the Jacobian or Hessian can now easily be computed in parallel.  This can be combined with vector reverse mode.
  • Jacobian pre-accumulation: sections of the computation can be collapsed into a pre-computed Jacobian, further reducing memory use
  • Disk tape: allows the tape to be recorded straight to disk.  Although slower, this allows very large computations to complete without having to use checkpointing to reduce memory use
  • Tape activity logging and improved error handling
Overview

dco/map is used to create adjoints of performance-critical sections of code, be they C++/OpenMP or CUDA. It has found application notably in accelerated XVA platforms where it helps deliver first and second order sensitivities. An overview is available here: High Performance Tape-Free Adjoint AD for C++11

  • dco/map combines the benefits of operator overloading (easy application, one single source code) with the benefits of handwritten adjoints (very fast and minimal memory use)
  • dco/map can handle the race conditions inherent in parallel adjoint codes easily and efficiently

What's new in v1.6 dco/map

  • New bitwise-copyable reduction push array.  For many workloads, reduction push is still the fastest array type, and the new class makes it easy to access this performance in existing C++ codes
  • Array management functions now allocate bitwise-copyable adjoint arrays directly, making it significantly easier to integrate the array classes into existing codes and class hierarchies.  It is now easier to apply dco/map to an existing C++ code base
  • Performance improvements to atomic push arrays – these are now approximately 50% faster in double precision
  • Improved dco/map external adjoint object for easier interoperability with dco/c++
  • Enhanced MAP_PRINT functionality to give more information to the user and make it easier to process the data
  • Overhauled the training material: users should now find it easier to get up to speed with dco/map
Overview

The NAG AD Library provides expertly developed, tested, documented, and supported numerical and statistical routines that make the Algorithmic Differentiation process quicker, more efficient and productive, and eliminates the need to write your own code or rely on unsupported code. The Library has been designed so that it can be used with or without any other Algorithmic Differentiation (AD) tool; however, the conversion from code containing calls to primal routines to an adjoint version becomes most seamless when combined with NAG’s AD tool, dco/c++.

Using the NAG AD Library

The numerical and statistical routines in the NAG AD Library can be called from C, C++ and Fortran. Example programs are available for each adjoint routine in both C++ and Fortran. The Library has been designed so that it can be used with or without any other Algorithmic Differentiation (AD) tool; however, the conversion from code containing calls to primal routines to an adjoint version becomes most seamless when combined with the AD tool dco/c++.

NAG AD Library Interfaces

The interface of a NAG AD Library routine follows closely the interface of the primal NAG Library routine on which it is based. The main differences are: real-valued variables change type to a special defined data type; an extra C pointer argument is added as the first argument; functions (with a non-void return type) are replaced by a void function / subroutine containing an extra argument to provide the return value. The same changes also apply to function / subroutine arguments and user workspace arguments are always provided for those that perform computation. Adjoints with respect to active parameters provided in user workspace can also be provided.

Technical Report: Why do we need Adjoint routines?

Documentation

The latest NAG AD Library Manual, is available online.

The Library is organized into Chapters – each being documented with its own Introduction and Contents list followed by a comprehensive document for each function detailing its purpose, description, list of parameters and possible error exits. Example programs and results are also supplied. All examples are available online to facilitate their use as templates for the users' calling programs.

The NAG Library Manual - prior releases

Previous releases of the NAG Library Manual are available from here

Installer's Notes and Users' Notes

Support documentation for the installation and use of each implementation of the NAG Library is available.