$ \newcommand{\lstinline}[1]{\mathtt{#1}} \newcommand{\R}{\mathbb{R}} \newcommand{\X}{{\bf x}} \newcommand{\Y}{{\bf y}} \newcommand{\dX}{{\X^{(1)}}} \newcommand{\dXX}{{{\bf X}^{(1)}}} \newcommand{\ddX}{{\X^{(2)}}} \newcommand{\dY}{{\Y^{(1)}}} \newcommand{\dYY}{{{\bf Y}^{(1)}}} \newcommand{\ddY}{{\Y^{(2)}}} \newcommand{\dF}{{F^{(1)}}} \newcommand{\bX}{{\X_{(1)}}} \newcommand{\bY}{{\Y_{(1)}}} \newcommand{\bF}{{F_{(1)}}} \newcommand{\dbX}{{\X_{(1)}^{(2)}}} \newcommand{\dbY}{{\Y_{(1)}^{(2)}}} $

dco/c++: Derivative Code by Overloading in C++

Summary of Features: v3.4

We consider implementations of multivariate vector functions $$ f : D^{n} \times D^{n'} \rightarrow D^m \times D^{m'} : (\Y, \Y') = f(\X,\X') $$ as computer programs over some base data type $D$ (for example, single or higher precision floating-point data, intervals, convex/concave relaxations, vectors/ensembles).

We assume that the arithmetic inside $f$ is completely defined (through overloading of the elemental functions; see below) .
The $n$ active (also: independent) and $n'$ passive inputs are mapped onto $m$ active (also: dependent) and $m'$ passive outputs. The given implementation is assumed to be $k$ times continuously differentiable at all points of interest implying the existence and finiteness of the Jacobian $$ \nabla f = \nabla_\X f(\X,\X') \equiv \frac{\partial \Y}{\partial \X}(\X,\X') \in D^{m \times n}, $$ the Hessian $$ \nabla^2 f = \nabla^2 f(\X,\X') \equiv \frac{\partial^2 \Y}{\partial \X^2}(\X,\X') \in D^{m \times n \times n}, $$ if $k\geq2,$ and of potentially higher derivative tensors $$ \nabla^k f = \nabla^k f(\X,\X') \equiv \frac{\partial^k \Y}{\partial \X^k}(\X,\X') \in D^{m \times n \times \underset{k \text{times}}{\ldots} \times n}. $$ We denote $$\nabla^k f=\left(\left [\nabla^k f \right ]_i^{j_1,\ldots,j_k} \right)_{i=0,\ldots,m-1}^{j_1,\ldots,j_k=0,\ldots,n-1},$$ and we use $*$ to denote the entire range of an index. For example, $\left [\nabla f \right ]_i^*$ denotes the $i$th row and $\left [\nabla f \right ]_*^j$ the $j$th column of the Jacobian, respectively. Algorithmic differentiation is implemented by the overloading of elemental functions including the built-in functions and operators of C++ as well as user-defined higher-level elemental functions.

1 First-order Tangent Mode

1.1 Generic first-order scalar tangents

Generic first-order scalar tangent mode enables the computation of products of the Jacobian with vectors $\X^{(1)} \in D^n$ $$ \Y^{(1)}=\nabla f \cdot \X^{(1)} \in D^m $$ through provision of a generic first-order scalar tangent data type over arbitrary base data types $D$.

1.2 Generic first-order vector tangents

Generic first-order vector tangent mode enables the computation of products of the Jacobian with matrices $X^{(1)} \in D^{n \times l}$ $$ Y^{(1)}=\nabla f \cdot X^{(1)} \in D^{m \times l} $$ through provision of a generic first-order vector tangent data type over arbitrary base data types $D.$

1.3 Preaccumulation through use of expression templates

Expression templates enable the generation of statically optimized gradient code at the level of individual assignments which can be beneficial for vector tangent mode.

2 First-order Adjoint Mode

2.1 Generic first-order scalar adjoints

Generic first-order scalar adjoint mode enables the computation of products of the transposed Jacobian with vectors $\Y_{(1)} \in D^m$ $$ \X_{(1)}=\nabla f^T \cdot \Y_{(1)} \in D^n $$ through provision of a generic first-order scalar adjoint data type over arbitrary base data types $D.$

2.2 Generic first-order vector adjoints

Generic first-order vector adjoint mode enables the computation of products of the transposed Jacobian with matrices $Y_{(1)} \in D^{m \times l}$ $$ X_{(1)}=\nabla f^T \cdot Y_{(1)} \in D^{n \times l} $$ through provision of a generic first-order scalar adjoint data type over arbitrary base data types $D.$

2.3 Global “blob” tape

Memory of the specified size is allocated and used for storing the tape without bound checks.

2.4 Global “chunk” tape

The tape grows in chunks up to the physical memory bound.

2.5 Global “file” tape (v3.2)

Chunks are written to and read from disk.

2.6 Thread-local tape (v3.4)

The global tapes are now thread-local by default allowing an easier switch to multi-threading.

2.7 Distinct data types for values, partial derivatives and adjoints with all kinds of tapes (v3.2)

Data types for values, partial derivatives and adjoints can be set individually when defining the differentiation mode.

2.8 (Thread-)local tapes

Multiple “blob” or “chunk” tapes can be allocated, for example, to implement thread-safe adjoints.

2.9 Preaccumulation through use of expression templates

Expression templates enable the generation of statically optimized gradient code at the level of individual assignments. This preaccumulation can result in a decrease in tape memory requirement.

2.10 Repeated evaluation of tape

Tapes can be recorded at a given point and interpreted repeatedly.

2.11 Adjoint callback interface

The adjoint callback interface supports

2.12 User-defined local Jacobians interface

Externally preaccumulated partial derivatives can be inserted directly into the tape for use within subsequent interpretations.

2.13 User-driven preaccumulation of local Jacobians (v3.2)

An easy-to-use interface for robust preaccumulation of local Jacobians is provided. Aggressive reduction of the tape size is facilitated.

2.14 Multiple adjoint vectors for a single tape (v3.2)

Several (concurrent) interpretations of the same tape are enabled through separate (thread-local) adjoint vectors.

2.15 Modulo Adjoint Propagation (v3.3)

The vector of adjoints is compressed by analysing the maximum number of required distinct adjoint memory locations. During interpretation, adjoint memory, which is no longer required, is overwritten and thus reused by indexing through modulo operations. This feature is especially useful for iterative algorithms (e.g. time iterations). The required memory for the vector of adjoints usually stays constant, independent of the number of iterations. Combined with the use of disk tape, almost arbitrarily sized tapes can be generated, which might be especially of interest for prototyping or validation purposes.

2.16 Sparse Tape Interpretation (v3.3)

The adjoint interpretation of the tape can omit propagation along edges when the corresponding adjoint to be propagated is zero. This might be of use when NaNs or Infs occur as local partial derivatives (e.g. when computing square root of zero), but this local result is only used for subsequent computations which are not relevant for the overall output. This feature might have a performance impact on tape interpretation and should therefore be considered for debugging configuration only.

3 Second- and Higher-order Tangent Mode

3.1 Generic second-order scalar tangents

Instantiation of the generic first-order scalar tangent data type with the generic first-order scalar tangent data type over a non-derivative base data type yields the second-order scalar tangent data type. It enables the computation of scalar projections of the Hessian $\nabla^2 f \in D^{m \times n \times n}$ in its two domain dimensions of length $n$ as $$ \Y^{(1,2)}=\langle\nabla^2 f,\X^{(1)},\X^{(2)}\rangle \equiv \left(\X^{{(1)}^T} \cdot \left [\nabla^2 f \right ]_i^{*,*} \cdot \X^{(2)} \right)_{i=0,\ldots,m-1} \in D^m $$ for $\X^{(1)}, \X^{(2)} \in D^n$ over arbitrary base data types $D$. The computational cost of accumulating the whole Hessian over $D$ becomes $O(n^2) \cdot \text{Cost}(f).$ with both $\X^{(1)}$ and $\X^{(2)}$ ranging independently over the Cartesian basis vectors in $D^n.$

3.2 Generic second-order vector tangents

Instantiation of the generic first-order vector tangent data type with the generic first-order vector tangent data type over a non-derivative base data type yields the second-order scalar tangent data type. It enables the computation of vector projections of the Hessian $\nabla^2 f \in D^{m \times n \times n}$ in its two domain dimensions of length $n$ as $$ Y^{(1,2)}=\langle\nabla^2 f,X^{(1)},X^{(2)}\rangle \equiv \left(X^{{(1)}^T} \cdot \left [\nabla^2 f \right ]_i^{*,*} \cdot X^{(2)} \right)_{i=0,\ldots,m-1} \in D^{m \times l_1 \times l_2} $$ for $X^{(1)} \in D^{n \times l_1},$ $X^{(2)} \in D^{n \times l_2}$ over arbitrary base data types $D$. The computational cost of accumulating the whole Hessian over $D$ remains equal to $O(n^2) \cdot \text{Cost}(f)$ with both $X^{(1)}$ and $X^{(2)}$ set equal to the identities in $D^n.$

3.3 Other generic second-order tangents

Instantiation of the generic first-order vector tangent data type with the generic first-order scalar tangent data type over a non-derivative base data type yields a second-order tangent data type. Similarly, instantiation of the generic first-order scalar tangent data type with the generic first-order vector tangent data type over a non-derivative base data type yields a second-order tangent data type.

3.4 Generic third- and higher-order tangents

Instantiation of tangent types with $k$th-order tangent types yields $(k+1)$th-order tangent types.

4 Second- and Higher-order Adjoint Mode

4.1 Generic second-order scalar adjoints

Instantiation of the generic first-order scalar adjoint data type with the generic first-order scalar tangent data type over a non-derivative base data type yields a second-order scalar adjoint data type. It enables the computation of scalar projections of the Hessian $\nabla^2 f \in D^{m \times n \times n}$ in its image and one of its domain dimensions as $$ \X_{(1)}^{(2)}=\langle\Y_{(1)},\nabla^2 f,\X^{(2)}\rangle \equiv \left(\Y_{(1)}^T \cdot \left [\nabla^2 f \right ]_*^{j,*} \cdot \X^{(2)} \right)_{j=0,\ldots,n-1} \in D^n $$ for $\Y_{(1)} \in D^m$ and $\X^{(2)} \in D^n$ over arbitrary base data types $D$. The computational cost of accumulating the whole Hessian over $D$ becomes $O(m\cdot n) \cdot \text{Cost}(f).$ with $\Y_{(1)}$ and $\X^{(2)}$ ranging over the Cartesian basis vectors in $D^m$ and $D^n$, respectively.

4.2 Generic second-order vector adjoints

Instantiation of the generic first-order vector adjoint data type with the generic first-order vector tangent data type over a non-derivative base data type yields a second-order vector adjoint data type. It enables the computation of vector projections of the Hessian $\nabla^2 f \in D^{m \times n \times n}$ in its image and one of its domain dimensions as $$ X_{(1)}^{(2)}=\langle Y_{(1)},\nabla^2 f,X^{(2)}\rangle \equiv \left(Y_{(1)}^T \cdot \left [\nabla^2 f \right ]_*^{j,*} \cdot X^{(2)} \right)_{j=0,\ldots,n-1} \in D^{l_1 \times n \times l_2} $$ for $Y_{(1)} \in D^{m \times l_1}$ and $X^{(2)} \in D^{n \times l_2}$ over arbitrary base data types $D$. The computational cost of accumulating the whole Hessian over $D$ remains equal to $O(m\cdot n) \cdot \text{Cost}(f).$ with $Y_{(1)}$ and $X^{(2)}$ set equal to the identities in $D^m$ and $D^n$, respectively.

4.3 Other generic second-order adjoints

Instantiation of the generic first-order vector adjoint data type with the generic first-order scalar tangent data type over a non-derivative base data type yields a second-order adjoint data type. Similarly, instantiation of the generic first-order scalar adjoint data type with the generic first-order vector tangent data type over a non-derivative base data type yields a second-order adjoint data type.

Symmetry of the Hessian in its two domain dimensions yields the following additional second-order adjoint data types: Instantiation of a generic first-order tangent data type with a generic first-order adjoint data type over a non-derivative base data type yields a second-order adjoint data type for computing $$ \langle\Y_{(2)}^{(1)},\nabla^2 f,\X^{(1)}\rangle \equiv \left(\Y_{(2)}^{(1)^T} \cdot \left [\nabla^2 f \right ]_*^{j,*} \cdot \X^{(1)} \right)_{j=0,\ldots,n-1} \in D^n. $$ Similarly, instantiation of a generic first-order adjoint data type with a generic first-order adjoint data type over a non-derivative base data type yields a second-order adjoint data type for computing $$ \langle\X_{(1,2)},\Y_{(1)},\nabla^2 f\rangle \equiv \left(\Y_{(1)}^T \cdot \left [\nabla^2 f \right ]_*^{j,*} \cdot \X_{(1,2)} \right)_{j=0,\ldots,n-1} \in D^n. $$

4.4 Generic third- and higher-order adjoints

Instantiation of tangent types with $k$th-order adjoint types yields $(k+1)$th-order adjoint types. Similarly, instantiation of adjoint types with $k$th-order tangent or adjoint types yields $(k+1)$th-order adjoint types.

5 Additional Functionality

5.1 Vectorized (AVX) tangent and adjoint modes (v3.4)

For making use of vectorization, a respective vector data type can be used as base type for tangent and adjoint modes.

5.2 Debugging capabilities (v3.4)

For enabling internal debugging checks, a compile time flag was added.