$
\newcommand{\lstinline}[1]{\mathtt{#1}}
\newcommand{\R}{\mathbb{R}}
\newcommand{\X}{{\bf x}}
\newcommand{\Y}{{\bf y}}
\newcommand{\dX}{{\X^{(1)}}}
\newcommand{\dXX}{{{\bf X}^{(1)}}}
\newcommand{\ddX}{{\X^{(2)}}}
\newcommand{\dY}{{\Y^{(1)}}}
\newcommand{\dYY}{{{\bf Y}^{(1)}}}
\newcommand{\ddY}{{\Y^{(2)}}}
\newcommand{\dF}{{F^{(1)}}}
\newcommand{\bX}{{\X_{(1)}}}
\newcommand{\bY}{{\Y_{(1)}}}
\newcommand{\bF}{{F_{(1)}}}
\newcommand{\dbX}{{\X_{(1)}^{(2)}}}
\newcommand{\dbY}{{\Y_{(1)}^{(2)}}}
$
dco/c++: Derivative Code by Overloading in C++
Summary of Features: v3.4
We consider implementations of multivariate vector functions
$$
f : D^{n} \times D^{n'} \rightarrow D^m \times D^{m'} :
(\Y, \Y') = f(\X,\X')
$$
as computer programs over some
base data type $D$
(for example, single or higher precision
floating-point data, intervals, convex/concave
relaxations, vectors/ensembles).
We assume that the arithmetic inside $f$ is completely
defined (through overloading of the elemental functions; see below)
.
The $n$ active (also: independent)
and $n'$ passive inputs are mapped onto $m$ active
(also: dependent) and $m'$ passive outputs. The given
implementation is assumed to be $k$ times continuously
differentiable at all points of interest implying the
existence and finiteness of the Jacobian
$$
\nabla f = \nabla_\X f(\X,\X') \equiv
\frac{\partial \Y}{\partial \X}(\X,\X') \in D^{m \times n},
$$
the Hessian
$$
\nabla^2 f = \nabla^2 f(\X,\X') \equiv
\frac{\partial^2 \Y}{\partial \X^2}(\X,\X') \in D^{m \times n \times n},
$$
if $k\geq2,$ and of potentially higher derivative tensors
$$
\nabla^k f = \nabla^k f(\X,\X') \equiv
\frac{\partial^k \Y}{\partial \X^k}(\X,\X') \in D^{m \times n \times \underset{k \text{times}}{\ldots} \times n}.
$$
We denote $$\nabla^k f=\left(\left [\nabla^k f \right ]_i^{j_1,\ldots,j_k} \right)_{i=0,\ldots,m-1}^{j_1,\ldots,j_k=0,\ldots,n-1},$$
and we use $*$ to denote the entire range of an index.
For example,
$\left [\nabla f \right ]_i^*$ denotes the $i$th row
and
$\left [\nabla f \right ]_*^j$ the $j$th column of the
Jacobian, respectively.
Algorithmic differentiation is implemented by the overloading
of elemental functions including the built-in functions
and operators of C++ as well as user-defined higher-level
elemental functions.
1 First-order Tangent Mode
1.1 Generic first-order scalar tangents
Generic first-order scalar tangent mode
enables the computation of products of the Jacobian with
vectors $\X^{(1)} \in D^n$
$$
\Y^{(1)}=\nabla f \cdot \X^{(1)} \in D^m
$$
through provision of a generic first-order scalar tangent data type
over arbitrary base data types $D$.
1.2 Generic first-order vector tangents
Generic first-order vector tangent mode
enables the computation of products of the Jacobian with
matrices $X^{(1)} \in D^{n \times l}$
$$
Y^{(1)}=\nabla f \cdot X^{(1)} \in D^{m \times l}
$$
through provision of a generic first-order vector tangent data type
over arbitrary base data types $D.$
1.3 Preaccumulation through use of expression templates
Expression templates enable the generation of
statically optimized gradient code at the level of
individual assignments which can be beneficial for vector
tangent mode.
2 First-order Adjoint Mode
2.1 Generic first-order scalar adjoints
Generic first-order scalar adjoint mode
enables the computation of products of the transposed Jacobian with
vectors $\Y_{(1)} \in D^m$
$$
\X_{(1)}=\nabla f^T \cdot \Y_{(1)} \in D^n
$$
through provision of a generic first-order scalar adjoint data type
over arbitrary base data types $D.$
2.2 Generic first-order vector adjoints
Generic first-order vector adjoint mode
enables the computation of products of the transposed Jacobian
with matrices $Y_{(1)} \in D^{m \times l}$
$$
X_{(1)}=\nabla f^T \cdot Y_{(1)} \in D^{n \times l}
$$
through provision of a generic first-order scalar adjoint data type
over arbitrary base data types $D.$
2.3 Global “blob” tape
Memory of the specified size is allocated and used for
storing the tape without bound checks.
2.4 Global “chunk” tape
The tape grows in chunks up to the physical memory bound.
2.5 Global “file” tape (v3.2)
Chunks are written to and read from disk.
2.6 Thread-local tape (v3.4)
The global tapes are now thread-local by default allowing an easier switch to multi-threading.
2.7 Distinct data types for values, partial derivatives and adjoints with all kinds of tapes (v3.2)
Data types for values, partial derivatives and adjoints can be set
individually when defining the differentiation mode.
2.8 (Thread-)local tapes
Multiple “blob” or “chunk” tapes can be allocated,
for example, to implement thread-safe adjoints.
2.9 Preaccumulation through use of expression templates
Expression templates enable the generation of
statically optimized gradient code at the level of
individual assignments. This preaccumulation can
result in a decrease in tape memory requirement.
2.10 Repeated evaluation of tape
Tapes can be recorded at a given point and interpreted
repeatedly.
2.11 Adjoint callback interface
The adjoint callback interface supports
- checkpointing
- symbolic adjoints of numerical methods
- combinations of tangent and adjoint modes
- preaccumulation of local derivative tensors
- creation of collections of domain-specific
higher-level elemental functions
2.12 User-defined local Jacobians interface
Externally preaccumulated partial derivatives can be
inserted directly into the tape for use within subsequent
interpretations.
2.13 User-driven preaccumulation of local Jacobians (v3.2)
An easy-to-use interface for robust preaccumulation of local Jacobians is
provided. Aggressive reduction of the tape size is facilitated.
2.14 Multiple adjoint vectors for a single tape (v3.2)
Several (concurrent) interpretations of the same tape are enabled through
separate (thread-local) adjoint vectors.
2.15 Modulo Adjoint Propagation (v3.3)
The vector of adjoints is compressed by analysing the maximum number
of required distinct adjoint memory locations. During interpretation,
adjoint memory, which is no longer required, is overwritten and thus
reused by indexing through modulo operations. This feature is
especially useful for iterative algorithms (e.g. time iterations). The
required memory for the vector of adjoints usually stays constant,
independent of the number of iterations. Combined with the use of disk
tape, almost arbitrarily sized tapes can be generated, which might be
especially of interest for prototyping or validation purposes.
2.16 Sparse Tape Interpretation (v3.3)
The adjoint interpretation of the tape can omit propagation along
edges when the corresponding adjoint to be propagated is zero. This
might be of use when NaNs or Infs occur as local partial derivatives
(e.g. when computing square root of zero), but this local result is
only used for subsequent computations which are not relevant for the
overall output. This feature might have a performance impact on tape
interpretation and should therefore be considered for debugging
configuration only.
3 Second- and Higher-order Tangent Mode
3.1 Generic second-order scalar tangents
Instantiation of the generic first-order
scalar tangent data type with the generic first-order
scalar tangent data type over a non-derivative base
data type yields the second-order scalar
tangent data type. It enables the computation of
scalar projections of the Hessian $\nabla^2 f \in
D^{m \times n \times n}$ in its two domain dimensions of length $n$
as
$$
\Y^{(1,2)}=\langle\nabla^2 f,\X^{(1)},\X^{(2)}\rangle \equiv
\left(\X^{{(1)}^T} \cdot \left [\nabla^2 f \right ]_i^{*,*} \cdot \X^{(2)} \right)_{i=0,\ldots,m-1}
\in D^m
$$
for $\X^{(1)}, \X^{(2)} \in D^n$
over arbitrary base data types $D$.
The computational
cost of
accumulating the whole Hessian over $D$
becomes $O(n^2) \cdot \text{Cost}(f).$
with both $\X^{(1)}$ and $\X^{(2)}$ ranging independently
over the Cartesian basis vectors in $D^n.$
3.2 Generic second-order vector tangents
Instantiation of the generic first-order
vector tangent data type with the generic first-order
vector tangent data type over a non-derivative base
data type yields the second-order scalar
tangent data type. It enables the computation of
vector projections of the Hessian $\nabla^2 f \in D^{m \times n \times n}$ in its two domain dimensions of length $n$
as
$$
Y^{(1,2)}=\langle\nabla^2 f,X^{(1)},X^{(2)}\rangle \equiv
\left(X^{{(1)}^T} \cdot \left [\nabla^2 f \right ]_i^{*,*} \cdot X^{(2)} \right)_{i=0,\ldots,m-1}
\in D^{m \times l_1 \times l_2}
$$
for $X^{(1)} \in D^{n \times l_1},$ $X^{(2)} \in D^{n \times l_2}$
over arbitrary base data types $D$.
The computational
cost of
accumulating the whole Hessian over $D$
remains equal to $O(n^2) \cdot \text{Cost}(f)$
with both $X^{(1)}$ and $X^{(2)}$ set equal to the
identities in $D^n.$
3.3 Other generic second-order tangents
Instantiation of the generic first-order
vector tangent data type with the generic first-order
scalar tangent data type over a non-derivative base
data type yields a second-order tangent data type.
Similarly, instantiation of the generic first-order
scalar tangent data type with the generic first-order
vector tangent data type over a non-derivative base
data type yields a second-order tangent data type.
3.4 Generic third- and higher-order tangents
Instantiation of tangent types with $k$th-order tangent
types yields $(k+1)$th-order tangent types.
4 Second- and Higher-order Adjoint Mode
4.1 Generic second-order scalar adjoints
Instantiation of the generic first-order
scalar adjoint data type with the generic first-order
scalar tangent data type over a non-derivative base
data type yields a second-order scalar
adjoint data type. It enables the computation of
scalar projections of the Hessian $\nabla^2 f \in
D^{m \times n \times n}$ in its image and one of its
domain dimensions as
$$
\X_{(1)}^{(2)}=\langle\Y_{(1)},\nabla^2 f,\X^{(2)}\rangle \equiv
\left(\Y_{(1)}^T \cdot \left [\nabla^2 f \right ]_*^{j,*} \cdot \X^{(2)} \right)_{j=0,\ldots,n-1}
\in D^n
$$
for $\Y_{(1)} \in D^m$ and $\X^{(2)} \in D^n$
over arbitrary base data types $D$. The computational
cost of
accumulating the whole Hessian over $D$
becomes $O(m\cdot n) \cdot \text{Cost}(f).$
with $\Y_{(1)}$ and $\X^{(2)}$ ranging
over the Cartesian basis vectors in $D^m$ and $D^n$,
respectively.
4.2 Generic second-order vector adjoints
Instantiation of the generic first-order
vector adjoint data type with the generic first-order
vector tangent data type over a non-derivative base
data type yields a second-order vector
adjoint data type. It enables the computation of
vector projections of the Hessian $\nabla^2 f \in
D^{m \times n \times n}$ in its image and one of its
domain dimensions as
$$
X_{(1)}^{(2)}=\langle Y_{(1)},\nabla^2 f,X^{(2)}\rangle \equiv
\left(Y_{(1)}^T \cdot \left [\nabla^2 f \right ]_*^{j,*} \cdot X^{(2)} \right)_{j=0,\ldots,n-1}
\in D^{l_1 \times n \times l_2}
$$
for $Y_{(1)} \in D^{m \times l_1}$ and $X^{(2)} \in D^{n \times l_2}$
over arbitrary base data types $D$. The computational
cost of
accumulating the whole Hessian over $D$
remains equal to $O(m\cdot n) \cdot \text{Cost}(f).$
with $Y_{(1)}$ and $X^{(2)}$ set equal to the identities
in $D^m$ and $D^n$, respectively.
4.3 Other generic second-order adjoints
Instantiation of the generic first-order
vector adjoint data type with the generic first-order
scalar tangent data type over a non-derivative base
data type yields a second-order adjoint data type.
Similarly, instantiation of the generic first-order
scalar adjoint data type with the generic first-order
vector tangent data type over a non-derivative base
data type yields a second-order adjoint data type.
Symmetry of the Hessian in its two domain dimensions
yields the following additional second-order adjoint
data types:
Instantiation of a generic first-order
tangent data type with a generic first-order
adjoint data type over a non-derivative base
data type yields a second-order adjoint data type
for computing
$$
\langle\Y_{(2)}^{(1)},\nabla^2 f,\X^{(1)}\rangle \equiv
\left(\Y_{(2)}^{(1)^T} \cdot \left [\nabla^2 f \right ]_*^{j,*} \cdot \X^{(1)} \right)_{j=0,\ldots,n-1}
\in D^n.
$$
Similarly, instantiation of a generic first-order
adjoint data type with a generic first-order
adjoint data type over a non-derivative base
data type yields a second-order adjoint data type for
computing
$$
\langle\X_{(1,2)},\Y_{(1)},\nabla^2 f\rangle \equiv
\left(\Y_{(1)}^T \cdot \left [\nabla^2 f \right ]_*^{j,*} \cdot \X_{(1,2)} \right)_{j=0,\ldots,n-1}
\in D^n.
$$
4.4 Generic third- and higher-order adjoints
Instantiation of tangent
types with $k$th-order adjoint types
yields $(k+1)$th-order adjoint types.
Similarly, instantiation of adjoint
types with $k$th-order tangent or adjoint
types yields $(k+1)$th-order adjoint types.
5 Additional Functionality
5.1 Vectorized (AVX) tangent and adjoint modes (v3.4)
For making use of vectorization, a respective vector data type can be
used as base type for tangent and adjoint modes.
5.2 Debugging capabilities (v3.4)
For enabling internal debugging checks, a compile time flag was added.