Skip to content

Automatic Differentiation

anvil provides automatic differentiation operators that take a NumericalFunction and return a new NumericalFunction computing the desired derivative. The result can be called from Python (JIT) or exported to C++ (AOT) like any other function.

Dense derivatives

Gradient

For scalar-valued functions \(f: \mathbb{R}^n \to \mathbb{R}\):

import anvil as av
from anvil.ad import gradient
import numpy as np

@av.numerical_function("cost", 5)
def cost(x: av.Tensor) -> av.Tensor:
    return (x * x).sum()

cost_grad = gradient(cost)  # NumericalFunction returning R^5

x = np.array([1.0, 2.0, 3.0, 4.0, 5.0])
print(cost_grad(x))  # [2. 4. 6. 8. 10.]

Jacobian

For vector-valued functions \(f: \mathbb{R}^n \to \mathbb{R}^m\):

from anvil.ad import jacobian

fn = av.NumericalFunction(
    "f",
    lambda x: av.Tensor.stack(x[0] * x[1], x[1] ** 2),
    (av.Arg(2),),
)
jac = jacobian(fn)  # NumericalFunction returning R^{2x2}

print(jac(np.array([3.0, 4.0])))
# [[4. 3.]
#  [0. 8.]]

Hessian

For scalar-valued functions \(f: \mathbb{R}^n \to \mathbb{R}\):

from anvil.ad import hessian

cost = av.NumericalFunction(
    "cost",
    lambda x: x[0] ** 2 + x[0] * x[1] + x[1] ** 2,
    (av.Arg(2),),
)
H = hessian(cost)  # NumericalFunction returning R^{2x2}

print(H(np.array([1.0, 1.0])))
# [[2. 1.]
#  [1. 2.]]

Multi-input functions

All operators accept an argnum parameter to specify which input to differentiate with respect to:

fn = av.NumericalFunction("f", lambda x, p: x * p, (av.Arg(3), av.Arg(3)))
jac_x = jacobian(fn, argnum=0)  # d/dx
jac_p = jacobian(fn, argnum=1)  # d/dp

Sparse derivatives

For functions with sparse derivative structure (common in optimization), use the sparse variants. These use graph coloring to minimize the number of directional derivative evaluations:

Sparse Jacobian

from anvil.ad import spjacobian

fn = av.NumericalFunction("dynamics", my_dynamics, (av.Arg(n),))
jac_fn = spjacobian(fn)  # SparseNumericalFunction

The returned SparseNumericalFunction computes only the non-zero entries of the Jacobian. The sparsity pattern is determined at trace time via symbolic sparsity analysis, and graph coloring determines the minimum number of JVP (Jacobian-vector product) evaluations needed.

Options:

spjacobian(fn, argnum=0)                    # differentiate w.r.t. first input
spjacobian(fn, unroll=True)                 # unroll JVP loop (faster for small n)
spjacobian(fn, sep_uncompression=True)      # separate uncompression step (default)
  • unroll=True: expands the JVP loop at trace time, producing larger but sometimes faster code. Recommended for small dimensions.
  • unroll=False (default): uses vmap over the JVP, producing compact loop-based code that scales to any dimension.
  • sep_uncompression=True (default): separates the column uncompression from the JVP computation into two kernels. Often faster.

Sparse Hessian

from anvil.ad import sphessian

cost = av.NumericalFunction("cost", my_cost, (av.Arg(n),))
H_fn = sphessian(cost)  # SparseNumericalFunction (upper triangular)

Returns the upper-triangular part of the Hessian in CSC format. Uses symmetric (star) graph coloring to exploit the Hessian's symmetry.

Reconstructing sparse matrices

In Python:

import scipy.sparse as sp

values = jac_fn(x)
J = sp.csc_array(
    (values, jac_fn.indices, jac_fn.indptr),
    shape=jac_fn.shape
)

In C++ (example using Eigen, but any sparse matrix library works -- Eigen is not required by anvil):

using SpMat = Eigen::SparseMatrix<double, Eigen::ColMajor, int>;
Eigen::Map<const SpMat> J(
    my_fn_jac::rows, my_fn_jac::cols, my_fn_jac::nnz,
    my_fn_jac::outerStarts, my_fn_jac::innerIndices,
    out.data
);

Low-level: JVP and VJP

The higher-level operators (gradient, jacobian, hessian, spjacobian, sphessian) are built on two primitives:

  • JVP (Jacobian-Vector Product, forward mode): computes \(J \cdot v\) for a given tangent vector \(v\). Used by jacobian and spjacobian.
  • VJP (Vector-Jacobian Product, reverse mode): computes \(J^\top \cdot v\) for a given cotangent vector \(v\). Used by gradient.

These operate at the UOp graph level and are not typically used directly. See the AD math background for the mathematical details.