Automatic Differentiation¶
anvil provides automatic differentiation operators that take a NumericalFunction and return a new NumericalFunction computing the desired derivative. The result can be called from Python (JIT) or exported to C++ (AOT) like any other function.
Dense derivatives¶
Gradient¶
For scalar-valued functions \(f: \mathbb{R}^n \to \mathbb{R}\):
import anvil as av
from anvil.ad import gradient
import numpy as np
@av.numerical_function("cost", 5)
def cost(x: av.Tensor) -> av.Tensor:
return (x * x).sum()
cost_grad = gradient(cost) # NumericalFunction returning R^5
x = np.array([1.0, 2.0, 3.0, 4.0, 5.0])
print(cost_grad(x)) # [2. 4. 6. 8. 10.]
Jacobian¶
For vector-valued functions \(f: \mathbb{R}^n \to \mathbb{R}^m\):
from anvil.ad import jacobian
fn = av.NumericalFunction(
"f",
lambda x: av.Tensor.stack(x[0] * x[1], x[1] ** 2),
(av.Arg(2),),
)
jac = jacobian(fn) # NumericalFunction returning R^{2x2}
print(jac(np.array([3.0, 4.0])))
# [[4. 3.]
# [0. 8.]]
Hessian¶
For scalar-valued functions \(f: \mathbb{R}^n \to \mathbb{R}\):
from anvil.ad import hessian
cost = av.NumericalFunction(
"cost",
lambda x: x[0] ** 2 + x[0] * x[1] + x[1] ** 2,
(av.Arg(2),),
)
H = hessian(cost) # NumericalFunction returning R^{2x2}
print(H(np.array([1.0, 1.0])))
# [[2. 1.]
# [1. 2.]]
Multi-input functions¶
All operators accept an argnum parameter to specify which input to differentiate with respect to:
fn = av.NumericalFunction("f", lambda x, p: x * p, (av.Arg(3), av.Arg(3)))
jac_x = jacobian(fn, argnum=0) # d/dx
jac_p = jacobian(fn, argnum=1) # d/dp
Sparse derivatives¶
For functions with sparse derivative structure (common in optimization), use the sparse variants. These use graph coloring to minimize the number of directional derivative evaluations:
Sparse Jacobian¶
from anvil.ad import spjacobian
fn = av.NumericalFunction("dynamics", my_dynamics, (av.Arg(n),))
jac_fn = spjacobian(fn) # SparseNumericalFunction
The returned SparseNumericalFunction computes only the non-zero entries of the Jacobian. The sparsity pattern is determined at trace time via symbolic sparsity analysis, and graph coloring determines the minimum number of JVP (Jacobian-vector product) evaluations needed.
Options:
spjacobian(fn, argnum=0) # differentiate w.r.t. first input
spjacobian(fn, unroll=True) # unroll JVP loop (faster for small n)
spjacobian(fn, sep_uncompression=True) # separate uncompression step (default)
unroll=True: expands the JVP loop at trace time, producing larger but sometimes faster code. Recommended for small dimensions.unroll=False(default): usesvmapover the JVP, producing compact loop-based code that scales to any dimension.sep_uncompression=True(default): separates the column uncompression from the JVP computation into two kernels. Often faster.
Sparse Hessian¶
from anvil.ad import sphessian
cost = av.NumericalFunction("cost", my_cost, (av.Arg(n),))
H_fn = sphessian(cost) # SparseNumericalFunction (upper triangular)
Returns the upper-triangular part of the Hessian in CSC format. Uses symmetric (star) graph coloring to exploit the Hessian's symmetry.
Reconstructing sparse matrices¶
In Python:
import scipy.sparse as sp
values = jac_fn(x)
J = sp.csc_array(
(values, jac_fn.indices, jac_fn.indptr),
shape=jac_fn.shape
)
In C++ (example using Eigen, but any sparse matrix library works -- Eigen is not required by anvil):
using SpMat = Eigen::SparseMatrix<double, Eigen::ColMajor, int>;
Eigen::Map<const SpMat> J(
my_fn_jac::rows, my_fn_jac::cols, my_fn_jac::nnz,
my_fn_jac::outerStarts, my_fn_jac::innerIndices,
out.data
);
Low-level: JVP and VJP¶
The higher-level operators (gradient, jacobian, hessian, spjacobian, sphessian) are built on two primitives:
- JVP (Jacobian-Vector Product, forward mode): computes \(J \cdot v\) for a given tangent vector \(v\). Used by
jacobianandspjacobian. - VJP (Vector-Jacobian Product, reverse mode): computes \(J^\top \cdot v\) for a given cotangent vector \(v\). Used by
gradient.
These operate at the UOp graph level and are not typically used directly. See the AD math background for the mathematical details.