Automatic differentiation¶

Differential operators¶

For a differentiable function $f: X \to Y$ between euclidean spaces: - differential $Df(x) \in \mathfrak{L}(X,Y)$ ↔ JVP: $\mathrm{jvp}(f,x,u) = Df(x)[u]$ - adjoint $Df(x)^* \in \mathfrak{L}(Y,X)$ ↔ VJP: $\mathrm{vjp}(f,x,v) = Df(x)^*[v]$

For $X=\mathbb{R}^n, Y=\mathbb{R}^m$: $$J_f(x) = \mathrm{vmap}(\mathrm{jvp}(f,x,\cdot), I_n)$$ $$J_f(x)^T = \mathrm{vmap}(\mathrm{vjp}(f,x,\cdot), I_m)$$

For scalar $f: X \to \mathbb{R}$: $$\nabla f(x) = \mathrm{vjp}(f, x, 1)$$

Hessians¶

For scalar functions, the Hessian-Vector-Product: $$\mathrm{hvp}(f,x,u) = D(\nabla f)(x)[u]$$

Full Hessian: $$H_f(x) = \mathrm{vmap}(\mathrm{jvp}(\mathrm{vjp}(f,\cdot,1), x, \cdot), I_n)$$

Lagrangian derivatives¶

For constrained optimization $\min_x f(x)$ s.t. $g_\mathrm{eq}(x) = 0$:

\[\mathcal{L}(x,\lambda) = f(x) + \lambda^T g_\mathrm{eq}(x)\]

\[\nabla_x \mathcal{L} = \mathrm{vjp}(f,x,1) + \mathrm{vjp}(g_\mathrm{eq},x,\lambda)\]

\[\nabla^2_{xx} \mathcal{L} = \mathrm{vmap}(\mathrm{jvp}(\mathrm{vjp}(f,\cdot,1) + \mathrm{vjp}(g_\mathrm{eq},\cdot,\lambda), x, \cdot), I_n)\]