Automatic differentiation¶
Differential operators¶
For a differentiable function \(f: X \to Y\) between euclidean spaces: - differential \(Df(x) \in \mathfrak{L}(X,Y)\) ↔ JVP: \(\mathrm{jvp}(f,x,u) = Df(x)[u]\) - adjoint \(Df(x)^* \in \mathfrak{L}(Y,X)\) ↔ VJP: \(\mathrm{vjp}(f,x,v) = Df(x)^*[v]\)
For \(X=\mathbb{R}^n, Y=\mathbb{R}^m\): $\(J_f(x) = \mathrm{vmap}(\mathrm{jvp}(f,x,\cdot), I_n)\)$ $\(J_f(x)^T = \mathrm{vmap}(\mathrm{vjp}(f,x,\cdot), I_m)\)$
For scalar \(f: X \to \mathbb{R}\): $\(\nabla f(x) = \mathrm{vjp}(f, x, 1)\)$
Hessians¶
For scalar functions, the Hessian-Vector-Product: $\(\mathrm{hvp}(f,x,u) = D(\nabla f)(x)[u]\)$
Full Hessian: $\(H_f(x) = \mathrm{vmap}(\mathrm{jvp}(\mathrm{vjp}(f,\cdot,1), x, \cdot), I_n)\)$
Lagrangian derivatives¶
For constrained optimization \(\min_x f(x)\) s.t. \(g_\mathrm{eq}(x) = 0\):
\[\mathcal{L}(x,\lambda) = f(x) + \lambda^T g_\mathrm{eq}(x)\]
\[\nabla_x \mathcal{L} = \mathrm{vjp}(f,x,1) + \mathrm{vjp}(g_\mathrm{eq},x,\lambda)\]
\[\nabla^2_{xx} \mathcal{L} = \mathrm{vmap}(\mathrm{jvp}(\mathrm{vjp}(f,\cdot,1) + \mathrm{vjp}(g_\mathrm{eq},\cdot,\lambda), x, \cdot), I_n)\]