NumericalFunctions¶

NumericalFunction is the central abstraction in anvil. It wraps a Python function written with tinygrad Tensor operations and turns it into a callable, code-generable unit.

Key mental model: Think of NumericalFunction as a compiled function specification. The Tensor class is only used inside function definitions to describe computations (similar to CasADi's SX/MX). Outside of function definitions, data flows as contiguous numpy arrays (similar to CasADi's DM). Users should never need to manipulate Tensors directly except to:

Write function bodies
Capture constant data (matrices, lookup tables, neural network weights) via Tensor(data) in closures

A NumericalFunction can be used in two ways:

Called from Python: fn(x) accepts and returns numpy arrays, dispatching to JIT-compiled native code
Exported to C++: av.generate_module("name", [fn]) produces standalone .hpp/.cpp files

Decorator¶

The simplest way to create a NumericalFunction is with the @av.numerical_function decorator:

import anvil as av

@av.numerical_function("my_fn", 1024)
def my_fn(x: av.Tensor) -> av.Tensor:
    return 0.5 * x.square()

The arguments after the name are the input shapes. For multiple inputs, pass multiple shapes:

@av.numerical_function("add", 3, 3)
def add(x: av.Tensor, y: av.Tensor) -> av.Tensor:
    return x + y

Direct construction¶

For more control (e.g. custom dtypes, 2D inputs), construct NumericalFunction directly:

fn = av.NumericalFunction(
    "my_function",
    lambda x: 0.5 * x.square(),
    (av.Arg(1024),),  # input shape (float64 by default)
)

The Arg constructor specifies the shape and optionally the dtype of each input:

av.Arg(1024)                              # 1D, float64 (default)
av.Arg((3, 3))                            # 2D, float64
av.Arg(1024, dtype=av.dtypes.float32)     # 1D, float32
av.Arg(0)                                 # zero-length (used for unused parameters)

Calling convention¶

NumericalFunction.__call__ performs lazy JIT compilation on first use: the function is traced, C++ source is generated, compiled to a shared library via clang++, and loaded via ctypes. Subsequent calls dispatch directly to native code.

import anvil as av
import numpy as np

fn = av.NumericalFunction("add", lambda x, y: x + y, (av.Arg(3), av.Arg(3)))

x = np.array([1.0, 2.0, 3.0])
y = np.array([4.0, 5.0, 6.0])
result = fn(x, y)  # np.ndarray([5.0, 7.0, 9.0])

# Multiple outputs returned as tuple of numpy arrays
fn2 = av.NumericalFunction("split", lambda x, y: (x + y, x * y), (av.Arg(3), av.Arg(3)))
sum_result, prod_result = fn2(x, y)

# Scalar outputs
fn3 = av.NumericalFunction("norm", lambda x: (x * x).sum(), (av.Arg(3),))
s = fn3(x)  # np.float64

Inputs are cast to the declared dtype and made contiguous. Outputs are always C-contiguous numpy arrays.

Capturing constants¶

Constants (matrices, weights, lookup tables) are captured from closures. They are embedded as static constexpr arrays in the generated C++ and copied into workspace buffers at initialization.

import anvil as av
import numpy as np

A = av.Tensor([[1.0, 2.0], [3.0, 4.0]])  # captured constant

fn = av.NumericalFunction("matmul", lambda x: A @ x, (av.Arg(2),))
fn(np.array([1.0, 1.0]))  # array([3., 7.])

In the generated C++, the matrix A is embedded as a static constexpr double[] array and copied into the workspace during init_ws().

SparseNumericalFunction¶

For functions with sparse outputs (typically Jacobians and Hessians), SparseNumericalFunction stores the output as a flat array of non-zero values alongside CSC (Compressed Sparse Column) metadata.

You usually don't construct SparseNumericalFunction directly -- instead, use the AD operators:

from anvil.ad import spjacobian, sphessian

fn = av.NumericalFunction("f", my_fn, (av.Arg(n),))
jac_fn = spjacobian(fn)   # SparseNumericalFunction
hess_fn = sphessian(fn)   # SparseNumericalFunction (upper triangular)

A SparseNumericalFunction has additional attributes:

shape: (rows, cols) of the full sparse matrix
nnz: number of non-zero elements
indices: row indices (CSC innerIndices)
indptr: column pointers (CSC outerStarts)

When called, it returns a flat np.ndarray of nnz values. To reconstruct the sparse matrix:

import scipy.sparse as sp

values = jac_fn(x)
J = sp.csc_array(
    (values, jac_fn.indices, jac_fn.indptr),
    shape=jac_fn.shape
)

In the generated C++ header, the sparsity pattern is exposed as static constexpr arrays:

namespace my_fn_jac {
    static constexpr int rows = 100;
    static constexpr int cols = 50;
    static constexpr int nnz = 200;
    static constexpr int innerIndices[200] = { /* row indices */ };
    static constexpr int outerStarts[51] = { /* column pointers */ };

    typedef Buffer<double, 200> OUT0_t;  // flat array of nnz values
}

Reconstructing with Eigen (Eigen is not required -- any sparse matrix library works):

using SpMat = Eigen::SparseMatrix<double, Eigen::ColMajor, int>;
Eigen::Map<const SpMat> J(
    my_fn_jac::rows, my_fn_jac::cols, my_fn_jac::nnz,
    my_fn_jac::outerStarts, my_fn_jac::innerIndices, out.data
);

Generic typing¶

NumericalFunction is a generic class parameterized by its input and output types:

NumericalFunction[I: tuple[Arg, ...], R: Tensor | tuple[Tensor, ...]]

The constructor has overloads for 1-4 inputs with single or dual outputs. This means your LSP (e.g. Pylance, pyright) can:

Infer the number of inputs: if you create NumericalFunction("f", fn, (Arg(3), Arg(3))), the type is NumericalFunction[tuple[Arg, Arg], Tensor] and the LSP will flag calls with the wrong number of arguments.
Distinguish single vs tuple returns: fn(x) returns np.ndarray for single-output functions and tuple[np.ndarray, ...] for multi-output functions.

Limitations:

Overloads cover up to 4 inputs and up to 2 outputs. Functions with more inputs/outputs will have a less precise type (NumericalFunction[tuple[Arg, ...], Tensor | tuple[Tensor, ...]]).
The @av.numerical_function decorator returns a NumericalFunction but the LSP may not infer the specific generic parameters from the decorator arguments.
AD operators like gradient(fn) return a NumericalFunction with the same input type but the output type may not be precisely tracked.