Skip to content

Getting Started

This walkthrough shows the core anvil workflow: define a function in Python, call it with JIT-compiled native code, compute its sparse Jacobian, and export everything to standalone C++.

Define a function

The simplest way to create an anvil function is with the @av.numerical_function decorator:

import anvil as av

@av.numerical_function("do_square", 1024)
def do_square(x: av.Tensor) -> av.Tensor:
    return 0.5 * x.square()

This creates a NumericalFunction instance -- the central abstraction in anvil. The decorator arguments are the function name and the input shape(s). The function body uses tinygrad Tensor operations to describe the computation.

You can also construct a NumericalFunction directly for more control (e.g. custom dtypes or multiple inputs):

fn = av.NumericalFunction(
    "do_square",
    lambda x: 0.5 * x.square(),
    (av.Arg(1024),),
)

Arg(1024) declares a 1D input of 1024 float64 elements. The Tensor class is only used inside function definitions to describe computations -- outside, data flows as numpy arrays.

Call from Python

Calling a NumericalFunction triggers lazy JIT compilation on the first call. Subsequent calls dispatch directly to native code:

import numpy as np

x = np.random.randn(1024)
result = do_square(x)  # np.ndarray of shape (1024,)

Compute derivatives

anvil provides automatic differentiation that produces new NumericalFunctions:

from anvil.ad import spjacobian

jac_fn = spjacobian(do_square)  # SparseNumericalFunction
jac_values = jac_fn(x)          # flat array of non-zero values in CSC format

Since do_square has a diagonal Jacobian, spjacobian detects this and only computes the 1024 non-zero entries (instead of the full 1024x1024 matrix).

Export to C++

Generate standalone C++ files with no Python dependency:

av.generate_module(
    "anvil_do_square",
    [do_square, jac_fn],
    constants={av.CodegenIntConstant("dim", 1024)},
)

This produces anvil_do_square.hpp and anvil_do_square.cpp. The generated code contains:

namespace anvil_do_square {

template <typename T, std::size_t... Ns> struct Buffer { /* ... */ };

constexpr int dim = 1024;

namespace do_square {
    typedef Buffer<double, 1024> IN0_t;
    typedef Buffer<double, 1024> OUT0_t;
    typedef Buffer<signed char, /*ws_size*/> WS_t;

    WS_t init_ws();
    void call(const IN0_t& in0, const OUT0_t& out0, const WS_t& ws);
}

namespace do_square_jac {
    // Sparse output metadata (CSC format)
    static constexpr int rows = 1024;
    static constexpr int cols = 1024;
    static constexpr int nnz = 1024;
    static constexpr int innerIndices[1024] = { /* row indices */ };
    static constexpr int outerStarts[1025] = { /* column pointers */ };

    typedef Buffer<double, 1024> IN0_t;
    typedef Buffer<double, 1024> OUT0_t;  // flat array of nnz values
    typedef Buffer<signed char, /*ws_size*/> WS_t;

    WS_t init_ws();
    void call(const IN0_t& in0, const OUT0_t& out0, const WS_t& ws);
}

} // namespace anvil_do_square

Use in C++

#include "anvil_do_square.hpp"
#include <cstdio>

int main() {
    using namespace anvil_do_square;

    auto in = do_square::IN0_t::alloc();
    auto out = do_square::OUT0_t::alloc();
    auto ws = do_square::init_ws();

    // Fill input
    for (int i = 0; i < do_square::IN0_t::size; i++)
        in.data[i] = static_cast<double>(i);

    // Call
    do_square::call(in, out, ws);

    printf("out[0] = %f\n", out.data[0]);  // 0.0
    printf("out[1] = %f\n", out.data[1]);  // 0.5

    // Clean up
    do_square::IN0_t::free(in);
    do_square::OUT0_t::free(out);
    do_square::WS_t::free(ws);
}

Next steps