Core¶
Bases: FunctionBase
cuda_arch
cached
property
¶
CUDA architecture string (e.g. 'sm_89') for NVRTC compilation.
outputs
cached
property
¶
Output args, inferred from tracing.
buf_names
cached
property
¶
Returns a dict with unique names for each buffer (input/output/intermediate).
intermediate_buf_offsets
cached
property
¶
Returns a dict with the offset in bytes of each intermediate buffer in a global workspace vector. Ensures proper alignment of each buffer to 16 bytes (NEON vector size, alignment guaranteed by Buffer::alloc)
ws_size
cached
property
¶
Returns the size in bytes of the global workspace vector.
gpu_kernel_sources
cached
property
¶
Maps kernel function_name to its source code string (for GPU embedding).
gpu_buf_nbytes
cached
property
¶
Maps each buffer to its byte size (for GPU allocation).
gpu_kernel_buf_indices
cached
property
¶
For each SINK kernel, returns ordered (arg_index, Buffer) for GPU dispatch.
Bases: NumericalFunction[I, Tensor]