-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Description
Problem statement
The BLAS level-2 geru kernel (general rank-1 update) has no Triton implementation in kernel_course.triton_ops. The README BLAS table shows no Triton support for geru, leaving a classic BLAS-2 operation without a Triton example in this project.
Without a Triton geru kernel:
- users cannot study how rank-1 updates are mapped onto Triton’s execution model,
- there is no Triton baseline for performance comparison against PyTorch and CuTe
geruimplementations, - the Triton backend is missing a key building block for more complex kernels.
Proposed solution
Add a Triton implementation of geru under kernel_course.triton_ops that matches the Python/PyTorch semantics and showcases tiling and memory access patterns for rank-1 updates.
Concretely:
- Introduce
kernel_course/triton_ops/geru.pydefining a Triton JIT kernel and Python wrapper. - Implement
$A = A + \alpha x y^\top$ by updating tiles ofAusing appropriate blocking strategies. - Ensure numerical equivalence with the Python reference across supported dtypes.
Alternatives considered
Relying solely on PyTorch or CuTe for geru would:
- reduce opportunities to demonstrate rank-1 updates in Triton,
- leave the Triton column incomplete in the README BLAS table,
- limit the breadth of Triton examples in
kernel-course.
Implementation details
- Add
kernel_course/triton_ops/geru.pycontaining:- a Triton kernel operating on tiles of
Awith elements fromxandy, - a wrapper that configures the grid/block sizes consistent with other Triton kernels.
- a Triton kernel operating on tiles of
- Handle arbitrary matrix sizes with guard checks.
- Integrate with future tests and benchmarks.
Use case
The Triton geru kernel will:
- illustrate a rank-1 update in Triton for educational purposes,
- provide a basis for performance comparisons to other backends,
- serve as a foundational operation for building more advanced kernels.
Related work
- Existing Triton kernels:
triton_ops.copy,triton_ops.swap. - Triton examples for matrix updates and BLAS-2 operations.
Additional context
This issue is part of expanding Triton coverage to BLAS level-2 operations listed in the README, including geru.
Metadata
Metadata
Assignees
Labels
No labels