-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Description
Problem statement
The BLAS level-2 geru kernel (general rank-1 update) does not yet have a CuTe backend implementation in this project. The README BLAS table lists an empty CuTe column for geru, preventing full cross-backend coverage for this operation.
Without a CuTe geru kernel:
- users cannot learn how rank-1 updates are expressed using CuTe’s layout and tiling abstractions,
- there is no CuTe performance baseline to compare against PyTorch and Triton
geruimplementations, - CuTe-based higher-level examples lack a standard rank-1 update primitive.
Proposed solution
Implement a CuTe-based geru kernel that matches the Python reference semantics and fits within the project’s backend structure.
Concretely:
- Add a CuTe
gerukernel in the appropriate CuTe backend directory, implementing$A = A + \alpha x y^\top$ . - Use CuTe primitives to describe matrix layout, vector access, and thread scheduling for rank-1 updates.
- Align the public API with other backends to allow uniform dispatch.
Alternatives considered
Alternatives such as omitting CuTe geru or reusing other backends would:
- reduce the educational impact of comparing CuTe to PyTorch/Triton on BLAS-2 operations,
- leave the CuTe column incomplete in the README BLAS table,
- limit CuTe’s role as a first-class backend.
Implementation details
- Establish file layout and build rules for CuTe kernels.
- Implement
geruusing CuTe abstractions for rank-1 updates over 2D layouts. - Ensure numerical equivalence with the Python reference.
- Integrate with planned tests and benchmarks for
geru.
Use case
The CuTe geru kernel will:
- demonstrate rank-1 updates in CuTe,
- enable detailed performance comparisons across backends,
- serve as a building block for more complex CuTe-based kernels.
Related work
- CuTe/CUTLASS examples of rank-1 updates.
- Standard BLAS
geruimplementations.
Additional context
This issue complements the geru Python/PyTorch/Triton feature requests and contributes to full CuTe coverage of BLAS-2 operations.
Metadata
Metadata
Assignees
Labels
No labels