Skip to content

[FEATURE REQUEST] gemv CuTe kernel implementation #23

@LoserCheems

Description

@LoserCheems

Problem statement

The BLAS level-2 gemv kernel has no CuTe backend implementation in this project. The README BLAS table shows an empty CuTe column for gemv, which prevents full cross-backend coverage for matrix-vector multiply.

Without a CuTe gemv kernel:

  • users cannot see how GEMV is expressed with CuTe’s layout and tiling abstractions,
  • there is no CuTe performance baseline to compare against PyTorch and Triton gemv,
  • CuTe-based higher-level modules lack a fundamental building block.

Proposed solution

Implement a CuTe-based gemv kernel matching the Python reference semantics and aligning with the project’s backend structure.

Concretely:

  • Add a CuTe gemv kernel in the appropriate CuTe backend directory, implementing $y = \alpha A x + \beta y$.
  • Use CuTe primitives to describe the matrix layout, vector access, and thread scheduling.
  • Ensure API parity with other backends so callers can dispatch to CuTe gemv uniformly.

Alternatives considered

Alternatives such as omitting CuTe gemv or relying solely on other backends would:

  • reduce the educational impact of comparing CuTe to PyTorch/Triton on GEMV,
  • leave the CuTe column partially empty in the README BLAS table,
  • limit the ability to build CuTe-based end-to-end examples.

Implementation details

  • Establish the file layout and build integration for CuTe kernels.
  • Implement gemv using CuTe constructs for row-major/column-major layouts and tiling.
  • Match numerical behaviour and broadcasting of scalars alpha and beta with the Python reference.
  • Integrate with planned testing and benchmarking for gemv.

Use case

The CuTe gemv kernel will:

  • demonstrate GEMV implementation in CuTe,
  • enable detailed performance comparisons across backends,
  • support more complex CuTe-based BLAS and Transformer kernels.

Related work

  • CuTe/CUTLASS examples of GEMV/GEMM.
  • Standard BLAS gemv implementations.

Additional context

This issue complements the gemv Python/PyTorch/Triton feature requests and helps complete the CuTe column in the README BLAS table.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions