[FEATURE REQUEST] `gemm` Python kernel implementation

### Problem statement

The BLAS level-3 `gemm` operation (general matrix-matrix multiply) currently has no pure Python reference implementation in `kernel_course.python_ops`. In the README BLAS table, the `gemm` row has no Python entry, even though `gemm` is a central building block for many higher-level algorithms.

Without a Python `gemm` kernel:
- there is no simple, framework-agnostic reference for $C = \alpha A B + \beta C$,
- backend implementations lack a canonical numerical baseline for correctness checks,
- educational materials cannot show a clear Python prototype for GEMM within this project.

### Proposed solution

Add a Python reference implementation for `gemm` under `kernel_course.python_ops`, following the conventions of existing Python kernels.

Concretely:
- Introduce `kernel_course/python_ops/gemm.py` implementing $C = \alpha A B + \beta C$.
- Support 2D matrices `A`, `B`, and `C` with compatible shapes.
- Emphasize clarity and correctness over performance, as this is a reference implementation.

### Alternatives considered

Using NumPy or PyTorch as the implicit reference would:
- tie semantics to a particular external library,
- diverge from the project’s pattern of explicit Python reference kernels,
- complicate reasoning about correctness independently of backend implementations.

### Implementation details

- Add `kernel_course/python_ops/gemm.py` with a top-level `gemm` function.
- Accept scalars `alpha`, `beta`, matrices `A`, `B`, and `C`.
- Implement straightforward nested loops or a readable algorithm, including shape checks.
- Update `kernel_course/python_ops/__init__.py` to export `gemm` if appropriate.

### Use case

The Python `gemm` kernel will:
- define the mathematical semantics of GEMM for all backends,
- provide a simple CPU-only reference for correctness and teaching,
- act as a baseline for testing and benchmarking backend implementations.

### Related work

- Existing Python kernels: `python_ops.copy`, `python_ops.swap`.
- Standard BLAS `gemm` operations.

### Additional context

This issue starts filling the `gemm` row in the README BLAS table with a Python reference implementation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FEATURE REQUEST] `gemm` Python kernel implementation #30

Problem statement

Proposed solution

Alternatives considered

Implementation details

Use case

Related work

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FEATURE REQUEST] gemm Python kernel implementation #30

Description

Problem statement

Proposed solution

Alternatives considered

Implementation details

Use case

Related work

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[FEATURE REQUEST] `gemm` Python kernel implementation #30