Add support for experimental wheel variants (i.e., wheelnext)

**Is your feature request related to a problem? Please describe.**
Today, installing llama-cpp-python on machines with different GPU backends (CUDA, ROCm, Metal, etc.) requires separate package names, custom extra indexes, or installer-level logic to select the correct wheel. This creates friction for downstream tooling (CLIs, orchestrators, and packaging systems) that want to provide a “just works” experience, especially when users don’t know which backend they need. Even a simple developer-driven install might require picking precisely the correct wheel.

**Describe the solution you'd like**
Add support for WheelNext-compatible experimental wheel variants when building and publishing wheels.

This would allow llama-cpp-python to produce a single package version that provides multiple backend-aware binary wheels, each annotated with variant metadata (e.g., GPU type, CUDA version, ROCm version).

Installers that understand the WheelNext spec (now used experimentally by PyTorch, uv, and others) can automatically select the correct backend wheel based on the system’s hardware/software configuration without a need for custom index URLs, separate packages, or manual backend flags.

Key pieces:
- Generate wheels with variant metadata following the experimental WheelNext (wheel variants) conventions.
- Publish per-backend wheels using the standardized naming + metadata fields.
- Ensure that CPU-only wheels remain available as fallback.

This would significantly simplify installation for all users and remove backend-selection logic from downstream tools. Wheel variants are fully backward-compatible so existing workflows won't be disrupted.

**Describe alternatives you've considered**
- Separate package names per backend (e.g., llama-cpp-python-cuda): fragments packaging and forces manual selection.
- Extras for backend variants (pip install llama-cpp-python[cuda]): still requires external detection and doesn’t integrate with hardware-aware installer selection.
- Custom index URLs for backend wheels: brittle and requires orchestration logic outside Python packaging.
- CLI-backed installation routing (what many downstream projects do currently): it’s reinventing the wheel and provides an inconsistent experience for end users.

All of these solutions put the burden on downstream tooling rather than on standardized wheel metadata.

**Additional context**
- https://wheelnext.dev
- https://pytorch.org/blog/pytorch-wheel-variants/
- https://astral.sh/blog/wheel-variants
- https://labs.quansight.org/blog/python-wheels-from-tags-to-variants
- https://developer.nvidia.com/blog/streamline-cuda-accelerated-python-install-and-packaging-workflows-with-wheel-variants
- https://lwn.net/Articles/1028299/


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for experimental wheel variants (i.e., wheelnext) #2092

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Add support for experimental wheel variants (i.e., wheelnext) #2092

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions