Skip to content

Add support for experimental wheel variants (i.e., wheelnext) #2092

@mhamann

Description

@mhamann

Is your feature request related to a problem? Please describe.
Today, installing llama-cpp-python on machines with different GPU backends (CUDA, ROCm, Metal, etc.) requires separate package names, custom extra indexes, or installer-level logic to select the correct wheel. This creates friction for downstream tooling (CLIs, orchestrators, and packaging systems) that want to provide a “just works” experience, especially when users don’t know which backend they need. Even a simple developer-driven install might require picking precisely the correct wheel.

Describe the solution you'd like
Add support for WheelNext-compatible experimental wheel variants when building and publishing wheels.

This would allow llama-cpp-python to produce a single package version that provides multiple backend-aware binary wheels, each annotated with variant metadata (e.g., GPU type, CUDA version, ROCm version).

Installers that understand the WheelNext spec (now used experimentally by PyTorch, uv, and others) can automatically select the correct backend wheel based on the system’s hardware/software configuration without a need for custom index URLs, separate packages, or manual backend flags.

Key pieces:

  • Generate wheels with variant metadata following the experimental WheelNext (wheel variants) conventions.
  • Publish per-backend wheels using the standardized naming + metadata fields.
  • Ensure that CPU-only wheels remain available as fallback.

This would significantly simplify installation for all users and remove backend-selection logic from downstream tools. Wheel variants are fully backward-compatible so existing workflows won't be disrupted.

Describe alternatives you've considered

  • Separate package names per backend (e.g., llama-cpp-python-cuda): fragments packaging and forces manual selection.
  • Extras for backend variants (pip install llama-cpp-python[cuda]): still requires external detection and doesn’t integrate with hardware-aware installer selection.
  • Custom index URLs for backend wheels: brittle and requires orchestration logic outside Python packaging.
  • CLI-backed installation routing (what many downstream projects do currently): it’s reinventing the wheel and provides an inconsistent experience for end users.

All of these solutions put the burden on downstream tooling rather than on standardized wheel metadata.

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions