Skip to content

Deploying Gemma-3-1b-it with NVIDIA GPU P2000 - gets error #3305

@antonios-nokia

Description

@antonios-nokia

System Info

I tried to deploy a small model in docker using the below commands as suggested

  • Using WSL2
  • Docker
  • NVIDIA GPU P2000
  • Cuda 12.7
  • nvidia-smi is shown the GPU

model=google/gemma-3-1b-it
volume=$PWD/data

docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data
-e HF_HOME=/data/huggingface
-e HF_TOKEN=
ghcr.io/huggingface/text-generation-inference:3.3.4 --model-id $model

*ERROR

│ 5206 │ │ │ │ return importlib.import_module( │
│ 5207 │ │ │ │ │ "torch._decomp.decompositions" │
│ 5208 │ │ │ │ )._replication_pad(input, pad) │
│ ❱ 5209 │ return torch._C._nn.pad(input, pad, mode, value) │
│ 5210 │
│ 5211 │
│ 5212 # TODO: Fix via pytorch/pytorch#75798
│ │
│ ╭───────────────────────────────── locals ─────────────────────────────────╮ │
│ │ input = <repr-error 'CUDA error: no kernel image is available for │ │
│ │ execution on the device\nCUDA kernel errors might be │ │
│ │ asynchronously reported at some other API call, so the │ │
│ │ stacktrace below might be incorrect.\nFor debugging consider │ │
│ │ passing CUDA_LAUNCH_BLOCKING=1\nCompile with │ │
│ │ TORCH_USE_CUDA_DSA to enable device-side assertions.\n'> │ │
│ │ mode = 'constant' │ │
│ │ pad = (0, 0, 0, 1) │ │
│ │ value = None │ │
│ ╰──────────────────────────────────────────────────────────────────────────╯ │
╰──────────────────────────────────────────────────────────────────────────────╯
RuntimeError: CUDA error: no kernel image is available for execution on the
device
CUDA kernel errors might be asynchronously reported at some other API call, so
the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
rank=0
2025-07-27T16:25:13.820012Z ERROR text_generation_launcher: Shard 0 failed to start
2025-07-27T16:25:13.820095Z INFO text_generation_launcher: Shutting down shards
Error: ShardCannotStart

***** CUDA version/toolkit ********

user@DESKTOP-I0E79EK:~$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Fri_Jan__6_16:45:21_PST_2023
Cuda compilation tools, release 12.0, V12.0.140
Build cuda_12.0.r12.0/compiler.32267302_0

user@DESKTOP-I0E79EK:~$ nvidia-smi
Sun Jul 27 17:27:49 2025
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.247.01 Driver Version: 566.24 CUDA Version: 12.7 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 Quadro P2000 On | 00000000:01:00.0 Off | N/A |
| N/A 52C P8 N/A / 5... | 0MiB / 4096MiB | 1% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction

  1. Check GPU P2000 is assigned passthrough to Ubuntu VM
  2. Deploy container with TGI image

Expected behavior

I would expect the model to be deployed successfully. IT downloaded weights, but later it complains for GPU allocation

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions