Deploying Gemma-3-1b-it with NVIDIA GPU P2000 - gets error

### System Info

I tried to deploy a small model in docker using the below commands as suggested
- Using WSL2
- Docker
- NVIDIA GPU P2000
- Cuda 12.7
- nvidia-smi is shown the GPU

model=google/gemma-3-1b-it
volume=$PWD/data

docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data \
    -e HF_HOME=/data/huggingface \
    -e HF_TOKEN=<token> \
    ghcr.io/huggingface/text-generation-inference:3.3.4 --model-id $model


****ERROR***

│   5206 │   │   │   │   return importlib.import_module(                       │
│   5207 │   │   │   │   │   "torch._decomp.decompositions"                    │
│   5208 │   │   │   │   )._replication_pad(input, pad)                        │
│ ❱ 5209 │   return torch._C._nn.pad(input, pad, mode, value)                  │
│   5210                                                                       │
│   5211                                                                       │
│   5212 # TODO: Fix via https://github.com/pytorch/pytorch/issues/75798       │
│                                                                              │
│ ╭───────────────────────────────── locals ─────────────────────────────────╮ │
│ │ input = <repr-error 'CUDA error: no kernel image is available for        │ │
│ │         execution on the device\nCUDA kernel errors might be             │ │
│ │         asynchronously reported at some other API call, so the           │ │
│ │         stacktrace below might be incorrect.\nFor debugging consider     │ │
│ │         passing CUDA_LAUNCH_BLOCKING=1\nCompile with                     │ │
│ │         `TORCH_USE_CUDA_DSA` to enable device-side assertions.\n'>       │ │
│ │  mode = 'constant'                                                       │ │
│ │   pad = (0, 0, 0, 1)                                                     │ │
│ │ value = None                                                             │ │
│ ╰──────────────────────────────────────────────────────────────────────────╯ │
╰──────────────────────────────────────────────────────────────────────────────╯
RuntimeError: CUDA error: no kernel image is available for execution on the
device
CUDA kernel errors might be asynchronously reported at some other API call, so
the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
 rank=0
2025-07-27T16:25:13.820012Z ERROR text_generation_launcher: Shard 0 failed to start
2025-07-27T16:25:13.820095Z  INFO text_generation_launcher: Shutting down shards
Error: ShardCannotStart

***** CUDA version/toolkit ********

user@DESKTOP-I0E79EK:~$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Fri_Jan__6_16:45:21_PST_2023
Cuda compilation tools, release 12.0, V12.0.140
Build cuda_12.0.r12.0/compiler.32267302_0

user@DESKTOP-I0E79EK:~$ nvidia-smi
Sun Jul 27 17:27:49 2025
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.247.01             Driver Version: 566.24       CUDA Version: 12.7     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Quadro P2000                   On  | 00000000:01:00.0 Off |                  N/A |
| N/A   52C    P8              N/A / 5... |      0MiB /  4096MiB |      1%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

### Information

- [ ] Docker
- [ ] The CLI directly

### Tasks

- [ ] An officially supported command
- [ ] My own modifications

### Reproduction

1. Check GPU P2000 is assigned passthrough to Ubuntu VM
2.  Deploy container with TGI image
3.

### Expected behavior

I would expect the model to be deployed successfully. IT downloaded weights, but later it complains for GPU allocation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Deploying Gemma-3-1b-it with NVIDIA GPU P2000 - gets error #3305

System Info

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Deploying Gemma-3-1b-it with NVIDIA GPU P2000 - gets error #3305

Description

System Info

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions