Skip to content

Gemma3: CUDA error: an illegal memory access was encountered. #3321

@Behnamhb

Description

@Behnamhb

System Info

tgi version : 3.3.4
gemma3 : 27B
gpu : h100

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction

i think this issue related to flash attention v2 : Dao-AILab/flash-attention#1311 . newer flash attention version 3 was released(BETA) for H100 gpu. i think this is not a good practice for a big project like tgi to depends on selected version , 8 month ago you update the flash atten version to 2.6.1 . now we have 2.8.3 . i check your Dockerfile ,we have not an easy way to update the project .

Expected behavior

update flash attention

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions