System Info
tgi version : 3.3.4
gemma3 : 27B
gpu : h100
Information
Tasks
Reproduction
i think this issue related to flash attention v2 : Dao-AILab/flash-attention#1311 . newer flash attention version 3 was released(BETA) for H100 gpu. i think this is not a good practice for a big project like tgi to depends on selected version , 8 month ago you update the flash atten version to 2.6.1 . now we have 2.8.3 . i check your Dockerfile ,we have not an easy way to update the project .
Expected behavior
update flash attention