Gemma3: CUDA error: an illegal memory access was encountered.

### System Info

tgi version : 3.3.4
gemma3 : 27B 
gpu : h100



### Information

- [x] Docker
- [ ] The CLI directly

### Tasks

- [x] An officially supported command
- [ ] My own modifications

### Reproduction

i think this issue related to flash attention v2 : https://github.com/Dao-AILab/flash-attention/issues/1311 . newer flash attention version 3 was released(BETA) for H100 gpu. i think this is not a good practice for a big project like tgi to depends on selected version , 8 month ago you update the flash atten version to 2.6.1 . now we have 2.8.3 . i check your Dockerfile ,we have not an easy way to update the project . 

### Expected behavior

update flash attention

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Gemma3: CUDA error: an illegal memory access was encountered. #3321

System Info

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Gemma3: CUDA error: an illegal memory access was encountered. #3321

Description

System Info

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions