Commit 152c73a

committed

Adds flash sparse attention interface

Enables calling sparse Flash attention CUDA kernels through custom autograd helpers. Registers fake implementations and padding logic so torch.compile stays compatible with varying head shapes.

1 parent 6bf01c4 commit 152c73aCopy full SHA for 152c73a

1 file changed

+760

-0

lines changed

flash_sparse_attn
- flash_sparse_attn_interface.py

1 file changed

+760

-0

lines changed

Comments

(0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit 152c73a

1 file changed

1 file changed

File tree

1 file changed

1 file changed

0 commit comments