Commit 152c73a
committed
Adds flash sparse attention interface
Enables calling sparse Flash attention CUDA kernels through custom autograd helpers.
Registers fake implementations and padding logic so torch.compile stays compatible with varying head shapes.1 parent 6bf01c4 commit 152c73a
1 file changed
+760
-0
lines changed
0 commit comments