Commit 1cbd2f9
committed
Unifies attention kernels with bias+mask windowing
Refactors attention paths to accept external attention bias and boolean causal mask, replacing zoh/dt-based masking and cache-position logic. Introduces a generic mask preparer that applies top-k windowing (optionally causal-aware), and standardizes interfaces across SDPA, Flash, Triton, and Flex implementations.
Removes zoh/dt projection and related params, repeats KV artifacts for GQA, and consistently applies additive masks. Updates benchmarks to generate bias/mask inputs, rename keep_window_size to window_size, adjust head dims, and harmonize result handling and output labeling.
Improves API consistency, simplifies experimentation with custom biases, and aligns masking semantics across kernels for more reliable benchmarking.1 parent bcdf7ad commit 1cbd2f9
1 file changed
+199
-251
lines changed
0 commit comments