Skip to content

Conversation

@andy-yang-1
Copy link
Collaborator

@andy-yang-1 andy-yang-1 commented Aug 8, 2025

Add general sparse attention backend:

def sparse_attention_fwd(
    query: torch.Tensor,           # [B, H, D]
    key: torch.Tensor,             # [B, H // gqa, S, D]
    value: torch.Tensor,           # [B, H // gqa, S, D]
    sparse_list: torch.Tensor,     # [B, H, S]
    sparse_len: torch.Tensor,      # [B, H]
    block_seq: int = 256,
) -> torch.Tensor:

query[b,h] will access sparse_list[b, h, sparse_len[b, h]] tokens in key[b, h//gqa] and value[b, h//gqa]

Try:

python3 sparse_attention_hub/sparse_attention/efficient_attention/sparse_attention_backend.py

And general bias sparse attention backend:

def bias_sparse_attention_fwd(
    query: torch.Tensor,           # [B, H, D]
    key: torch.Tensor,             # [B, H // gqa, S, D]
    value: torch.Tensor,           # [B, H // gqa, S, D]
    sparse_list: torch.Tensor,     # [B, H, S]
    sparse_len: torch.Tensor,      # [B, H]
    weight_list: torch.Tensor,     # [B, H, S]
    block_seq: int = 256,
) -> torch.Tensor:

Try:

python3 sparse_attention_hub/sparse_attention/efficient_attention/bias_sparse_attention_backend.py

See #17

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants