Skip to content

Conversation

@sahiljoshi515
Copy link
Collaborator

@sahiljoshi515 sahiljoshi515 commented Dec 9, 2025

🧱 Baseline Addition PR

πŸ“˜ Description

Baseline: Compressed Retrieval Attention (Do you think this is a better name?)

βœ… Baseline Checklist

(Follow the docs/baslines/developer_guide.md)

Item Status Notes (Note comments, remove placehoders)
🧩 1. baseline code [ Y] This is a new baseline.
βš™οΈ 4. Performance Profiled [ Y] See below
πŸ“– 5. Documentation Created [Y ] docs/baselines/Bucket_Attn.md

πŸ“Š Results Summary

quality

results in different setups

Our Results:

niah_multikey_3 qa_2
Our Code (10x sparsty,) 100 56

research performance (H200)

Hyper parameters: K=4, L = 30, top_t = 4

πŸ“ˆ Masker Overhead Analysis:

  • Sparse Attention: 4.400 ms
  • Baseline (no maskers): 0.299 ms
  • Masker overhead: 4.101 ms (1373.8%)
    ❌ High overhead - significant masker cost

Hyper parameters: K=4, L = 60, top_t = 4

πŸ“ˆ Masker Overhead Analysis:

  • Sparse Attention: 7.318 ms
  • Baseline (no maskers): 0.299 ms
  • Masker overhead: 7.019 ms (2350.5%)
    ❌ High overhead - significant masker cost

Sahil Joshi and others added 8 commits December 8, 2025 14:26
Updated terminology and clarified descriptions in the Bucket Attention documentation.
Removed 'RACE-Style Aggregation' from section title.
Added example configuration and experimental setup details for sparse attention.
@sahiljoshi515 sahiljoshi515 changed the title Bucket attn Baseline: Bucket Attention Dec 9, 2025
@sahiljoshi515 sahiljoshi515 requested a review from apd10 December 10, 2025 14:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants