Skip to content

Commit a724423

Browse files
committed
Updates README with streamlined benchmarks and adds contributing section
Simplifies benchmark script names and descriptions for better clarity. Removes MQAR benchmark section as it's no longer part of the core suite. Adds comprehensive contributing guidelines including bug reporting, feature requests, and development workflow to encourage community participation.
1 parent 630ce7a commit a724423

File tree

1 file changed

+33
-10
lines changed

1 file changed

+33
-10
lines changed

README.md

Lines changed: 33 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -179,34 +179,29 @@ python -c "import flash_dma_cuda; print('✅ Flash DMA CUDA extension imported s
179179

180180
**Note**: Flash Dynamic Mask Attention requires CUDA compute capability 8.0+ for optimal performance. Earlier architectures are not supported.
181181

182+
182183
## Benchmarking
183184

184185
Flash-DMA provides comprehensive benchmarking tools to evaluate performance across different configurations:
185186

186187
### Forward Pass Equivalence
187188
```bash
188-
python benchmarks/benchmark_forward_equivalence.py
189+
python benchmarks/forward_equivalence.py
189190
```
190191
Validates numerical consistency between Python reference and CUDA implementation.
191192

192193
### Performance Benchmarking
193194
```bash
194-
python benchmarks/benchmark_forward_performance.py
195+
python benchmarks/forward_performance.py
195196
```
196-
Compares Flash-DMA against standard Flash Attention across various sequence lengths and batch sizes.
197+
Compares Flash-DMA against standard SDPA across various sequence lengths and batch sizes.
197198

198199
### Gradient Computation
199200
```bash
200-
python benchmarks/benchmark_grad.py
201+
python benchmarks/grad_equivalence.py
201202
```
202203
Tests backward pass implementation and gradient equivalence.
203204

204-
### Multi-Query Associative Recall
205-
```bash
206-
python benchmarks/benchmark_mqar.py
207-
```
208-
Evaluates performance on long-range reasoning tasks.
209-
210205

211206
## Troubleshooting
212207

@@ -254,10 +249,37 @@ print_memory_stats()
254249
torch.cuda.empty_cache()
255250
```
256251

252+
253+
## Contributing
254+
255+
We welcome contributions from the community! Flash-DMA is an open-source project and we value all types of contributions.
256+
257+
### How to Contribute
258+
259+
- **Report bugs**: Found a bug? Please [open an issue](https://github.com/SmallDoges/flash-dmattn/issues/new/choose)
260+
- **Request features**: Have an idea for improvement? [Let us know](https://github.com/SmallDoges/flash-dmattn/issues/new/choose)
261+
- **Submit code**: Ready to contribute code? Check our [Contributing Guide](CONTRIBUTING.md)
262+
- **Improve docs**: Help us make the documentation better
263+
264+
### Quick Start for Contributors
265+
266+
1. Fork the repository
267+
2. Create a feature branch: `git checkout -b feature-name`
268+
3. Make your changes and test them
269+
4. Submit a pull request
270+
271+
For detailed instructions, see our [Contributing Guide](CONTRIBUTING.md).
272+
273+
### Code of Conduct
274+
275+
This project follows the [Contributor Covenant Code of Conduct](CODE_OF_CONDUCT.md). By participating, you are expected to uphold this code.
276+
277+
257278
## License
258279

259280
This project is licensed under the BSD 3-Clause License. See [LICENSE](LICENSE) for details.
260281

282+
261283
## Citation
262284

263285
If you use Flash-DMA in your research, please cite:
@@ -274,6 +296,7 @@ If you use Flash-DMA in your research, please cite:
274296
}
275297
```
276298

299+
277300
## Acknowledgments
278301

279302
This project builds upon and integrates several excellent works:

0 commit comments

Comments
 (0)