@@ -179,34 +179,29 @@ python -c "import flash_dma_cuda; print('✅ Flash DMA CUDA extension imported s
179179
180180** Note** : Flash Dynamic Mask Attention requires CUDA compute capability 8.0+ for optimal performance. Earlier architectures are not supported.
181181
182+
182183## Benchmarking
183184
184185Flash-DMA provides comprehensive benchmarking tools to evaluate performance across different configurations:
185186
186187### Forward Pass Equivalence
187188``` bash
188- python benchmarks/benchmark_forward_equivalence .py
189+ python benchmarks/forward_equivalence .py
189190```
190191Validates numerical consistency between Python reference and CUDA implementation.
191192
192193### Performance Benchmarking
193194``` bash
194- python benchmarks/benchmark_forward_performance .py
195+ python benchmarks/forward_performance .py
195196```
196- Compares Flash-DMA against standard Flash Attention across various sequence lengths and batch sizes.
197+ Compares Flash-DMA against standard SDPA across various sequence lengths and batch sizes.
197198
198199### Gradient Computation
199200``` bash
200- python benchmarks/benchmark_grad .py
201+ python benchmarks/grad_equivalence .py
201202```
202203Tests backward pass implementation and gradient equivalence.
203204
204- ### Multi-Query Associative Recall
205- ``` bash
206- python benchmarks/benchmark_mqar.py
207- ```
208- Evaluates performance on long-range reasoning tasks.
209-
210205
211206## Troubleshooting
212207
@@ -254,10 +249,37 @@ print_memory_stats()
254249torch.cuda.empty_cache()
255250```
256251
252+
253+ ## Contributing
254+
255+ We welcome contributions from the community! Flash-DMA is an open-source project and we value all types of contributions.
256+
257+ ### How to Contribute
258+
259+ - ** Report bugs** : Found a bug? Please [ open an issue] ( https://github.com/SmallDoges/flash-dmattn/issues/new/choose )
260+ - ** Request features** : Have an idea for improvement? [ Let us know] ( https://github.com/SmallDoges/flash-dmattn/issues/new/choose )
261+ - ** Submit code** : Ready to contribute code? Check our [ Contributing Guide] ( CONTRIBUTING.md )
262+ - ** Improve docs** : Help us make the documentation better
263+
264+ ### Quick Start for Contributors
265+
266+ 1 . Fork the repository
267+ 2 . Create a feature branch: ` git checkout -b feature-name `
268+ 3 . Make your changes and test them
269+ 4 . Submit a pull request
270+
271+ For detailed instructions, see our [ Contributing Guide] ( CONTRIBUTING.md ) .
272+
273+ ### Code of Conduct
274+
275+ This project follows the [ Contributor Covenant Code of Conduct] ( CODE_OF_CONDUCT.md ) . By participating, you are expected to uphold this code.
276+
277+
257278## License
258279
259280This project is licensed under the BSD 3-Clause License. See [ LICENSE] ( LICENSE ) for details.
260281
282+
261283## Citation
262284
263285If you use Flash-DMA in your research, please cite:
@@ -274,6 +296,7 @@ If you use Flash-DMA in your research, please cite:
274296}
275297```
276298
299+
277300## Acknowledgments
278301
279302This project builds upon and integrates several excellent works:
0 commit comments