flash-algo
diff --git a/‎AUTHORS‎
Lines changed: 3 additions & 1 deletion b/‎AUTHORS‎
Lines changed: 3 additions & 1 deletion
diff --git a/‎CITATION.cff‎
Lines changed: 50 additions & 0 deletions b/‎CITATION.cff‎
Lines changed: 50 additions & 0 deletions
diff --git a/‎CODE_OF_CONDUCT.md‎
Lines changed: 132 additions & 0 deletions b/‎CODE_OF_CONDUCT.md‎
Lines changed: 132 additions & 0 deletions
diff --git a/‎CONTRIBUTING.md‎
Lines changed: 205 additions & 0 deletions b/‎CONTRIBUTING.md‎
Lines changed: 205 additions & 0 deletions
@@ -2,4 +2,6 @@ Jingze Shi, losercheems@gmail.com
 Yifan Wu, ywu012@connect.hkust-gz.edu.cn
 Bingheng Wu, wubingheng52136@gmail.com
 Yiran Peng, amagipeng@gmail.com
-Tri Dao, trid@cs.stanford.edu
+Liangdong Wang, wangliangdong@baai.ac.cn
+Guang Li, liuguang@baai.ac.cn
+Yuyu Luo, yuyuluo@hkust-gz.edu.cn
@@ -0,0 +1,50 @@
+cff-version: "1.2.0"
+date-released: 2025-06
+message: "If you use this software, please cite it using these metadata."
+title: "Flash Dynamic Mask Attention: Trainable Dynamic Mask Sparse Attention"
+url: "https://github.com/SmallDoges/flash-dmattn"
+authors:
+  - family-names: Shi
+    given-names: Jingze
+    email: losercheems@gmail.com
+  - family-names: Wu
+    given-names: Yifan
+    email: ywu012@connect.hkust-gz.edu.cn
+  - family-names: Wu
+    given-names: Bingheng
+    email: wubingheng52136@gmail.com
+  - family-names: Peng
+    given-names: Yiran
+    email: amagipeng@gmail.com
+  - family-names: Wang
+    given-names: Liangdong
+    email: wangliangdong@baai.ac.cn
+  - family-names: Liu
+    given-names: Guang
+    email: liuguang@baai.ac.cn
+  - family-names: Luo
+    given-names: Yuyu
+    email: yuyuluo@hkust-gz.edu.cn
+preferred-citation:
+  type: article
+  authors:
+    - family-names: Shi
+      given-names: Jingze
+    - family-names: Wu
+      given-names: Yifan
+    - family-names: Wu
+      given-names: Bingheng
+    - family-names: Peng
+      given-names: Yiran
+    - family-names: Wang
+      given-names: Liangdong
+    - family-names: Liu
+      given-names: Guang
+    - family-names: Luo
+      given-names: Yuyu
+  title: "Trainable Dynamic Mask Sparse Attention"
+  year: 2025
+  url: "https://arxiv.org/abs/2508.02124"
+  doi: "10.48550/arXiv.2508.02124"
+  journal: "arXiv preprint"
+  volume: "arXiv:2508.02124"
@@ -0,0 +1,132 @@
+# Contributor Covenant Code of Conduct
+
+## Our Pledge
+
+We as members, contributors, and leaders pledge to make participation in our
+community a harassment-free experience for everyone, regardless of age, body
+size, visible or invisible disability, ethnicity, sex characteristics, gender
+identity and expression, level of experience, education, socio-economic status,
+nationality, personal appearance, race, caste, color, religion, or sexual
+identity and orientation.
+
+We pledge to act and interact in ways that contribute to an open, welcoming,
+diverse, inclusive, and healthy community.
+
+## Our Standards
+
+Examples of behavior that contributes to a positive environment for our
+community include:
+
+* Demonstrating empathy and kindness toward other people
+* Being respectful of differing opinions, viewpoints, and experiences
+* Giving and gracefully accepting constructive feedback
+* Accepting responsibility and apologizing to those affected by our mistakes,
+  and learning from the experience
+* Focusing on what is best not just for us as individuals, but for the overall
+  community
+
+Examples of unacceptable behavior include:
+
+* The use of sexualized language or imagery, and sexual attention or advances of
+  any kind
+* Trolling, insulting or derogatory comments, and personal or political attacks
+* Public or private harassment
+* Publishing others' private information, such as a physical or email address,
+  without their explicit permission
+* Other conduct which could reasonably be considered inappropriate in a
+  professional setting
+
+## Enforcement Responsibilities
+
+Community leaders are responsible for clarifying and enforcing our standards of
+acceptable behavior and will take appropriate and fair corrective action in
+response to any behavior that they deem inappropriate, threatening, offensive,
+or harmful.
+
+Community leaders have the right and responsibility to remove, edit, or reject
+comments, commits, code, wiki edits, issues, and other contributions that are
+not aligned to this Code of Conduct, and will communicate reasons for moderation
+decisions when appropriate.
+
+## Scope
+
+This Code of Conduct applies within all community spaces, and also applies when
+an individual is officially representing the community in public spaces.
+Examples of representing our community include using an official e-mail address,
+posting via an official social media account, or acting as an appointed
+representative at an online or offline event.
+
+## Enforcement
+
+Instances of abusive, harassing, or otherwise unacceptable behavior may be
+reported to the community leaders responsible for enforcement at
+losercheems@gmail.com.
+All complaints will be reviewed and investigated promptly and fairly.
+
+All community leaders are obligated to respect the privacy and security of the
+reporter of any incident.
+
+## Enforcement Guidelines
+
+Community leaders will follow these Community Impact Guidelines in determining
+the consequences for any action they deem in violation of this Code of Conduct:
+
+### 1. Correction
+
+**Community Impact**: Use of inappropriate language or other behavior deemed
+unprofessional or unwelcome in the community.
+
+**Consequence**: A private, written warning from community leaders, providing
+clarity around the nature of the violation and an explanation of why the
+behavior was inappropriate. A public apology may be requested.
+
+### 2. Warning
+
+**Community Impact**: A violation through a single incident or series of
+actions.
+
+**Consequence**: A warning with consequences for continued behavior. No
+interaction with the people involved, including unsolicited interaction with
+those enforcing the Code of Conduct, for a specified period of time. This
+includes avoiding interactions in community spaces as well as external channels
+like social media. Violating these terms may lead to a temporary or permanent
+ban.
+
+### 3. Temporary Ban
+
+**Community Impact**: A serious violation of community standards, including
+sustained inappropriate behavior.
+
+**Consequence**: A temporary ban from any sort of interaction or public
+communication with the community for a specified period of time. No public or
+private interaction with the people involved, including unsolicited interaction
+with those enforcing the Code of Conduct, is allowed during this period.
+Violating these terms may lead to a permanent ban.
+
+### 4. Permanent Ban
+
+**Community Impact**: Demonstrating a pattern of violation of community
+standards, including sustained inappropriate behavior, harassment of an
+individual, or aggression toward or disparagement of classes of individuals.
+
+**Consequence**: A permanent ban from any sort of public interaction within the
+community.
+
+## Attribution
+
+This Code of Conduct is adapted from the [Contributor Covenant][homepage],
+version 2.1, available at
+[https://www.contributor-covenant.org/version/2/1/code_of_conduct.html][v2.1].
+
+Community Impact Guidelines were inspired by
+[Mozilla's code of conduct enforcement ladder][Mozilla CoC].
+
+For answers to common questions about this code of conduct, see the FAQ at
+[https://www.contributor-covenant.org/faq][FAQ]. Translations are available at
+[https://www.contributor-covenant.org/translations][translations].
+
+[homepage]: https://www.contributor-covenant.org
+[v2.1]: https://www.contributor-covenant.org/version/2/1/code_of_conduct.html
+[Mozilla CoC]: https://github.com/mozilla/diversity
+[FAQ]: https://www.contributor-covenant.org/faq
+[translations]: https://www.contributor-covenant.org/translations
@@ -0,0 +1,205 @@
+# Contributing to Flash Dynamic Mask Attention
+
+Everyone is welcome to contribute, and we value everybody's contribution. Code contributions are not the only way to help the community. Answering questions, helping others, and improving the documentation are also immensely valuable.
+
+It also helps us if you spread the word! Reference the library in blog posts about the awesome projects it made possible, shout out on Twitter every time it has helped you, or simply ⭐️ the repository to say thank you.
+
+However you choose to contribute, please be mindful and respect our [code of conduct](https://github.com/SmallDoges/flash-dmattn/blob/main/CODE_OF_CONDUCT.md).
+
+## Ways to contribute
+
+There are several ways you can contribute to Flash-DMA:
+
+* Fix outstanding issues with the existing code.
+* Submit issues related to bugs or desired new features.
+* Implement new attention mechanisms or optimizations.
+* Contribute to the examples, benchmarks, or documentation.
+* Improve CUDA kernel performance.
+
+If you don't know where to start, there is a special [Good First Issue](https://github.com/SmallDoges/flash-dmattn/contribute) listing. It will give you a list of open issues that are beginner-friendly and help you start contributing to open-source.
+
+> All contributions are equally valuable to the community. 🥰
+
+## Fixing outstanding issues
+
+If you notice an issue with the existing code and have a fix in mind, feel free to [start contributing](#create-a-pull-request) and open a Pull Request!
+
+## Submitting a bug-related issue or feature request
+
+Do your best to follow these guidelines when submitting a bug-related issue or a feature request. It will make it easier for us to come back to you quickly and with good feedback.
+
+### Did you find a bug?
+
+The Flash-DMA library is robust and reliable thanks to users who report the problems they encounter.
+
+Before you report an issue, we would really appreciate it if you could **make sure the bug was not already reported** (use the search bar on GitHub under Issues). Your issue should also be related to bugs in the library itself, and not your code.
+
+Once you've confirmed the bug hasn't already been reported, please include the following information in your issue so we can quickly resolve it:
+
+* Your **OS type and version** and **Python**, **PyTorch**, and **CUDA** versions.
+* Your **GPU model** and **CUDA Compute Capability**.
+* A short, self-contained, code snippet that allows us to reproduce the bug in less than 30s.
+* The *full* traceback if an exception is raised.
+* Attach any other additional information, like screenshots, you think may help.
+
+To get the environment information automatically, run:
+
+```bash
+python -c "import torch; print(f'PyTorch: {torch.__version__}'); print(f'CUDA: {torch.version.cuda}'); print(f'GPU: {torch.cuda.get_device_name() if torch.cuda.is_available() else \"None\"}')"
+```
+
+### Do you want a new feature?
+
+If there is a new feature you'd like to see in Flash-DMA, please open an issue and describe:
+
+1. What is the *motivation* behind this feature? Is it related to performance optimization, memory efficiency, or new attention mechanisms?
+
+2. Describe your requested feature in as much detail as possible. The more you can tell us about it, the better we'll be able to help you.
+
+3. Provide a *code snippet* that demonstrates the feature's usage.
+
+4. If the feature is related to a paper, please include a link.
+
+## Do you want to implement a new attention mechanism?
+
+New attention mechanisms and optimizations are constantly being developed. If you want to implement a new mechanism, please provide:
+
+* A short description of the attention mechanism and a link to the paper.
+* Link to the implementation if it is open-sourced.
+* Performance benchmarks compared to existing methods.
+* CUDA compute capability requirements.
+
+## Do you want to add documentation?
+
+We're always looking for improvements to the documentation that make it more clear and accurate. Please let us know how the documentation can be improved such as typos and any content that is missing, unclear or inaccurate.
+
+## Create a Pull Request
+
+Before writing any code, we strongly advise you to search through the existing PRs or issues to make sure nobody is already working on the same thing.
+
+You will need basic `git` proficiency to contribute to Flash-DMA. You'll need **Python 3.8+** and **CUDA 11.8+** to contribute.
+
+### Development Setup
+
+1. Fork the [repository](https://github.com/SmallDoges/flash-dmattn) by clicking on the **Fork** button.
+
+2. Clone your fork to your local disk, and add the base repository as a remote:
+
+   ```bash
+   git clone https://github.com/<your Github handle>/flash-dmattn.git
+   cd flash-dmattn
+   git remote add upstream https://github.com/SmallDoges/flash-dmattn.git
+   ```
+
+3. Create a new branch to hold your development changes:
+
+   ```bash
+   git checkout -b a-descriptive-name-for-my-changes
+   ```
+
+   🚨 **Do not** work on the `main` branch!
+
+4. Set up a development environment:
+
+   ```bash
+   # Ensure CUDA environment is properly set up
+   export CUDA_HOME=/usr/local/cuda  # Adjust path as needed
+   
+   # Install in development mode
+   pip install -e .
+   
+   # Install development dependencies
+   pip install pytest numpy
+   ```
+
+5. Develop the features in your branch.
+
+   As you work on your code, you should make sure the test suite passes:
+
+   ```bash
+   python -m pytest tests/ -v
+   ```
+
+   Flash-DMA also includes performance benchmarks. Run them to ensure your changes don't regress performance:
+
+   ```bash
+   python benchmarks/forward_performance.py
+   python benchmarks/forward_equivalence.py
+   ```
+
+   For CUDA development, ensure your changes compile across supported architectures:
+
+   ```bash
+   python setup.py build_ext --inplace
+   ```
+
+6. Once you're happy with your changes, add changed files using `git add` and record your changes with `git commit`:
+
+   ```bash
+   git add .
+   git commit -m "A descriptive commit message"
+   ```
+
+   Please write [good commit messages](https://chris.beams.io/posts/git-commit/).
+
+7. Go to your fork on GitHub and click on **Pull Request** to open a pull request.
+
+### Pull request checklist
+
+☐ The pull request title should summarize your contribution.<br>
+☐ If your pull request addresses an issue, please mention the issue number in the pull request description to make sure they are linked.<br>
+☐ To indicate a work in progress please prefix the title with `[WIP]`.<br>
+☐ Make sure existing tests pass.<br>
+☐ If adding a new feature, also add tests for it.<br>
+☐ If implementing new CUDA kernels, ensure they work across all supported compute capabilities (SM 8.0+).<br>
+☐ All public methods must have informative docstrings.<br>
+☐ Performance benchmarks should not regress significantly.<br>
+
+### Tests
+
+An extensive test suite is included to test the library behavior and performance. Tests can be found in the [tests](https://github.com/SmallDoges/flash-dmattn/tree/main/tests) folder and benchmarks in the [benchmarks](https://github.com/SmallDoges/flash-dmattn/tree/main/benchmarks) folder.
+
+We use `pytest` for testing. From the root of the repository, run:
+
+```bash
+python -m pytest tests/ -v
+```
+
+For performance testing:
+
+```bash
+python -m pytest benchmarks/ -v
+```
+
+### CUDA Development Guidelines
+
+When contributing CUDA code:
+
+1. **Test across architectures**: Ensure your code works on SM 8.0, 9.0, and 10.0.
+2. **Memory efficiency**: Profile memory usage and ensure no memory leaks.
+3. **Performance**: Benchmark against existing implementations.
+4. **Documentation**: Document kernel parameters and expected performance characteristics.
+
+### Code Style
+
+We follow standard Python code style guidelines:
+
+* Use descriptive variable names
+* Add type hints where applicable
+* Follow PEP 8 guidelines
+* Add docstrings to all public functions
+
+For CUDA code:
+* Use clear variable names
+* Comment complex kernel logic
+* Follow NVIDIA CUDA best practices
+
+## Security
+
+If you discover a security vulnerability, please send an e-mail to the maintainers. All security vulnerabilities will be promptly addressed.
+
+## Questions?
+
+If you have questions about contributing, feel free to ask in the [GitHub Discussions](https://github.com/SmallDoges/flash-dmattn/discussions) or open an issue.
+
+Thank you for contributing to Flash Dynamic Mask Attention! 🚀