Skip to content

Commit 69de751

Browse files
authored
Adds GitHub issue templates and PR template
2 parents 5ba76a0 + c0fd207 commit 69de751

File tree

4 files changed

+229
-0
lines changed

4 files changed

+229
-0
lines changed
Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
---
2+
name: Bug report
3+
about: Create a report to help us improve Flash-DMA
4+
title: '[BUG] '
5+
labels: 'bug'
6+
assignees: ''
7+
8+
---
9+
10+
**Describe the bug**
11+
A clear and concise description of what the bug is.
12+
13+
**To Reproduce**
14+
Steps to reproduce the behavior:
15+
1. Import flash_dmattn
16+
2. Run the following code:
17+
```python
18+
# Paste your code here
19+
```
20+
3. See error
21+
22+
**Expected behavior**
23+
A clear and concise description of what you expected to happen.
24+
25+
**Environment Information**
26+
Please run the following and paste the output:
27+
```bash
28+
python -c "import torch; print(f'PyTorch: {torch.__version__}'); print(f'CUDA: {torch.version.cuda}'); print(f'GPU: {torch.cuda.get_device_name() if torch.cuda.is_available() else \"None\"}')"
29+
```
30+
31+
**Additional context**
32+
- OS: [e.g. Ubuntu 20.04, Windows 10, macOS 12]
33+
- Python version: [e.g. 3.9.7]
34+
- Flash-DMA version: [e.g. 0.1.0]
35+
- CUDA Compute Capability: [e.g. 8.6]
36+
37+
**Error traceback**
38+
If applicable, add the full error traceback:
39+
```
40+
Paste the full traceback here
41+
```
42+
43+
**Debugging Information**
44+
Add any other context about the problem here, including:
45+
- Sequence lengths and batch sizes you're using
46+
- Whether this works with standard PyTorch SDPA
47+
- Any custom modifications to the code
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
---
2+
name: Feature request
3+
about: Suggest an idea for Flash-DMA
4+
title: '[FEATURE] '
5+
labels: 'enhancement'
6+
assignees: ''
7+
8+
---
9+
10+
**Is your feature request related to a problem? Please describe.**
11+
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
12+
13+
**Describe the solution you'd like**
14+
A clear and concise description of what you want to happen.
15+
16+
**Describe alternatives you've considered**
17+
A clear and concise description of any alternative solutions or features you've considered.
18+
19+
**Implementation details**
20+
If you have thoughts on implementation:
21+
- Would this require CUDA kernel changes?
22+
- Does this affect the Python API?
23+
- Are there performance implications?
24+
- Any compatibility concerns with different GPU architectures?
25+
26+
**Use case**
27+
Describe your specific use case:
28+
- What sequence lengths are you working with?
29+
- What is your target application (e.g., long document processing, code generation)?
30+
- How would this feature improve your workflow?
31+
32+
**Additional context**
33+
Add any other context or screenshots about the feature request here.
34+
35+
**Related work**
36+
If this feature is inspired by a paper or existing implementation, please provide:
37+
- Link to paper/implementation
38+
- Brief explanation of the technique
39+
- Why it would be valuable for Flash-DMA users
Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
---
2+
name: Performance issue
3+
about: Report performance problems or optimization opportunities
4+
title: '[PERFORMANCE] '
5+
labels: 'performance'
6+
assignees: ''
7+
8+
---
9+
10+
**Performance Issue Description**
11+
Describe the performance problem you're experiencing.
12+
13+
**Current Performance**
14+
Please provide benchmark results:
15+
- Sequence length: [e.g., 4096, 8192, 16384]
16+
- Batch size: [e.g., 1, 2, 4]
17+
- Number of heads: [e.g., 16, 32]
18+
- Head dimension: [e.g., 64, 128]
19+
- Current speed: [e.g., 15.2 ms/iteration]
20+
- Memory usage: [e.g., 8.5 GB]
21+
22+
**Expected Performance**
23+
What performance would you expect, and why?
24+
- Expected speed: [e.g., <10 ms/iteration]
25+
- Comparison baseline: [e.g., PyTorch SDPA, Flash Attention]
26+
27+
**Environment Information**
28+
```bash
29+
python -c "import torch; print(f'PyTorch: {torch.__version__}'); print(f'CUDA: {torch.version.cuda}'); print(f'GPU: {torch.cuda.get_device_name() if torch.cuda.is_available() else \"None\"}')"
30+
```
31+
32+
**Benchmark Code**
33+
Provide the code you used for benchmarking:
34+
```python
35+
# Paste your benchmark code here
36+
```
37+
38+
**Profiling Information**
39+
If you have profiling data (from nsys, nvprof, or PyTorch profiler), please include relevant excerpts.
40+
41+
**System Information**
42+
- GPU model and memory: [e.g., RTX 4090 24GB]
43+
- CUDA Compute Capability: [e.g., 8.9]
44+
- CPU: [e.g., Intel i9-12900K]
45+
- RAM: [e.g., 32GB DDR4]
46+
47+
**Additional Context**
48+
- Is this a regression from a previous version?
49+
- Have you tried different batch sizes or sequence lengths?
50+
- Any specific attention patterns (causal, full, custom masks)?

.github/pull_request_template.md

Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
# Pull Request Template
2+
3+
## Description
4+
Please provide a clear and concise description of your changes.
5+
6+
## Type of Change
7+
Please check the relevant option(s):
8+
9+
- [ ] Bug fix (non-breaking change which fixes an issue)
10+
- [ ] New feature (non-breaking change which adds functionality)
11+
- [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
12+
- [ ] Documentation update
13+
- [ ] Performance optimization
14+
- [ ] CUDA kernel improvement
15+
- [ ] Code refactoring
16+
17+
## Related Issues
18+
Please link any related issues:
19+
- Fixes #(issue number)
20+
- Related to #(issue number)
21+
22+
## Changes Made
23+
Please describe the changes you made:
24+
25+
### Code Changes
26+
- [ ] Modified Python API
27+
- [ ] Updated CUDA kernels
28+
- [ ] Changed build system
29+
- [ ] Updated dependencies
30+
31+
### Documentation
32+
- [ ] Updated README
33+
- [ ] Updated API documentation
34+
- [ ] Added examples
35+
- [ ] Updated benchmarks
36+
37+
## Testing
38+
Please describe the tests you ran to verify your changes:
39+
40+
- [ ] Existing tests pass: `python -m pytest tests/ -v`
41+
- [ ] Added new tests for new functionality
42+
- [ ] Benchmarks show no performance regression
43+
- [ ] Tested on multiple GPU architectures (if applicable)
44+
45+
### Test Configuration
46+
- OS: [e.g., Ubuntu 20.04]
47+
- Python: [e.g., 3.9.7]
48+
- PyTorch: [e.g., 2.1.0]
49+
- CUDA: [e.g., 11.8]
50+
- GPU: [e.g., RTX 4090]
51+
52+
## Performance Impact
53+
If this change affects performance, please provide benchmarks:
54+
55+
### Before
56+
```
57+
# Benchmark results before your changes
58+
```
59+
60+
### After
61+
```
62+
# Benchmark results after your changes
63+
```
64+
65+
## Breaking Changes
66+
If this PR introduces breaking changes, please describe:
67+
- What breaks
68+
- How users can migrate their code
69+
- Why the breaking change is necessary
70+
71+
## Checklist
72+
Please check all that apply:
73+
74+
- [ ] My code follows the project's style guidelines
75+
- [ ] I have performed a self-review of my own code
76+
- [ ] I have commented my code, particularly in hard-to-understand areas
77+
- [ ] I have made corresponding changes to the documentation
78+
- [ ] My changes generate no new warnings
79+
- [ ] I have added tests that prove my fix is effective or that my feature works
80+
- [ ] New and existing unit tests pass locally with my changes
81+
- [ ] Any dependent changes have been merged and published
82+
83+
### CUDA-specific (if applicable)
84+
- [ ] CUDA kernels compile without warnings
85+
- [ ] Tested on SM 8.0+ architectures
86+
- [ ] Memory usage has been profiled
87+
- [ ] No memory leaks detected
88+
89+
## Additional Notes
90+
Any additional information that reviewers should know:
91+
92+
## Screenshots (if applicable)
93+
If your changes include visual elements or performance improvements, please add screenshots or graphs.

0 commit comments

Comments
 (0)