Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1 +1,3 @@
.env
dataset/*.parquet
dataset/submissions/*
15 changes: 15 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,3 +34,18 @@ python export.py

The script will create a directory at the specified output path containing the dataset in Parquet format. If `--output_dir` is not provided, it will save to `dataset` in the current working directory.

## Tests
The deduplication scripts can be tested by running
```bash
python test_dedup.py
# if you have pytest you can run
python -m pytest test_dedup.py -v
```
To test things we actually create a fake dataset. Here are the features of it
The test creates a 50-entry dataset with:
- **Exact duplicates**: First 5 entries use identical code
- **Fuzzy duplicates**: Next 5 entries use similar code with small variations
- **Multiple run modes**: `leaderboard`, `test`, `benchmark`
- **Mixed success states**: Both `True` and `False` values for `run_passed`
- **Realistic struct data**: Complex nested structures for `run_result`, `run_compilation`, `run_meta`, and `run_system_info`
- **Proper timestamps**: All timestamp fields include timezone information
8 changes: 7 additions & 1 deletion dataset/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,13 @@ tags:
license: mit
---

If you use GPUMODE/amd-kernels-2025 in your work, please cite:
This is the dataset that was created from the first and second AMD $100K kernel competitions, containing roughly 110K kernels for fp8-gemm, moe, mla, all2all, gemm+reducescatter, and allgather+gemm optimized to run on MI300. Learn more at gpumode.com/v2/news

To see the full list of kernel competitions we've ran and are running you can checkout https://github.com/gpu-mode/reference-kernels which also contains details on reference kernels and their input shapes and distributions

We are planning on adding kernels optimized for NVFP4 on Blackwell next

If you use this dataset in your work, please cite:

```bibtex
@inproceedings{
Expand Down
Loading