Learning to Parallel Decoding

dLLM.decoding.demo.mov

🔥News

[2025-10-14] Dream integration coming soon!

💡Methods

1. Learning to Parallel Decoding

Extremely Greedy Parallel strategy: compares the predicted tokens with the reference answer and only remasks the tokens that do not match in these comparisons. Use a trained filter $f_\theta$ that simulate the Extremely Greedy Parallel strategy after each decoding step to select tokens and decide whether to remask them.

2. End-of-Text Prediction

Upon detection of an $[EoT]$ token, we throw away all the tokens after the $[EoT]$ token in the next diffusion step. When the specified output length is very long (for example, 1024), this method can significantly reduce computation by dynamically reducing the input size during the diffusion process.

🏎️Performance

Experiments on GSM8K, MATH, HumanEval, and MBPP show that our approach significantly improves throughput (by up to 22.58 times faster) while maintaining model accuracy, demonstrating outstanding generalization and practicality. Each method was evaluated using two generation lengths (256 and 1024) across four datasets. Performance is measured using three metrics: TPS (tokens/sec), speedup, and accuracy score. The highest throughput and speedup values for each configuration are highlighted in bold.

How to run

Install dependencies

pip install -r requirements.txt

Run the program
1. Test single questions
```
python generate.py
```
1. Run evaluations
```
./eval_llada.sh
```

Generate data for training

Download the FLAN dataset to small_model_train/flan
Run the following script

./generate_training_data.sh

Training Filter

You can directly use training.ipynb to train new filter models with your own datasets.

Acknowledgments

We would like to thank the authors of LLaDA and Fast-dLLM for their excellent work and open-source contributions.

Citation

If you find our work useful, please consider citing our paper.

@misc{bao2025learningparallelacceleratingdiffusion,
      title={Learning to Parallel: Accelerating Diffusion Large Language Models via Learnable Parallel Decoding}, 
      author={Wenrui Bao and Zhiben Chen and Dan Xu and Yuzhang Shang},
      year={2025},
      eprint={2509.25188},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2509.25188}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
dream		dream
llada		llada
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
train.ipynb		train.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Learning to Parallel Decoding

🔥News

💡Methods

1. Learning to Parallel Decoding

2. End-of-Text Prediction

🏎️Performance

How to run

Generate data for training

Training Filter

Acknowledgments

Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

ims-kdks/Learning-to-Parallel-Decoding

Folders and files

Latest commit

History

Repository files navigation

Learning to Parallel Decoding

🔥News

💡Methods

1. Learning to Parallel Decoding

2. End-of-Text Prediction

🏎️Performance

How to run

Generate data for training

Training Filter

Acknowledgments

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages