-
Notifications
You must be signed in to change notification settings - Fork 660
Open
Labels
Description
MileStone 1: Support Dense Model Deterministic Inference
- Batch Invariant ops
- Refer to implement the Paddle version of
batch_invariant_ops - Other ops
- Refer to implement the Paddle version of
- Build batch invariance testing scripts for request level and operator level
- Cascade Append Attention Backend、FA3 Support Batch Invariant
- Support Dense Model
MileStone 2: Support CUDAGraph、ChunkPrefill、PrefixCache、Moe Model
- Support CUDAGraph
- Support ChunkPrefill
- Support PrefixCache
- Support Moe Model
MileStone 3: Support SpeculativeDecoding、 Parallelism、Quantization
- Support SpecDecoding
- MTP
- Parallelism
- TP
- EP
- Quantization
- BlockWise FP8
MileStone 4: Support RL trainning
- RL
Related resources
- Blog: https://thinkingmachines.ai/blog/defeating-nondeterminism-in-llm-inference/
- batch_invariant_ops: https://github.com/thinking-machines-lab/batch_invariant_ops
- SGLang : [Feature] Support deterministic inference with Batch Invariant Ops sgl-project/sglang#10278
- vLLM:Batch-invariant Inference
- slime: [FEAT] Deterministic rollout THUDM/slime#361
gongshaotian
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Todo