[BugFix] Fix spec decoding max_tokens scheduling perf issue #29542

njhill · 2025-11-26T20:22:10Z

This affected performance with async scheduling since requests could be unscheduled before they were finished and have to be scheduled again for the last 1-2 tokens.

Thanks @benchislett for finding.

This affected performance with async scheduling since requests could be unscheduled before they were finished and have to be scheduled again for the last 1-2 tokens. Signed-off-by: Nick Hill <nhill@redhat.com>

chatgpt-codex-connector · 2025-11-26T20:22:18Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

gemini-code-assist

Code Review

This pull request addresses a performance issue related to speculative decoding with asynchronous scheduling. The change correctly adjusts the calculation of max_total_tokens to account for speculative placeholder tokens. This prevents requests from being prematurely unscheduled when they are near their max_tokens limit, which previously required them to be rescheduled to generate the final tokens. The logic is sound, ensuring that enough tokens can be scheduled for speculative verification while still respecting the model's maximum length. The implementation is clear and well-commented, and it should effectively resolve the described performance bottleneck.

Signed-off-by: Nick Hill <nhill@redhat.com>

njhill · 2025-11-27T21:06:19Z

vllm/v1/outputs.py

-            start = start_req_idx
-            end = end_req_idx
-            sliced_cu_num_generated_tokens = None
+    def slice_request(self, req_idx: int, num_positions: int):


Changing how logprobs are sliced to ensure we don't include extra positions beyond the number of output tokens.

[BugFix] Fix spec decoding max_tokens scheduling perf issue

b9a2afd

This affected performance with async scheduling since requests could be unscheduled before they were finished and have to be scheduled again for the last 1-2 tokens. Signed-off-by: Nick Hill <nhill@redhat.com>

njhill requested review from ApostaC, WoosukKwon, alexm-redhat, heheda12345, robertgshaw2-redhat and ywang96 as code owners November 26, 2025 20:22

mergify bot added the v1 label Nov 26, 2025

gemini-code-assist bot reviewed Nov 26, 2025

View reviewed changes

njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 26, 2025

fix slicing of trailing logprobs

9313e98

Signed-off-by: Nick Hill <nhill@redhat.com>

njhill requested a review from benchislett November 27, 2025 02:15

Merge branch 'main' into async-spec-dec-len

e613aef

njhill commented Nov 27, 2025

View reviewed changes

shyeh25 mentioned this pull request Nov 28, 2025

[Bug]: GPT-OSS-120B Eagle-3 High concurrency perf drop #29657

Open

1 task

DarkLight1337 approved these changes Nov 28, 2025

View reviewed changes

DarkLight1337 merged commit 8e7a891 into vllm-project:main Nov 28, 2025
44 checks passed

njhill deleted the async-spec-dec-len branch November 28, 2025 19:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[BugFix] Fix spec decoding max_tokens scheduling perf issue #29542

[BugFix] Fix spec decoding max_tokens scheduling perf issue #29542

Uh oh!

njhill commented Nov 26, 2025 •

edited by github-actions bot

Loading

Uh oh!

chatgpt-codex-connector bot commented Nov 26, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

njhill Nov 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

[BugFix] Fix spec decoding max_tokens scheduling perf issue #29542

[BugFix] Fix spec decoding max_tokens scheduling perf issue #29542

Uh oh!

Conversation

njhill commented Nov 26, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chatgpt-codex-connector bot commented Nov 26, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

njhill Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

njhill commented Nov 26, 2025 •

edited by github-actions bot

Loading