Skip to content

Conversation

@ST-XX
Copy link
Collaborator

@ST-XX ST-XX commented Nov 20, 2025

Motivation

去掉以前的打开受限解码,就会回退ENABLE_V1_KVCACHE_SCHEDULER=0 的逻辑。
支持受限解码+ V1 调度

Modifications

  1. 去掉回退逻辑
  2. V1调度添加受限解码后端初始化
  3. V1调度逻辑的 P阶段完成识别逻辑、支持Chunk Prefill
  4. V1调度 + PD 分离,D 阶段获取 Prefill token逻辑升级

Usage or Command

ENABLE_V1_KVCACHE_SCHEDULER=1
使用方式与以前一致,详见:structured_outputs.md

Accuracy Tests

受限解码 json 格式校验成功率测试通过

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@paddle-bot
Copy link

paddle-bot bot commented Nov 20, 2025

Thanks for your contribution!

@ST-XX ST-XX requested review from Jiang-Jia-Jun and kevincheng2 and removed request for kevincheng2 November 20, 2025 07:16
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enables guided decoding support when ENABLE_V1_KVCACHE_SCHEDULER=1. Previously, enabling guided decoding would automatically force the V1 KVCache scheduler to be disabled. This change removes that limitation and adds the necessary logic to support guided decoding with the V1 scheduler.

Key changes:

  • Removed automatic fallback logic that disabled V1 scheduler when guided decoding was enabled
  • Added guided decoding backend initialization in V1 scheduler's prefill phase
  • Enhanced V1 scheduler to support chunked prefill with guided decoding and improved prefill/decode phase separation logic

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
fastdeploy/worker/worker_process.py Removes the check that forced ENABLE_V1_KVCACHE_SCHEDULER to 0 when guided decoding was enabled
fastdeploy/engine/args_utils.py Removes the duplicate check that disabled V1 scheduler for guided decoding
fastdeploy/worker/gpu_model_runner.py Adds guided decoding initialization in insert_tasks_v1, implements prefill token extraction for decode phase in PD disaggregation, and enhances _get_p_done_idxs_gd to identify completed prefill phases with chunked prefill support

ST-XX and others added 2 commits November 24, 2025 17:15
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@codecov-commenter
Copy link

Codecov Report

❌ Patch coverage is 14.28571% with 18 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@5ff93d4). Learn more about missing BASE report.

Files with missing lines Patch % Lines
fastdeploy/worker/gpu_model_runner.py 14.28% 17 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #5140   +/-   ##
==========================================
  Coverage           ?   57.16%           
==========================================
  Files              ?      317           
  Lines              ?    38471           
  Branches           ?     5774           
==========================================
  Hits               ?    21991           
  Misses             ?    14705           
  Partials           ?     1775           
Flag Coverage Δ
GPU 57.16% <14.28%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants