Skip to content

Pull requests: vllm-project/vllm

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

[CPU]Parallelize over tokens in int4 moe
#29600 opened Nov 27, 2025 by xiangze-arm Loading…
[Multimodal][Speculative Decoding]Eagle3 mm support, enablement on qwen3vl new-model Requests to new models qwen Related to Qwen models speculative-decoding v1
#29594 opened Nov 27, 2025 by EanWang211123 Loading…
5 tasks
[Bugfix] Fix HunyuanVL XD-RoPE ready ONLY add when PR is ready to merge/full CI is needed
#29593 opened Nov 27, 2025 by ywang96 Loading…
5 tasks
[CPU]Update CPU PyTorch to 2.9.0 ci/build
#29589 opened Nov 27, 2025 by scydas Loading…
5 tasks
Flashrl nvidia v1
#29586 opened Nov 27, 2025 by devpatelio Loading…
[BugFix] Optional tokenizer argument when loading GGUF models
#29582 opened Nov 27, 2025 by sts07142 Loading…
1 of 5 tasks
[BugFix] Fix new nightly failures ready ONLY add when PR is ready to merge/full CI is needed ready-run-all-tests Trigger CI with all tests for wide-ranging PRs v1
#29578 opened Nov 27, 2025 by LucasWilkinson Loading…
[Doc] Add 20251202 vLLM Malaysia Meetup Info documentation Improvements or additions to documentation
#29577 opened Nov 27, 2025 by tjtanaa Loading…
5 tasks
platform: optimized grouped topk op
#29575 opened Nov 27, 2025 by xinyu-intel Loading…
Make PyTorch profiler gzip and CUDA time dump configurable documentation Improvements or additions to documentation nvidia v1
#29568 opened Nov 27, 2025 by zhangruoxu Loading…
5 tasks
[Perf] Enable cuda graph for deepepHT, 5.3% throughput improvement, 4.4% TTFT improvement nvidia ready ONLY add when PR is ready to merge/full CI is needed
#29558 opened Nov 27, 2025 by yewentao256 Loading…
[Frontend] Remap -O to -cc commandline flag ci/build documentation Improvements or additions to documentation
#29557 opened Nov 27, 2025 by gmagogsfm Loading…
[CI/Build] Skip ray tests on ROCm rocm Related to AMD ROCm v1
#29556 opened Nov 26, 2025 by rjrock Loading…
3 of 5 tasks
[responsesAPI] support input output messages for non harmony models frontend gpt-oss Related to GPT-OSS models
#29549 opened Nov 26, 2025 by qandrew Loading…
[Perf] Deepgemm fused layout kernel for activations, 4.3% throughput improvement, 10.7% TTFT improvement. ready ONLY add when PR is ready to merge/full CI is needed
#29546 opened Nov 26, 2025 by yewentao256 Loading…
[Bugfix] Fix DeepSeek R1 MTP weight loading deepseek Related to DeepSeek models
#29545 opened Nov 26, 2025 by MatthewBonanni Draft
3 of 5 tasks
[BugFix] Fix spec decoding max_tokens scheduling perf issue ready ONLY add when PR is ready to merge/full CI is needed v1
#29542 opened Nov 26, 2025 by njhill Loading…
[Attention] Update attention imports kv-connector nvidia ready ONLY add when PR is ready to merge/full CI is needed rocm Related to AMD ROCm speculative-decoding tpu Related to Google TPUs v1
#29540 opened Nov 26, 2025 by MatthewBonanni Loading…
3 of 5 tasks
[Frontend] Add streaming tool-call support to Responses API (non-Harmony) frontend gpt-oss Related to GPT-OSS models
#29513 opened Nov 26, 2025 by sumitaryal Loading…
5 tasks
ProTip! Type g i on any issue or pull request to go back to the issue listing page.