-
-
Notifications
You must be signed in to change notification settings - Fork 11.6k
[BugFix] Optional tokenizer argument when loading GGUF models #29582
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BugFix] Optional tokenizer argument when loading GGUF models #29582
Conversation
Signed-off-by: Injae Ryou <injaeryou@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request improves the experience of loading GGUF models by making the --tokenizer argument optional. The changes automatically detect the correct GGUF file from a HuggingFace repository based on the quantization type. The implementation is mostly solid, but I've identified two high-severity issues. One is related to overly broad exception handling which could hide bugs, and the other is a potential silent failure if the GGUF tokenizer file isn't found, which could lead to incorrect model behavior. Addressing these will make the new feature more robust.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Injae Ryou <injaeryou@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Signed-off-by: Injae Ryou <injaeryou@gmail.com>
|
@Isotr0py |
- list_filtered_repo_files Signed-off-by: Injae Ryou <injaeryou@gmail.com>
Signed-off-by: Injae Ryou <injaeryou@gmail.com>
cbd7d7b to
ac41103
Compare
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Isotr0py
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM now! Thanks for fixing this!
Purpose
fixes: #29563
Fix correct to use optional
--tokenizerargument when using GGUF model (local and remote).As-Is:
vllm serve <gguf_model> --tokenizer <tokenizer>To-Be:
vllm serve <gguf_model> (--tokenizer optional)Test Plan
Test Result
vllm serve unsloth/Qwen3-0.6B-GGUF:IQ1_S
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.