-
Notifications
You must be signed in to change notification settings - Fork 609
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Your current environment
The output of `python collect_env.py`
Your output of above commands here
🐛 Describe the bug
While using the LoRA feature in vllm-ascend v0.11.0rc0, I noticed that the responses generated by the LoRA model and the base model are identical. This does not happen in the standard vllm. Could there be an issue here? Below is my test startup script:
vllm serve /llm/model/Qwen1.5-4B-Chat --max-lora-rank 64 --max-loras 1 --max-cpu-loras 100 --enable-lora --lora-modules test-lora=/llm/lora_models/Qwen1.5-4B-Chat_lora_huanhuan
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working