-
Notifications
You must be signed in to change notification settings - Fork 600
[BugFix]This PR aims to fix the precision issue of the LoRA feature i… #4046
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…n vllm-ascend. vLLM version: v0.11.0 vLLM main: vllm-project/vllm Signed-off-by: liuchenbing <chenliumail@163.com>
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request aims to fix a precision issue with LoRA features and enable bfloat16 kernels more broadly. The change in vllm_ascend/lora/punica_npu.py correctly casts the input tensor x to torch.float32 in add_lora_embedding, which aligns with the kernel's expectation and should resolve the precision problem. However, the changes in the C++ kernel files (bgmv_expand.cpp, bgmv_shrink.cpp, sgmv_expand.cpp, sgmv_shrink.cpp) introduce a critical compilation issue. While the calls to the bfloat16_t kernels are now unconditional, their definitions remain inside conditional compilation blocks (#if (__CCE_AICORE__ >= 220)). This will cause build failures on hardware with __CCE_AICORE__ < 220. Please address this by making the kernel definitions unconditional as well.
| bgmv_expand_bfloat16_t<<<blockDim, nullptr, stream>>>(x, weight, indices, indicesSize, yIn, yOut, batchSize, | ||
| numTokensPerCore, maxLoRARank, outputHiddenDim, | ||
| sliceOffset, outputFullDim); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You've removed the conditional compilation directive for the bgmv_expand_bfloat16_t kernel call, making it unconditional. However, the kernel's definition using BGMV_EXPAND_TYPE_DECLARE(bfloat16_t) at line 346 is still inside an #if (__CCE_AICORE__ >= 220) block. This will cause a compilation error on platforms where __CCE_AICORE__ < 220, as the function will be called but not defined. You should also remove the conditional compilation around the kernel declaration to fix this.
| bgmv_shrink_bfloat16_t<<<blockDim, nullptr, stream>>>(x, weight, indices, indicesSize, y, batchSize, numTokensPerCore, | ||
| inputHiddenDim, maxLoRARank, scale); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You've removed the conditional compilation directive for the bgmv_shrink_bfloat16_t kernel call, making it unconditional. However, the kernel's definition using BGMV_SHRINK_TYPE_DECLARE(bfloat16_t) at line 230 is still inside an #if (__CCE_AICORE__ >= 220) block. This will cause a compilation error on platforms where __CCE_AICORE__ < 220, as the function will be called but not defined. You should also remove the conditional compilation around the kernel declaration to fix this.
| sgmv_expand_bfloat16_t<<<blockDim, nullptr, stream>>>(x, weight, loraIndices, loraIndicesSize, | ||
| seqLen, seqLenSize, yIn, yOut, batchSize, | ||
| numTokensPerCore, maxLoRARank, outputHiddenDim, | ||
| sliceOffset, outputFullDim); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You've removed the conditional compilation directive for the sgmv_expand_bfloat16_t kernel call, making it unconditional. However, the kernel's definition using SGMV_EXPAND_TYPE_DECLARE(bfloat16_t) at line 361 is still inside an #if (__CCE_AICORE__ >= 220) block. This will cause a compilation error on platforms where __CCE_AICORE__ < 220, as the function will be called but not defined. You should also remove the conditional compilation around the kernel declaration to fix this.
| sgmv_shrink_bfloat16_t<<<blockDim, nullptr, stream>>>(x, weight, loraIndices, loraIndicesSize, | ||
| seqLen, seqLenSize, | ||
| y, batchSize, | ||
| numTokensPerCore, inputHiddenDim, maxLoRARank, | ||
| scale); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You've removed the conditional compilation directive for the sgmv_shrink_bfloat16_t kernel call, making it unconditional. However, the kernel's definition using SGMV_SHRINK_TYPE_DECLARE(bfloat16_t) at line 246 is still inside an #if (__CCE_AICORE__ >= 220) block. This will cause a compilation error on platforms where __CCE_AICORE__ < 220, as the function will be called but not defined. You should also remove the conditional compilation around the kernel declaration to fix this.
|
This PR can fix 2 bugs:
@liuchenbing Could you consider fixing according to the GEMINI review opinion ? |
…n vllm-ascend. vLLM version: v0.11.0 vLLM main: vllm-project/vllm Signed-off-by: liuchenbing <chenliumail@163.com>
…n vllm-ascend. vLLM version: v0.11.0 vLLM main: vllm-project/vllm Signed-off-by: liuchenbing <chenliumail@163.com>
…n vllm-ascend. vLLM version: v0.11.0 vLLM main: vllm-project/vllm Signed-off-by: liuchenbing <chenliumail@163.com>
…n vllm-ascend. vLLM version: v0.11.0 vLLM main: vllm-project/vllm Signed-off-by: liuchenbing <chenliumail@163.com>
|
This PR is duplicated to #4141. We'll concentrate on that one, and this will be closed. |
vLLM version: v0.11.0
vLLM main: vllm-project/vllm
What this PR does / why we need it?
Does this PR introduce any user-facing change?
How was this patch tested?
pytest tests/lora/test_llama_tp.py::test_llama_lora -s