@@ -6,78 +6,78 @@ Get the latest info here: https://github.com/vllm-project/vllm-ascend/issues/160
66
77### Generative Models
88
9- | Model | Support | Note |
10- | -------------------------------| -----------| ----------------------------------------------------------------------|
11- | DeepSeek V3/3.1 | ✅ | |
12- | DeepSeek V3.2 EXP | ✅ | |
13- | DeepSeek R1 | ✅ | |
14- | DeepSeek Distill (Qwen/LLama) | ✅ | |
15- | Qwen3 | ✅ | |
16- | Qwen3-based | ✅ | |
17- | Qwen3-Coder | ✅ | |
18- | Qwen3-Moe | ✅ | |
19- | Qwen3-Next | ✅ | |
20- | Qwen2.5 | ✅ | |
21- | Qwen2 | ✅ | |
22- | Qwen2-based | ✅ | |
23- | QwQ-32B | ✅ | |
24- | LLama2/3/3.1 | ✅ | |
25- | Internlm | ✅ | [ #1962 ] ( https://github.com/vllm-project/vllm-ascend/issues/1962 ) |
26- | Baichuan | ✅ | |
27- | Baichuan2 | ✅ | |
28- | Phi-4-mini | ✅ | |
29- | MiniCPM | ✅ | |
30- | MiniCPM3 | ✅ | |
31- | Ernie4.5 | ✅ | |
32- | Ernie4.5-Moe | ✅ | |
33- | Gemma-2 | ✅ | |
34- | Gemma-3 | ✅ | |
35- | Phi-3/4 | ✅ | |
36- | Mistral/Mistral-Instruct | ✅ | |
37- | GLM-4.5 | ✅ | |
38- | GLM-4 | ❌ | [ #2255 ] ( https://github.com/vllm-project/vllm-ascend/issues/2255 ) |
39- | GLM-4-0414 | ❌ | [ #2258 ] ( https://github.com/vllm-project/vllm-ascend/issues/2258 ) |
40- | ChatGLM | ❌ | [ #554 ] ( https://github.com/vllm-project/vllm-ascend/issues/554 ) |
41- | DeepSeek V2.5 | 🟡 | Need test |
42- | Mllama | 🟡 | Need test |
43- | MiniMax-Text | 🟡 | Need test |
9+ | Model | Support | Note | BF16 | Supported Hardware | W8A8 | Chunked Prefill | Automatic Prefix Cache | LoRA | Speculative Decoding | Async Scheduling | Tensor Parallel | Pipeline Parallel | Expert Parallel | Data Parallel | Prefill-decode Disaggregation | Piecewise AclGraph | Fullgraph AclGraph | max-model-len | MLP Weight Prefetch | Doc |
10+ | -------------------------------| -----------| ----------------------------------------------------------------------| ------ | -------------------- | ------ | ----------------- | ------------------------ | ------ | ---------------------- | ------------------ | ----------------- | ------------------- | ----------------- | --------------- | ------------------------------- | -------------------- | -------------------- | --------------- | --------------------- | ----- |
11+ | DeepSeek V3/3.1 | ✅ | |||||||||||||||||||
12+ | DeepSeek V3.2 EXP | ✅ | | ✅ | A2/A3 | ✅ | ✅ | ✅ | ✅ | ✅ | | ✅ | ✅ | ✅ | ✅ | ❌ | | | 163840 | | [ DeepSeek-V3.2-Exp tutorial ] ( ../../tutorials/DeepSeek-V3.2-Exp.md ) |
13+ | DeepSeek R1 | ✅ | |||||||||||||||||||
14+ | DeepSeek Distill (Qwen/LLama) | ✅ | |||||||||||||||||||
15+ | Qwen3 | ✅ | |||||||||||||||||||
16+ | Qwen3-based | ✅ | |||||||||||||||||||
17+ | Qwen3-Coder | ✅ | |||||||||||||||||||
18+ | Qwen3-Moe | ✅ | |||||||||||||||||||
19+ | Qwen3-Next | ✅ | |||||||||||||||||||
20+ | Qwen2.5 | ✅ | |||||||||||||||||||
21+ | Qwen2 | ✅ | |||||||||||||||||||
22+ | Qwen2-based | ✅ | |||||||||||||||||||
23+ | QwQ-32B | ✅ | |||||||||||||||||||
24+ | LLama2/3/3.1 | ✅ | |||||||||||||||||||
25+ | Internlm | ✅ | [ #1962 ] ( https://github.com/vllm-project/vllm-ascend/issues/1962 ) |||||||||||||||||||
26+ | Baichuan | ✅ | |||||||||||||||||||
27+ | Baichuan2 | ✅ | |||||||||||||||||||
28+ | Phi-4-mini | ✅ | |||||||||||||||||||
29+ | MiniCPM | ✅ | |||||||||||||||||||
30+ | MiniCPM3 | ✅ | |||||||||||||||||||
31+ | Ernie4.5 | ✅ | |||||||||||||||||||
32+ | Ernie4.5-Moe | ✅ | |||||||||||||||||||
33+ | Gemma-2 | ✅ | |||||||||||||||||||
34+ | Gemma-3 | ✅ | |||||||||||||||||||
35+ | Phi-3/4 | ✅ | |||||||||||||||||||
36+ | Mistral/Mistral-Instruct | ✅ | |||||||||||||||||||
37+ | GLM-4.5 | ✅ | |||||||||||||||||||
38+ | GLM-4 | ❌ | [ #2255 ] ( https://github.com/vllm-project/vllm-ascend/issues/2255 ) |||||||||||||||||||
39+ | GLM-4-0414 | ❌ | [ #2258 ] ( https://github.com/vllm-project/vllm-ascend/issues/2258 ) |||||||||||||||||||
40+ | ChatGLM | ❌ | [ #554 ] ( https://github.com/vllm-project/vllm-ascend/issues/554 ) |||||||||||||||||||
41+ | DeepSeek V2.5 | 🟡 | Need test |||||||||||||||||||
42+ | Mllama | 🟡 | Need test |||||||||||||||||||
43+ | MiniMax-Text | 🟡 | Need test |||||||||||||||||||
4444
4545### Pooling Models
4646
47- | Model | Support | Note |
48- | -------------------------------| -----------| ----------------------------------------------------------------------|
49- | Qwen3-Embedding | ✅ | |
50- | Molmo | ✅ | [ 1942] ( https://github.com/vllm-project/vllm-ascend/issues/1942 ) |
51- | XLM-RoBERTa-based | ❌ | [ 1960] ( https://github.com/vllm-project/vllm-ascend/issues/1960 ) |
47+ | Model | Support | Note | BF16 | Supported Hardware | W8A8 | Chunked Prefill | Automatic Prefix Cache | LoRA | Speculative Decoding | Async Scheduling | Tensor Parallel | Pipeline Parallel | Expert Parallel | Data Parallel | Prefill-decode Disaggregation | Piecewise AclGraph | Fullgraph AclGraph | max-model-len | MLP Weight Prefetch | Doc |
48+ | -------------------------------| -----------| ----------------------------------------------------------------------| ------ | -------------------- | ------ | ----------------- | ------------------------ | ------ | ---------------------- | ------------------ | ----------------- | ------------------- | ----------------- | --------------- | ------------------------------- | -------------------- | -------------------- | --------------- | --------------------- | ----- |
49+ | Qwen3-Embedding | ✅ | |||||||||||||||||||
50+ | Molmo | ✅ | [ 1942] ( https://github.com/vllm-project/vllm-ascend/issues/1942 ) |||||||||||||||||||
51+ | XLM-RoBERTa-based | ❌ | [ 1960] ( https://github.com/vllm-project/vllm-ascend/issues/1960 ) |||||||||||||||||||
5252
5353## Multimodal Language Models
5454
5555### Generative Models
5656
57- | Model | Support | Note |
58- | --------------------------------| ---------------| ----------------------------------------------------------------------|
59- | Qwen2-VL | ✅ | |
60- | Qwen2.5-VL | ✅ | |
61- | Qwen3-VL | ✅ | |
62- | Qwen3-VL-MOE | ✅ | |
63- | Qwen2.5-Omni | ✅ | [ 1760] ( https://github.com/vllm-project/vllm-ascend/issues/1760 ) |
64- | QVQ | ✅ | |
65- | LLaVA 1.5/1.6 | ✅ | [ 1962] ( https://github.com/vllm-project/vllm-ascend/issues/1962 ) |
66- | InternVL2 | ✅ | |
67- | InternVL2.5 | ✅ | |
68- | Qwen2-Audio | ✅ | |
69- | Aria | ✅ | |
70- | LLaVA-Next | ✅ | |
71- | LLaVA-Next-Video | ✅ | |
72- | MiniCPM-V | ✅ | |
73- | Mistral3 | ✅ | |
74- | Phi-3-Vison/Phi-3.5-Vison | ✅ | |
75- | Gemma3 | ✅ | |
76- | LLama4 | ❌ | [ 1972] ( https://github.com/vllm-project/vllm-ascend/issues/1972 ) |
77- | LLama3.2 | ❌ | [ 1972] ( https://github.com/vllm-project/vllm-ascend/issues/1972 ) |
78- | Keye-VL-8B-Preview | ❌ | [ 1963] ( https://github.com/vllm-project/vllm-ascend/issues/1963 ) |
79- | Florence-2 | ❌ | [ 2259] ( https://github.com/vllm-project/vllm-ascend/issues/2259 ) |
80- | GLM-4V | ❌ | [ 2260] ( https://github.com/vllm-project/vllm-ascend/issues/2260 ) |
81- | InternVL2.0/2.5/3.0<br >InternVideo2.5/Mono-InternVL | ❌ | [ 2064] ( https://github.com/vllm-project/vllm-ascend/issues/2064 ) |
82- | Whisper | ❌ | [ 2262] ( https://github.com/vllm-project/vllm-ascend/issues/2262 ) |
83- | Ultravox | 🟡 | Need test |
57+ | Model | Support | Note | BF16 | Supported Hardware | W8A8 | Chunked Prefill | Automatic Prefix Cache | LoRA | Speculative Decoding | Async Scheduling | Tensor Parallel | Pipeline Parallel | Expert Parallel | Data Parallel | Prefill-decode Disaggregation | Piecewise AclGraph | Fullgraph AclGraph | max-model-len | MLP Weight Prefetch | Doc |
58+ | --------------------------------| ---------------| ----------------------------------------------------------------------| ------ | -------------------- | ------ | ----------------- | ------------------------ | ------ | ---------------------- | ------------------ | ----------------- | ------------------- | ----------------- | --------------- | ------------------------------- | -------------------- | -------------------- | --------------- | --------------------- | ----- |
59+ | Qwen2-VL | ✅ | |||||||||||||||||||
60+ | Qwen2.5-VL | ✅ | |||||||||||||||||||
61+ | Qwen3-VL | ✅ | |||||||||||||||||||
62+ | Qwen3-VL-MOE | ✅ | |||||||||||||||||||
63+ | Qwen2.5-Omni | ✅ | [ 1760] ( https://github.com/vllm-project/vllm-ascend/issues/1760 ) |||||||||||||||||||
64+ | QVQ | ✅ | |||||||||||||||||||
65+ | LLaVA 1.5/1.6 | ✅ | [ 1962] ( https://github.com/vllm-project/vllm-ascend/issues/1962 ) |||||||||||||||||||
66+ | InternVL2 | ✅ | |||||||||||||||||||
67+ | InternVL2.5 | ✅ | |||||||||||||||||||
68+ | Qwen2-Audio | ✅ | |||||||||||||||||||
69+ | Aria | ✅ | |||||||||||||||||||
70+ | LLaVA-Next | ✅ | |||||||||||||||||||
71+ | LLaVA-Next-Video | ✅ | |||||||||||||||||||
72+ | MiniCPM-V | ✅ | |||||||||||||||||||
73+ | Mistral3 | ✅ | |||||||||||||||||||
74+ | Phi-3-Vison/Phi-3.5-Vison | ✅ | |||||||||||||||||||
75+ | Gemma3 | ✅ | |||||||||||||||||||
76+ | LLama4 | ❌ | [ 1972] ( https://github.com/vllm-project/vllm-ascend/issues/1972 ) |||||||||||||||||||
77+ | LLama3.2 | ❌ | [ 1972] ( https://github.com/vllm-project/vllm-ascend/issues/1972 ) |||||||||||||||||||
78+ | Keye-VL-8B-Preview | ❌ | [ 1963] ( https://github.com/vllm-project/vllm-ascend/issues/1963 ) |||||||||||||||||||
79+ | Florence-2 | ❌ | [ 2259] ( https://github.com/vllm-project/vllm-ascend/issues/2259 ) |||||||||||||||||||
80+ | GLM-4V | ❌ | [ 2260] ( https://github.com/vllm-project/vllm-ascend/issues/2260 ) |||||||||||||||||||
81+ | InternVL2.0/2.5/3.0<br >InternVideo2.5/Mono-InternVL | ❌ | [ 2064] ( https://github.com/vllm-project/vllm-ascend/issues/2064 ) |||||||||||||||||||
82+ | Whisper | ❌ | [ 2262] ( https://github.com/vllm-project/vllm-ascend/issues/2262 ) |||||||||||||||||||
83+ | Ultravox | 🟡 | Need test |||||||||||||||||||
0 commit comments