Skip to content

[Metax] deepseek ops import error #5214

@StareAtYou

Description

@StareAtYou

测试环境

platform:maca
paddle:dev20251120
paddle_metax_gpu: dev20251121
fastdeploy:develop branch,based on ff26158

Image

现象描述

刚编译完 fastdeploy 包,能够正常导入 deepseek 相关的 op 接口

from fastdeploy.model_executor.ops.gpu import fused_rotary_position_encoding

运行 run_model.py 后,导入 deepseek op 就会失败

run_model.py 源码

import os
os.environ["MACA_VISIBLE_DEVICES"] = "6,7"
os.environ["FD_MOE_BACKEND"] = "cutlass"
os.environ["PADDLE_XCCL_BACKEND"] = "metax_gpu"
os.environ["FLAGS_weight_only_linear_arch"] = "80"
os.environ["FD_METAX_KVCACHE_MEM"] = "8"
os.environ["FD_ENC_DEC_BLOCK_NUM"] = "0"
# os.environ["FD_METAX_DENSE_QUANT_TYPE"] = "wint8"

# "/root/model/ERNIE-4.5-21B-A3B-Paddle"
# "/root/model/ERNIE-4.5-0.3B-Paddle"
# "/root/model/ERNIE-4.5-21B-A3B-Thinking"

import fastdeploy
llm = fastdeploy.LLM(model="/root/model/ERNIE-4.5-VL-28B-A3B-Thinking",
                     tensor_parallel_size=1,
                     load_choices="default_v1",
                     engine_worker_queue_port=8899,
                     quantization="wint8",
                     disable_custom_all_reduce=True,
)

prompts = [
        # "who are you?",
        "A robe takes 2 bolts of blue fiber and half that much white fiber. How many bolts in total does it take?",
]
# sampling_params = fastdeploy.SamplingParams(top_p=0.0, max_tokens=2047, temperature=0.0)
sampling_params = fastdeploy.SamplingParams(top_p=0.95, max_tokens=128, temperature=0.6)

outputs = llm.generate(prompts, sampling_params)

for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs.text
    print(f"Prompt: {prompt!r}")
    print(f"Generated: {generated_text!r}")

终端打印

刚编译出 fd 包时的导入测试

Image

紧接着运行 run_model.py 时

Image

接着再测试一开始导包方式

Image

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions