modelscope
diff --git a/‎README.md‎
Lines changed: 8 additions & 7 deletions b/‎README.md‎
Lines changed: 8 additions & 7 deletions
diff --git a/‎README_CN.md‎
Lines changed: 8 additions & 7 deletions b/‎README_CN.md‎
Lines changed: 8 additions & 7 deletions
diff --git a/‎csrc/core/model/chatglm/chatglm.cpp‎
Lines changed: 1 addition & 0 deletions b/‎csrc/core/model/chatglm/chatglm.cpp‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎csrc/core/model/chatglm/chatglm.h‎
Lines changed: 6 additions & 0 deletions b/‎csrc/core/model/chatglm/chatglm.h‎
Lines changed: 6 additions & 0 deletions
diff --git a/‎examples/python/0_basic/basic_example_chatglm4.py‎
Lines changed: 163 additions & 0 deletions b/‎examples/python/0_basic/basic_example_chatglm4.py‎
Lines changed: 163 additions & 0 deletions
diff --git a/‎examples/python/0_basic/basic_example_chatglm4_dimodel_simple.py‎
Lines changed: 36 additions & 0 deletions b/‎examples/python/0_basic/basic_example_chatglm4_dimodel_simple.py‎
Lines changed: 36 additions & 0 deletions
@@ -80,13 +80,14 @@ During inference, the quantized weight is recovered as bfloat16 for matrix multi
 
 # Supported Models
 
-| Architecture | Models | DashInfer model_type | HuggingFace Models | ModelScope Models |
-|:------------:|:------:|:--------------------:|:------------------:|:-----------------:|
-| QWenLMHeadModel | Qwen | Qwen_v10 | [Qwen/Qwen-1_8B-Chat](https://huggingface.co/Qwen/Qwen-1_8B-Chat),<br>[Qwen/Qwen-7B-Chat](https://huggingface.co/Qwen/Qwen-7B-Chat),<br>[Qwen/Qwen-14B-Chat](https://huggingface.co/Qwen/Qwen-14B-Chat), etc. | [qwen/Qwen-1_8B-Chat](https://modelscope.cn/models/qwen/Qwen-1_8B-Chat/summary),<br>[qwen/Qwen-7B-Chat](https://modelscope.cn/models/qwen/Qwen-7B-Chat/summary),<br>[qwen/Qwen-14B-Chat](https://modelscope.cn/models/qwen/Qwen-14B-Chat/summary), etc. |
-| Qwen2ForCausalLM | Qwen1.5 | Qwen_v15 | [Qwen/Qwen1.5-0.5B-Chat](https://huggingface.co/Qwen/Qwen1.5-0.5B-Chat),<br>[Qwen/Qwen1.5-1.8B-Chat](https://huggingface.co/Qwen/Qwen1.5-1.8B-Chat),<br>[Qwen/Qwen1.5-4B-Chat](https://huggingface.co/Qwen/Qwen1.5-4B-Chat),<br>[Qwen/Qwen1.5-7B-Chat](https://huggingface.co/Qwen/Qwen1.5-7B-Chat),<br>[Qwen/Qwen1.5-14B-Chat](https://huggingface.co/Qwen/Qwen1.5-14B-Chat), etc. | [qwen/Qwen1.5-0.5B-Chat](https://modelscope.cn/models/qwen/Qwen1.5-0.5B-Chat/summary),<br>[qwen/Qwen1.5-1.8B-Chat](https://modelscope.cn/models/qwen/Qwen1.5-1.8B-Chat/summary),<br>[qwen/Qwen1.5-4B-Chat](https://modelscope.cn/models/qwen/Qwen1.5-4B-Chat/summary),<br>[qwen/Qwen1.5-7B-Chat](https://modelscope.cn/models/qwen/Qwen1.5-7B-Chat/summary),<br>[qwen/Qwen1.5-14B-Chat](https://modelscope.cn/models/qwen/Qwen1.5-14B-Chat/summary), etc. |
-| ChatGLMModel | ChatGLM | ChatGLM_v2 | [THUDM/chatglm2-6b](https://huggingface.co/THUDM/chatglm2-6b),<br>[THUDM/chatglm2-6b-32k](https://huggingface.co/THUDM/chatglm2-6b-32k) | [ZhipuAI/chatglm2-6b](https://modelscope.cn/models/ZhipuAI/chatglm2-6b/summary),<br>[ZhipuAI/chatglm2-6b-32k](https://modelscope.cn/models/ZhipuAI/chatglm2-6b-32k/summary) |
-| ChatGLMModel | ChatGLM | ChatGLM_v3 | [THUDM/chatglm3-6b](https://huggingface.co/THUDM/chatglm3-6b),<br>[THUDM/chatglm3-6b-32k](https://huggingface.co/THUDM/chatglm3-6b-32k) | [ZhipuAI/chatglm3-6b](https://modelscope.cn/models/ZhipuAI/chatglm3-6b/summary),<br>[ZhipuAI/chatglm3-6b-32k](https://modelscope.cn/models/ZhipuAI/chatglm3-6b-32k/summary) |
-| LlamaForCausalLM | LLaMA-2 | LLaMA_v2 | [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf),<br>[meta-llama/Llama-2-13b-chat-hf](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf) | [modelscope/Llama-2-7b-chat-ms](https://modelscope.cn/models/modelscope/Llama-2-7b-chat-ms/summary),<br>[modelscope/Llama-2-13b-chat-ms](https://modelscope.cn/models/modelscope/Llama-2-13b-chat-ms/summary) |
+| Architecture | Models | DashInfer model_type | HuggingFace Models | ModelScope Models | DashInfer Models |
+|:------------:|:------:|:--------------------:|:------------------:|:-----------------:|:----------------:|
+| QWenLMHeadModel | Qwen | Qwen_v10 | [Qwen/Qwen-1_8B-Chat](https://huggingface.co/Qwen/Qwen-1_8B-Chat),<br>[Qwen/Qwen-7B-Chat](https://huggingface.co/Qwen/Qwen-7B-Chat),<br>[Qwen/Qwen-14B-Chat](https://huggingface.co/Qwen/Qwen-14B-Chat), etc. | [qwen/Qwen-1_8B-Chat](https://modelscope.cn/models/qwen/Qwen-1_8B-Chat/summary),<br>[qwen/Qwen-7B-Chat](https://modelscope.cn/models/qwen/Qwen-7B-Chat/summary),<br>[qwen/Qwen-14B-Chat](https://modelscope.cn/models/qwen/Qwen-14B-Chat/summary), etc. | / |
+| Qwen2ForCausalLM | Qwen1.5 | Qwen_v15 | [Qwen/Qwen1.5-0.5B-Chat](https://huggingface.co/Qwen/Qwen1.5-0.5B-Chat),<br>[Qwen/Qwen1.5-1.8B-Chat](https://huggingface.co/Qwen/Qwen1.5-1.8B-Chat),<br>[Qwen/Qwen1.5-4B-Chat](https://huggingface.co/Qwen/Qwen1.5-4B-Chat),<br>[Qwen/Qwen1.5-7B-Chat](https://huggingface.co/Qwen/Qwen1.5-7B-Chat),<br>[Qwen/Qwen1.5-14B-Chat](https://huggingface.co/Qwen/Qwen1.5-14B-Chat), etc. | [qwen/Qwen1.5-0.5B-Chat](https://modelscope.cn/models/qwen/Qwen1.5-0.5B-Chat/summary),<br>[qwen/Qwen1.5-1.8B-Chat](https://modelscope.cn/models/qwen/Qwen1.5-1.8B-Chat/summary),<br>[qwen/Qwen1.5-4B-Chat](https://modelscope.cn/models/qwen/Qwen1.5-4B-Chat/summary),<br>[qwen/Qwen1.5-7B-Chat](https://modelscope.cn/models/qwen/Qwen1.5-7B-Chat/summary),<br>[qwen/Qwen1.5-14B-Chat](https://modelscope.cn/models/qwen/Qwen1.5-14B-Chat/summary), etc. | / |
+| ChatGLMModel | ChatGLM | ChatGLM_v2 | [THUDM/chatglm2-6b](https://huggingface.co/THUDM/chatglm2-6b),<br>[THUDM/chatglm2-6b-32k](https://huggingface.co/THUDM/chatglm2-6b-32k) | [ZhipuAI/chatglm2-6b](https://modelscope.cn/models/ZhipuAI/chatglm2-6b/summary),<br>[ZhipuAI/chatglm2-6b-32k](https://modelscope.cn/models/ZhipuAI/chatglm2-6b-32k/summary) | / |
+| ChatGLMModel | ChatGLM | ChatGLM_v3 | [THUDM/chatglm3-6b](https://huggingface.co/THUDM/chatglm3-6b),<br>[THUDM/chatglm3-6b-32k](https://huggingface.co/THUDM/chatglm3-6b-32k) | [ZhipuAI/chatglm3-6b](https://modelscope.cn/models/ZhipuAI/chatglm3-6b/summary),<br>[ZhipuAI/chatglm3-6b-32k](https://modelscope.cn/models/ZhipuAI/chatglm3-6b-32k/summary) | / |
+| ChatGLMModel | ChatGLM | ChatGLM_v4 | [THUDM/glm-4-9b-chat](https://huggingface.co/THUDM/glm-4-9b-chat) | [ZhipuAI/glm-4-9b-chat](https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat/summary) | [dash-infer/glm-4-9b-chat-DI](https://modelscope.cn/models/dash-infer/glm-4-9b-chat-DI/summary) |
+| LlamaForCausalLM | LLaMA-2 | LLaMA_v2 | [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf),<br>[meta-llama/Llama-2-13b-chat-hf](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf) | [modelscope/Llama-2-7b-chat-ms](https://modelscope.cn/models/modelscope/Llama-2-7b-chat-ms/summary),<br>[modelscope/Llama-2-13b-chat-ms](https://modelscope.cn/models/modelscope/Llama-2-13b-chat-ms/summary) | / |
 
 # Software Architecture
 
 
@@ -81,13 +81,14 @@ $$ x_{u8} = x_{fp32} / scale + zeropoint $$
 
 # 模型支持
 
-| Architecture | Models | DashInfer model_type | HuggingFace Models | ModelScope Models |
-|:------------:|:------:|:--------------------:|:------------------:|:-----------------:|
-| QWenLMHeadModel | Qwen | Qwen_v10 | [Qwen/Qwen-1_8B-Chat](https://huggingface.co/Qwen/Qwen-1_8B-Chat),<br>[Qwen/Qwen-7B-Chat](https://huggingface.co/Qwen/Qwen-7B-Chat),<br>[Qwen/Qwen-14B-Chat](https://huggingface.co/Qwen/Qwen-14B-Chat), etc. | [qwen/Qwen-1_8B-Chat](https://modelscope.cn/models/qwen/Qwen-1_8B-Chat/summary),<br>[qwen/Qwen-7B-Chat](https://modelscope.cn/models/qwen/Qwen-7B-Chat/summary),<br>[qwen/Qwen-14B-Chat](https://modelscope.cn/models/qwen/Qwen-14B-Chat/summary), etc. |
-| Qwen2ForCausalLM | Qwen1.5 | Qwen_v15 | [Qwen/Qwen1.5-0.5B-Chat](https://huggingface.co/Qwen/Qwen1.5-0.5B-Chat),<br>[Qwen/Qwen1.5-1.8B-Chat](https://huggingface.co/Qwen/Qwen1.5-1.8B-Chat),<br>[Qwen/Qwen1.5-4B-Chat](https://huggingface.co/Qwen/Qwen1.5-4B-Chat),<br>[Qwen/Qwen1.5-7B-Chat](https://huggingface.co/Qwen/Qwen1.5-7B-Chat),<br>[Qwen/Qwen1.5-14B-Chat](https://huggingface.co/Qwen/Qwen1.5-14B-Chat), etc. | [qwen/Qwen1.5-0.5B-Chat](https://modelscope.cn/models/qwen/Qwen1.5-0.5B-Chat/summary),<br>[qwen/Qwen1.5-1.8B-Chat](https://modelscope.cn/models/qwen/Qwen1.5-1.8B-Chat/summary),<br>[qwen/Qwen1.5-4B-Chat](https://modelscope.cn/models/qwen/Qwen1.5-4B-Chat/summary),<br>[qwen/Qwen1.5-7B-Chat](https://modelscope.cn/models/qwen/Qwen1.5-7B-Chat/summary),<br>[qwen/Qwen1.5-14B-Chat](https://modelscope.cn/models/qwen/Qwen1.5-14B-Chat/summary), etc. |
-| ChatGLMModel | ChatGLM | ChatGLM_v2 | [THUDM/chatglm2-6b](https://huggingface.co/THUDM/chatglm2-6b),<br>[THUDM/chatglm2-6b-32k](https://huggingface.co/THUDM/chatglm2-6b-32k) | [ZhipuAI/chatglm2-6b](https://modelscope.cn/models/ZhipuAI/chatglm2-6b/summary),<br>[ZhipuAI/chatglm2-6b-32k](https://modelscope.cn/models/ZhipuAI/chatglm2-6b-32k/summary) |
-| ChatGLMModel | ChatGLM | ChatGLM_v3 | [THUDM/chatglm3-6b](https://huggingface.co/THUDM/chatglm3-6b),<br>[THUDM/chatglm3-6b-32k](https://huggingface.co/THUDM/chatglm3-6b-32k) | [ZhipuAI/chatglm3-6b](https://modelscope.cn/models/ZhipuAI/chatglm3-6b/summary),<br>[ZhipuAI/chatglm3-6b-32k](https://modelscope.cn/models/ZhipuAI/chatglm3-6b-32k/summary) |
-| LlamaForCausalLM | LLaMA-2 | LLaMA_v2 | [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf),<br>[meta-llama/Llama-2-13b-chat-hf](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf) | [modelscope/Llama-2-7b-chat-ms](https://modelscope.cn/models/modelscope/Llama-2-7b-chat-ms/summary),<br>[modelscope/Llama-2-13b-chat-ms](https://modelscope.cn/models/modelscope/Llama-2-13b-chat-ms/summary) |
+| Architecture | Models | DashInfer model_type | HuggingFace Models | ModelScope Models | DashInfer Models |
+|:------------:|:------:|:--------------------:|:------------------:|:-----------------:|:----------------:|
+| QWenLMHeadModel | Qwen | Qwen_v10 | [Qwen/Qwen-1_8B-Chat](https://huggingface.co/Qwen/Qwen-1_8B-Chat),<br>[Qwen/Qwen-7B-Chat](https://huggingface.co/Qwen/Qwen-7B-Chat),<br>[Qwen/Qwen-14B-Chat](https://huggingface.co/Qwen/Qwen-14B-Chat), etc. | [qwen/Qwen-1_8B-Chat](https://modelscope.cn/models/qwen/Qwen-1_8B-Chat/summary),<br>[qwen/Qwen-7B-Chat](https://modelscope.cn/models/qwen/Qwen-7B-Chat/summary),<br>[qwen/Qwen-14B-Chat](https://modelscope.cn/models/qwen/Qwen-14B-Chat/summary), etc. | / |
+| Qwen2ForCausalLM | Qwen1.5 | Qwen_v15 | [Qwen/Qwen1.5-0.5B-Chat](https://huggingface.co/Qwen/Qwen1.5-0.5B-Chat),<br>[Qwen/Qwen1.5-1.8B-Chat](https://huggingface.co/Qwen/Qwen1.5-1.8B-Chat),<br>[Qwen/Qwen1.5-4B-Chat](https://huggingface.co/Qwen/Qwen1.5-4B-Chat),<br>[Qwen/Qwen1.5-7B-Chat](https://huggingface.co/Qwen/Qwen1.5-7B-Chat),<br>[Qwen/Qwen1.5-14B-Chat](https://huggingface.co/Qwen/Qwen1.5-14B-Chat), etc. | [qwen/Qwen1.5-0.5B-Chat](https://modelscope.cn/models/qwen/Qwen1.5-0.5B-Chat/summary),<br>[qwen/Qwen1.5-1.8B-Chat](https://modelscope.cn/models/qwen/Qwen1.5-1.8B-Chat/summary),<br>[qwen/Qwen1.5-4B-Chat](https://modelscope.cn/models/qwen/Qwen1.5-4B-Chat/summary),<br>[qwen/Qwen1.5-7B-Chat](https://modelscope.cn/models/qwen/Qwen1.5-7B-Chat/summary),<br>[qwen/Qwen1.5-14B-Chat](https://modelscope.cn/models/qwen/Qwen1.5-14B-Chat/summary), etc. | / |
+| ChatGLMModel | ChatGLM | ChatGLM_v2 | [THUDM/chatglm2-6b](https://huggingface.co/THUDM/chatglm2-6b),<br>[THUDM/chatglm2-6b-32k](https://huggingface.co/THUDM/chatglm2-6b-32k) | [ZhipuAI/chatglm2-6b](https://modelscope.cn/models/ZhipuAI/chatglm2-6b/summary),<br>[ZhipuAI/chatglm2-6b-32k](https://modelscope.cn/models/ZhipuAI/chatglm2-6b-32k/summary) | / |
+| ChatGLMModel | ChatGLM | ChatGLM_v3 | [THUDM/chatglm3-6b](https://huggingface.co/THUDM/chatglm3-6b),<br>[THUDM/chatglm3-6b-32k](https://huggingface.co/THUDM/chatglm3-6b-32k) | [ZhipuAI/chatglm3-6b](https://modelscope.cn/models/ZhipuAI/chatglm3-6b/summary),<br>[ZhipuAI/chatglm3-6b-32k](https://modelscope.cn/models/ZhipuAI/chatglm3-6b-32k/summary) | / |
+| ChatGLMModel | ChatGLM | ChatGLM_v4 | [THUDM/glm-4-9b-chat](https://huggingface.co/THUDM/glm-4-9b-chat) | [ZhipuAI/glm-4-9b-chat](https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat/summary) | [dash-infer/glm-4-9b-chat-DI](https://modelscope.cn/models/dash-infer/glm-4-9b-chat-DI/summary) |
+| LlamaForCausalLM | LLaMA-2 | LLaMA_v2 | [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf),<br>[meta-llama/Llama-2-13b-chat-hf](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf) | [modelscope/Llama-2-7b-chat-ms](https://modelscope.cn/models/modelscope/Llama-2-7b-chat-ms/summary),<br>[modelscope/Llama-2-13b-chat-ms](https://modelscope.cn/models/modelscope/Llama-2-13b-chat-ms/summary) | / |
 
 # 软件框架
 
 
@@ -102,4 +102,5 @@ AsStatus ChatGLMModel::Forward(const TensorMap& inputs, TensorMap* outputs) {
 
 REGISTER_MODEL("ChatGLM_v2", ChatGLM_v2Model)
 REGISTER_MODEL("ChatGLM_v3", ChatGLM_v3Model)
+REGISTER_MODEL("ChatGLM_v4", ChatGLM_v4Model)
 }  // namespace allspark
@@ -31,4 +31,10 @@ class ChatGLM_v3Model : public ChatGLMModel {
   explicit ChatGLM_v3Model(const std::string& model_type = "")
       : ChatGLMModel(model_type){};
 };
+
+class ChatGLM_v4Model : public ChatGLMModel {
+ public:
+  explicit ChatGLM_v4Model(const std::string& model_type = "")
+      : ChatGLMModel(model_type){};
+};
 }  // namespace allspark
@@ -0,0 +1,163 @@
+#
+# Copyright (c) Alibaba, Inc. and its affiliates.
+# @file    basic_example_chatglm4.py
+#
+import os
+import copy
+import time
+import random
+import argparse
+import subprocess
+from jinja2 import Template
+from concurrent.futures import ThreadPoolExecutor
+
+from dashinfer.helper import EngineHelper, ConfigManager
+
+
+def download_model(model_id, revision, source="modelscope"):
+    print(f"Downloading model {model_id} (revision: {revision}) from {source}")
+    if source == "modelscope":
+        from modelscope import snapshot_download
+        model_dir = snapshot_download(model_id, revision=revision)
+    elif source == "huggingface":
+        from huggingface_hub import snapshot_download
+        model_dir = snapshot_download(repo_id=model_id)
+    else:
+        raise ValueError("Unknown source")
+
+    print(f"Save model to path {model_dir}")
+
+    return model_dir
+
+
+def create_test_prompt(default_gen_cfg=None):
+    input_list = [
+        "浙江的省会在哪",
+        "Where is the capital of Zhejiang?",
+        "将“温故而知新”翻译成英文，并解释其含义",
+    ]
+
+    user_msg = {"role": "user", "content": ""}
+    assistant_msg = {"role": "assistant", "content": ""}
+
+    prompt_template = Template(
+        "[gMASK] <sop> " + "<|{{user_role}}|>\n" + "{{user_content}}" +
+        "<|{{assistant_role}}|>\n")
+
+    gen_cfg_list = []
+    prompt_list = []
+    for i in range(len(input_list)):
+        user_msg["content"] = input_list[i]
+        prompt = prompt_template.render(user_role=user_msg["role"], user_content=user_msg["content"],
+                                        assistant_role=assistant_msg["role"])
+        prompt_list.append(prompt)
+        if default_gen_cfg != None:
+            gen_cfg = copy.deepcopy(default_gen_cfg)
+            gen_cfg["seed"] = random.randint(0, 10000)
+            gen_cfg_list.append(gen_cfg)
+
+    return prompt_list, gen_cfg_list
+
+
+def process_request(request_list, engine_helper: EngineHelper):
+
+    def print_inference_result(request):
+        msg = "***********************************\n"
+        msg += f"* Answer (dashinfer) for Request {request.id}\n"
+        msg += "***********************************\n"
+        msg += f"** context_time: {request.context_time} s, generate_time: {request.generate_time} s\n\n"
+        msg += f"** encoded input, len: {request.in_tokens_len} **\n{request.in_tokens}\n\n"
+        msg += f"** encoded output, len: {request.out_tokens_len} **\n{request.out_tokens}\n\n"
+        msg += f"** text input **\n{request.in_text}\n\n"
+        msg += f"** text output **\n{request.out_text}\n\n"
+        print(msg)
+
+    def done_callback(future):
+        request = future.argument
+        future.result()
+        print_inference_result(request)
+
+    # create a threadpool
+    executor = ThreadPoolExecutor(
+        max_workers=engine_helper.engine_config["engine_max_batch"])
+
+    try:
+        # submit all tasks to the threadpool
+        futures = []
+        for request in request_list:
+            future = executor.submit(engine_helper.process_one_request, request)
+            future.argument = request
+            future.add_done_callback(done_callback)
+            futures.append(future)
+    finally:
+        executor.shutdown(wait=True)
+
+    return
+
+
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--quantize', action='store_true')
+    args = parser.parse_args()
+
+    config_file = "../model_config/config_chatglm4_9b.json"
+    config = ConfigManager.get_config_from_json(config_file)
+    config["convert_config"]["do_dynamic_quantize_convert"] = args.quantize
+
+    cmd = f"pip show dashinfer | grep 'Location' | cut -d ' ' -f 2"
+    package_location = subprocess.run(cmd,
+                                      stdout=subprocess.PIPE,
+                                      stderr=subprocess.PIPE,
+                                      shell=True,
+                                      text=True)
+    package_location = package_location.stdout.strip()
+    os.environ["AS_DAEMON_PATH"] = package_location + "/dashinfer/allspark/bin"
+    os.environ["AS_NUMA_NUM"] = str(len(config["device_ids"]))
+    os.environ["AS_NUMA_OFFSET"] = str(config["device_ids"][0])
+
+    ## download original model
+    ## download model from huggingface
+    # original_model = {
+    #     "source": "huggingface",
+    #     "model_id": "THUDM/glm-4-9b-chat",
+    #     "revision": "",
+    #     "model_path": ""
+    # }
+
+    ## download model from modelscope
+    original_model = {
+        "source": "modelscope",
+        "model_id": "ZhipuAI/glm-4-9b-chat",
+        "revision": "master",
+        "model_path": ""
+    }
+    original_model["model_path"] = download_model(original_model["model_id"],
+                                                  original_model["revision"],
+                                                  original_model["source"])
+
+    ## init EngineHelper class
+    engine_helper = EngineHelper(config)
+    engine_helper.verbose = True
+    engine_helper.init_tokenizer(original_model["model_path"])
+
+    ## convert huggingface model to dashinfer model
+    ## only one conversion is required
+    engine_helper.convert_model(original_model["model_path"])
+
+    ## inference
+    engine_helper.init_engine()
+
+    prompt_list, gen_cfg_list = create_test_prompt(
+        engine_helper.default_gen_cfg)
+    request_list = engine_helper.create_request(prompt_list, gen_cfg_list)
+
+    global_start = time.time()
+    process_request(request_list, engine_helper)
+    global_end = time.time()
+
+    total_timecost = global_end - global_start
+    # engine_helper.print_inference_result_all(request_list)
+    engine_helper.print_profiling_data(request_list, total_timecost)
+    print(f"total timecost: {total_timecost} s")
+
+    engine_helper.uninit_engine()
@@ -0,0 +1,36 @@
+#
+# Copyright (c) Alibaba, Inc. and its affiliates.
+# @file    basic_example_chatglm4_dimodel_simple.py
+#
+import copy
+import random
+
+from modelscope import snapshot_download
+from dashinfer.helper import EngineHelper, ConfigManager
+
+model_path = snapshot_download("dash-infer/glm-4-9b-chat-DI")
+
+config_file = model_path + "/" + "di_config.json"
+config = ConfigManager.get_config_from_json(config_file)
+config["model_path"] = model_path
+
+## init EngineHelper class
+engine_helper = EngineHelper(config)
+engine_helper.verbose = True
+engine_helper.init_tokenizer(model_path)
+
+## init engine
+engine_helper.init_engine()
+
+## prepare inputs and generation configs
+user_input = "浙江的省会在哪"
+prompt = "[gMASK] <sop> " + "<|user|>\n" + user_input + "<|assistant|>\n"
+gen_cfg = copy.deepcopy(engine_helper.default_gen_cfg)
+gen_cfg["seed"] = random.randint(0, 10000)
+request_list = engine_helper.create_request([prompt], [gen_cfg])
+
+## inference
+engine_helper.process_one_request(request_list[0])
+engine_helper.print_inference_result_all(request_list)
+
+engine_helper.uninit_engine()