Skip to content

Commit e8ef7e4

Browse files
author
yejunjin
authored
add dashinfer worker to servicize (#32)
1 parent 93a7ad7 commit e8ef7e4

File tree

8 files changed

+583
-0
lines changed

8 files changed

+583
-0
lines changed

documents/CN/examples_python.md

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -296,7 +296,67 @@ Running on local URL: http://127.0.0.1:7860
296296
297297
To create a public link, set `share=True` in `launch()`.
298298
```
299+
# 4_fastchat
299300

301+
[fastchat](https://github.com/lm-sys/FastChat)是一个开源的服务平台, 用于训练、服务和评估大语言模型聊天机器人。它提供以worker方式将一个推理引擎后端接入平台,提供兼容openai api的服务。
302+
[examples/python/4_fastchat/dashinfer_worker.py](../../examples/python/4_fastchat/dashinfer_worker.py)中,我们提供了一个使用FastChat与DashInfer实现worker的示例代码。用户仅需简单地将FastChat服务组件中的默认`fastchat.serve.model_worker`替换为`dashinfer_worker`,即可实现一个既兼容OpenAI API又能高效利用CPU资源进行推理的解决方案。
303+
304+
305+
## Step 1: 安装fastchat
306+
```shell
307+
pip install "fschat[model_worker]"
308+
```
309+
## Step 2: 启动fastchat相关服务
310+
```shell
311+
python -m fastchat.serve.controller
312+
python -m fastchat.serve.openai_api_server --host localhost --port 8000
313+
```
314+
315+
## Step 3: 启动dashinfer_worker
316+
```shell
317+
python dashinfer_worker.py --model-path qwen/Qwen-7B-Chat ../model_config/config_qwen_v10_7b.json
318+
```
319+
320+
## Step 4: 使用cURL发送HTTP请求访问兼容openai api接口
321+
```shell
322+
curl http://localhost:8000/v1/chat/completions \
323+
-H "Content-Type: application/json" \
324+
-d '{
325+
"model": "Qwen-7B-Chat",
326+
"messages": [{"role": "user", "content": "Hello! What is your name?"}]
327+
}'
328+
```
329+
## 使用Docker快速启动
330+
此外,我们还提供了便捷的Docker镜像,使您能够快速部署一个集成dashinfer_worker并兼容OpenAI API的HTTP服务,只需执行以下命令:
331+
332+
启动Docker容器时,请确保替换尖括号内的路径为实际路径,并遵循以下命令格式:
333+
```shell
334+
docker run -d \
335+
--network host \
336+
-v <host_path_to_your_model>:<container_path_to_your_model> \
337+
-v <host_path_to_dashinfer_json_config_file>:<container_path_to_dashinfer_json_config_file> \
338+
dashinfer/fschat_ubuntu_x86:v1.2.1 \
339+
-m <container_path_to_your_model> \
340+
<container_path_to_dashinfer_json_config_file>
341+
```
342+
343+
- <host_path_to_your_model>: host上存放ModelScope/HuggingFace模型的路径
344+
- <container_path_to_your_model>: 要绑定到container中存放ModelScope/HuggingFace模型的路径
345+
- <host_path_to_dashinfer_json_config_file>: host上DashInfer的json配置文件的路径
346+
- <container_path_to_dashinfer_json_config_file>: 要绑定到container中DashInfer json配置文件的路径
347+
- -m选项:表示container中ModelScope/HuggingFace的路径,取决于-v选项中host上路径绑定到container中的路径。若这里为ModelScope/HuggingFace中标准路径(例如:
348+
qwen/Qwen-7B-Chat),那么不需要将host上模型路径绑定到container中,容器会自动为你下载模型。
349+
350+
下面是一个启动Qwen-7B-Chat模型服务的例子,默认host为localhost、端口为8000.
351+
```shell
352+
docker run -d \
353+
--network host \
354+
-v ~/.cache/modelscope/hub/qwen/Qwen-7B-Chat:/workspace/qwen/Qwen-7B-Chat \
355+
-v examples/python/model_config/config_qwen_v10_7b.json:/workspace/config_qwen_v10_7b.json \
356+
dashinfer/fschat_ubuntu_x86:v1.2.1 \
357+
-m /workspace/qwen/Qwen-7B-Chat \
358+
/workspace/config_qwen_v10_7b.json
359+
```
300360
# 模型配置文件
301361

302362
`<path_to_dashinfer>/examples/python/model_config`目录下提供了一些config示例。

documents/EN/examples_python.md

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -297,6 +297,66 @@ Running on local URL: http://127.0.0.1:7860
297297
To create a public link, set `share=True` in `launch()`.
298298
```
299299

300+
# 4_fastchat
301+
[FastChat](https://github.com/lm-sys/FastChat) is an open-source platform designed for training, serving, and evaluating large language model chatbots. It facilitates integrating an inference engine backend into the platform in a worker-based manner, providing services compatible with the OpenAI API.
302+
303+
In the [examples/python/4_fastchat/dashinfer_worker.py](../../examples/python/4_fastchat/dashinfer_worker.py) file, we supply a sample code demonstrating the implementation of a worker using FastChat and DashInfer. Users can readily substitute the default `fastchat.serve.model_worker` in the FastChat service component with `dashinfer_worker`, thereby achieving a solution that is not only compatible with the OpenAI API but also optimizes CPU usage for efficient inference.
304+
305+
## Step 1: Install FastChat
306+
```shell
307+
pip install "fschat[model_worker]"
308+
```
309+
310+
## Step 2: Start FastChat Services
311+
```shell
312+
python -m fastchat.serve.controller
313+
python -m fastchat.serve.openai_api_server --host localhost --port 8000
314+
```
315+
316+
## Step 3: Launch dashinfer_worker
317+
```shell
318+
python dashinfer_worker.py --model-path qwen/Qwen-7B-Chat ../model_config/config_qwen_v10_7b.json
319+
```
320+
321+
## Step 4: Send HTTP Request via cURL to Access OpenAI API-Compatible Endpoint
322+
```shell
323+
curl http://localhost:8000/v1/chat/completions \
324+
-H "Content-Type: application/json" \
325+
-d '{
326+
"model": "Qwen-7B-Chat",
327+
"messages": [{"role": "user", "content": "Hello! What is your name?"}]
328+
}'
329+
```
330+
331+
## Quick Start with Docker
332+
Furthermore, we provide a convenient Docker image, enabling rapid deployment of an HTTP service that integrates dashinfer_worker and is compatible with the OpenAI API. Execute the following command, ensuring to replace bracketed paths with actual paths:
333+
334+
```shell
335+
docker run -d \
336+
--network host \
337+
-v <host_path_to_your_model>:<container_path_to_your_model> \
338+
-v <host_path_to_dashinfer_json_config_file>:<container_path_to_dashinfer_json_config_file> \
339+
dashinfer/fschat_ubuntu_x86:v1.2.1 \
340+
<container_path_to_your_model> \
341+
<container_path_to_dashinfer_json_config_file>
342+
```
343+
- `<host_path_to_your_model>`: The path on the host where ModelScope/HuggingFace models reside.
344+
- `<container_path_to_your_model>`: The path within the container for mounting ModelScope/HuggingFace models.
345+
- `<host_path_to_dashinfer_json_config_file>`: The location of the DashInfer JSON configuration file on the host.
346+
- `<container_path_to_dashinfer_json_config_file>`: The destination path in the container for the DashInfer JSON configuration file.
347+
- The `-m` flag denotes the path to ModelScope/HuggingFace within the container, which is determined by the host-to-container path binding specified in `-v`. If this refers to a standard ModelScope/HuggingFace path (e.g., `qwen/Qwen-7B-Chat`), there's no need to bind the model path from the host; the container will automatically download the model for you.
348+
349+
Below is an example of launching a Qwen-7B-Chat model service, with the default host set to localhost and the port to 8000.
350+
```shell
351+
docker run -d \
352+
--network host \
353+
-v ~/.cache/modelscope/hub/qwen/Qwen-7B-Chat:/workspace/qwen/Qwen-7B-Chat \
354+
-v examples/python/model_config/config_qwen_v10_7b.json:/workspace/config_qwen_v10_7b.json \
355+
dashinfer/fschat_ubuntu_x86:v1.2.1 \
356+
/workspace/qwen/Qwen-7B-Chat \
357+
/workspace/config_qwen_v10_7b.json
358+
```
359+
300360
# Model Configuration Files
301361

302362
The `<path_to_dashinfer>/examples/python/model_config` directory provides several configuration examples.

0 commit comments

Comments
 (0)