Skip to content

Commit 0174d94

Browse files
x574chenXiaotong Chen
andauthored
multimodal: update doc and model path in launcher (#48)
* update multimodal doc and requirement * update model path --------- Co-authored-by: Xiaotong Chen <“cxt459847@alibaba-inc.com”>
1 parent 97108ec commit 0174d94

File tree

3 files changed

+15
-9
lines changed

3 files changed

+15
-9
lines changed

docs/sphinx/vlm/vlm_offline_inference_en.rst

Lines changed: 11 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -97,26 +97,30 @@ You can also use OpenAI's Python client library:
9797
},
9898
],
9999
}],
100-
stream=False,
100+
stream=True,
101101
max_completion_tokens=1024,
102102
temperature=0.1,
103103
)
104104
105+
full_response = ""
106+
for chunk in response:
107+
full_response += chunk.choices[0].delta.content
108+
print(".", end="")
109+
110+
print(f"\nFull Response: \n{full_response}")
111+
105112
Launching with CLI
106113
-------------------------
107114
You can also opt to install dashinfer-vlm locally and use command line to launch server.
108115

109116
1. Pull dashinfer docker image (see :ref:`docker-label`)
110117
2. Install TensorRT Python package, and download TensorRT GA build from NVIDIA Developer Zone.
111118

112-
Example: TensorRT 10.6.0.26 for CUDA 12.6, Linux x86_64
113-
114119
.. code-block:: bash
115120
116-
pip install tensorrt
117-
wget https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.6.0/tars/TensorRT-10.6.0.26.Linux.x86_64-gnu.cuda-12.6.tar.gz
118-
tar -xvzf TensorRT-10.6.0.26.Linux.x86_64-gnu.cuda-12.6.tar.gz
119-
export LD_LIBRARY_PATH=`pwd`/TensorRT-10.6.0.26/lib
121+
wget https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.5.0/tars/TensorRT-10.5.0.18.Linux.x86_64-gnu.cuda-12.6.tar.gz
122+
tar -xvzf TensorRT-10.5.0.18.Linux.x86_64-gnu.cuda-12.6.tar.gz
123+
export LD_LIBRARY_PATH=`pwd`/TensorRT-10.5.0.18/lib
120124
121125
3. Install dashinfer Python Package from `release <https://github.com/modelscope/dash-infer/releases>`_
122126
4. Install dashinfer-vlm: ``pip install dashinfer-vlm``.

multimodal/dashinfer_vlm/api_server/server.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -76,7 +76,8 @@ def init():
7676
context.set("chat_format", chat_format)
7777

7878
# -----------------------Convert Model------------------------
79-
output_dir = "/root/.cache/as_model/" + model.split("/")[-1]
79+
home_dir = os.environ.get("HOME") or "/root"
80+
output_dir = os.path.join(home_dir, ".cache/as_model/", model.split("/")[-1])
8081
model_name = "model"
8182
data_type = "bfloat16"
8283

multimodal/requirements.txt

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
tensorrt==10.5.0
12
av
23
numpy==1.24.3
34
requests==2.32.3
@@ -6,7 +7,7 @@ transformers>=4.45.0
67
cachetools>=5.4.0
78
six
89
tiktoken
9-
openai==1.52.2
10+
openai>=1.56.2
1011
shortuuid
1112
fastapi
1213
pydantic_settings

0 commit comments

Comments
 (0)