File tree Expand file tree Collapse file tree 3 files changed +15
-9
lines changed
Expand file tree Collapse file tree 3 files changed +15
-9
lines changed Original file line number Diff line number Diff line change @@ -97,26 +97,30 @@ You can also use OpenAI's Python client library:
9797 },
9898 ],
9999 }],
100- stream = False ,
100+ stream = True ,
101101 max_completion_tokens = 1024 ,
102102 temperature = 0.1 ,
103103 )
104104
105+ full_response = " "
106+ for chunk in response:
107+ full_response += chunk.choices[0 ].delta.content
108+ print (" ." , end = " " )
109+
110+ print (f " \n Full Response: \n { full_response} " )
111+
105112 Launching with CLI
106113-------------------------
107114You can also opt to install dashinfer-vlm locally and use command line to launch server.
108115
1091161. Pull dashinfer docker image (see :ref: `docker-label `)
1101172. Install TensorRT Python package, and download TensorRT GA build from NVIDIA Developer Zone.
111118
112- Example: TensorRT 10.6.0.26 for CUDA 12.6, Linux x86_64
113-
114119.. code-block :: bash
115120
116- pip install tensorrt
117- wget https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.6.0/tars/TensorRT-10.6.0.26.Linux.x86_64-gnu.cuda-12.6.tar.gz
118- tar -xvzf TensorRT-10.6.0.26.Linux.x86_64-gnu.cuda-12.6.tar.gz
119- export LD_LIBRARY_PATH=` pwd` /TensorRT-10.6.0.26/lib
121+ wget https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.5.0/tars/TensorRT-10.5.0.18.Linux.x86_64-gnu.cuda-12.6.tar.gz
122+ tar -xvzf TensorRT-10.5.0.18.Linux.x86_64-gnu.cuda-12.6.tar.gz
123+ export LD_LIBRARY_PATH=` pwd` /TensorRT-10.5.0.18/lib
120124
121125 3. Install dashinfer Python Package from `release <https://github.com/modelscope/dash-infer/releases >`_
1221264. Install dashinfer-vlm: ``pip install dashinfer-vlm ``.
Original file line number Diff line number Diff line change @@ -76,7 +76,8 @@ def init():
7676 context .set ("chat_format" , chat_format )
7777
7878 # -----------------------Convert Model------------------------
79- output_dir = "/root/.cache/as_model/" + model .split ("/" )[- 1 ]
79+ home_dir = os .environ .get ("HOME" ) or "/root"
80+ output_dir = os .path .join (home_dir , ".cache/as_model/" , model .split ("/" )[- 1 ])
8081 model_name = "model"
8182 data_type = "bfloat16"
8283
Original file line number Diff line number Diff line change 1+ tensorrt == 10.5.0
12av
23numpy == 1.24.3
34requests == 2.32.3
@@ -6,7 +7,7 @@ transformers>=4.45.0
67cachetools >= 5.4.0
78six
89tiktoken
9- openai == 1.52 .2
10+ openai >= 1.56 .2
1011shortuuid
1112fastapi
1213pydantic_settings
You can’t perform that action at this time.
0 commit comments