update multimodal doc and requirement (#47)

x574chen · Xiaotong Chen · web-flow · commit 97108eccb705 · 2024-12-18T19:26:55.000+08:00
Co-authored-by: Xiaotong Chen &lt;“cxt459847@alibaba-inc.com”&gt;
diff --git a/docs/sphinx/vlm/vlm_offline_inference_en.rst b/docs/sphinx/vlm/vlm_offline_inference_en.rst
@@ -107,15 +107,19 @@ Launching with CLI
 You can also opt to install dashinfer-vlm locally and use command line to launch server.
 
 1. Pull dashinfer docker image (see :ref:`docker-label`)
-2. Download and extract the TensorRT GA build
+2. Install TensorRT Python package, and download TensorRT GA build from NVIDIA Developer Zone.
+
+Example: TensorRT 10.6.0.26 for CUDA 12.6, Linux x86_64
 
 .. code-block:: bash
 
+   pip install tensorrt
    wget https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.6.0/tars/TensorRT-10.6.0.26.Linux.x86_64-gnu.cuda-12.6.tar.gz
    tar -xvzf TensorRT-10.6.0.26.Linux.x86_64-gnu.cuda-12.6.tar.gz
-   export TRT_LIBPATH=`pwd`/TensorRT-10.6.0.26
+   export LD_LIBRARY_PATH=`pwd`/TensorRT-10.6.0.26/lib
 
-3. Install ``dashinfer-vlm``: ``pip install dashinfer-vlm``.
+3. Install dashinfer Python Package from `release <https://github.com/modelscope/dash-infer/releases>`_
+4. Install dashinfer-vlm: ``pip install dashinfer-vlm``.
 
 Now you can launch server with command line:
 
diff --git a/multimodal/Dockerfile b/multimodal/Dockerfile
@@ -6,6 +6,7 @@ RUN mkdir /root/code/
 COPY ./dashinfer_vlm /root/code/dashinfer_vlm
 COPY ./setup.py code/
 COPY ./requirements.txt /root/code/requirements.txt
+RUN python3 -m pip install https://github.com/modelscope/dash-infer/releases/download/v2.0.0-rc2/dashinfer-2.0.0rc2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
 RUN python3 -m pip install -r /root/code/requirements.txt --index-url=http://mirrors.aliyun.com/pypi/simple/ --trusted-host=mirrors.aliyun.com
 
 RUN python3 -m pip install -e /root/code/
diff --git a/multimodal/requirements.txt b/multimodal/requirements.txt
@@ -1,4 +1,3 @@
-dashinfer
 av
 numpy==1.24.3
 requests==2.32.3
@@ -12,7 +11,8 @@ shortuuid
 fastapi
 pydantic_settings
 uvicorn
-cmake==3.22.6 
+cmake==3.22.6
 modelscope
 aiohttp
 onnx
+torchvision
diff --git a/multimodal/resource/dashinfer-vlm-arch.png b/multimodal/resource/dashinfer-vlm-arch.png