VectorInstitute
diff --git a/‎.github/workflows/code_checks.yml‎
Lines changed: 2 additions & 2 deletions b/‎.github/workflows/code_checks.yml‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎.github/workflows/docker.yml‎
Lines changed: 55 additions & 0 deletions b/‎.github/workflows/docker.yml‎
Lines changed: 55 additions & 0 deletions
diff --git a/‎.github/workflows/docs_build.yml‎
Lines changed: 3 additions & 3 deletions b/‎.github/workflows/docs_build.yml‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎.github/workflows/docs_deploy.yml‎
Lines changed: 4 additions & 4 deletions b/‎.github/workflows/docs_deploy.yml‎
Lines changed: 4 additions & 4 deletions
diff --git a/‎.github/workflows/publish.yml‎
Lines changed: 3 additions & 3 deletions b/‎.github/workflows/publish.yml‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎.github/workflows/unit_tests.yml‎
Lines changed: 5 additions & 4 deletions b/‎.github/workflows/unit_tests.yml‎
Lines changed: 5 additions & 4 deletions
diff --git a/‎.pre-commit-config.yaml‎
Lines changed: 1 addition & 1 deletion b/‎.pre-commit-config.yaml‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎Dockerfile‎
Lines changed: 15 additions & 42 deletions b/‎Dockerfile‎
Lines changed: 15 additions & 42 deletions
diff --git a/‎README.md‎
Lines changed: 62 additions & 8 deletions b/‎README.md‎
Lines changed: 62 additions & 8 deletions
diff --git a/‎codecov.yml‎
Lines changed: 1 addition & 0 deletions b/‎codecov.yml‎
Lines changed: 1 addition & 0 deletions
@@ -30,7 +30,7 @@ jobs:
     steps:
       - uses: actions/checkout@v4.2.2
       - name: Install uv
-        uses: astral-sh/setup-uv@v5.2.2
+        uses: astral-sh/setup-uv@v5.3.1
         with:
           # Install a specific version of uv.
           version: "0.5.21"
@@ -46,6 +46,6 @@ jobs:
           source .venv/bin/activate
           pre-commit run --all-files
       - name: pip-audit (gh-action-pip-audit)
-        uses: pypa/gh-action-pip-audit@v1.0.8
+        uses: pypa/gh-action-pip-audit@v1.1.0
         with:
           virtual-environment: .venv/
@@ -0,0 +1,55 @@
+name: docker
+
+on:
+  release:
+    types: [published]
+  push:
+    branches:
+      - main
+    paths:
+      - Dockerfile
+      - .github/workflows/docker.yml
+  pull_request:
+    branches:
+      - main
+      - develop
+    paths:
+      - Dockerfile
+      - .github/workflows/docker.yml
+
+jobs:
+  push_to_registry:
+    name: Push Docker image to Docker Hub
+    runs-on: ubuntu-latest
+    steps:
+      - name: Checkout repository
+        uses: actions/checkout@v4.2.2
+
+      - name: Extract vLLM version
+        id: vllm-version
+        run: |
+          VERSION=$(grep -A 1 'name = "vllm"' uv.lock | grep version | cut -d '"' -f 2)
+          echo "version=$VERSION" >> $GITHUB_OUTPUT
+
+      - name: Log in to Docker Hub
+        uses: docker/login-action@74a5d142397b4f367a81961eba4e8cd7edddf772
+        with:
+          username: ${{ secrets.DOCKER_USERNAME }}
+          password: ${{ secrets.DOCKER_PASSWORD }}
+
+      - name: Extract metadata (tags, labels) for Docker
+        id: meta
+        uses: docker/metadata-action@902fa8ec7d6ecbf8d84d538b9b233a880e428804
+        with:
+          images: vectorinstitute/vector-inference
+
+      - name: Build and push Docker image
+        uses: docker/build-push-action@471d1dc4e07e5cdedd4c2171150001c434f0b7a4
+        with:
+          context: .
+          file: ./Dockerfile
+          push: true
+          tags: |
+            ${{ steps.meta.outputs.tags }}
+            vectorinstitute/vector-inference:${{ steps.vllm-version.outputs.version }}
+          labels: ${{ steps.meta.outputs.labels }}
@@ -27,18 +27,18 @@ jobs:
       - uses: actions/checkout@v4.2.2
 
       - name: Install uv
-        uses: astral-sh/setup-uv@4db96194c378173c656ce18a155ffc14a9fc4355
+        uses: astral-sh/setup-uv@f94ec6bedd8674c4426838e6b50417d36b6ab231
         with:
           version: "0.5.21"
           enable-cache: true
 
       - name: "Set up Python"
-        uses: actions/setup-python@42375524e23c412d93fb67b49958b491fce71c38
+        uses: actions/setup-python@8039c45ed9a312fba91f3399cd0605ba2ebfe93c
         with:
           python-version-file: ".python-version"
 
       - name: Install the project
-        run: uv sync --all-extras --all-groups
+        run: uv sync --dev --group docs
 
       - name: Build docs
         run: cd docs && rm -rf source/reference/api/_autosummary && uv run make html
@@ -31,19 +31,19 @@ jobs:
           submodules: 'true'
 
       - name: Install uv
-        uses: astral-sh/setup-uv@4db96194c378173c656ce18a155ffc14a9fc4355
+        uses: astral-sh/setup-uv@f94ec6bedd8674c4426838e6b50417d36b6ab231
         with:
           # Install a specific version of uv.
           version: "0.5.21"
           enable-cache: true
 
       - name: "Set up Python"
-        uses: actions/setup-python@42375524e23c412d93fb67b49958b491fce71c38
+        uses: actions/setup-python@8039c45ed9a312fba91f3399cd0605ba2ebfe93c
         with:
           python-version-file: ".python-version"
 
       - name: Install the project
-        run: uv sync --all-extras --all-groups
+        run: uv sync --dev --group docs
 
       - name: Build docs
         run: |
@@ -53,7 +53,7 @@ jobs:
           touch build/html/.nojekyll
 
       - name: Deploy to Github pages
-        uses: JamesIves/github-pages-deploy-action@15de0f09300eea763baee31dff6c6184995c5f6a
+        uses: JamesIves/github-pages-deploy-action@6c2d9db40f9296374acc17b90404b6e8864128c8
         with:
           branch: github_pages
           folder: docs/build/html
@@ -12,16 +12,16 @@ jobs:
         run: |
           sudo apt-get update
           sudo apt-get install libcurl4-openssl-dev libssl-dev
-      - uses: actions/checkout@v4.1.1
+      - uses: actions/checkout@v4.2.2
       - name: Install poetry
         run: python3 -m pip install --upgrade pip && python3 -m pip install poetry
-      - uses: actions/setup-python@v5.0.0
+      - uses: actions/setup-python@v5.4.0
         with:
           python-version: '3.10'
       - name: Build package
         run: poetry build
       - name: Publish package
-        uses: pypa/gh-action-pypi-publish@27b31702a0e7fc50959f5ad993c78deac1bdfc29
+        uses: pypa/gh-action-pypi-publish@76f52bc884231f62b9a034ebfe128415bbaabdfc
         with:
           user: __token__
           password: ${{ secrets.PYPI_API_TOKEN }}
@@ -46,7 +46,7 @@ jobs:
       - uses: actions/checkout@v4.2.2
 
       - name: Install uv
-        uses: astral-sh/setup-uv@v5.2.2
+        uses: astral-sh/setup-uv@v5.3.1
         with:
           # Install a specific version of uv.
           version: "0.5.21"
@@ -58,17 +58,18 @@ jobs:
           python-version: ${{ matrix.python-version }}
 
       - name: Install the project
-        run: uv sync --all-extras --dev
+        run: uv sync --dev
 
       - name: Install dependencies and check code
         run: |
           uv run pytest -m "not integration_test" --cov vec_inf --cov-report=xml tests
 
       # Uncomment this once this repo is configured on Codecov
       - name: Upload coverage to Codecov
-        uses: codecov/codecov-action@v5.3.1
+        uses: codecov/codecov-action@v5.4.0
         with:
           token: ${{ secrets.CODECOV_TOKEN }}
-          slug: VectorInstitute/vec-inf
+          file: ./coverage.xml
+          name: codecov-umbrella
           fail_ci_if_error: true
           verbose: true
@@ -16,7 +16,7 @@ repos:
     - id: check-toml
 
   - repo: https://github.com/astral-sh/ruff-pre-commit
-    rev: 'v0.9.6'
+    rev: 'v0.11.0'
     hooks:
     - id: ruff
       args: [--fix, --exit-non-zero-on-fix]
 
@@ -12,27 +12,14 @@ ARG TORCH_CUDA_ARCH_LIST="7.5;8.0;8.6+PTX"
 # Set the Python version
 ARG PYTHON_VERSION=3.10.12
 
-# Install dependencies for building Python
+# Install system dependencies
 RUN apt-get update && apt-get install -y \
-    wget \
-    build-essential \
-    libssl-dev \
-    zlib1g-dev \
-    libbz2-dev \
-    libreadline-dev \
-    libsqlite3-dev \
-    libffi-dev \
-    libncursesw5-dev \
-    xz-utils \
-    tk-dev \
-    libxml2-dev \
-    libxmlsec1-dev \
-    liblzma-dev \
-    git \
-    vim \
+    wget build-essential libssl-dev zlib1g-dev libbz2-dev \
+    libreadline-dev libsqlite3-dev libffi-dev libncursesw5-dev \
+    xz-utils tk-dev libxml2-dev libxmlsec1-dev liblzma-dev git vim \
     && rm -rf /var/lib/apt/lists/*
 
-# Download and install Python from precompiled binaries
+# Install Python
 RUN wget https://www.python.org/ftp/python/$PYTHON_VERSION/Python-$PYTHON_VERSION.tgz && \
     tar -xzf Python-$PYTHON_VERSION.tgz && \
     cd Python-$PYTHON_VERSION && \
@@ -42,38 +29,24 @@ RUN wget https://www.python.org/ftp/python/$PYTHON_VERSION/Python-$PYTHON_VERSIO
     cd .. && \
     rm -rf Python-$PYTHON_VERSION.tgz Python-$PYTHON_VERSION
 
-# Download and install pip using get-pip.py
+# Install pip and core Python tools
 RUN wget https://bootstrap.pypa.io/get-pip.py && \
     python3.10 get-pip.py && \
-    rm get-pip.py
+    rm get-pip.py && \
+    python3.10 -m pip install --upgrade pip setuptools wheel uv
 
-# Ensure pip for Python 3.10 is used
-RUN python3.10 -m pip install --upgrade pip setuptools wheel
-
-# Install Poetry using Python 3.10
-RUN python3.10 -m pip install poetry
-
-# Don't create venv
-RUN poetry config virtualenvs.create false
-
-# Set working directory
+# Set up project
 WORKDIR /vec-inf
-
-# Copy current directory
 COPY . /vec-inf
 
-# Update Poetry lock file if necessary
-RUN poetry lock
-
-# Install vec-inf
-RUN poetry install --extras "dev"
-
-# Install Flash Attention 2 backend
+# Install project dependencies with build requirements
+RUN PIP_INDEX_URL="https://download.pytorch.org/whl/cu121" uv pip install --system -e .[dev]
+# Install Flash Attention
 RUN python3.10 -m pip install flash-attn --no-build-isolation
 
-# Move nccl to accessible location
-RUN mkdir -p /vec-inf/nccl
-RUN mv /root/.config/vllm/nccl/cu12/libnccl.so.2.18.1 /vec-inf/nccl/libnccl.so.2.18.1;
+# Final configuration
+RUN mkdir -p /vec-inf/nccl && \
+    mv /root/.config/vllm/nccl/cu12/libnccl.so.2.18.1 /vec-inf/nccl/libnccl.so.2.18.1
 
 # Set the default command to start an interactive shell
 CMD ["bash"]
@@ -3,36 +3,90 @@
 ----------------------------------------------------
 
 [![code checks](https://github.com/VectorInstitute/vector-inference/actions/workflows/code_checks.yml/badge.svg)](https://github.com/VectorInstitute/vector-inference/actions/workflows/code_checks.yml)
-[![docs](https://github.com/VectorInstitute/vector-inference/actions/workflows/docs_build.yml/badge.svg)](https://github.com/VectorInstitute/vector-inference/actions/workflows/docs_build.yml)
-[![codecov](https://codecov.io/github/VectorInstitute/vector-inference/graph/badge.svg?token=83MYFZ3UPA)](https://codecov.io/github/VectorInstitute/vector-inference)
+[![docs](https://github.com/VectorInstitute/vector-inference/actions/workflows/docs_deploy.yml/badge.svg)](https://github.com/VectorInstitute/vector-inference/actions/workflows/docs_deploy.yml)
+[![codecov](https://codecov.io/github/VectorInstitute/vector-inference/branch/develop/graph/badge.svg?token=NI88QSIGAC)](https://app.codecov.io/github/VectorInstitute/vector-inference/tree/develop)
 ![GitHub License](https://img.shields.io/github/license/VectorInstitute/vector-inference)
 
-This repository provides an easy-to-use solution to run inference servers on [Slurm](https://slurm.schedmd.com/overview.html)-managed computing clusters using [vLLM](https://docs.vllm.ai/en/latest/). **All scripts in this repository runs natively on the Vector Institute cluster environment**. To adapt to other environments, update [`launch_server.sh`](vec_inf/launch_server.sh), [`vllm.slurm`](vec_inf/vllm.slurm), [`multinode_vllm.slurm`](vec_inf/multinode_vllm.slurm) and [`models.csv`](vec_inf/models/models.csv) accordingly.
+This repository provides an easy-to-use solution to run inference servers on [Slurm](https://slurm.schedmd.com/overview.html)-managed computing clusters using [vLLM](https://docs.vllm.ai/en/latest/). **All scripts in this repository runs natively on the Vector Institute cluster environment**. To adapt to other environments, update [`launch_server.sh`](vec_inf/launch_server.sh), [`vllm.slurm`](vec_inf/vllm.slurm), [`multinode_vllm.slurm`](vec_inf/multinode_vllm.slurm) and [`models.csv`](vec_inf/config/models.yaml) accordingly.
 
 ## Installation
 If you are using the Vector cluster environment, and you don't need any customization to the inference server environment, run the following to install package:
+
 ```bash
 pip install vec-inf
 ```
 Otherwise, we recommend using the provided [`Dockerfile`](Dockerfile) to set up your own environment with the package
 
-## Launch an inference server
+## Usage
+
 ### `launch` command
+
+The `launch` command allows users to deploy a model as a slurm job. If the job successfully launches, a URL endpoint is exposed for
+the user to send requests for inference.
+
 We will use the Llama 3.1 model as example, to launch an OpenAI compatible inference server for Meta-Llama-3.1-8B-Instruct, run:
+
 ```bash
 vec-inf launch Meta-Llama-3.1-8B-Instruct
 ```
 You should see an output like the following:
 
 <img width="600" alt="launch_img" src="https://github.com/user-attachments/assets/ab658552-18b2-47e0-bf70-e539c3b898d5">
 
-The model would be launched using the [default parameters](vec_inf/models/models.csv), you can override these values by providing additional parameters, use `--help` to see the full list. You can also launch your own customized model as long as the model architecture is [supported by vLLM](https://docs.vllm.ai/en/stable/models/supported_models.html), and make sure to follow the instructions below:
+#### Overrides
+
+Models that are already supported by `vec-inf` would be launched using the [default parameters](vec_inf/config/models.yaml). You can override these values by providing additional parameters. Use `vec-inf launch --help` to see the full list of parameters that can be
+overriden. For example, if `qos` is to be overriden:
+
+```bash
+vec-inf launch Meta-Llama-3.1-8B-Instruct --qos <new_qos>
+```
+
+#### Custom models
+
+You can also launch your own custom model as long as the model architecture is [supported by vLLM](https://docs.vllm.ai/en/stable/models/supported_models.html), and make sure to follow the instructions below:
 * Your model weights directory naming convention should follow `$MODEL_FAMILY-$MODEL_VARIANT`.
-* Your model weights directory should contain HF format weights.
-* The following launch parameters will conform to default value if not specified: `--max-num-seqs`, `--partition`, `--data-type`, `--venv`, `--log-dir`, `--model-weights-parent-dir`, `--pipeline-parallelism`, `--enforce-eager`. All other launch parameters need to be specified for custom models.
-* Example for setting the model weights parent directory: `--model-weights-parent-dir /h/user_name/my_weights`.
+* Your model weights directory should contain HuggingFace format weights.
+* You should create a custom configuration file for your model and specify its path via setting the environment variable `VEC_INF_CONFIG`
+Check the [default parameters](vec_inf/config/models.yaml) file for the format of the config file. All the parameters for the model
+should be specified in that config file.
 * For other model launch parameters you can reference the default values for similar models using the [`list` command ](#list-command).
 
+Here is an example to deploy a custom [Qwen2.5-7B-Instruct-1M](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct-1M) model which is not
+supported in the default list of models using a user custom config. In this case, the model weights are assumed to be downloaded to
+a `model-weights` directory inside the user's home directory. The weights directory of the model follows the naming convention so it
+would be named `Qwen2.5-7B-Instruct-1M`. The following yaml file would need to be created, lets say it is named `/h/<username>/my-model-config.yaml`.
+
+```yaml
+models:
+  Qwen2.5-7B-Instruct-1M:
+    model_family: Qwen2.5
+    model_variant: 7B-Instruct-1M
+    model_type: LLM
+    num_gpus: 2
+    num_nodes: 1
+    vocab_size: 152064
+    max_model_len: 1010000
+    max_num_seqs: 256
+    pipeline_parallelism: true
+    enforce_eager: false
+    qos: m2
+    time: 08:00:00
+    partition: a40
+    data_type: auto
+    venv: singularity
+    log_dir: default
+    model_weights_parent_dir: /h/<username>/model-weights
+```
+
+You would then set the `VEC_INF_CONFIG` path using:
+
+```bash
+export VEC_INF_CONFIG=/h/<username>/my-model-config.yaml
+```
+
+Alternatively, you can also use launch parameters to set these values instead of using a user-defined config.
+
 ### `status` command
 You can check the inference server status by providing the Slurm job ID to the `status` command:
 ```bash
 
@@ -1,4 +1,5 @@
 codecov:
+  branch: develop
   require_ci_to_pass: true
   notify:
     after_n_builds: 2
Original file line number	Diff line number	Diff line change
`@@ -1,4 +1,5 @@`
`1`	`1`	`codecov:`
	`2`	`+ branch: develop`
`2`	`3`	`require_ci_to_pass: true`
`3`	`4`	`notify:`
`4`	`5`	`after_n_builds: 2`