Skip to content

Commit 4db040c

Browse files
authored
Merge branch 'main' into dependabot/github_actions/docker/build-push-action-6.18.0
2 parents 292b219 + aa7800f commit 4db040c

26 files changed

+2935
-991
lines changed

README.md

Lines changed: 13 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
[![vLLM](https://img.shields.io/badge/vllm-0.8.5.post1-blue)](https://docs.vllm.ai/en/v0.8.5.post1/index.html)
1111
![GitHub License](https://img.shields.io/github/license/VectorInstitute/vector-inference)
1212

13-
This repository provides an easy-to-use solution to run inference servers on [Slurm](https://slurm.schedmd.com/overview.html)-managed computing clusters using [vLLM](https://docs.vllm.ai/en/latest/). **All scripts in this repository runs natively on the Vector Institute cluster environment**. To adapt to other environments, update the environment variables in [`vec_inf/client/slurm_vars.py`](vec_inf/client/slurm_vars.py), and the model config for cached model weights in [`vec_inf/config/models.yaml`](vec_inf/config/models.yaml) accordingly.
13+
This repository provides an easy-to-use solution to run inference servers on [Slurm](https://slurm.schedmd.com/overview.html)-managed computing clusters using [vLLM](https://docs.vllm.ai/en/latest/). **All scripts in this repository runs natively on the Vector Institute cluster environment**. To adapt to other environments, follow the instructions in [Installation](#installation).
1414

1515
## Installation
1616
If you are using the Vector cluster environment, and you don't need any customization to the inference server environment, run the following to install package:
@@ -20,6 +20,11 @@ pip install vec-inf
2020
```
2121
Otherwise, we recommend using the provided [`Dockerfile`](Dockerfile) to set up your own environment with the package. The latest image has `vLLM` version `0.8.5.post1`.
2222

23+
If you'd like to use `vec-inf` on your own Slurm cluster, you would need to update the configuration files, there are 3 ways to do it:
24+
* Clone the repository and update the `environment.yaml` and the `models.yaml` file in [`vec_inf/config`](vec_inf/config/), then install from source by running `pip install .`.
25+
* The package would try to look for cached configuration files in your environment before using the default configuration. The default cached configuration directory path points to `/model-weights/vec-inf-shared`, you would need to create an `environment.yaml` and a `models.yaml` following the format of these files in [`vec_inf/config`](vec_inf/config/).
26+
* The package would also look for an enviroment variable `VEC_INF_CONFIG_DIR`. You can put your `environment.yaml` and `models.yaml` in a directory of your choice and set the enviroment variable `VEC_INF_CONFIG_DIR` to point to that location.
27+
2328
## Usage
2429

2530
Vector Inference provides 2 user interfaces, a CLI and an API
@@ -61,7 +66,8 @@ You can also launch your own custom model as long as the model architecture is [
6166
* Your model weights directory naming convention should follow `$MODEL_FAMILY-$MODEL_VARIANT` ($MODEL_VARIANT is OPTIONAL).
6267
* Your model weights directory should contain HuggingFace format weights.
6368
* You should specify your model configuration by:
64-
* Creating a custom configuration file for your model and specify its path via setting the environment variable `VEC_INF_CONFIG`. Check the [default parameters](vec_inf/config/models.yaml) file for the format of the config file. All the parameters for the model should be specified in that config file.
69+
* Creating a custom configuration file for your model and specify its path via setting the environment variable `VEC_INF_MODEL_CONFIG` (This one will supersede `VEC_INF_CONFIG_DIR` if that is also set). Check the [default parameters](vec_inf/config/models.yaml) file for the format of the config file. All the parameters for the model should be specified in that config file.
70+
* Add your model configuration to the cached `models.yaml` in your cluster environment (if you have write access to the cached configuration directory).
6571
* Using launch command options to specify your model setup.
6672
* For other model launch parameters you can reference the default values for similar models using the [`list` command ](#list-command).
6773

@@ -89,10 +95,10 @@ models:
8995
--compilation-config: 3
9096
```
9197
92-
You would then set the `VEC_INF_CONFIG` path using:
98+
You would then set the `VEC_INF_MODEL_CONFIG` path using:
9399

94100
```bash
95-
export VEC_INF_CONFIG=/h/<username>/my-model-config.yaml
101+
export VEC_INF_MODEL_CONFIG=/h/<username>/my-model-config.yaml
96102
```
97103

98104
**NOTE**
@@ -103,10 +109,11 @@ export VEC_INF_CONFIG=/h/<username>/my-model-config.yaml
103109

104110
#### Other commands
105111

106-
* `status`: Check the model status by providing its Slurm job ID, `--json-mode` supported.
112+
* `batch-launch`: Launch multiple model inference servers at once, currently ONLY single node models supported,
113+
* `status`: Check the model status by providing its Slurm job ID.
107114
* `metrics`: Streams performance metrics to the console.
108115
* `shutdown`: Shutdown a model by providing its Slurm job ID.
109-
* `list`: List all available model names, or view the default/cached configuration of a specific model, `--json-mode` supported.
116+
* `list`: List all available model names, or view the default/cached configuration of a specific model.
110117
* `cleanup`: Remove old log directories. You can filter by `--model-family`, `--model-name`, `--job-id`, and/or `--before-job-id`. Use `--dry-run` to preview what would be deleted.
111118

112119
For more details on the usage of these commands, refer to the [User Guide](https://vectorinstitute.github.io/vector-inference/user_guide/)

docs/index.md

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Vector Inference: Easy inference on Slurm clusters
22

3-
This repository provides an easy-to-use solution to run inference servers on [Slurm](https://slurm.schedmd.com/overview.html)-managed computing clusters using [vLLM](https://docs.vllm.ai/en/latest/). **All scripts in this repository runs natively on the Vector Institute cluster environment**. To adapt to other environments, update the environment variables in [`vec_inf/client/slurm_vars.py`](https://github.com/VectorInstitute/vector-inference/blob/main/vec_inf/client/slurm_vars.py), and the model config for cached model weights in [`vec_inf/config/models.yaml`](https://github.com/VectorInstitute/vector-inference/blob/main/vec_inf/config/models.yaml) accordingly.
3+
This repository provides an easy-to-use solution to run inference servers on [Slurm](https://slurm.schedmd.com/overview.html)-managed computing clusters using [vLLM](https://docs.vllm.ai/en/latest/). **All scripts in this repository runs natively on the Vector Institute cluster environment**. To adapt to other environments, follow the instructions in [Installation](#installation).
44

55
## Installation
66

@@ -11,3 +11,8 @@ pip install vec-inf
1111
```
1212

1313
Otherwise, we recommend using the provided [`Dockerfile`](https://github.com/VectorInstitute/vector-inference/blob/main/Dockerfile) to set up your own environment with the package. The latest image has `vLLM` version `0.8.5.post1`.
14+
15+
If you'd like to use `vec-inf` on your own Slurm cluster, you would need to update the configuration files, there are 3 ways to do it:
16+
* Clone the repository and update the `environment.yaml` and the `models.yaml` file in [`vec_inf/config`](vec_inf/config/), then install from source by running `pip install .`.
17+
* The package would try to look for cached configuration files in your environment before using the default configuration. The default cached configuration directory path points to `/model-weights/vec-inf-shared`, you would need to create an `environment.yaml` and a `models.yaml` following the format of these files in [`vec_inf/config`](vec_inf/config/).
18+
* The package would also look for an enviroment variable `VEC_INF_CONFIG_DIR`. You can put your `environment.yaml` and `models.yaml` in a directory of your choice and set the enviroment variable `VEC_INF_CONFIG_DIR` to point to that location.

docs/user_guide.md

Lines changed: 55 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44

55
### `launch` command
66

7-
The `launch` command allows users to deploy a model as a slurm job. If the job successfully launches, a URL endpoint is exposed for the user to send requests for inference.
7+
The `launch` command allows users to launch a OpenAI-compatible model inference server as a slurm job. If the job successfully launches, a URL endpoint is exposed for the user to send requests for inference.
88

99
We will use the Llama 3.1 model as example, to launch an OpenAI compatible inference server for Meta-Llama-3.1-8B-Instruct, run:
1010

@@ -58,7 +58,8 @@ You can also launch your own custom model as long as the model architecture is [
5858
* Your model weights directory naming convention should follow `$MODEL_FAMILY-$MODEL_VARIANT` ($MODEL_VARIANT is OPTIONAL).
5959
* Your model weights directory should contain HuggingFace format weights.
6060
* You should specify your model configuration by:
61-
* Creating a custom configuration file for your model and specify its path via setting the environment variable `VEC_INF_CONFIG`. Check the [default parameters](https://github.com/VectorInstitute/vector-inference/blob/main/vec_inf/config/models.yaml) file for the format of the config file. All the parameters for the model should be specified in that config file.
61+
* Creating a custom configuration file for your model and specify its path via setting the environment variable `VEC_INF_MODEL_CONFIG` (This one will supersede `VEC_INF_CONFIG_DIR` if that is also set). Check the [default parameters](vec_inf/config/models.yaml) file for the format of the config file. All the parameters for the model should be specified in that config file.
62+
* Add your model configuration to the cached `models.yaml` in your cluster environment (if you have write access to the cached configuration directory).
6263
* Using launch command options to specify your model setup.
6364
* For other model launch parameters you can reference the default values for similar models using the [`list` command ](#list-command).
6465

@@ -85,10 +86,10 @@ models:
8586
--max-num-seqs: 256
8687
```
8788
88-
You would then set the `VEC_INF_CONFIG` path using:
89+
You would then set the `VEC_INF_MODEL_CONFIG` path using:
8990

9091
```bash
91-
export VEC_INF_CONFIG=/h/<username>/my-model-config.yaml
92+
export VEC_INF_MODEL_CONFIG=/h/<username>/my-model-config.yaml
9293
```
9394

9495
**NOTE**
@@ -97,6 +98,53 @@ export VEC_INF_CONFIG=/h/<username>/my-model-config.yaml
9798
* For GPU partitions with non-Ampere architectures, e.g. `rtx6000`, `t4v2`, BF16 isn't supported. For models that have BF16 as the default type, when using a non-Ampere GPU, use FP16 instead, i.e. `--dtype: float16`.
9899
* Setting `--compilation-config` to `3` currently breaks multi-node model launches, so we don't set them for models that require multiple nodes of GPUs.
99100

101+
### `batch-launch` command
102+
103+
The `batch-launch` command allows users to launch multiple inference servers at once, here is an example of launching 2 models:
104+
105+
```bash
106+
vec-inf batch-launch DeepSeek-R1-Distill-Qwen-7B Qwen2.5-Math-PRM-7B
107+
```
108+
109+
You should see an output like the following:
110+
111+
```
112+
┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
113+
┃ Job Config ┃ Value ┃
114+
┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
115+
│ Slurm Job ID │ 17480109 │
116+
│ Slurm Job Name │ BATCH-DeepSeek-R1-Distill-Qwen-7B-Qwen2.5-Math-PRM-7B │
117+
│ Model Name │ DeepSeek-R1-Distill-Qwen-7B │
118+
│ Partition │ a40 │
119+
│ QoS │ m2 │
120+
│ Time Limit │ 08:00:00 │
121+
│ Num Nodes │ 1 │
122+
│ GPUs/Node │ 1 │
123+
│ CPUs/Task │ 16 │
124+
│ Memory/Node │ 64G │
125+
│ Log Directory │ /h/marshallw/.vec-inf-logs/BATCH-DeepSeek-R1-Distill-Qwen-7B-Qwen2.5… │
126+
│ Model Name │ Qwen2.5-Math-PRM-7B │
127+
│ Partition │ a40 │
128+
│ QoS │ m2 │
129+
│ Time Limit │ 08:00:00 │
130+
│ Num Nodes │ 1 │
131+
│ GPUs/Node │ 1 │
132+
│ CPUs/Task │ 16 │
133+
│ Memory/Node │ 64G │
134+
│ Log Directory │ /h/marshallw/.vec-inf-logs/BATCH-DeepSeek-R1-Distill-Qwen-7B-Qwen2.5… │
135+
└────────────────┴─────────────────────────────────────────────────────────────────────────┘
136+
```
137+
138+
The inference servers will begin launching only after all requested resources have been allocated, preventing resource waste. Unlike the `launch` command, `batch-launch` does not accept additional launch parameters from the command line. Users must either:
139+
140+
- Specify a batch launch configuration file using the `--batch-config` option, or
141+
- Ensure model launch configurations are available at the default location (cached config or user-defined `VEC_INF_CONFIG`)
142+
143+
Since batch launches use heterogeneous jobs, users can request different partitions and resource amounts for each model. After launch, you can monitor individual servers using the standard commands (`status`, `metrics`, etc.) by providing the specific Slurm job ID for each server (e.g. 17480109+0, 17480109+1).
144+
145+
**NOTE**
146+
* Currently only models that can fit on a single node (regardless of the node type) is supported, multi-node launches will be available in a future update.
147+
100148
### `status` command
101149

102150
You can check the inference server status by providing the Slurm job ID to the `status` command:
@@ -138,7 +186,9 @@ There are 5 possible states:
138186
* **FAILED**: Inference server in an unhealthy state. Job failed reason will be shown.
139187
* **SHUTDOWN**: Inference server is shutdown/cancelled.
140188

141-
Note that the base URL is only available when model is in `READY` state, and if you've changed the Slurm log directory path, you also need to specify it when using the `status` command.
189+
**Note**
190+
* The base URL is only available when model is in `READY` state.
191+
* For servers launched with `batch-launch`, the job ID should follow the format of "MAIN_JOB_ID+OFFSET" (e.g. 17480109+0, 17480109+1).
142192

143193
### `metrics` command
144194

examples/slurm_dependency/run_downstream.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99

1010
if len(sys.argv) < 2:
1111
raise ValueError("Expected server job ID as the first argument.")
12-
job_id = int(sys.argv[1])
12+
job_id = sys.argv[1]
1313

1414
vi_client = VecInfClient()
1515
print(f"Waiting for SLURM job {job_id} to be ready...")

tests/test_imports.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,11 +22,12 @@ def test_imports(self):
2222
import vec_inf.client._exceptions
2323
import vec_inf.client._helper
2424
import vec_inf.client._slurm_script_generator
25+
import vec_inf.client._slurm_templates
26+
import vec_inf.client._slurm_vars
2527
import vec_inf.client._utils
2628
import vec_inf.client.api
2729
import vec_inf.client.config
28-
import vec_inf.client.models
29-
import vec_inf.client.slurm_vars # noqa: F401
30+
import vec_inf.client.models # noqa: F401
3031

3132
except ImportError as e:
3233
pytest.fail(f"Import failed: {e}")

0 commit comments

Comments
 (0)