You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This repository provides an easy-to-use solution to run inference servers on [Slurm](https://slurm.schedmd.com/overview.html)-managed computing clusters using [vLLM](https://docs.vllm.ai/en/latest/). **All scripts in this repository runs natively on the Vector Institute cluster environment**. To adapt to other environments, update the environment variables in [`vec_inf/client/slurm_vars.py`](vec_inf/client/slurm_vars.py), and the model config for cached model weights in [`vec_inf/config/models.yaml`](vec_inf/config/models.yaml) accordingly.
13
+
This repository provides an easy-to-use solution to run inference servers on [Slurm](https://slurm.schedmd.com/overview.html)-managed computing clusters using [vLLM](https://docs.vllm.ai/en/latest/). **All scripts in this repository runs natively on the Vector Institute cluster environment**. To adapt to other environments, follow the instructions in [Installation](#installation).
14
14
15
15
## Installation
16
16
If you are using the Vector cluster environment, and you don't need any customization to the inference server environment, run the following to install package:
@@ -20,6 +20,11 @@ pip install vec-inf
20
20
```
21
21
Otherwise, we recommend using the provided [`Dockerfile`](Dockerfile) to set up your own environment with the package. The latest image has `vLLM` version `0.8.5.post1`.
22
22
23
+
If you'd like to use `vec-inf` on your own Slurm cluster, you would need to update the configuration files, there are 3 ways to do it:
24
+
* Clone the repository and update the `environment.yaml` and the `models.yaml` file in [`vec_inf/config`](vec_inf/config/), then install from source by running `pip install .`.
25
+
* The package would try to look for cached configuration files in your environment before using the default configuration. The default cached configuration directory path points to `/model-weights/vec-inf-shared`, you would need to create an `environment.yaml` and a `models.yaml` following the format of these files in [`vec_inf/config`](vec_inf/config/).
26
+
* The package would also look for an enviroment variable `VEC_INF_CONFIG_DIR`. You can put your `environment.yaml` and `models.yaml` in a directory of your choice and set the enviroment variable `VEC_INF_CONFIG_DIR` to point to that location.
27
+
23
28
## Usage
24
29
25
30
Vector Inference provides 2 user interfaces, a CLI and an API
@@ -61,7 +66,8 @@ You can also launch your own custom model as long as the model architecture is [
61
66
* Your model weights directory naming convention should follow `$MODEL_FAMILY-$MODEL_VARIANT` ($MODEL_VARIANT is OPTIONAL).
62
67
* Your model weights directory should contain HuggingFace format weights.
63
68
* You should specify your model configuration by:
64
-
* Creating a custom configuration file for your model and specify its path via setting the environment variable `VEC_INF_CONFIG`. Check the [default parameters](vec_inf/config/models.yaml) file for the format of the config file. All the parameters for the model should be specified in that config file.
69
+
* Creating a custom configuration file for your model and specify its path via setting the environment variable `VEC_INF_MODEL_CONFIG` (This one will supersede `VEC_INF_CONFIG_DIR` if that is also set). Check the [default parameters](vec_inf/config/models.yaml) file for the format of the config file. All the parameters for the model should be specified in that config file.
70
+
* Add your model configuration to the cached `models.yaml` in your cluster environment (if you have write access to the cached configuration directory).
65
71
* Using launch command options to specify your model setup.
66
72
* For other model launch parameters you can reference the default values for similar models using the [`list` command ](#list-command).
67
73
@@ -89,10 +95,10 @@ models:
89
95
--compilation-config: 3
90
96
```
91
97
92
-
You would then set the `VEC_INF_CONFIG` path using:
98
+
You would then set the `VEC_INF_MODEL_CONFIG` path using:
* `status`: Check the model status by providing its Slurm job ID, `--json-mode` supported.
112
+
* `batch-launch`: Launch multiple model inference servers at once, currently ONLY single node models supported,
113
+
* `status`: Check the model status by providing its Slurm job ID.
107
114
* `metrics`: Streams performance metrics to the console.
108
115
* `shutdown`: Shutdown a model by providing its Slurm job ID.
109
-
* `list`: List all available model names, or view the default/cached configuration of a specific model, `--json-mode` supported.
116
+
* `list`: List all available model names, or view the default/cached configuration of a specific model.
110
117
* `cleanup`: Remove old log directories. You can filter by `--model-family`, `--model-name`, `--job-id`, and/or `--before-job-id`. Use `--dry-run` to preview what would be deleted.
111
118
112
119
For more details on the usage of these commands, refer to the [User Guide](https://vectorinstitute.github.io/vector-inference/user_guide/)
Copy file name to clipboardExpand all lines: docs/index.md
+6-1Lines changed: 6 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
# Vector Inference: Easy inference on Slurm clusters
2
2
3
-
This repository provides an easy-to-use solution to run inference servers on [Slurm](https://slurm.schedmd.com/overview.html)-managed computing clusters using [vLLM](https://docs.vllm.ai/en/latest/). **All scripts in this repository runs natively on the Vector Institute cluster environment**. To adapt to other environments, update the environment variables in [`vec_inf/client/slurm_vars.py`](https://github.com/VectorInstitute/vector-inference/blob/main/vec_inf/client/slurm_vars.py), and the model config for cached model weights in [`vec_inf/config/models.yaml`](https://github.com/VectorInstitute/vector-inference/blob/main/vec_inf/config/models.yaml) accordingly.
3
+
This repository provides an easy-to-use solution to run inference servers on [Slurm](https://slurm.schedmd.com/overview.html)-managed computing clusters using [vLLM](https://docs.vllm.ai/en/latest/). **All scripts in this repository runs natively on the Vector Institute cluster environment**. To adapt to other environments, follow the instructions in [Installation](#installation).
4
4
5
5
## Installation
6
6
@@ -11,3 +11,8 @@ pip install vec-inf
11
11
```
12
12
13
13
Otherwise, we recommend using the provided [`Dockerfile`](https://github.com/VectorInstitute/vector-inference/blob/main/Dockerfile) to set up your own environment with the package. The latest image has `vLLM` version `0.8.5.post1`.
14
+
15
+
If you'd like to use `vec-inf` on your own Slurm cluster, you would need to update the configuration files, there are 3 ways to do it:
16
+
* Clone the repository and update the `environment.yaml` and the `models.yaml` file in [`vec_inf/config`](vec_inf/config/), then install from source by running `pip install .`.
17
+
* The package would try to look for cached configuration files in your environment before using the default configuration. The default cached configuration directory path points to `/model-weights/vec-inf-shared`, you would need to create an `environment.yaml` and a `models.yaml` following the format of these files in [`vec_inf/config`](vec_inf/config/).
18
+
* The package would also look for an enviroment variable `VEC_INF_CONFIG_DIR`. You can put your `environment.yaml` and `models.yaml` in a directory of your choice and set the enviroment variable `VEC_INF_CONFIG_DIR` to point to that location.
Copy file name to clipboardExpand all lines: docs/user_guide.md
+55-5Lines changed: 55 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,7 +4,7 @@
4
4
5
5
### `launch` command
6
6
7
-
The `launch` command allows users to deploy a model as a slurm job. If the job successfully launches, a URL endpoint is exposed for the user to send requests for inference.
7
+
The `launch` command allows users to launch a OpenAI-compatible model inference server as a slurm job. If the job successfully launches, a URL endpoint is exposed for the user to send requests for inference.
8
8
9
9
We will use the Llama 3.1 model as example, to launch an OpenAI compatible inference server for Meta-Llama-3.1-8B-Instruct, run:
10
10
@@ -58,7 +58,8 @@ You can also launch your own custom model as long as the model architecture is [
58
58
* Your model weights directory naming convention should follow `$MODEL_FAMILY-$MODEL_VARIANT` ($MODEL_VARIANT is OPTIONAL).
59
59
* Your model weights directory should contain HuggingFace format weights.
60
60
* You should specify your model configuration by:
61
-
* Creating a custom configuration file for your model and specify its path via setting the environment variable `VEC_INF_CONFIG`. Check the [default parameters](https://github.com/VectorInstitute/vector-inference/blob/main/vec_inf/config/models.yaml) file for the format of the config file. All the parameters for the model should be specified in that config file.
61
+
* Creating a custom configuration file for your model and specify its path via setting the environment variable `VEC_INF_MODEL_CONFIG` (This one will supersede `VEC_INF_CONFIG_DIR` if that is also set). Check the [default parameters](vec_inf/config/models.yaml) file for the format of the config file. All the parameters for the model should be specified in that config file.
62
+
* Add your model configuration to the cached `models.yaml` in your cluster environment (if you have write access to the cached configuration directory).
62
63
* Using launch command options to specify your model setup.
63
64
* For other model launch parameters you can reference the default values for similar models using the [`list` command ](#list-command).
64
65
@@ -85,10 +86,10 @@ models:
85
86
--max-num-seqs: 256
86
87
```
87
88
88
-
You would then set the `VEC_INF_CONFIG` path using:
89
+
You would then set the `VEC_INF_MODEL_CONFIG` path using:
* For GPU partitions with non-Ampere architectures, e.g. `rtx6000`, `t4v2`, BF16 isn't supported. For models that have BF16 as the default type, when using a non-Ampere GPU, use FP16 instead, i.e. `--dtype: float16`.
98
99
* Setting `--compilation-config` to `3` currently breaks multi-node model launches, so we don't set them for models that require multiple nodes of GPUs.
99
100
101
+
### `batch-launch` command
102
+
103
+
The `batch-launch` command allows users to launch multiple inference servers at once, here is an example of launching 2 models:
The inference servers will begin launching only after all requested resources have been allocated, preventing resource waste. Unlike the `launch` command, `batch-launch` does not accept additional launch parameters from the command line. Users must either:
139
+
140
+
- Specify a batch launch configuration file using the `--batch-config` option, or
141
+
- Ensure model launch configurations are available at the default location (cached config or user-defined `VEC_INF_CONFIG`)
142
+
143
+
Since batch launches use heterogeneous jobs, users can request different partitions and resource amounts for each model. After launch, you can monitor individual servers using the standard commands (`status`, `metrics`, etc.) by providing the specific Slurm job ID for each server (e.g. 17480109+0, 17480109+1).
144
+
145
+
**NOTE**
146
+
* Currently only models that can fit on a single node (regardless of the node type) is supported, multi-node launches will be available in a future update.
147
+
100
148
### `status` command
101
149
102
150
You can check the inference server status by providing the Slurm job ID to the `status` command:
@@ -138,7 +186,9 @@ There are 5 possible states:
138
186
* **FAILED**: Inference server in an unhealthy state. Job failed reason will be shown.
139
187
* **SHUTDOWN**: Inference server is shutdown/cancelled.
140
188
141
-
Note that the base URL is only available when model is in `READY` state, and if you've changed the Slurm log directory path, you also need to specify it when using the `status` command.
189
+
**Note**
190
+
* The base URL is only available when model is in `READY` state.
191
+
* For servers launched with `batch-launch`, the job ID should follow the format of "MAIN_JOB_ID+OFFSET" (e.g. 17480109+0, 17480109+1).
0 commit comments