Skip to content

Commit e0b194c

Browse files
committed
Update description for --model-weights-parent-dir, add instructions for launching custom models
1 parent 6f43e54 commit e0b194c

File tree

2 files changed

+12
-2
lines changed

2 files changed

+12
-2
lines changed

README.md

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ pip install vec-inf
99
Otherwise, we recommend using the provided [`Dockerfile`](Dockerfile) to set up your own environment with the package
1010

1111
## Launch an inference server
12+
### `launch` command
1213
We will use the Llama 3.1 model as example, to launch an OpenAI compatible inference server for Meta-Llama-3.1-8B-Instruct, run:
1314
```bash
1415
vec-inf launch Meta-Llama-3.1-8B-Instruct
@@ -17,8 +18,14 @@ You should see an output like the following:
1718

1819
<img width="400" alt="launch_img" src="https://github.com/user-attachments/assets/9d29947a-2708-4131-9a78-4484d2361da3">
1920

20-
The model would be launched using the [default parameters](vec_inf/models/models.csv), you can override these values by providing additional parameters, use `--help` to see the full list. You can also launch your own customized model as long as the model architecture is [supported by vLLM](https://docs.vllm.ai/en/stable/models/supported_models.html). You will need to specify model launching parameters for custom models.
21+
The model would be launched using the [default parameters](vec_inf/models/models.csv), you can override these values by providing additional parameters, use `--help` to see the full list. You can also launch your own customized model as long as the model architecture is [supported by vLLM](https://docs.vllm.ai/en/stable/models/supported_models.html), and make sure to follow the instructions below:
22+
* Your model weights directory naming convention should follow `$MODEL_FAMILY-$MODEL_VARIANT`.
23+
* Your model weights directory should contain HF format weights.
24+
* The following launch parameters will conform to default value if not specified: `--max-num-seqs`, `--partition`, `--data-type`, `--venv`, `--log-dir`, `--model-weights-parent-dir`, `--pipeline-parallelism`. All other launch parameters need to be specified for custom models.
25+
* Example for setting the model weights parent directory: `--model-weights-parent-dir /h/user_name/my_weights`.
26+
* For other model launch parameters you can reference the default values for similar models using the [`list` command ](#list-command).
2127

28+
### `status` command
2229
You can check the inference server status by providing the Slurm job ID to the `status` command:
2330
```bash
2431
vec-inf status 13014393
@@ -38,6 +45,7 @@ There are 5 possible states:
3845

3946
Note that the base URL is only available when model is in `READY` state, and if you've changed the Slurm log directory path, you also need to specify it when using the `status` command.
4047

48+
### `metrics` command
4149
Once your server is ready, you can check performance metrics by providing the Slurm job ID to the `metrics` command:
4250
```bash
4351
vec-inf metrics 13014393
@@ -47,13 +55,15 @@ And you will see the performance metrics streamed to your console, note that the
4755

4856
<img width="400" alt="metrics_img" src="https://github.com/user-attachments/assets/e5ff2cd5-659b-4c88-8ebc-d8f3fdc023a4">
4957

58+
### `shutdown` command
5059
Finally, when you're finished using a model, you can shut it down by providing the Slurm job ID:
5160
```bash
5261
vec-inf shutdown 13014393
5362

5463
> Shutting down model with Slurm Job ID: 13014393
5564
```
5665

66+
### `list` command
5767
You call view the full list of available models by running the `list` command:
5868
```bash
5969
vec-inf list

vec_inf/cli/_cli.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -84,7 +84,7 @@ def cli():
8484
"--model-weights-parent-dir",
8585
type=str,
8686
default="/model-weights",
87-
help="Path to parent directory containing model weights",
87+
help="Path to parent directory containing model weights, default to '/model-weights' for supported models",
8888
)
8989
@click.option(
9090
"--pipeline-parallelism",

0 commit comments

Comments
 (0)