Update description for --model-weights-parent-dir, add instructions for launching custom models

XkunW · XkunW · commit e0b194caad63 · 2024-11-12T17:27:51.000-05:00
diff --git a/README.md b/README.md
@@ -9,6 +9,7 @@ pip install vec-inf
 Otherwise, we recommend using the provided [`Dockerfile`](Dockerfile) to set up your own environment with the package
 
 ## Launch an inference server
+### `launch` command
 We will use the Llama 3.1 model as example, to launch an OpenAI compatible inference server for Meta-Llama-3.1-8B-Instruct, run:
 ```bash
 vec-inf launch Meta-Llama-3.1-8B-Instruct
@@ -17,8 +18,14 @@ You should see an output like the following:
 
 <img width="400" alt="launch_img" src="https://github.com/user-attachments/assets/9d29947a-2708-4131-9a78-4484d2361da3">
 
-The model would be launched using the [default parameters](vec_inf/models/models.csv), you can override these values by providing additional parameters, use `--help` to see the full list. You can also launch your own customized model as long as the model architecture is [supported by vLLM](https://docs.vllm.ai/en/stable/models/supported_models.html). You will need to specify model launching parameters for custom models.
+The model would be launched using the [default parameters](vec_inf/models/models.csv), you can override these values by providing additional parameters, use `--help` to see the full list. You can also launch your own customized model as long as the model architecture is [supported by vLLM](https://docs.vllm.ai/en/stable/models/supported_models.html), and make sure to follow the instructions below:
+* Your model weights directory naming convention should follow `$MODEL_FAMILY-$MODEL_VARIANT`.
+* Your model weights directory should contain HF format weights.
+* The following launch parameters will conform to default value if not specified: `--max-num-seqs`, `--partition`, `--data-type`, `--venv`, `--log-dir`, `--model-weights-parent-dir`, `--pipeline-parallelism`. All other launch parameters need to be specified for custom models.
+* Example for setting the model weights parent directory: `--model-weights-parent-dir /h/user_name/my_weights`.
+* For other model launch parameters you can reference the default values for similar models using the [`list` command ](#list-command).
 
+### `status` command
 You can check the inference server status by providing the Slurm job ID to the `status` command:
 ```bash
 vec-inf status 13014393
@@ -38,6 +45,7 @@ There are 5 possible states:
 
 Note that the base URL is only available when model is in `READY` state, and if you've changed the Slurm log directory path, you also need to specify it when using the `status` command.
 
+### `metrics` command
 Once your server is ready, you can check performance metrics by providing the Slurm job ID to the `metrics` command:
 ```bash
 vec-inf metrics 13014393
@@ -47,13 +55,15 @@ And you will see the performance metrics streamed to your console, note that the
 
 <img width="400" alt="metrics_img" src="https://github.com/user-attachments/assets/e5ff2cd5-659b-4c88-8ebc-d8f3fdc023a4">
 
+### `shutdown` command
 Finally, when you're finished using a model, you can shut it down by providing the Slurm job ID:
 ```bash
 vec-inf shutdown 13014393
 
 > Shutting down model with Slurm Job ID: 13014393
 ```
 
+### `list` command
 You call view the full list of available models by running the `list` command:
 ```bash
 vec-inf list
diff --git a/vec_inf/cli/_cli.py b/vec_inf/cli/_cli.py
@@ -84,7 +84,7 @@ def cli():
     "--model-weights-parent-dir",
     type=str,
     default="/model-weights",
-    help="Path to parent directory containing model weights",
+    help="Path to parent directory containing model weights, default to '/model-weights' for supported models",
 )
 @click.option(
     "--pipeline-parallelism",

Original file line number	Diff line number	Diff line change
`@@ -84,7 +84,7 @@ def cli():`
`84`	`84`	`"--model-weights-parent-dir",`
`85`	`85`	`type=str,`
`86`	`86`	`default="/model-weights",`
`87`		`- help="Path to parent directory containing model weights",`
	`87`	`+ help="Path to parent directory containing model weights, default to '/model-weights' for supported models",`
`88`	`88`	`)`
`89`	`89`	`@click.option(`
`90`	`90`	`"--pipeline-parallelism",`