Skip to content

Commit 52a6ff3

Browse files
committed
Update docs
1 parent 1ab7e1a commit 52a6ff3

File tree

3 files changed

+37
-19
lines changed

3 files changed

+37
-19
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,7 @@ Models that are already supported by `vec-inf` would be launched using the cache
5353
#### Other commands
5454

5555
* `batch-launch`: Launch multiple model inference servers at once, currently ONLY single node models supported,
56-
* `status`: Check the model status by providing its Slurm job ID.
56+
* `status`: Check the status of all `vec-inf` jobs, or a specific job by providing its job ID.
5757
* `metrics`: Streams performance metrics to the console.
5858
* `shutdown`: Shutdown a model by providing its Slurm job ID.
5959
* `list`: List all available model names, or view the default/cached configuration of a specific model.

docs/user_guide.md

Lines changed: 34 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -149,35 +149,52 @@ Since batch launches use heterogeneous jobs, users can request different partiti
149149

150150
### `status` command
151151

152-
You can check the inference server status by providing the Slurm job ID to the `status` command:
152+
You can check the status of all inference servers launched through `vec-inf` by running the `status` command:
153+
```bash
154+
vec-inf status`
155+
```
156+
157+
And you should see an output like this:
158+
```
159+
┏━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┓
160+
┃ Job ID ┃ Model Name ┃ Status ┃ Base URL ┃
161+
┡━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━┩
162+
│ 1434429 │ Qwen3-8B │ READY │ http://gpu113:8080/v1 │
163+
│ 1434584 │ Qwen3-14B │ READY │ http://gpu053:8080/v1 │
164+
│ 1435035+0 │ Qwen3-32B │ PENDING │ UNAVAILABLE │
165+
│ 1435035+1 │ Qwen3-14B │ PENDING │ UNAVAILABLE │
166+
└───────────┴────────────┴─────────┴───────────────────────┘
167+
```
168+
169+
If you want to check why a specific job is pending or failing, append the job ID to the status command:
153170

154171
```bash
155-
vec-inf status 15373800
172+
vec-inf status 1435035+1
156173
```
157174

158175
If the server is pending for resources, you should see an output like this:
159176

160177
```
161-
┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━
162-
┃ Job Status ┃ Value
163-
┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━
164-
│ Model Name │ Meta-Llama-3.1-8B-Instruct
165-
│ Model Status │ PENDING
166-
│ Pending Reason │ Resources
167-
│ Base URL │ UNAVAILABLE
168-
└────────────────┴────────────────────────────
178+
┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
179+
┃ Job Status ┃ Value ┃
180+
┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
181+
│ Model Name │ Qwen3-14B
182+
│ Model Status │ PENDING │
183+
│ Pending Reason │ Resources │
184+
│ Base URL │ UNAVAILABLE │
185+
└────────────────┴─────────────┘
169186
```
170187

171188
When the server is ready, you should see an output like this:
172189

173190
```
174-
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━
175-
┃ Job Status ┃ Value
176-
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━
177-
│ Model Name │ Meta-Llama-3.1-8B-Instruct
178-
│ Model Status │ READY
179-
│ Base URL │ http://gpu042:8080/v1
180-
└──────────────┴────────────────────────────
191+
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┓
192+
┃ Job Status ┃ Value ┃
193+
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━┩
194+
│ Model Name │ Qwen3-14B
195+
│ Model Status │ READY │
196+
│ Base URL │ http://gpu105:8080/v1 │
197+
└──────────────┴───────────────────────┘
181198
```
182199

183200
There are 5 possible states:

vec_inf/README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
* `launch`: Specify a model family and other optional parameters to launch an OpenAI compatible inference server.
44
* `batch-launch`: Specify a list of models to launch multiple OpenAI compatible inference servers at the same time.
5-
* `status`: Check the model status by providing its Slurm job ID.
5+
* `status`: Check the status of all `vec-inf` jobs, or a specific job by providing its job ID.
66
* `metrics`: Streams performance metrics to the console.
77
* `shutdown`: Shutdown a model by providing its Slurm job ID.
88
* `list`: List all available model names, or view the default/cached configuration of a specific model.
@@ -14,6 +14,7 @@ Use `--help` to see all available options
1414

1515
* `launch_model`: Launch an OpenAI compatible inference server.
1616
* `batch_launch_models`: Launch multiple OpenAI compatible inference servers.
17+
* `fetch_running_jobs`: Get the running `vec-inf` job IDs.
1718
* `get_status`: Get the status of a running model.
1819
* `get_metrics`: Get the performance metrics of a running model.
1920
* `shutdown_model`: Shutdown a running model.

0 commit comments

Comments
 (0)