Update docs

XkunW · XkunW · commit 52a6ff3deeaf · 2025-11-11T13:53:54.000-05:00
diff --git a/README.md b/README.md
@@ -53,7 +53,7 @@ Models that are already supported by `vec-inf` would be launched using the cache
 #### Other commands
 
 * `batch-launch`: Launch multiple model inference servers at once, currently ONLY single node models supported,
-* `status`: Check the model status by providing its Slurm job ID.
+* `status`: Check the status of all `vec-inf` jobs, or a specific job by providing its job ID.
 * `metrics`: Streams performance metrics to the console.
 * `shutdown`: Shutdown a model by providing its Slurm job ID.
 * `list`: List all available model names, or view the default/cached configuration of a specific model.
diff --git a/docs/user_guide.md b/docs/user_guide.md
@@ -149,35 +149,52 @@ Since batch launches use heterogeneous jobs, users can request different partiti
 
 ### `status` command
 
-You can check the inference server status by providing the Slurm job ID to the `status` command:
+You can check the status of all inference servers launched through `vec-inf` by running the `status` command:
+```bash
+vec-inf status`
+```
+
+And you should see an output like this:
+```
+┏━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┓
+┃ Job ID    ┃ Model Name ┃ Status  ┃ Base URL              ┃
+┡━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━┩
+│ 1434429   │ Qwen3-8B   │ READY   │ http://gpu113:8080/v1 │
+│ 1434584   │ Qwen3-14B  │ READY   │ http://gpu053:8080/v1 │
+│ 1435035+0 │ Qwen3-32B  │ PENDING │ UNAVAILABLE           │
+│ 1435035+1 │ Qwen3-14B  │ PENDING │ UNAVAILABLE           │
+└───────────┴────────────┴─────────┴───────────────────────┘
+```
+
+If you want to check why a specific job is pending or failing, append the job ID to the status command:
 
 ```bash
-vec-inf status 15373800
+vec-inf status 1435035+1
 ```
 
 If the server is pending for resources, you should see an output like this:
 
 ```
-┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
-┃ Job Status     ┃ Value                      ┃
-┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
-│ Model Name     │ Meta-Llama-3.1-8B-Instruct │
-│ Model Status   │ PENDING                    │
-│ Pending Reason │ Resources                  │
-│ Base URL       │ UNAVAILABLE                │
-└────────────────┴────────────────────────────┘
+┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
+┃ Job Status     ┃ Value       ┃
+┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
+│ Model Name     │ Qwen3-14B   │
+│ Model Status   │ PENDING     │
+│ Pending Reason │ Resources   │
+│ Base URL       │ UNAVAILABLE │
+└────────────────┴─────────────┘
 ```
 
 When the server is ready, you should see an output like this:
 
 ```
-┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
-┃ Job Status   ┃ Value                      ┃
-┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
-│ Model Name   │ Meta-Llama-3.1-8B-Instruct │
-│ Model Status │ READY                      │
-│ Base URL     │ http://gpu042:8080/v1      │
-└──────────────┴────────────────────────────┘
+┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┓
+┃ Job Status   ┃ Value                 ┃
+┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━┩
+│ Model Name   │ Qwen3-14B             │
+│ Model Status │ READY                 │
+│ Base URL     │ http://gpu105:8080/v1 │
+└──────────────┴───────────────────────┘
 ```
 
 There are 5 possible states:
diff --git a/vec_inf/README.md b/vec_inf/README.md
@@ -2,7 +2,7 @@
 
 * `launch`: Specify a model family and other optional parameters to launch an OpenAI compatible inference server.
 * `batch-launch`: Specify a list of models to launch multiple OpenAI compatible inference servers at the same time.
-* `status`: Check the model status by providing its Slurm job ID.
+* `status`: Check the status of all `vec-inf` jobs, or a specific job by providing its job ID.
 * `metrics`: Streams performance metrics to the console.
 * `shutdown`: Shutdown a model by providing its Slurm job ID.
 * `list`: List all available model names, or view the default/cached configuration of a specific model.
@@ -14,6 +14,7 @@ Use `--help` to see all available options
 
 * `launch_model`: Launch an OpenAI compatible inference server.
 * `batch_launch_models`: Launch multiple OpenAI compatible inference servers.
+* `fetch_running_jobs`: Get the running `vec-inf` job IDs. 
 * `get_status`: Get the status of a running model.
 * `get_metrics`: Get the performance metrics of a running model.
 * `shutdown_model`: Shutdown a running model.