@@ -13,7 +13,28 @@ vec-inf launch Meta-Llama-3.1-8B-Instruct
1313```
1414You should see an output like the following:
1515
16- <img width =" 600 " alt =" launch_img " src =" https://github.com/user-attachments/assets/62fa818b-57dd-47de-b094-18aa91747f2d " >
16+ ```
17+ ┏━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
18+ ┃ Job Config ┃ Value ┃
19+ ┡━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
20+ │ Slurm Job ID │ 16060964 │
21+ │ Job Name │ Meta-Llama-3.1-8B-Instruct │
22+ │ Model Type │ LLM │
23+ │ Vocabulary Size │ 128256 │
24+ │ Partition │ a40 │
25+ │ QoS │ m2 │
26+ │ Time Limit │ 08:00:00 │
27+ │ Num Nodes │ 1 │
28+ │ GPUs/Node │ 1 │
29+ │ CPUs/Task │ 16 │
30+ │ Memory/Node │ 64G │
31+ │ Model Weights Directory │ /model-weights/Meta-Llama-3.1-8B-Instruct │
32+ │ Log Directory │ /h/vi_user/.vec-inf-logs/Meta-Llama-3.1 │
33+ │ vLLM Arguments: │ │
34+ │ --max-model-len: │ 131072 │
35+ │ --max-num-seqs: │ 256 │
36+ └─────────────────────────┴───────────────────────────────────────────┘
37+ ```
1738
1839#### Overrides
1940
@@ -70,7 +91,7 @@ You would then set the `VEC_INF_CONFIG` path using:
7091export VEC_INF_CONFIG=/h/<username>/my-model-config.yaml
7192` ` `
7293
73- Note that there are other parameters that can also be added to the config but not shown in this example, check the [`ModelConfig`](vec_inf/client/config.py) for details.
94+ Note that there are other parameters that can also be added to the config but not shown in this example, check the [`ModelConfig`](https://github.com/VectorInstitute/vector-inference/blob/main/ vec_inf/client/config.py) for details.
7495
7596# ## `status` command
7697
@@ -82,11 +103,28 @@ vec-inf status 15373800
82103
83104If the server is pending for resources, you should see an output like this :
84105
85- <img width="400" alt="status_pending_img" src="https://github.com/user-attachments/assets/b659c302-eae1-4560-b7a9-14eb3a822a2f">
106+ ` ` `
107+ ┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
108+ ┃ Job Status ┃ Value ┃
109+ ┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
110+ │ Model Name │ Meta-Llama-3.1-8B-Instruct │
111+ │ Model Status │ PENDING │
112+ │ Pending Reason │ Resources │
113+ │ Base URL │ UNAVAILABLE │
114+ └────────────────┴────────────────────────────┘
115+ ` ` `
86116
87117When the server is ready, you should see an output like this :
88118
89- <img width="400" alt="status_ready_img" src="https://github.com/user-attachments/assets/672986c2-736c-41ce-ac7c-1fb585cdcb0d">
119+ ` ` `
120+ ┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
121+ ┃ Job Status ┃ Value ┃
122+ ┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
123+ │ Model Name │ Meta-Llama-3.1-8B-Instruct │
124+ │ Model Status │ READY │
125+ │ Base URL │ http://gpu042:8080/v1 │
126+ └──────────────┴────────────────────────────┘
127+ ` ` `
90128
91129There are 5 possible states :
92130
@@ -107,7 +145,23 @@ vec-inf metrics 15373800
107145
108146And you will see the performance metrics streamed to your console, note that the metrics are updated with a 2-second interval.
109147
110- <img width="400" alt="metrics_img" src="https://github.com/user-attachments/assets/3ee143d0-1a71-4944-bbd7-4c3299bf0339">
148+ ```
149+ ┏━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
150+ ┃ Metric ┃ Value ┃
151+ ┡━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
152+ │ Prompt Throughput │ 10.9 tokens/s │
153+ │ Generation Throughput │ 34.2 tokens/s │
154+ │ Requests Running │ 1 reqs │
155+ │ Requests Waiting │ 0 reqs │
156+ │ Requests Swapped │ 0 reqs │
157+ │ GPU Cache Usage │ 0.1% │
158+ │ CPU Cache Usage │ 0.0% │
159+ │ Avg Request Latency │ 2.6 s │
160+ │ Total Prompt Tokens │ 441 tokens │
161+ │ Total Generation Tokens │ 1748 tokens │
162+ │ Successful Requests │ 14 reqs │
163+ └─────────────────────────┴─────────────────┘
164+ ```
111165
112166### `shutdown` command
113167
@@ -135,7 +189,28 @@ You can also view the default setup for a specific supported model by providing
135189vec-inf list Meta-Llama-3.1-70B-Instruct
136190```
137191
138- <img width="500" alt="list_model_img" src="https://github.com/user-attachments/assets/34e53937-2d86-443e-85f6-34e408653ddb">
192+ ```
193+ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
194+ ┃ Model Config ┃ Value ┃
195+ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
196+ │ model_name │ Meta-Llama-3.1-8B-Instruct │
197+ │ model_family │ Meta-Llama-3.1 │
198+ │ model_variant │ 8B-Instruct │
199+ │ model_type │ LLM │
200+ │ gpus_per_node │ 1 │
201+ │ num_nodes │ 1 │
202+ │ cpus_per_task │ 16 │
203+ │ mem_per_node │ 64G │
204+ │ vocab_size │ 128256 │
205+ │ qos │ m2 │
206+ │ time │ 08:00:00 │
207+ │ partition │ a40 │
208+ │ model_weights_parent_dir │ /model-weights │
209+ │ vLLM Arguments: │ │
210+ │ --max-model-len: │ 131072 │
211+ │ --max-num-seqs: │ 256 │
212+ └──────────────────────────┴────────────────────────────┘
213+ ```
139214
140215` launch ` , ` list ` , and ` status ` command supports ` --json-mode ` , where the command output would be structured as a JSON string.
141216
0 commit comments