Skip to content

Commit 11da920

Browse files
authored
Merge branch 'main' into feature/misc-fixes
2 parents 1938311 + 6d21da4 commit 11da920

File tree

8 files changed

+181
-151
lines changed

8 files changed

+181
-151
lines changed

MODEL_TRACKING.md

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -166,8 +166,8 @@ This document tracks all model weights available in the `/model-weights` directo
166166
| Model | Configuration |
167167
|:------|:-------------|
168168
| `Qwen3-14B` ||
169-
| `Qwen3-8B` | |
170-
| `Qwen3-32B` | |
169+
| `Qwen3-8B` | |
170+
| `Qwen3-32B` | |
171171
| `Qwen3-235B-A22B` ||
172172
| `Qwen3-Embedding-8B` ||
173173

@@ -187,6 +187,11 @@ This document tracks all model weights available in the `/model-weights` directo
187187
| `DeepSeek-Coder-V2-Lite-Instruct` ||
188188
| `deepseek-math-7b-instruct` ||
189189

190+
### OpenAI: GPT-OSS
191+
| Model | Configuration |
192+
|:------|:-------------|
193+
| `gpt-oss-120b` ||
194+
190195
### Other LLM Models
191196
| Model | Configuration |
192197
|:------|:-------------|

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
[![code checks](https://github.com/VectorInstitute/vector-inference/actions/workflows/code_checks.yml/badge.svg)](https://github.com/VectorInstitute/vector-inference/actions/workflows/code_checks.yml)
88
[![docs](https://github.com/VectorInstitute/vector-inference/actions/workflows/docs.yml/badge.svg)](https://github.com/VectorInstitute/vector-inference/actions/workflows/docs.yml)
99
[![codecov](https://codecov.io/github/VectorInstitute/vector-inference/branch/main/graph/badge.svg?token=NI88QSIGAC)](https://app.codecov.io/github/VectorInstitute/vector-inference/tree/main)
10-
[![vLLM](https://img.shields.io/badge/vLLM-0.10.1.1-blue)](https://docs.vllm.ai/en/v0.10.1.1/)
10+
[![vLLM](https://img.shields.io/badge/vLLM-0.11.0-blue)](https://docs.vllm.ai/en/v0.11.0/)
1111
![GitHub License](https://img.shields.io/github/license/VectorInstitute/vector-inference)
1212

1313
This repository provides an easy-to-use solution to run inference servers on [Slurm](https://slurm.schedmd.com/overview.html)-managed computing clusters using [vLLM](https://docs.vllm.ai/en/latest/). **This package runs natively on the Vector Institute cluster environments**. To adapt to other environments, follow the instructions in [Installation](#installation).
@@ -20,7 +20,7 @@ If you are using the Vector cluster environment, and you don't need any customiz
2020
```bash
2121
pip install vec-inf
2222
```
23-
Otherwise, we recommend using the provided [`Dockerfile`](Dockerfile) to set up your own environment with the package. The latest image has `vLLM` version `0.10.1.1`.
23+
Otherwise, we recommend using the provided [`Dockerfile`](Dockerfile) to set up your own environment with the package. The latest image has `vLLM` version `0.11.0`.
2424

2525
If you'd like to use `vec-inf` on your own Slurm cluster, you would need to update the configuration files, there are 3 ways to do it:
2626
* Clone the repository and update the `environment.yaml` and the `models.yaml` file in [`vec_inf/config`](vec_inf/config/), then install from source by running `pip install .`.

docs/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ If you are using the Vector cluster environment, and you don't need any customiz
1212
pip install vec-inf
1313
```
1414

15-
Otherwise, we recommend using the provided [`Dockerfile`](https://github.com/VectorInstitute/vector-inference/blob/main/Dockerfile) to set up your own environment with the package. The latest image has `vLLM` version `0.10.1.1`.
15+
Otherwise, we recommend using the provided [`Dockerfile`](https://github.com/VectorInstitute/vector-inference/blob/main/Dockerfile) to set up your own environment with the package. The latest image has `vLLM` version `0.11.0`.
1616

1717
If you'd like to use `vec-inf` on your own Slurm cluster, you would need to update the configuration files, there are 3 ways to do it:
1818
* Clone the repository and update the `environment.yaml` and the `models.yaml` file in [`vec_inf/config`](https://github.com/VectorInstitute/vector-inference/blob/main/vec_inf/config), then install from source by running `pip install .`.

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[project]
22
name = "vec-inf"
3-
version = "0.7.1"
3+
version = "0.7.2"
44
description = "Efficient LLM inference on Slurm clusters using vLLM."
55
readme = "README.md"
66
authors = [{name = "Marshall Wang", email = "marshall.wang@vectorinstitute.ai"}]

vec_inf/cli/_helper.py

Lines changed: 42 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,43 @@ def __init__(self, model_name: str, params: dict[str, Any]):
3636
self.model_name = model_name
3737
self.params = params
3838

39+
def _add_resource_allocation_details(self, table: Table) -> None:
40+
"""Add resource allocation details to the table."""
41+
optional_fields = [
42+
("account", "Account"),
43+
("work_dir", "Working Directory"),
44+
("resource_type", "Resource Type"),
45+
("partition", "Partition"),
46+
("qos", "QoS"),
47+
]
48+
for key, label in optional_fields:
49+
if self.params.get(key):
50+
table.add_row(label, self.params[key])
51+
52+
def _add_vllm_config(self, table: Table) -> None:
53+
"""Add vLLM configuration details to the table."""
54+
if self.params.get("vllm_args"):
55+
table.add_row("vLLM Arguments:", style="magenta")
56+
for arg, value in self.params["vllm_args"].items():
57+
table.add_row(f" {arg}:", str(value))
58+
59+
def _add_env_vars(self, table: Table) -> None:
60+
"""Add environment variable configuration details to the table."""
61+
if self.params.get("env"):
62+
table.add_row("Environment Variables", style="magenta")
63+
for arg, value in self.params["env"].items():
64+
table.add_row(f" {arg}:", str(value))
65+
66+
def _add_bind_paths(self, table: Table) -> None:
67+
"""Add bind path configuration details to the table."""
68+
if self.params.get("bind"):
69+
table.add_row("Bind Paths", style="magenta")
70+
for path in self.params["bind"].split(","):
71+
host = target = path
72+
if ":" in path:
73+
host, target = path.split(":")
74+
table.add_row(f" {host}:", target)
75+
3976
def format_table_output(self) -> Table:
4077
"""Format output as rich Table.
4178
@@ -59,16 +96,7 @@ def format_table_output(self) -> Table:
5996
table.add_row("Vocabulary Size", self.params["vocab_size"])
6097

6198
# Add resource allocation details
62-
if self.params.get("account"):
63-
table.add_row("Account", self.params["account"])
64-
if self.params.get("work_dir"):
65-
table.add_row("Working Directory", self.params["work_dir"])
66-
if self.params.get("resource_type"):
67-
table.add_row("Resource Type", self.params["resource_type"])
68-
if self.params.get("partition"):
69-
table.add_row("Partition", self.params["partition"])
70-
if self.params.get("qos"):
71-
table.add_row("QoS", self.params["qos"])
99+
self._add_resource_allocation_details(table)
72100
table.add_row("Time Limit", self.params["time"])
73101
table.add_row("Num Nodes", self.params["num_nodes"])
74102
table.add_row("GPUs/Node", self.params["gpus_per_node"])
@@ -84,23 +112,10 @@ def format_table_output(self) -> Table:
84112
)
85113
table.add_row("Log Directory", self.params["log_dir"])
86114

87-
# Add vLLM configuration details
88-
table.add_row("vLLM Arguments:", style="magenta")
89-
for arg, value in self.params["vllm_args"].items():
90-
table.add_row(f" {arg}:", str(value))
91-
92-
# Add environment variable configuration details
93-
table.add_row("Environment Variables", style="magenta")
94-
for arg, value in self.params["env"].items():
95-
table.add_row(f" {arg}:", str(value))
96-
97-
# Add bind path configuration details
98-
table.add_row("Bind Paths", style="magenta")
99-
for path in self.params["bind"].split(","):
100-
host = target = path
101-
if ":" in path:
102-
host, target = path.split(":")
103-
table.add_row(f" {host}:", target)
115+
# Add configuration details
116+
self._add_vllm_config(table)
117+
self._add_env_vars(table)
118+
self._add_bind_paths(table)
104119

105120
return table
106121

vec_inf/client/_helper.py

Lines changed: 61 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -196,23 +196,14 @@ def _process_env_vars(self, env_arg: str) -> dict[str, str]:
196196
print(f"WARNING: Could not parse env var: {line}")
197197
return env_vars
198198

199-
def _get_launch_params(self) -> dict[str, Any]:
200-
"""Prepare launch parameters, set log dir, and validate required fields.
201-
202-
Returns
203-
-------
204-
dict[str, Any]
205-
Dictionary of prepared launch parameters
199+
def _apply_cli_overrides(self, params: dict[str, Any]) -> None:
200+
"""Apply CLI argument overrides to params.
206201
207-
Raises
208-
------
209-
MissingRequiredFieldsError
210-
If required fields are missing or tensor parallel size is not specified
211-
when using multiple GPUs
202+
Parameters
203+
----------
204+
params : dict[str, Any]
205+
Dictionary of launch parameters to override
212206
"""
213-
params = self.model_config.model_dump(exclude_none=True)
214-
215-
# Override config defaults with CLI arguments
216207
if self.kwargs.get("vllm_args"):
217208
vllm_args = self._process_vllm_args(self.kwargs["vllm_args"])
218209
for key, value in vllm_args.items():
@@ -225,13 +216,29 @@ def _get_launch_params(self) -> dict[str, Any]:
225216
params["env"][key] = str(value)
226217
del self.kwargs["env"]
227218

219+
if self.kwargs.get("bind") and params.get("bind"):
220+
params["bind"] = f"{params['bind']},{self.kwargs['bind']}"
221+
del self.kwargs["bind"]
222+
228223
for key, value in self.kwargs.items():
229224
params[key] = value
230225

231-
# Check for required fields without default vals, will raise an error if missing
232-
utils.check_required_fields(params)
226+
def _validate_resource_allocation(self, params: dict[str, Any]) -> None:
227+
"""Validate resource allocation and parallelization settings.
233228
234-
# Validate resource allocation and parallelization settings
229+
Parameters
230+
----------
231+
params : dict[str, Any]
232+
Dictionary of launch parameters to validate
233+
234+
Raises
235+
------
236+
MissingRequiredFieldsError
237+
If tensor parallel size is not specified when using multiple GPUs
238+
ValueError
239+
If total # of GPUs requested is not a power of two
240+
If mismatch between total # of GPUs requested and parallelization settings
241+
"""
235242
if (
236243
int(params["gpus_per_node"]) > 1
237244
and params["vllm_args"].get("--tensor-parallel-size") is None
@@ -252,19 +259,18 @@ def _get_launch_params(self) -> dict[str, Any]:
252259
"Mismatch between total number of GPUs requested and parallelization settings"
253260
)
254261

255-
# Convert gpus_per_node and resource_type to gres
256-
resource_type = params.get("resource_type")
257-
if resource_type:
258-
params["gres"] = f"gpu:{resource_type}:{params['gpus_per_node']}"
259-
else:
260-
params["gres"] = f"gpu:{params['gpus_per_node']}"
262+
def _setup_log_files(self, params: dict[str, Any]) -> None:
263+
"""Set up log directory and file paths.
261264
262-
# Create log directory
265+
Parameters
266+
----------
267+
params : dict[str, Any]
268+
Dictionary of launch parameters to set up log files
269+
"""
263270
params["log_dir"] = Path(params["log_dir"], params["model_family"]).expanduser()
264271
params["log_dir"].mkdir(parents=True, exist_ok=True)
265272
params["src_dir"] = SRC_DIR
266273

267-
# Construct slurm log file paths
268274
params["out_file"] = (
269275
f"{params['log_dir']}/{self.model_name}.%j/{self.model_name}.%j.out"
270276
)
@@ -275,6 +281,35 @@ def _get_launch_params(self) -> dict[str, Any]:
275281
f"{params['log_dir']}/{self.model_name}.$SLURM_JOB_ID/{self.model_name}.$SLURM_JOB_ID.json"
276282
)
277283

284+
def _get_launch_params(self) -> dict[str, Any]:
285+
"""Prepare launch parameters, set log dir, and validate required fields.
286+
287+
Returns
288+
-------
289+
dict[str, Any]
290+
Dictionary of prepared launch parameters
291+
"""
292+
params = self.model_config.model_dump(exclude_none=True)
293+
294+
# Override config defaults with CLI arguments
295+
self._apply_cli_overrides(params)
296+
297+
# Check for required fields without default vals, will raise an error if missing
298+
utils.check_required_fields(params)
299+
300+
# Validate resource allocation and parallelization settings
301+
self._validate_resource_allocation(params)
302+
303+
# Convert gpus_per_node and resource_type to gres
304+
resource_type = params.get("resource_type")
305+
if resource_type:
306+
params["gres"] = f"gpu:{resource_type}:{params['gpus_per_node']}"
307+
else:
308+
params["gres"] = f"gpu:{params['gpus_per_node']}"
309+
310+
# Setup log files
311+
self._setup_log_files(params)
312+
278313
# Convert path to string for JSON serialization
279314
for field in params:
280315
if field in ["vllm_args", "env"]:

vec_inf/client/_utils.py

Lines changed: 54 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -108,15 +108,64 @@ def is_server_running(
108108
if isinstance(log_content, str):
109109
return log_content
110110

111-
status: Union[str, tuple[ModelStatus, str]] = ModelStatus.LAUNCHING
111+
# Patterns that indicate fatal errors (not just warnings)
112+
fatal_error_patterns = [
113+
"traceback",
114+
"exception",
115+
"fatal error",
116+
"critical error",
117+
"failed to",
118+
"could not",
119+
"unable to",
120+
"error:",
121+
]
122+
123+
# Patterns to ignore (non-fatal warnings/info messages)
124+
ignore_patterns = [
125+
"deprecated",
126+
"futurewarning",
127+
"userwarning",
128+
"deprecationwarning",
129+
"slurmstepd: error:", # SLURM cancellation messages (often after server started)
130+
]
131+
132+
ready_signature_found = False
133+
fatal_error_line = None
112134

113135
for line in log_content:
114-
if "error" in line.lower():
115-
status = (ModelStatus.FAILED, line.strip("\n"))
136+
line_lower = line.lower()
137+
138+
# Check for ready signature first - if found, server is running
116139
if MODEL_READY_SIGNATURE in line:
117-
status = "RUNNING"
140+
ready_signature_found = True
141+
# Continue checking to see if there are errors after startup
142+
143+
# Check for fatal errors (only if we haven't seen ready signature yet)
144+
if not ready_signature_found:
145+
# Skip lines that match ignore patterns
146+
if any(ignore_pattern in line_lower for ignore_pattern in ignore_patterns):
147+
continue
118148

119-
return status
149+
# Check for fatal error patterns
150+
for pattern in fatal_error_patterns:
151+
if pattern in line_lower:
152+
# Additional check: skip if it's part of a warning message
153+
# (warnings often contain "error:" but aren't fatal)
154+
if "warning" in line_lower and "error:" in line_lower:
155+
continue
156+
fatal_error_line = line.strip("\n")
157+
break
158+
159+
# If we found a fatal error, mark as failed
160+
if fatal_error_line:
161+
return (ModelStatus.FAILED, fatal_error_line)
162+
163+
# If ready signature was found and no fatal errors, server is running
164+
if ready_signature_found:
165+
return "RUNNING"
166+
167+
# Otherwise, still launching
168+
return ModelStatus.LAUNCHING
120169

121170

122171
def get_base_url(slurm_job_name: str, slurm_job_id: str, log_dir: str) -> str:

0 commit comments

Comments
 (0)