Skip to content

Commit b5c3607

Browse files
committed
Add Quick search support for ensemble composing model parameter ranges
1 parent 2b36a7b commit b5c3607

8 files changed

+1282
-21
lines changed

docs/config_search.md

Lines changed: 85 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,8 @@
11
<!--
2+
SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
3+
SPDX-License-Identifier: Apache-2.0
4+
-->
5+
<!--
26
Copyright (c) 2020-2024, NVIDIA CORPORATION. All rights reserved.
37
48
Licensed under the Apache License, Version 2.0 (the "License");
@@ -240,7 +244,8 @@ manual sweep:
240244

241245
_This mode has the following limitations:_
242246

243-
- If model config parameters are specified, they can contain only one possible combination of parameters
247+
- Top-level models can contain only one possible combination of model config parameters
248+
- Composing models (ensemble/BLS sub-models) can specify parameter ranges if they follow specific patterns (see [Ensemble Composing Model Parameter Ranges](#ensemble-composing-model-parameter-ranges))
244249

245250
This mode uses a hill climbing algorithm to search the configuration space, looking for
246251
the maximal objective value within the specified constraints. In the majority of cases
@@ -262,6 +267,77 @@ profile_models:
262267

263268
---
264269

270+
### **Ensemble Composing Model Parameter Ranges**
271+
272+
When profiling ensemble or BLS models in Quick search mode, composing models (sub-models) can specify parameter ranges for `instance_group` count. This enables optimization of composing models with different resource requirements, such as:
273+
274+
- CPU-bound models (tokenizers, preprocessors) that may benefit from higher instance counts
275+
- GPU-bound models (inference models, embeddings) with limited GPU memory
276+
277+
**Supported Instance Count Patterns:**
278+
279+
Model Analyzer supports two types of instance count sequences that map to Quick search's coordinate system:
280+
281+
1. **Powers of 2**: `[1, 2, 4, 8, 16, 32]` or subsets like `[2, 4, 8]`
282+
- Maps to exponential search dimensions
283+
- Recommended for most use cases
284+
285+
2. **Contiguous sequences**: `[1, 2, 3, 4, 5]` or ranges like `[5, 6, 7, 8]`
286+
- Maps to linear search dimensions
287+
- Useful for fine-grained control
288+
289+
**Important Notes:**
290+
291+
- Only composing models can specify instance count ranges in Quick mode
292+
- Top-level models (non-composing) must still have a single parameter combination
293+
- For arbitrary value lists (e.g., `[1, 3, 7, 15]`), use Optuna search mode instead
294+
- Composing models are identified using `cpu_only_composing_models` or `bls_composing_models` configuration
295+
296+
---
297+
298+
_An example with ensemble containing CPU tokenizer and GPU inference model:_
299+
300+
```yaml
301+
model_repository: /path/to/model/repository/
302+
303+
run_config_search_mode: quick
304+
export_path: /tmp/results
305+
override_output_model_repository: true
306+
307+
cpu_only_composing_models:
308+
- tokenizer
309+
310+
profile_models:
311+
tokenizer:
312+
model_config_parameters:
313+
instance_group:
314+
- kind: KIND_CPU
315+
count: [1, 2, 4, 8, 16, 32] # Powers of 2 sequence
316+
dynamic_batching:
317+
max_queue_delay_microseconds: [0]
318+
319+
inference_model:
320+
model_config_parameters:
321+
instance_group:
322+
- kind: KIND_GPU
323+
count: [1, 2, 4, 8] # Subset of powers of 2
324+
dynamic_batching:
325+
max_queue_delay_microseconds: [0]
326+
327+
ensemble_model:
328+
model_config_parameters:
329+
dynamic_batching:
330+
max_queue_delay_microseconds: [0]
331+
```
332+
333+
In this example:
334+
- The tokenizer (CPU model) searches instance counts from 1 to 32
335+
- The inference model (GPU model) searches instance counts from 1 to 8
336+
- Quick search explores both dimensions in parallel to find the optimal combination
337+
- The ensemble model itself has fixed parameters (single combination)
338+
339+
---
340+
265341
### **Limiting Batch Size, Instance Group, and Client Concurrency**
266342

267343
Using the `--run-config-search-<min/max>...` config options you have the ability to clamp the algorithm's upper or lower bounds for the model's batch size and instance group count, as well as the client's request concurrency.
@@ -398,6 +474,10 @@ _This mode has the following limitations:_
398474

399475
Ensemble models can be optimized using the Quick Search mode's hill climbing algorithm to search the composing models' configuration spaces in parallel, looking for the maximal objective value within the specified constraints. Model Analyzer has observed positive outcomes towards finding the maximum objective value; with runtimes under one hour (compared to the days it would take a brute force run to complete) for ensembles that contain up to four composing models.
400476

477+
**Composing Model Parameter Ranges:**
478+
479+
Composing models within ensembles can specify instance count ranges to optimize models with different resource requirements (e.g., CPU tokenizers vs GPU inference models). See [Ensemble Composing Model Parameter Ranges](#ensemble-composing-model-parameter-ranges) for details on supported patterns and configuration examples.
480+
401481
After Model Analyzer has found the best config(s), it will then sweep the top-N configurations found (specified by `--num-configs-per-model`) over the concurrency range before generation of the summary reports.
402482

403483
---
@@ -412,6 +492,10 @@ _This mode has the following limitations:_
412492

413493
BLS models can be optimized using the Quick Search mode's hill climbing algorithm to search the BLS composing models' configuration spaces, as well as the BLS model's instance count, in parallel, looking for the maximal objective value within the specified constraints. Model Analyzer has observed positive outcomes towards finding the maximum objective value; with runtimes under one hour (compared to the days it would take a brute force run to complete) for BLS models that contain up to four composing models.
414494

495+
**Composing Model Parameter Ranges:**
496+
497+
BLS composing models can specify instance count ranges to optimize models with different resource requirements. Models are identified using the `bls_composing_models` configuration parameter. See [Ensemble Composing Model Parameter Ranges](#ensemble-composing-model-parameter-ranges) for details on supported patterns and configuration examples.
498+
415499
After Model Analyzer has found the best config(s), it will then sweep the top-N configurations found (specified by `--num-configs-per-model`) over the concurrency range before generation of the summary reports.
416500

417501
---

model_analyzer/config/generate/quick_run_config_generator.py

Lines changed: 58 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@
2525
from model_analyzer.config.run.model_run_config import ModelRunConfig
2626
from model_analyzer.config.run.run_config import RunConfig
2727
from model_analyzer.constants import LOGGER_NAME
28+
from model_analyzer.model_analyzer_exceptions import TritonModelAnalyzerException
2829
from model_analyzer.perf_analyzer.perf_config import PerfAnalyzerConfig
2930
from model_analyzer.result.run_config_measurement import RunConfigMeasurement
3031
from model_analyzer.triton.model.model_config import ModelConfig
@@ -425,33 +426,52 @@ def _get_next_model_config_variant(
425426
)
426427

427428
model_config_params = deepcopy(model.model_config_parameters())
429+
430+
# Extract user-specified instance_group kind before removing it
431+
instance_kind = self._extract_instance_group_kind(model_config_params)
432+
if not instance_kind:
433+
# Fallback to cpu_only flag
434+
instance_kind = "KIND_CPU" if model.cpu_only() else "KIND_GPU"
435+
428436
if model_config_params:
437+
# Remove parameters that are controlled by search dimensions
429438
model_config_params.pop("max_batch_size", None)
439+
model_config_params.pop("instance_group", None)
430440

431-
# This is guaranteed to only generate one combination (check is in config_command)
441+
# Generate combinations from remaining parameters
442+
# For composing models, this may include dynamic_batching settings, etc.
432443
param_combos = GeneratorUtils.generate_combinations(model_config_params)
433-
assert len(param_combos) == 1
434444

435-
param_combo = param_combos[0]
445+
# Top-level models must have exactly 1 combination (validated earlier)
446+
# Composing models can have 1 combination (non-searchable params are fixed)
447+
if len(param_combos) > 1:
448+
raise TritonModelAnalyzerException(
449+
f"Model {model.model_name()} has multiple parameter combinations "
450+
f"after removing searchable parameters. This should have been caught "
451+
f"during config validation."
452+
)
453+
454+
param_combo = param_combos[0] if param_combos else {}
436455
else:
437456
param_combo = {}
438457

439-
kind = "KIND_CPU" if model.cpu_only() else "KIND_GPU"
458+
# Add instance_group with count from dimension and kind from config
440459
instance_count = self._calculate_instance_count(dimension_values)
441-
442460
param_combo["instance_group"] = [
443461
{
444462
"count": instance_count,
445-
"kind": kind,
463+
"kind": instance_kind,
446464
}
447465
]
448466

467+
# Add max_batch_size from dimension if applicable
449468
if "max_batch_size" in dimension_values:
450469
param_combo["max_batch_size"] = self._calculate_model_batch_size(
451470
dimension_values
452471
)
453472

454-
if model.supports_dynamic_batching():
473+
# Add default dynamic_batching if model supports it and not already specified
474+
if model.supports_dynamic_batching() and "dynamic_batching" not in param_combo:
455475
param_combo["dynamic_batching"] = {}
456476

457477
model_config_variant = BaseModelConfigGenerator.make_model_config_variant(
@@ -463,6 +483,37 @@ def _get_next_model_config_variant(
463483

464484
return model_config_variant
465485

486+
def _extract_instance_group_kind(self, model_config_params: dict) -> str:
487+
"""
488+
Extract the 'kind' field from instance_group in model_config_parameters.
489+
490+
Returns empty string if not found or if instance_group is not specified.
491+
"""
492+
if not model_config_params or "instance_group" not in model_config_params:
493+
return ""
494+
495+
instance_group = model_config_params["instance_group"]
496+
497+
# Handle various nested list structures from config parsing
498+
if isinstance(instance_group, list) and len(instance_group) > 0:
499+
# Handle nested structure: [[ {...} ]]
500+
while (
501+
isinstance(instance_group, list)
502+
and len(instance_group) > 0
503+
and isinstance(instance_group[0], list)
504+
):
505+
instance_group = instance_group[0]
506+
507+
# Now should have [{...}] structure
508+
if (
509+
isinstance(instance_group, list)
510+
and len(instance_group) > 0
511+
and isinstance(instance_group[0], dict)
512+
):
513+
return instance_group[0].get("kind", "")
514+
515+
return ""
516+
466517
def _create_next_model_run_config(
467518
self,
468519
model: ModelProfileSpec,

0 commit comments

Comments
 (0)