You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add Quick search support for ensemble composing model parameter ranges
Fix copyrights
Fix tests
Update tests
Allow user to specify composing models instead of relying on auto-discovery
Warn when there is a non-existent composing model.
Update copyrights
Update copyrights
Fix model name YAML
Correctly get kind
Properly set CPU/GPU kind
Copy file name to clipboardExpand all lines: docs/config_search.md
+91-14Lines changed: 91 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,17 +1,6 @@
1
1
<!--
2
-
Copyright (c) 2020-2024, NVIDIA CORPORATION. All rights reserved.
3
-
4
-
Licensed under the Apache License, Version 2.0 (the "License");
5
-
you may not use this file except in compliance with the License.
6
-
You may obtain a copy of the License at
7
-
8
-
http://www.apache.org/licenses/LICENSE-2.0
9
-
10
-
Unless required by applicable law or agreed to in writing, software
11
-
distributed under the License is distributed on an "AS IS" BASIS,
12
-
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13
-
See the License for the specific language governing permissions and
14
-
limitations under the License.
2
+
SPDX-FileCopyrightText: Copyright (c) 2020-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
3
+
SPDX-License-Identifier: Apache-2.0
15
4
-->
16
5
17
6
# Table of Contents
@@ -240,7 +229,8 @@ manual sweep:
240
229
241
230
_This mode has the following limitations:_
242
231
243
-
- If model config parameters are specified, they can contain only one possible combination of parameters
232
+
- Top-level models can contain only one possible combination of model config parameters
233
+
- Composing models (ensemble/BLS sub-models) can specify parameter ranges if they follow specific patterns (see [Ensemble Composing Model Parameter Ranges](#ensemble-composing-model-parameter-ranges))
244
234
245
235
This mode uses a hill climbing algorithm to search the configuration space, looking for
246
236
the maximal objective value within the specified constraints. In the majority of cases
@@ -262,6 +252,85 @@ profile_models:
262
252
263
253
---
264
254
255
+
### **Ensemble Composing Model Parameter Ranges**
256
+
257
+
When profiling ensemble or BLS models in Quick search mode, composing models (sub-models) can specify parameter ranges for `instance_group` count. This enables optimization of composing models with different resource requirements, such as:
258
+
259
+
- CPU-bound models (tokenizers, preprocessors) that may benefit from higher instance counts
260
+
- GPU-bound models (inference models, embeddings) with limited GPU memory
261
+
262
+
**Supported Instance Count Patterns:**
263
+
264
+
Model Analyzer supports two types of instance count sequences that map to Quick search's coordinate system:
265
+
266
+
1. **Powers of 2**: `[1, 2, 4, 8, 16, 32]`or subsets like `[2, 4, 8]`
- Only the ensemble is listed in `profile_models` - composing models are auto-discovered from `ensemble_scheduling`
314
+
- The `ensemble_composing_models` section provides configurations for auto-discovered models
315
+
- The tokenizer (CPU model) searches instance counts from 1 to 32
316
+
- The inference model (GPU model) searches instance counts from 1 to 8
317
+
- Quick search explores both dimensions in parallel to find the optimal combination
318
+
- The ensemble model itself uses default parameters
319
+
- Any models specified in `ensemble_composing_models` that don't exist in the ensemble will be ignored with a warning
320
+
321
+
**Instance Group Kind:**
322
+
323
+
The `kind` field (`KIND_CPU` or `KIND_GPU`) in `instance_group` is respected when explicitly specified. This allows you to control whether a model runs on CPU or GPU directly in the config without needing the separate `cpu_only_composing_models` option.
324
+
325
+
Priority order for determining instance kind:
326
+
1. **Explicit `kind` in `instance_group`** (highest priority) - if you specify `kind: KIND_CPU` or `kind: KIND_GPU`, that value is used
327
+
2. **`cpu_only_composing_models` config** - models listed here will use KIND_CPU
328
+
3. **Default to KIND_GPU** (lowest priority) - if neither is specified, models default to GPU instances
329
+
330
+
This means you can override `cpu_only_composing_models` by explicitly specifying `kind: KIND_GPU` in the instance_group
331
+
332
+
---
333
+
265
334
### **Limiting Batch Size, Instance Group, and Client Concurrency**
266
335
267
336
Using the `--run-config-search-<min/max>...` config options you have the ability to clamp the algorithm's upper or lower bounds for the model's batch size and instance group count, as well as the client's request concurrency.
@@ -398,6 +467,10 @@ _This mode has the following limitations:_
398
467
399
468
Ensemble models can be optimized using the Quick Search mode's hill climbing algorithm to search the composing models' configuration spaces in parallel, looking for the maximal objective value within the specified constraints. Model Analyzer has observed positive outcomes towards finding the maximum objective value; with runtimes under one hour (compared to the days it would take a brute force run to complete) for ensembles that contain up to four composing models.
400
469
470
+
**Composing Model Parameter Ranges:**
471
+
472
+
Composing models within ensembles can specify instance count ranges to optimize models with different resource requirements (e.g., CPU tokenizers vs GPU inference models). See [Ensemble Composing Model Parameter Ranges](#ensemble-composing-model-parameter-ranges) for details on supported patterns and configuration examples.
473
+
401
474
After Model Analyzer has found the best config(s), it will then sweep the top-N configurations found (specified by `--num-configs-per-model`) over the concurrency range before generation of the summary reports.
402
475
403
476
---
@@ -412,6 +485,10 @@ _This mode has the following limitations:_
412
485
413
486
BLS models can be optimized using the Quick Search mode's hill climbing algorithm to search the BLS composing models' configuration spaces, as well as the BLS model's instance count, in parallel, looking for the maximal objective value within the specified constraints. Model Analyzer has observed positive outcomes towards finding the maximum objective value; with runtimes under one hour (compared to the days it would take a brute force run to complete) for BLS models that contain up to four composing models.
414
487
488
+
**Composing Model Parameter Ranges:**
489
+
490
+
BLS composing models can specify instance count ranges to optimize models with different resource requirements. Models are identified using the `bls_composing_models` configuration parameter. See [Ensemble Composing Model Parameter Ranges](#ensemble-composing-model-parameter-ranges) for details on supported patterns and configuration examples.
491
+
415
492
After Model Analyzer has found the best config(s), it will then sweep the top-N configurations found (specified by `--num-configs-per-model`) over the concurrency range before generation of the summary reports.
0 commit comments