Skip to content

Commit b51483d

Browse files
Add trt support for bundles (#479)
Add the TensorRT benchmark results for some new NGC bundles. ### Description A few sentences describing the changes proposed in this pull request. ### Status **Ready/Work in progress/Hold** ### Please ensure all the checkboxes: <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Codeformat tests passed locally by running `./runtests.sh --codeformat`. - [ ] In-line docstrings updated. - [ ] Update `version` and `changelog` in `metadata.json` if changing an existing bundle. - [ ] Please ensure the naming rules in config files meet our requirements (please refer to: `CONTRIBUTING.md`). - [ ] Ensure versions of packages such as `monai`, `pytorch` and `numpy` are correct in `metadata.json`. - [ ] Descriptions should be consistent with the content, such as `eval_metrics` of the provided weights and TorchScript modules. - [ ] Files larger than 25MB are excluded and replaced by providing download links in `large_file.yml`. - [ ] Avoid using path that contains personal information within config files (such as use `/home/your_name/` for `"bundle_root"`). --------- Signed-off-by: binliu <binliu@nvidia.com> Co-authored-by: Yiheng Wang <68361391+yiheng-wang-nv@users.noreply.github.com>
1 parent 6b55743 commit b51483d

File tree

12 files changed

+158
-6
lines changed

12 files changed

+158
-6
lines changed

ci/bundle_custom_data.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,9 @@
4949
"spleen_ct_segmentation": {},
5050
"endoscopic_tool_segmentation": {},
5151
"pathology_tumor_detection": {},
52+
"pathology_nuclei_classification": {},
53+
"pathology_nuclick_annotation": {"use_trace": True},
54+
"wholeBody_ct_segmentation": {"use_trace": True},
5255
"pancreas_ct_dints_segmentation": {
5356
"use_trace": True,
5457
"converter_kwargs": {"truncate_long_and_double": True, "torch_executed_ops": ["aten::upsample_trilinear3d"]},

models/pancreas_ct_dints_segmentation/configs/metadata.json

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,8 @@
11
{
22
"schema": "https://github.com/Project-MONAI/MONAI-extra-test-data/releases/download/0.8.1/meta_schema_20220324.json",
3-
"version": "0.4.3",
3+
"version": "0.4.4",
44
"changelog": {
5+
"0.4.4": "update the benchmark results of TensorRT",
56
"0.4.3": "add support for TensorRT conversion and inference",
67
"0.4.2": "update search function to match monai 1.2",
78
"0.4.1": "fix the wrong GPU index issue of multi-node",

models/pancreas_ct_dints_segmentation/docs/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -84,8 +84,8 @@ This bundle supports acceleration with TensorRT. The table below displays the sp
8484

8585
| method | torch_fp32(ms) | torch_amp(ms) | trt_fp32(ms) | trt_fp16(ms) | speedup amp | speedup fp32 | speedup fp16 | amp vs fp16|
8686
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
87-
| model computation | 54611.72 | 19240.66 | 16104.8 | 11443.57 | 2.84 | 3.39 | 4.77 | 1.68 |
88-
| end2end | 133.93 | 43.41 | 35.65 | 26.63 | 3.09 | 3.76 | 5.03 | 1.63 |
87+
| model computation | 133.93 | 43.41 | 35.65 | 26.63 | 3.09 | 3.76 | 5.03 | 1.63 |
88+
| end2end | 54611.72 | 19240.66 | 16104.8 | 11443.57 | 2.84 | 3.39 | 4.77 | 1.68 |
8989

9090
Where:
9191
- `model computation` means the speedup ratio of model's inference with a random input without preprocessing and postprocessing
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
{
2+
"imports": [
3+
"$import glob",
4+
"$import os",
5+
"$import pathlib",
6+
"$import json",
7+
"$import torch_tensorrt"
8+
],
9+
"handlers#0#_disabled_": true,
10+
"network_def": "$torch.jit.load(@bundle_root + '/models/model_trt.ts')",
11+
"evaluator#amp": false
12+
}

models/pathology_nuclei_classification/configs/metadata.json

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,8 @@
11
{
22
"schema": "https://github.com/Project-MONAI/MONAI-extra-test-data/releases/download/0.8.1/meta_schema_20220324.json",
3-
"version": "0.1.4",
3+
"version": "0.1.5",
44
"changelog": {
5+
"0.1.5": "add support for TensorRT conversion and inference",
56
"0.1.4": "fix the wrong GPU index issue of multi-node",
67
"0.1.3": "remove error dollar symbol in readme",
78
"0.1.2": "add RAM warning",

models/pathology_nuclei_classification/docs/README.md

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -139,6 +139,31 @@ A graph showing the validation F1-score over 100 epochs.
139139

140140
![](https://developer.download.nvidia.com/assets/Clara/Images/monai_pathology_classification_val_f1_v3.png) <br>
141141

142+
#### TensorRT speedup
143+
This bundle supports acceleration with TensorRT. The table below displays the speedup ratios observed on an A100 80G GPU.
144+
145+
| method | torch_fp32(ms) | torch_amp(ms) | trt_fp32(ms) | trt_fp16(ms) | speedup amp | speedup fp32 | speedup fp16 | amp vs fp16|
146+
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
147+
| model computation | 9.99 | 14.14 | 4.62 | 2.37 | 0.71 | 2.16 | 4.22 | 5.97 |
148+
| end2end | 412.95 | 408.88 | 351.64 | 286.85 | 1.01 | 1.17 | 1.44 | 1.43 |
149+
150+
Where:
151+
- `model computation` means the speedup ratio of model's inference with a random input without preprocessing and postprocessing
152+
- `end2end` means run the bundle end-to-end with the TensorRT based model.
153+
- `torch_fp32` and `torch_amp` are for the PyTorch models with or without `amp` mode.
154+
- `trt_fp32` and `trt_fp16` are for the TensorRT based models converted in corresponding precision.
155+
- `speedup amp`, `speedup fp32` and `speedup fp16` are the speedup ratios of corresponding models versus the PyTorch float32 model
156+
- `amp vs fp16` is the speedup ratio between the PyTorch amp model and the TensorRT float16 based model.
157+
158+
This result is benchmarked under:
159+
- TensorRT: 8.6.1+cuda12.0
160+
- Torch-TensorRT Version: 1.4.0
161+
- CPU Architecture: x86-64
162+
- OS: ubuntu 20.04
163+
- Python version:3.8.10
164+
- CUDA version: 12.1
165+
- GPU models and configuration: A100 80G
166+
142167
## MONAI Bundle Commands
143168
In addition to the Pythonic APIs, a few command line interfaces (CLI) are provided to interact with the bundle. The CLI supports flexible use cases, such as overriding configs at runtime and predefining arguments in a file.
144169

@@ -182,6 +207,18 @@ torchrun --standalone --nnodes=1 --nproc_per_node=2 -m monai.bundle run --config
182207
python -m monai.bundle run --config_file configs/inference.json
183208
```
184209

210+
#### Export checkpoint to TensorRT based models with fp32 or fp16 precision:
211+
212+
```
213+
python -m monai.bundle trt_export --net_id network_def --filepath models/model_trt.ts --ckpt_file models/model.pt --meta_file configs/metadata.json --config_file configs/inference.json --precision <fp32/fp16>
214+
```
215+
216+
#### Execute inference with the TensorRT model:
217+
218+
```
219+
python -m monai.bundle run --config_file "['configs/inference.json', 'configs/inference_trt.json']"
220+
```
221+
185222
# References
186223
[1] S. Graham, Q. D. Vu, S. E. A. Raza, A. Azam, Y-W. Tsang, J. T. Kwak and N. Rajpoot. "HoVer-Net: Simultaneous Segmentation and Classification of Nuclei in Multi-Tissue Histology Images." Medical Image Analysis, Sept. 2019. [[doi](https://doi.org/10.1016/j.media.2019.101563)]
187224

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
{
2+
"imports": [
3+
"$import glob",
4+
"$import json",
5+
"$import pathlib",
6+
"$import os",
7+
"$import torch_tensorrt"
8+
],
9+
"handlers#0#_disabled_": true,
10+
"network_def": "$torch.jit.load(@bundle_root + '/models/model_trt.ts')",
11+
"evaluator#amp": false
12+
}

models/pathology_nuclick_annotation/configs/metadata.json

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,8 @@
11
{
22
"schema": "https://github.com/Project-MONAI/MONAI-extra-test-data/releases/download/0.8.1/meta_schema_20220324.json",
3-
"version": "0.1.4",
3+
"version": "0.1.5",
44
"changelog": {
5+
"0.1.5": "add support for TensorRT conversion and inference",
56
"0.1.4": "fix the wrong GPU index issue of multi-node",
67
"0.1.3": "remove error dollar symbol in readme",
78
"0.1.2": "add RAM usage with CachDataset",

models/pathology_nuclick_annotation/docs/README.md

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -125,6 +125,31 @@ A graph showing the validation mean Dice over 50 epochs.
125125

126126
![](https://developer.download.nvidia.com/assets/Clara/Images/monai_pathology_nuclick_annotation_val_dice_v2.png) <br>
127127

128+
#### TensorRT speedup
129+
This bundle supports acceleration with TensorRT. The table below displays the speedup ratios observed on an A100 80G GPU.
130+
131+
| method | torch_fp32(ms) | torch_amp(ms) | trt_fp32(ms) | trt_fp16(ms) | speedup amp | speedup fp32 | speedup fp16 | amp vs fp16|
132+
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
133+
| model computation | 3.27 | 4.31 | 2.12 | 1.73 | 0.76 | 1.54 | 1.89 | 2.49 |
134+
| end2end | 705.32 | 752.64 | 290.45 | 347.07 | 0.94 | 2.43 | 2.03 | 2.17 |
135+
136+
Where:
137+
- `model computation` means the speedup ratio of model's inference with a random input without preprocessing and postprocessing
138+
- `end2end` means run the bundle end-to-end with the TensorRT based model.
139+
- `torch_fp32` and `torch_amp` are for the PyTorch models with or without `amp` mode.
140+
- `trt_fp32` and `trt_fp16` are for the TensorRT based models converted in corresponding precision.
141+
- `speedup amp`, `speedup fp32` and `speedup fp16` are the speedup ratios of corresponding models versus the PyTorch float32 model
142+
- `amp vs fp16` is the speedup ratio between the PyTorch amp model and the TensorRT float16 based model.
143+
144+
This result is benchmarked under:
145+
- TensorRT: 8.6.1+cuda12.0
146+
- Torch-TensorRT Version: 1.4.0
147+
- CPU Architecture: x86-64
148+
- OS: ubuntu 20.04
149+
- Python version:3.8.10
150+
- CUDA version: 12.1
151+
- GPU models and configuration: A100 80G
152+
128153
## MONAI Bundle Commands
129154
In addition to the Pythonic APIs, a few command line interfaces (CLI) are provided to interact with the bundle. The CLI supports flexible use cases, such as overriding configs at runtime and predefining arguments in a file.
130155

@@ -168,6 +193,18 @@ torchrun --standalone --nnodes=1 --nproc_per_node=2 -m monai.bundle run --config
168193
python -m monai.bundle run --config_file configs/inference.json
169194
```
170195

196+
#### Export checkpoint to TensorRT based models with fp32 or fp16 precision:
197+
198+
```
199+
python -m monai.bundle trt_export --net_id network_def --filepath models/model_trt.ts --ckpt_file models/model.pt --meta_file configs/metadata.json --config_file configs/inference.json --precision <fp32/fp16> --use_trace "True"
200+
```
201+
202+
#### Execute inference with the TensorRT model:
203+
204+
```
205+
python -m monai.bundle run --config_file "['configs/inference.json', 'configs/inference_trt.json']"
206+
```
207+
171208
# References
172209
[1] Koohbanani, Navid Alemi, et al. "NuClick: a deep learning framework for interactive segmentation of microscopic images." Medical Image Analysis 65 (2020): 101771. https://arxiv.org/abs/2005.14511.
173210

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
{
2+
"imports": [
3+
"$import glob",
4+
"$import os",
5+
"$import torch_tensorrt"
6+
],
7+
"handlers#0#_disabled_": true,
8+
"network_def": "$torch.jit.load(@bundle_root + '/models/model_trt.ts')",
9+
"evaluator#amp": false
10+
}

0 commit comments

Comments
 (0)