Skip to content

Commit 46411c1

Browse files
OutisLinjzjzpre-commit-ci[bot]
authored
Profile bug fix when both enable_profiler and profiling are set to true. (#4855)
When enable_profiler has been set to true, it will save the profiling results into json format in the tensorboard_log_dir. Currently, when both enable_profiler and profiling are set to true, it will throw an error `RuntimeError: Trace is already saved`. As a result, just ignore the profiling option when enable_profiler is true. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Bug Fixes** * Refined profiling trace export logic to prevent duplicate or unintended trace exports when the new profiler is enabled. Profiling traces are now only exported if profiling is enabled and the new profiler is not active. * **Documentation** * Updated profiling documentation to clarify that when the new profiler is enabled, profiling results are saved to TensorBoard logs instead of a Chrome JSON file. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: LI TIANCHENG <137472077+OutisLi@users.noreply.github.com> Co-authored-by: Jinzhe Zeng <njzjz@qq.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
1 parent 91ebe34 commit 46411c1

File tree

2 files changed

+2
-2
lines changed

2 files changed

+2
-2
lines changed

deepmd/pt/train/training.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1147,7 +1147,7 @@ def log_loss_valid(_task_key="Default"):
11471147
log.info(
11481148
f"The profiling trace has been saved under {self.tensorboard_log_dir}"
11491149
)
1150-
if self.profiling:
1150+
if not self.enable_profiler and self.profiling:
11511151
prof.export_chrome_trace(self.profiling_file)
11521152
log.info(
11531153
f"The profiling trace has been saved to: {self.profiling_file}"

deepmd/utils/argcheck.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3140,7 +3140,7 @@ def training_args(
31403140
doc_disp_avg = (
31413141
"Display the average loss over the display interval for training sets."
31423142
)
3143-
doc_profiling = "Export the profiling results to the Chrome JSON file for performance analysis, driven by the legacy TensorFlow profiling API or PyTorch Profiler. The output file will be saved to `profiling_file`."
3143+
doc_profiling = "Export the profiling results to the Chrome JSON file for performance analysis, driven by the legacy TensorFlow profiling API or PyTorch Profiler. The output file will be saved to `profiling_file`. In the PyTorch backend, when enable_profiler is True, this option is ignored, since the profiling results will be saved to the TensorBoard log."
31443144
doc_profiling_file = "Output file for profiling."
31453145
doc_enable_profiler = "Export the profiling results to the TensorBoard log for performance analysis, driven by TensorFlow Profiler (available in TensorFlow 2.3) or PyTorch Profiler. The log will be saved to `tensorboard_log_dir`."
31463146
doc_tensorboard = "Enable tensorboard"

0 commit comments

Comments
 (0)