Skip to content

Commit 176c746

Browse files
authored
fix: print summary on local_rank=0 (#4597)
fix #4595 <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **Bug Fixes** - Improved training summary display in distributed environments. Now, only the main process outputs the training summary, ensuring clearer and more concise logging without duplicate information. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
1 parent ea966e8 commit 176c746

File tree

2 files changed

+6
-2
lines changed

2 files changed

+6
-2
lines changed

deepmd/pd/entrypoints/main.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,7 @@
4646
)
4747
from deepmd.pd.utils.env import (
4848
DEVICE,
49+
LOCAL_RANK,
4950
)
5051
from deepmd.pd.utils.finetune import (
5152
get_finetune_rules,
@@ -232,7 +233,8 @@ def train(
232233
output: str = "out.json",
233234
) -> None:
234235
log.info("Configuration path: %s", input_file)
235-
SummaryPrinter()()
236+
if LOCAL_RANK == 0:
237+
SummaryPrinter()()
236238
with open(input_file) as fin:
237239
config = json.load(fin)
238240
# ensure suffix, as in the command line help, we say "path prefix of checkpoint files"

deepmd/pt/entrypoints/main.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,7 @@
6161
)
6262
from deepmd.pt.utils.env import (
6363
DEVICE,
64+
LOCAL_RANK,
6465
)
6566
from deepmd.pt.utils.finetune import (
6667
get_finetune_rules,
@@ -254,7 +255,8 @@ def train(
254255
output: str = "out.json",
255256
) -> None:
256257
log.info("Configuration path: %s", input_file)
257-
SummaryPrinter()()
258+
if LOCAL_RANK == 0:
259+
SummaryPrinter()()
258260
with open(input_file) as fin:
259261
config = json.load(fin)
260262
# ensure suffix, as in the command line help, we say "path prefix of checkpoint files"

0 commit comments

Comments
 (0)