Skip to content

Conversation

@HydrogenSulfate
Copy link
Collaborator

@HydrogenSulfate HydrogenSulfate commented Aug 5, 2025

Support DPA2/DPA3 C++ inference with lammps

Note

The intermediate representation (IR) of PaddlePaddle's computation graph does not support the Dict[str, Tensor] data type. Considering the special nature of comm_dict, we replaced it with List[Tensor], which allowed us to successfully run inference for DPA2 and DPA3.

  • DPA2
    4c116161d304da381ffeed968857be1f

  • DPA3
    863dfb7c8f285149cfdaf6aa4c6849fc

Summary by CodeRabbit

Summary by CodeRabbit

  • New Features

    • Added support for distributed computation and message passing in PaddlePaddle backend, including a custom operator for efficient border data exchange.
    • Enabled optional retrieval of intermediate network outputs during evaluation.
    • Introduced input statistics computation for model fitting.
    • Added dynamic loading of Paddle custom operator library and integration in model descriptor and repformer layers.
  • Improvements

    • Enhanced model freezing and input signature flexibility with dynamic batch sizes and expanded communication tensor support.
    • Improved tensor operations for better device handling, explicit broadcasting, and type casting consistency.
    • Refined handling of atomic virial outputs and descriptor communication.
    • Added support for spin and non-spin modes in parallel descriptor computations using the custom operator.
    • Simplified device placement in tensor creation and removed redundant explicit device transfers.
    • Improved input validation and type annotations across models and descriptors.
    • Replaced ad-hoc logging in C++ API with a structured logger for better debug output.
    • Updated parameter documentation for improved clarity.
  • Bug Fixes

    • Fixed tensor padding, masking, and device placement issues for improved robustness in distributed and parallel scenarios.
    • Corrected handling of neighbor list masking and indexing with explicit tensor fills to avoid broadcast errors.
  • Build/Chores

    • Added new build scripts and CMake configurations for PaddlePaddle custom operator integration.
    • Improved support for PaddlePaddle model detection and operator library loading in both Python and C++ APIs.
    • Implemented full message passing setup in C++ API replacing previous exceptions.
    • Added Python setup script for Paddle custom operator extension build.

Copilot AI review requested due to automatic review settings August 5, 2025 11:48
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for DPA2/DPA3 C++ inference with LAMMPS integration for the Paddle framework. The changes implement necessary Paddle operators and build infrastructure to enable message passing and communication patterns required for DPA2/DPA3 models in distributed computing environments.

  • Adds custom Paddle operators for border communication (message passing) needed for DPA2/DPA3
  • Updates C++ API to support Paddle model loading and inference with communication dictionary
  • Fixes various tensor operations and API calls to be compatible with Paddle framework requirements

Reviewed Changes

Copilot reviewed 32 out of 32 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
source/op/pd/setup.py Setup script for building Paddle C++ extension
source/op/pd/comm.cc Core C++ implementation of border communication operators
source/op/pd/CMakeLists.txt Build configuration for Paddle operators
source/install/build_cc_pd.sh Build script for C++ Paddle inference
source/api_cc/src/common.cc Updates to load Paddle operators and detect Paddle models
source/api_cc/src/DeepPotPD.cc Main Paddle C++ API implementation with communication support
source/api_cc/include/DeepPotPD.h Header updates for communication tensor storage
deepmd/pd/utils/stat.py Statistics computation utilities with type distinction support
deepmd/pd/utils/nlist.py Neighbor list utilities with API compatibility fixes
deepmd/pd/model/descriptor/*.py Various descriptor implementations with communication support
deepmd/pd/entrypoints/main.py Model freezing updates for static graph compilation
Comments suppressed due to low confidence (3)

source/op/pd/comm.cc:228

  • [nitpick] Consider using a named constant or compile-time check instead of runtime sizeof comparison for better performance and clarity.
                             communicator_tensor, nlocal_tensor, nghost_tensor);

source/op/pd/comm.cc:246

  • Incomplete documentation block. The comment should be properly formatted as a standard C++ comment or completed as a proper docblock.
                       paddle::Tensor& recv_g1_tensor_grad) {

deepmd/pd/utils/stat.py:129

  • The function has incomplete documentation. The existing docstring should describe all parameters and the return value, not just the purpose.
def _post_process_stat(

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 5, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

This update introduces a PaddlePaddle custom operator library (deepmd_op_pd) with full build and runtime integration across C++, Python, and CMake. New Python modules handle dynamic operator loading and interface with the custom op in descriptor code. The C++ inference backend and operator library support MPI-based border communication, with new logic for message passing and ghost atom handling. Numerous type annotation, tensor casting, and device handling improvements are made throughout the Paddle model codebase.

Changes

Cohort / File(s) Change Summary
Paddle Custom Operator: Build & Integration
source/op/pd/CMakeLists.txt, source/op/pd/comm.cc, source/op/pd/setup.py, deepmd/pd/cxx_op.py
Adds CMake and Python build scripts for the Paddle custom op (deepmd_op_pd), implements MPI-based border communication in C++, and introduces a Python module for dynamic loading of the operator.
Descriptor Integration with Custom Op
deepmd/pd/model/descriptor/repflows.py, deepmd/pd/model/descriptor/repformers.py
Integrates the custom operator into descriptor forward passes, supporting both spin and non-spin cases, with conditional fallback if the op is unavailable. Adjusts tensor padding, splitting, and communication logic.
Descriptor Type Annotation & Input Handling
deepmd/pd/model/descriptor/dpa1.py, deepmd/pd/model/descriptor/dpa2.py, deepmd/pd/model/descriptor/dpa3.py, deepmd/pd/model/descriptor/se_a.py, deepmd/pd/model/descriptor/se_t_tebd.py, deepmd/pd/model/model/make_model.py, deepmd/pd/model/model/ener_model.py
Updates type annotations for comm_dict parameters from optional dicts to optional lists of tensors, adjusts tensor casting (from .to() to .astype()), and refines input checks and device handling.
Descriptor Tensor Indexing & Masking
deepmd/pd/model/descriptor/env_mat.py, deepmd/pd/model/descriptor/se_atten.py, deepmd/pd/model/model/make_model.py, deepmd/pd/utils/exclude_mask.py, deepmd/pd/utils/nlist.py
Adds explicit broadcast=False to paddle.take_along_axis calls, replaces scalar paddings with tensor-based paddings, and removes explicit device placement in tensor operations.
Atomic Model & Fitting Enhancements
deepmd/pd/model/atomic_model/base_atomic_model.py, deepmd/pd/model/atomic_model/dp_atomic_model.py, deepmd/pd/model/task/fitting.py
Refines type annotations, adds hooks for intermediate output retrieval in fitting networks, implements input statistics computation, and updates statistics logic to use new flags instead of output definitions.
Model Output, Freezing, and Transformation
deepmd/pd/entrypoints/main.py, deepmd/pd/model/model/ener_model.py, deepmd/pd/model/model/transform_output.py
Updates freezing input specs, handles atomic virial outputs more explicitly, and refines output casting and device handling.
C++ Inference Backend & Operator Loading
source/api_cc/include/DeepPotPD.h, source/api_cc/src/DeepPotPD.cc, source/api_cc/src/common.cc
Adds support for Paddle backend, enables unconditional loading of the custom op, fully implements message passing and ghost atom logic, and updates tensor input handling for Paddle models.
CMake Build System
source/CMakeLists.txt, source/api_cc/CMakeLists.txt
Adjusts subdirectory inclusion for Paddle op, cleans up CMake files, and ensures correct build order and conditions.
Statistics Utilities
deepmd/pd/utils/stat.py
Refactors output statistics computation to use boolean flags for type distinction and intensiveness, updates shape handling, and clarifies function signatures and comments.
Miscellaneous Model & Network Changes
deepmd/pd/model/descriptor/repformer_layer.py, deepmd/pd/model/network/network.py
Removes explicit device placement in tensor creation and simplifies tensor handling.

Sequence Diagram(s)

sequenceDiagram
    participant PythonUser
    participant deepmd.pd.cxx_op
    participant deepmd_op_pd (Custom Op)
    participant DescriptorModule
    participant PaddleRuntime

    PythonUser->>deepmd.pd.cxx_op: import cxx_op
    deepmd.pd.cxx_op->>deepmd_op_pd (Custom Op): Try dynamic load ("deepmd_op_pd")
    deepmd.pd.cxx_op-->>PythonUser: Expose ENABLE_CUSTOMIZED_OP, paddle_ops_deepmd

    PythonUser->>DescriptorModule: Call .forward(..., comm_dict)
    DescriptorModule->>deepmd.pd.cxx_op: Use paddle_ops_deepmd.border_op (if enabled)
    alt Custom Op enabled
        DescriptorModule->>deepmd_op_pd (Custom Op): border_op(sendlist, ... , g1, ...)
        deepmd_op_pd (Custom Op)->>PaddleRuntime: MPI communication, tensor exchange
        deepmd_op_pd (Custom Op)-->>DescriptorModule: Return processed tensor
    else Custom Op not enabled
        DescriptorModule-->>PythonUser: Raise NotImplementedError
    end
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~90 minutes

Suggested labels

LAMMPS

Suggested reviewers

  • iProzd
  • njzjz

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

config->EnableNewIR(true);
config->EnableCustomPasses({"add_shadow_output_after_dead_parameter_pass"},
true);
// config->SwitchIrOptim(false);

Check notice

Code scanning / CodeQL

Commented-out code Note

This comment appears to contain commented-out code.
config_fl->EnableNewIR(true);
config_fl->EnableCustomPasses({"add_shadow_output_after_dead_parameter_pass"},
true);
// config_fl->SwitchIrOptim(false);

Check notice

Code scanning / CodeQL

Commented-out code Note

This comment appears to contain commented-out code.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
deepmd/pd/model/descriptor/repflows.py (1)

595-673: Comprehensive parallel mode implementation with proper spin support

This implementation correctly handles both regular and spin modes in parallel execution:

Strengths:

  • Proper detection of spin mode via comm_dict length
  • Correct tensor padding and reshaping for both modes
  • Appropriate use of the custom border_op with CPU-placed integer tensors
  • Clean separation of real/virtual embeddings in spin mode
  • Proper use of concat_switch_virtual for spin tensor rearrangement

Potential concerns:

  • The assertion assert len(comm_dict) >= 6 appears twice (lines 595, 646) - consider consolidating
  • Complex nested conditionals for spin vs non-spin modes could benefit from helper methods

Consider extracting helper methods to reduce complexity:

def _prepare_node_embedding_for_parallel(self, node_ebd, comm_dict, nloc, nall):
    """Helper to prepare node embeddings for parallel processing."""
    has_spin = len(comm_dict) >= 7
    if not has_spin:
        return self._pad_regular_embedding(node_ebd, nloc, nall), nloc, nall
    else:
        return self._prepare_spin_embedding(node_ebd, nloc, nall)

def _pad_regular_embedding(self, node_ebd, nloc, nall):
    """Pad regular (non-spin) node embeddings."""
    # Current padding logic here
    
def _prepare_spin_embedding(self, node_ebd, nloc, nall):
    """Prepare spin mode embeddings with proper splitting and concatenation."""
    # Current spin logic here
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d20ef07 and 75c56b1.

📒 Files selected for processing (4)
  • deepmd/pd/model/atomic_model/base_atomic_model.py (13 hunks)
  • deepmd/pd/model/descriptor/repflows.py (8 hunks)
  • deepmd/pd/model/descriptor/repformers.py (8 hunks)
  • source/api_cc/CMakeLists.txt (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • source/api_cc/CMakeLists.txt
  • deepmd/pd/model/descriptor/repformers.py
  • deepmd/pd/model/atomic_model/base_atomic_model.py
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: njzjz
PR: deepmodeling/deepmd-kit#4144
File: source/api_cc/tests/test_deeppot_dpa_pt.cc:166-246
Timestamp: 2024-10-08T15:32:11.479Z
Learning: Refactoring between test classes `TestInferDeepPotDpaPt` and `TestInferDeepPotDpaPtNopbc` is addressed in PR #3905.
Learnt from: njzjz
PR: deepmodeling/deepmd-kit#4144
File: source/api_cc/tests/test_deeppot_dpa_pt.cc:166-246
Timestamp: 2024-09-19T04:25:12.408Z
Learning: Refactoring between test classes `TestInferDeepPotDpaPt` and `TestInferDeepPotDpaPtNopbc` is addressed in PR #3905.
📚 Learning: refactoring between test classes `testinferdeeppotdpapt` and `testinferdeeppotdpaptnopbc` is address...
Learnt from: njzjz
PR: deepmodeling/deepmd-kit#4144
File: source/api_cc/tests/test_deeppot_dpa_pt.cc:166-246
Timestamp: 2024-10-08T15:32:11.479Z
Learning: Refactoring between test classes `TestInferDeepPotDpaPt` and `TestInferDeepPotDpaPtNopbc` is addressed in PR #3905.

Applied to files:

  • deepmd/pd/model/descriptor/repflows.py
🧬 Code Graph Analysis (1)
deepmd/pd/model/descriptor/repflows.py (3)
deepmd/pd/utils/spin.py (1)
  • concat_switch_virtual (6-29)
deepmd/pd/model/descriptor/repformers.py (1)
  • border_op (57-71)
deepmd/pd/model/network/network.py (1)
  • Tensor (30-33)
🪛 Ruff (0.12.2)
deepmd/pd/model/descriptor/repflows.py

449-450: Use a single if statement instead of nested if statements

Combine if statements using and

(SIM102)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (30)
  • GitHub Check: Test Python (6, 3.12)
  • GitHub Check: Test Python (6, 3.9)
  • GitHub Check: Test Python (5, 3.9)
  • GitHub Check: Test Python (3, 3.9)
  • GitHub Check: Test Python (1, 3.12)
  • GitHub Check: Test Python (4, 3.12)
  • GitHub Check: Test Python (5, 3.12)
  • GitHub Check: Test Python (4, 3.9)
  • GitHub Check: Test Python (1, 3.9)
  • GitHub Check: Test Python (2, 3.9)
  • GitHub Check: Test Python (3, 3.12)
  • GitHub Check: Test Python (2, 3.12)
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (python)
  • GitHub Check: Analyze (c-cpp)
  • GitHub Check: Test C++ (true)
  • GitHub Check: Test C++ (false)
  • GitHub Check: Build C++ (clang, clang)
  • GitHub Check: Build C++ (cpu, cpu)
  • GitHub Check: Build C library (2.14, >=2.5.0,<2.15, libdeepmd_c_cu11.tar.gz)
  • GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
  • GitHub Check: Build C++ (cuda120, cuda)
  • GitHub Check: Build C++ (cuda, cuda)
  • GitHub Check: Build C++ (rocm, rocm)
  • GitHub Check: Build wheels for cp310-manylinux_aarch64
  • GitHub Check: Build wheels for cp311-win_amd64
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Build wheels for cp311-macosx_arm64
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Build wheels for cp311-macosx_x86_64
🔇 Additional comments (6)
deepmd/pd/model/descriptor/repflows.py (6)

13-16: LGTM: Clean conditional import structure

The import of custom operator components is well-structured and follows the expected pattern for optional dependencies.


41-43: LGTM: Appropriate import for spin functionality

The import of concat_switch_virtual is correctly added to support the spin mode functionality implemented later in the file.


58-79: LGTM: Robust fallback mechanism for custom operators

The conditional setup provides a clean fallback when custom operators are not available, with clear error messaging that references documentation. The assignment pattern matches other descriptor modules.


446-446: LGTM: Type annotation update aligns with interface changes

The change from Optional[dict] to Optional[list[paddle.Tensor]] correctly reflects the new communication interface used across descriptor modules.


450-451: LGTM: Appropriate guards for dynamic mode assertions

The addition of paddle.in_dynamic_mode() guards around mapping assertions is correct. These assertions are only meaningful in dynamic mode where JIT tracing doesn't apply.

Also applies to: 530-531, 576-577, 587-588


457-457: LGTM: Tensor compatibility improvement

Replacing scalar -1 with paddle.full_like(..., -1) ensures proper tensor operations and type consistency. This is a good practice for tensor frameworks.

Also applies to: 480-480

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
source/api_cc/src/DeepPotPD.cc (2)

213-213: Remove commented code.

This commented line should be removed from production code as flagged by static analysis.

-  // config->SwitchIrOptim(false);

222-222: Remove commented code.

This commented line should be removed from production code as flagged by static analysis.

-  // config_fl->SwitchIrOptim(false);
🧹 Nitpick comments (1)
source/api_cc/src/DeepPotPD.cc (1)

14-165: Consider moving the Logger class to a separate header file.

The Logger class is a substantial addition (150+ lines) that provides comprehensive logging functionality. While the implementation looks solid with proper RAII semantics and configurable levels, it would be better placed in a dedicated header file for reusability across the codebase.

Consider extracting this to source/api_cc/include/Logger.h and source/api_cc/src/Logger.cc for better modularity and potential reuse in other components.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 75c56b1 and 6f3c323.

📒 Files selected for processing (2)
  • deepmd/pd/utils/nlist.py (8 hunks)
  • source/api_cc/src/DeepPotPD.cc (9 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • deepmd/pd/utils/nlist.py
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: njzjz
PR: deepmodeling/deepmd-kit#4144
File: source/api_cc/tests/test_deeppot_dpa_pt.cc:166-246
Timestamp: 2024-09-19T04:25:12.408Z
Learning: Refactoring between test classes `TestInferDeepPotDpaPt` and `TestInferDeepPotDpaPtNopbc` is addressed in PR #3905.
Learnt from: njzjz
PR: deepmodeling/deepmd-kit#4144
File: source/api_cc/tests/test_deeppot_dpa_pt.cc:166-246
Timestamp: 2024-10-08T15:32:11.479Z
Learning: Refactoring between test classes `TestInferDeepPotDpaPt` and `TestInferDeepPotDpaPtNopbc` is addressed in PR #3905.
Learnt from: njzjz
PR: deepmodeling/deepmd-kit#4226
File: deepmd/dpmodel/atomic_model/base_atomic_model.py:202-202
Timestamp: 2024-10-16T21:49:57.401Z
Learning: When reviewing PRs, avoid making refactor suggestions that are not directly related to the PR's changes. For example, in `deepmd/dpmodel/atomic_model/base_atomic_model.py`, do not suggest simplifying `for kk in ret_dict.keys()` to `for kk in ret_dict` unless it's relevant to the PR.
📚 Learning: refactoring between test classes `testinferdeeppotdpapt` and `testinferdeeppotdpaptnopbc` is address...
Learnt from: njzjz
PR: deepmodeling/deepmd-kit#4144
File: source/api_cc/tests/test_deeppot_dpa_pt.cc:166-246
Timestamp: 2024-10-08T15:32:11.479Z
Learning: Refactoring between test classes `TestInferDeepPotDpaPt` and `TestInferDeepPotDpaPtNopbc` is addressed in PR #3905.

Applied to files:

  • source/api_cc/src/DeepPotPD.cc
📚 Learning: in `deepmd/dpmodel/model/make_model.py`, the variable `nall` assigned but not used is intentional an...
Learnt from: njzjz
PR: deepmodeling/deepmd-kit#4226
File: deepmd/dpmodel/model/make_model.py:370-373
Timestamp: 2024-10-16T21:50:10.680Z
Learning: In `deepmd/dpmodel/model/make_model.py`, the variable `nall` assigned but not used is intentional and should not be flagged in future reviews.

Applied to files:

  • source/api_cc/src/DeepPotPD.cc
📚 Learning: for code blocks in `doc/model/train-fitting-dos.md` that display commands, use 'txt' as the language...
Learnt from: njzjz
PR: deepmodeling/deepmd-kit#3875
File: doc/model/train-fitting-dos.md:107-107
Timestamp: 2024-10-08T15:32:11.479Z
Learning: For code blocks in `doc/model/train-fitting-dos.md` that display commands, use 'txt' as the language specification as per user njzjz's preference.

Applied to files:

  • source/api_cc/src/DeepPotPD.cc
🧬 Code Graph Analysis (1)
source/api_cc/src/DeepPotPD.cc (1)
source/api_cc/src/common.cc (2)
  • load_op_library (412-439)
  • load_op_library (412-412)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Build wheels for cp311-win_amd64
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Build wheels for cp310-manylinux_aarch64
  • GitHub Check: Build wheels for cp311-macosx_arm64
  • GitHub Check: Test Python (6, 3.9)
  • GitHub Check: Test Python (6, 3.12)
  • GitHub Check: Test Python (3, 3.12)
  • GitHub Check: Test Python (3, 3.9)
  • GitHub Check: Test Python (4, 3.12)
  • GitHub Check: Test Python (5, 3.9)
  • GitHub Check: Test Python (5, 3.12)
  • GitHub Check: Test Python (1, 3.12)
  • GitHub Check: Test Python (1, 3.9)
  • GitHub Check: Test Python (4, 3.9)
  • GitHub Check: Test Python (2, 3.9)
  • GitHub Check: Test Python (2, 3.12)
  • GitHub Check: Test C++ (false)
  • GitHub Check: Test C++ (true)
  • GitHub Check: Build C++ (cuda, cuda)
  • GitHub Check: Analyze (python)
  • GitHub Check: Analyze (c-cpp)
  • GitHub Check: Build C++ (cuda120, cuda)
  • GitHub Check: Build C++ (cpu, cpu)
  • GitHub Check: Build C++ (clang, clang)
  • GitHub Check: Build C++ (rocm, rocm)
  • GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
  • GitHub Check: Build C library (2.14, >=2.5.0,<2.15, libdeepmd_c_cu11.tar.gz)
🔇 Additional comments (10)
source/api_cc/src/DeepPotPD.cc (10)

197-197: LGTM: Op library loading enabled.

The uncommented call to deepmd::load_op_library() correctly enables the Paddle custom operator library loading, which is essential for DPA2/DPA3 inference support as mentioned in the PR objectives.


269-344: Enhanced logging improves debugging capabilities.

The replacement of ad-hoc std::cout statements with structured logging using the new Logger class significantly improves code maintainability and debugging capabilities. The debug-level logging of hyperparameters will be valuable for troubleshooting model issues.


376-376: Input tensor name updated to match Paddle model specifications.

The change from "coord" to "extended_coord" aligns with the updated Paddle model freezing specifications mentioned in the AI summary.


380-380: Input tensor name updated to match Paddle model specifications.

The change from "atype" to "extended_atype" aligns with the updated Paddle model freezing specifications mentioned in the AI summary.


388-448: Message passing implementation enables distributed computation.

The implementation replaces the previous exception with comprehensive MPI-based message passing logic. The pointer-to-integer conversion workaround (lines 431-438) is a clever solution to Paddle's limitation with double pointer tensors.

Key improvements:

  • Proper tensor setup for MPI communication parameters
  • Safe type casting with assertions
  • Efficient pointer address conversion for sendlist handling

The assertion on Line 421 ensures pointer size compatibility before the conversion, which is good defensive programming.


445-448: Tensor handle promoted to member variable.

Converting mapping_tensor from a local to a member variable (this->mapping_tensor) is likely necessary to maintain tensor lifetime during asynchronous Paddle inference operations.


451-455: Tensor handle promoted to member variable.

Converting firstneigh_tensor from a local to a member variable (this->firstneigh_tensor) is likely necessary to maintain tensor lifetime during asynchronous Paddle inference operations.


457-461: Redundant parentheses removed in CopyFromCpu call.

The removal of redundant parentheses in CopyFromCpu(fparam.data()) improves code readability without changing functionality.


562-562: Input tensor name updated to match Paddle model specifications.

The change from "coord" to "extended_coord" in the second predictor aligns with the updated Paddle model freezing specifications.


567-567: Input tensor name updated to match Paddle model specifications.

The change from "atype" to "extended_atype" in the second predictor aligns with the updated Paddle model freezing specifications.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
deepmd/pd/model/task/fitting.py (1)

629-629: Remove invalid .to(device=...) method call.

This matches the previous review comment - PaddlePaddle tensors don't support the .to(device=...) method. This should be removed or replaced with appropriate PaddlePaddle tensor placement.

-                ).to(device=descriptor.place)  # jit assertion
+                )  # jit assertion
🧹 Nitpick comments (1)
deepmd/pd/model/task/fitting.py (1)

98-101: Use ternary operator for cleaner code.

The conditional assignment can be simplified using a ternary operator as suggested by static analysis.

-        if callable(merged):
-            sampled = merged()
-        else:
-            sampled = merged
+        sampled = merged() if callable(merged) else merged
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cb6d855 and e698253.

📒 Files selected for processing (1)
  • deepmd/pd/model/task/fitting.py (7 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: njzjz
PR: deepmodeling/deepmd-kit#4144
File: source/api_cc/tests/test_deeppot_dpa_pt.cc:166-246
Timestamp: 2024-09-19T04:25:12.408Z
Learning: Refactoring between test classes `TestInferDeepPotDpaPt` and `TestInferDeepPotDpaPtNopbc` is addressed in PR #3905.
Learnt from: njzjz
PR: deepmodeling/deepmd-kit#4144
File: source/api_cc/tests/test_deeppot_dpa_pt.cc:166-246
Timestamp: 2024-10-08T15:32:11.479Z
Learning: Refactoring between test classes `TestInferDeepPotDpaPt` and `TestInferDeepPotDpaPtNopbc` is addressed in PR #3905.
Learnt from: njzjz
PR: deepmodeling/deepmd-kit#4226
File: deepmd/dpmodel/atomic_model/base_atomic_model.py:202-202
Timestamp: 2024-10-16T21:49:57.401Z
Learning: When reviewing PRs, avoid making refactor suggestions that are not directly related to the PR's changes. For example, in `deepmd/dpmodel/atomic_model/base_atomic_model.py`, do not suggest simplifying `for kk in ret_dict.keys()` to `for kk in ret_dict` unless it's relevant to the PR.
🪛 Ruff (0.12.2)
deepmd/pd/model/task/fitting.py

98-101: Use ternary operator sampled = merged() if callable(merged) else merged instead of if-else-block

Replace if-else-block with sampled = merged() if callable(merged) else merged

(SIM108)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (30)
  • GitHub Check: Analyze (python)
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (c-cpp)
  • GitHub Check: Test Python (5, 3.12)
  • GitHub Check: Test Python (4, 3.9)
  • GitHub Check: Test Python (6, 3.9)
  • GitHub Check: Test Python (5, 3.9)
  • GitHub Check: Test Python (3, 3.12)
  • GitHub Check: Test Python (6, 3.12)
  • GitHub Check: Test Python (4, 3.12)
  • GitHub Check: Test Python (3, 3.9)
  • GitHub Check: Test Python (2, 3.12)
  • GitHub Check: Test Python (2, 3.9)
  • GitHub Check: Test Python (1, 3.12)
  • GitHub Check: Test Python (1, 3.9)
  • GitHub Check: Build wheels for cp310-manylinux_aarch64
  • GitHub Check: Build wheels for cp311-macosx_arm64
  • GitHub Check: Build wheels for cp311-win_amd64
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Build wheels for cp311-macosx_x86_64
  • GitHub Check: Test C++ (true)
  • GitHub Check: Test C++ (false)
  • GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
  • GitHub Check: Build C library (2.14, >=2.5.0,<2.15, libdeepmd_c_cu11.tar.gz)
  • GitHub Check: Build C++ (cpu, cpu)
  • GitHub Check: Build C++ (cuda120, cuda)
  • GitHub Check: Build C++ (clang, clang)
  • GitHub Check: Build C++ (cuda, cuda)
  • GitHub Check: Build C++ (rocm, rocm)
🔇 Additional comments (10)
deepmd/pd/model/task/fitting.py (10)

7-7: LGTM!

The Callable import is correctly added and used in the new compute_input_stats method signature.


75-161: Well-implemented input statistics computation.

The method correctly handles both frame and atomic parameter statistics with proper:

  • Support for both callable and list inputs
  • Protection against division by zero
  • Tensor operations for computing means and standard deviations
  • Parameter updates using paddle.assign

337-338: LGTM!

The new attribute for controlling middle output evaluation is properly initialized.


461-463: LGTM!

The setter method provides a clean interface for controlling middle output evaluation.


523-525: LGTM! Proper tensor casting for PaddlePaddle.

The change from .to() to .astype() is correct for PaddlePaddle tensor type casting operations.


614-618: LGTM! Well-implemented middle output handling for mixed types.

The conditional middle output computation using call_until_last is correctly implemented and properly guarded by the evaluation flag.


625-640: LGTM! Comprehensive middle output handling for non-mixed types.

The implementation correctly:

  • Handles middle outputs for each atom type separately
  • Uses proper masking with paddle.tile and paddle.where
  • Accumulates results across all types

655-657: LGTM! Improved masking with paddle.full_like.

The use of paddle.full_like(atom_property, 0.0) for masking is more explicit and appropriate than using paddle.zeros_like.


662-662: LGTM! Proper boolean mask casting.

The use of .astype("bool") for explicit boolean casting is correct and clear.


665-666: LGTM! Improved return structure.

Returning a results dictionary provides better structure and supports the new middle output functionality while maintaining the original output under self.var_name.

@HydrogenSulfate HydrogenSulfate force-pushed the support_dpa2/3_infernce branch from dfa0dd0 to c2e337f Compare August 6, 2025 14:57
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
source/api_cc/src/DeepPotPD.cc (1)

213-213: Remove commented-out code or document its purpose.

These commented-out SwitchIrOptim(false) calls should either be removed entirely or properly documented if they're intended for future debugging/configuration use.

Also applies to: 222-222

🧹 Nitpick comments (1)
source/api_cc/src/DeepPotPD.cc (1)

14-165: Consider moving Logger class to a separate header file.

This Logger implementation is comprehensive but adds significant code bulk to the DeepPotPD implementation file. For better modularity and reusability across the codebase, consider extracting it to a dedicated logging utility header.

The Logger class could be moved to something like source/api_cc/include/logging.h to enable reuse across other C++ components.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c2e337f and 89743ff.

📒 Files selected for processing (1)
  • source/api_cc/src/DeepPotPD.cc (8 hunks)
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: njzjz
PR: deepmodeling/deepmd-kit#4144
File: source/api_cc/tests/test_deeppot_dpa_pt.cc:166-246
Timestamp: 2024-09-19T04:25:12.408Z
Learning: Refactoring between test classes `TestInferDeepPotDpaPt` and `TestInferDeepPotDpaPtNopbc` is addressed in PR #3905.
Learnt from: njzjz
PR: deepmodeling/deepmd-kit#4144
File: source/api_cc/tests/test_deeppot_dpa_pt.cc:166-246
Timestamp: 2024-10-08T15:32:11.479Z
Learning: Refactoring between test classes `TestInferDeepPotDpaPt` and `TestInferDeepPotDpaPtNopbc` is addressed in PR #3905.
Learnt from: njzjz
PR: deepmodeling/deepmd-kit#4226
File: deepmd/dpmodel/atomic_model/base_atomic_model.py:202-202
Timestamp: 2024-10-16T21:49:57.401Z
Learning: When reviewing PRs, avoid making refactor suggestions that are not directly related to the PR's changes. For example, in `deepmd/dpmodel/atomic_model/base_atomic_model.py`, do not suggest simplifying `for kk in ret_dict.keys()` to `for kk in ret_dict` unless it's relevant to the PR.
📚 Learning: refactoring between test classes `testinferdeeppotdpapt` and `testinferdeeppotdpaptnopbc` is address...
Learnt from: njzjz
PR: deepmodeling/deepmd-kit#4144
File: source/api_cc/tests/test_deeppot_dpa_pt.cc:166-246
Timestamp: 2024-09-19T04:25:12.408Z
Learning: Refactoring between test classes `TestInferDeepPotDpaPt` and `TestInferDeepPotDpaPtNopbc` is addressed in PR #3905.

Applied to files:

  • source/api_cc/src/DeepPotPD.cc
📚 Learning: in `deepmd/dpmodel/model/make_model.py`, the variable `nall` assigned but not used is intentional an...
Learnt from: njzjz
PR: deepmodeling/deepmd-kit#4226
File: deepmd/dpmodel/model/make_model.py:370-373
Timestamp: 2024-10-16T21:50:10.680Z
Learning: In `deepmd/dpmodel/model/make_model.py`, the variable `nall` assigned but not used is intentional and should not be flagged in future reviews.

Applied to files:

  • source/api_cc/src/DeepPotPD.cc
📚 Learning: for code blocks in `doc/model/train-fitting-dos.md` that display commands, use 'txt' as the language...
Learnt from: njzjz
PR: deepmodeling/deepmd-kit#3875
File: doc/model/train-fitting-dos.md:107-107
Timestamp: 2024-10-08T15:32:11.479Z
Learning: For code blocks in `doc/model/train-fitting-dos.md` that display commands, use 'txt' as the language specification as per user njzjz's preference.

Applied to files:

  • source/api_cc/src/DeepPotPD.cc
🧬 Code Graph Analysis (1)
source/api_cc/src/DeepPotPD.cc (1)
source/api_cc/src/common.cc (2)
  • load_op_library (412-439)
  • load_op_library (412-412)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (29)
  • GitHub Check: Test Python (6, 3.12)
  • GitHub Check: Test Python (4, 3.12)
  • GitHub Check: Test Python (6, 3.9)
  • GitHub Check: Test Python (5, 3.12)
  • GitHub Check: Test Python (5, 3.9)
  • GitHub Check: Build wheels for cp311-macosx_arm64
  • GitHub Check: Test Python (4, 3.9)
  • GitHub Check: Test Python (3, 3.9)
  • GitHub Check: Test Python (2, 3.9)
  • GitHub Check: Test Python (3, 3.12)
  • GitHub Check: Test Python (1, 3.9)
  • GitHub Check: Test Python (1, 3.12)
  • GitHub Check: Build wheels for cp310-manylinux_aarch64
  • GitHub Check: Test Python (2, 3.12)
  • GitHub Check: Build wheels for cp311-macosx_x86_64
  • GitHub Check: Build wheels for cp311-win_amd64
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Build C++ (rocm, rocm)
  • GitHub Check: Build C++ (clang, clang)
  • GitHub Check: Build C++ (cpu, cpu)
  • GitHub Check: Build C++ (cuda, cuda)
  • GitHub Check: Build C++ (cuda120, cuda)
  • GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
  • GitHub Check: Test C++ (true)
  • GitHub Check: Build C library (2.14, >=2.5.0,<2.15, libdeepmd_c_cu11.tar.gz)
  • GitHub Check: Test C++ (false)
  • GitHub Check: Analyze (c-cpp)
  • GitHub Check: Analyze (python)
🔇 Additional comments (8)
source/api_cc/src/DeepPotPD.cc (8)

197-197: LGTM: Critical for DPA2/DPA3 support.

Enabling deepmd::load_op_library() is essential for loading the PaddlePaddle custom operator library (deepmd_op_pd) that enables DPA2/DPA3 inference. This aligns with the PR's core objective.


269-295: Excellent logging improvements.

The structured logging replaces ad-hoc debug prints with proper leveled logging, providing valuable debugging information for DPA2/DPA3 model initialization while being controllable via log levels.

Also applies to: 309-328, 333-344


376-376: LGTM: Updated tensor input names for DPA2/DPA3.

The change from generic names to "extended_coord" and "extended_atype" aligns with the updated input naming conventions for DPA2/DPA3 models that support extended coordinate and atom type tensors for message passing.

Also applies to: 380-380


407-415: LGTM: Proper data type handling for sendnum.

The conditional type conversion for sendnum properly handles cases where the source data type differs from the expected int32_t, ensuring compatibility across different platforms.


421-421: Good defensive programming with pointer size assertion.

The assertion ensures that std::intptr_t is 64-bit before the pointer-to-integer conversion, preventing potential issues on platforms with different pointer sizes.


426-438: Clever workaround for PaddlePaddle limitation.

The conversion of pointer arrays to intptr_t addresses PaddlePaddle's inability to construct tensors from double pointers. The detailed comment explains the rationale well.


445-445: LGTM: Proper tensor handle lifetime management.

Storing tensor handles as member variables ensures they persist for the lifetime of inference operations, preventing premature destruction that could cause runtime errors.

Also applies to: 451-451


457-468: LGTM: Clean parameter tensor handling.

The optional parameter tensor setup properly handles both fparam and aparam cases with appropriate reshaping and data copying. Using local unique_ptr is correct for these immediate-use tensors.

@HydrogenSulfate HydrogenSulfate force-pushed the support_dpa2/3_infernce branch from 5a3f647 to 6c4309b Compare August 7, 2025 09:08
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
deepmd/pd/entrypoints/main.py (1)

376-377: Why is do_atomic_virial hardcoded to False?

The change from a dynamic boolean input to a hardcoded False value could break functionality for users who need atomic virial calculations. Is this intentional? If so, please document the reasoning. If temporary, consider adding a TODO comment.

source/api_cc/src/DeepPotPD.cc (1)

213-213: Remove or uncomment the configuration lines.

These commented-out configuration lines appear to be debug settings. Either remove them entirely or uncomment them with appropriate conditions if they're needed for debugging.

// Remove these lines entirely:
// config->SwitchIrOptim(false);
// config_fl->SwitchIrOptim(false);

Also applies to: 222-222

🧹 Nitpick comments (2)
deepmd/pd/model/descriptor/repformers.py (1)

207-208: Minor docstring improvement.

Removing the default value mention from the docstring makes the parameter documentation cleaner and more precise.

source/api_cc/src/DeepPotPD.cc (1)

20-165: Consider using existing logging framework instead of custom implementation.

While the Logger class is well-implemented with proper features (log levels, colors, timestamps, stream interface), introducing a custom logging solution in a large codebase like deepmd-kit may not be ideal. Consider using an established logging library like spdlog or integrating with the project's existing logging infrastructure.

The implementation itself is solid with proper RAII, move semantics, and thread-safety considerations for the stream interface.

If keeping the custom logger, consider these improvements:

// Add thread safety for static members
#include <mutex>
class Logger {
private:
  static std::mutex config_mutex;
  // ... existing members
public:
  static void setLevel(Level level) {
    std::lock_guard<std::mutex> lock(config_mutex);
    minLevel = level;
  }
  // Similar for other setters
};
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 89743ff and 6c4309b.

📒 Files selected for processing (5)
  • deepmd/pd/entrypoints/main.py (2 hunks)
  • deepmd/pd/model/descriptor/repflows.py (8 hunks)
  • deepmd/pd/model/descriptor/repformers.py (8 hunks)
  • deepmd/pd/utils/serialization.py (3 hunks)
  • source/api_cc/src/DeepPotPD.cc (8 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • deepmd/pd/utils/serialization.py
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: njzjz
PR: deepmodeling/deepmd-kit#4144
File: source/api_cc/tests/test_deeppot_dpa_pt.cc:166-246
Timestamp: 2024-10-08T15:32:11.479Z
Learning: Refactoring between test classes `TestInferDeepPotDpaPt` and `TestInferDeepPotDpaPtNopbc` is addressed in PR #3905.
Learnt from: njzjz
PR: deepmodeling/deepmd-kit#4144
File: source/api_cc/tests/test_deeppot_dpa_pt.cc:166-246
Timestamp: 2024-09-19T04:25:12.408Z
Learning: Refactoring between test classes `TestInferDeepPotDpaPt` and `TestInferDeepPotDpaPtNopbc` is addressed in PR #3905.
Learnt from: njzjz
PR: deepmodeling/deepmd-kit#4226
File: deepmd/dpmodel/atomic_model/base_atomic_model.py:202-202
Timestamp: 2024-10-16T21:49:57.401Z
Learning: When reviewing PRs, avoid making refactor suggestions that are not directly related to the PR's changes. For example, in `deepmd/dpmodel/atomic_model/base_atomic_model.py`, do not suggest simplifying `for kk in ret_dict.keys()` to `for kk in ret_dict` unless it's relevant to the PR.
📚 Learning: when reviewing prs, avoid making refactor suggestions that are not directly related to the pr's chan...
Learnt from: njzjz
PR: deepmodeling/deepmd-kit#4226
File: deepmd/dpmodel/atomic_model/base_atomic_model.py:202-202
Timestamp: 2024-10-16T21:49:57.401Z
Learning: When reviewing PRs, avoid making refactor suggestions that are not directly related to the PR's changes. For example, in `deepmd/dpmodel/atomic_model/base_atomic_model.py`, do not suggest simplifying `for kk in ret_dict.keys()` to `for kk in ret_dict` unless it's relevant to the PR.

Applied to files:

  • deepmd/pd/entrypoints/main.py
📚 Learning: refactoring between test classes `testinferdeeppotdpapt` and `testinferdeeppotdpaptnopbc` is address...
Learnt from: njzjz
PR: deepmodeling/deepmd-kit#4144
File: source/api_cc/tests/test_deeppot_dpa_pt.cc:166-246
Timestamp: 2024-10-08T15:32:11.479Z
Learning: Refactoring between test classes `TestInferDeepPotDpaPt` and `TestInferDeepPotDpaPtNopbc` is addressed in PR #3905.

Applied to files:

  • source/api_cc/src/DeepPotPD.cc
  • deepmd/pd/model/descriptor/repformers.py
  • deepmd/pd/model/descriptor/repflows.py
📚 Learning: in `deepmd/dpmodel/model/make_model.py`, the variable `nall` assigned but not used is intentional an...
Learnt from: njzjz
PR: deepmodeling/deepmd-kit#4226
File: deepmd/dpmodel/model/make_model.py:370-373
Timestamp: 2024-10-16T21:50:10.680Z
Learning: In `deepmd/dpmodel/model/make_model.py`, the variable `nall` assigned but not used is intentional and should not be flagged in future reviews.

Applied to files:

  • source/api_cc/src/DeepPotPD.cc
📚 Learning: for code blocks in `doc/model/train-fitting-dos.md` that display commands, use 'txt' as the language...
Learnt from: njzjz
PR: deepmodeling/deepmd-kit#3875
File: doc/model/train-fitting-dos.md:107-107
Timestamp: 2024-06-13T16:32:13.786Z
Learning: For code blocks in `doc/model/train-fitting-dos.md` that display commands, use 'txt' as the language specification as per user njzjz's preference.

Applied to files:

  • source/api_cc/src/DeepPotPD.cc
🧬 Code Graph Analysis (2)
source/api_cc/src/DeepPotPD.cc (1)
source/api_cc/src/common.cc (2)
  • load_op_library (412-439)
  • load_op_library (412-412)
deepmd/pd/model/descriptor/repformers.py (3)
deepmd/pd/utils/spin.py (1)
  • concat_switch_virtual (6-29)
deepmd/pd/model/descriptor/repflows.py (1)
  • border_op (60-74)
deepmd/pd/model/network/network.py (1)
  • Tensor (30-33)
🪛 Ruff (0.12.2)
deepmd/pd/model/descriptor/repflows.py

449-450: Use a single if statement instead of nested if statements

Combine if statements using and

(SIM102)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (29)
  • GitHub Check: Build wheels for cp310-manylinux_aarch64
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Build wheels for cp311-macosx_x86_64
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Build wheels for cp311-win_amd64
  • GitHub Check: Build wheels for cp311-macosx_arm64
  • GitHub Check: Test Python (5, 3.9)
  • GitHub Check: Test Python (6, 3.9)
  • GitHub Check: Test Python (6, 3.12)
  • GitHub Check: Test Python (5, 3.12)
  • GitHub Check: Test Python (3, 3.9)
  • GitHub Check: Test Python (4, 3.9)
  • GitHub Check: Test Python (4, 3.12)
  • GitHub Check: Test Python (2, 3.12)
  • GitHub Check: Test Python (3, 3.12)
  • GitHub Check: Test Python (2, 3.9)
  • GitHub Check: Test Python (1, 3.12)
  • GitHub Check: Test Python (1, 3.9)
  • GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
  • GitHub Check: Build C library (2.14, >=2.5.0,<2.15, libdeepmd_c_cu11.tar.gz)
  • GitHub Check: Test C++ (false)
  • GitHub Check: Test C++ (true)
  • GitHub Check: Build C++ (rocm, rocm)
  • GitHub Check: Build C++ (cuda, cuda)
  • GitHub Check: Build C++ (cuda120, cuda)
  • GitHub Check: Build C++ (clang, clang)
  • GitHub Check: Build C++ (cpu, cpu)
  • GitHub Check: Analyze (c-cpp)
  • GitHub Check: Analyze (python)
🔇 Additional comments (20)
deepmd/pd/entrypoints/main.py (2)

371-373: LGTM! Enhanced flexibility with dynamic batch sizing.

The change from fixed batch size [1, ...] to dynamic batch size [-1, ...] for coordinate, atom type, and box tensors improves the model's flexibility to handle variable batch sizes during inference.


392-412: Well-structured communication tensor specifications.

The expanded comm_dict input specification properly supports the new communication tensor interface with individual specs for send_list, send_proc, recv_proc, send_num, recv_num, and communicator. The updated input names (extended_coord, extended_atype) also align with the broader codebase conventions.

deepmd/pd/model/descriptor/repflows.py (4)

13-16: Excellent custom operator integration with proper fallback.

The conditional import and assignment of border_op provides a clean fallback mechanism when the custom Paddle operator library isn't available. The stub function with clear error message guides users to the documentation, and the spin utility import is properly added.

Also applies to: 41-43, 58-79


450-451: Improved robustness with dynamic mode checks and tensor operations.

The addition of paddle.in_dynamic_mode() guards before assertions prevents errors during static compilation, and the replacement of scalar -1 with paddle.full_like(..., -1) ensures proper tensor type compatibility.

Also applies to: 457-457, 480-480, 530-531, 576-577, 587-588


446-446: Proper type standardization for communication interface.

The change from Optional[dict[str, paddle.Tensor]] to Optional[list[paddle.Tensor]] aligns with the codebase-wide standardization of the communication tensor interface.


595-675: Well-implemented parallel communication with spin support.

The parallel mode implementation properly distinguishes between spin and non-spin cases:

  • Non-spin: Simple padding along the atom dimension
  • Spin: Split real/virtual parts, concatenate along features, pad, then use concat_switch_virtual

The integration with border_op follows the correct pattern with CPU-placed integer tensors and proper tensor reshaping.

deepmd/pd/model/descriptor/repformers.py (4)

13-16: Consistent custom operator integration.

The implementation matches the pattern in repflows.py with proper conditional import and fallback mechanism for the border_op custom operator.

Also applies to: 38-40, 55-76


411-411: Consistent type standardization.

The comm_dict parameter type change aligns with the broader codebase standardization effort.


413-414: Improved conditional logic with proper mode checks.

The conditional logic properly handles both parallel and non-parallel execution paths with appropriate dynamic mode checks to prevent assertion errors during static compilation.

Also applies to: 439-439, 447-448, 465-467, 476-478


483-560: Consistent parallel communication implementation.

The parallel mode implementation follows the same well-structured pattern as in repflows.py, properly handling both spin and non-spin cases with appropriate tensor padding, border_op communication, and concat_switch_virtual for spin mode.

source/api_cc/src/DeepPotPD.cc (10)

14-19: New header includes for logging functionality.

The additional includes support the new Logger class implementation. The includes are appropriate for the functionality being added.


269-275: Good logging integration replacing debug prints.

The replacement of std::cout statements with the structured logger is a significant improvement. The logging provides clear information about model loading and CINN compilation status.

Also applies to: 277-284, 286-294


308-327: Excellent debugging output for model introspection.

The addition of debug logs to print input/output tensor names for both predictors will be very helpful for debugging and understanding the model interface. The lambda function for formatting the output is clean and reusable.


332-343: Good addition of hyperparameter debug logging.

Adding debug output for all the loaded hyperparameters (do_message_passing, rcut, ntypes, etc.) provides valuable insight into the model configuration during initialization.


387-438: Comprehensive message passing implementation.

This is a major improvement - replacing the previous exception-throwing placeholder with full implementation of message passing tensor setup. The code properly handles:

  • Multiple MPI communication tensors (send_proc, recv_proc, send_num, recv_num, communicator, send_list)
  • Data type conversion for sendnum to int32_t
  • Workaround for Paddle's lack of double pointer support by converting to intptr_t

The pointer address conversion approach is a clever workaround for Paddle's limitations.

One potential concern: The assertion assert(sizeof(std::intptr_t) == 8); assumes 64-bit systems. Consider adding a more descriptive error message or handling 32-bit systems gracefully:

static_assert(sizeof(std::intptr_t) == 8, 
    "This implementation requires 64-bit pointers for Paddle compatibility");

439-447: Proper mapping tensor implementation.

The replacement of the previous exception with actual mapping tensor creation is good. The code correctly:

  • Creates mapping with proper remapping via fwd_map
  • Uses int64_t as expected by the model
  • Stores as member variable for lifetime management

450-453: Member variable assignment for tensor handles.

Converting firstneigh_tensor from local to member variable ensures the tensor handle's lifetime extends beyond the method scope, which is important for proper resource management in the Paddle inference context.


456-459: Consistent tensor handling patterns.

The changes to use std::unique_ptr<paddle_infer::Tensor> and remove redundant parentheses in CopyFromCpu calls improve code consistency and readability.

Also applies to: 462-467


197-197: Confirm custom operator library and loader implementation

I wasn’t able to locate a built deepmd_op_pd shared library or the definition of deepmd::load_op_library() in the repository. Please verify that:

  • Your CMake setup (e.g. CMakeLists.txt) adds and installs deepmd_op_pd as a shared library.
  • The deepmd::load_op_library() function is implemented and calls the correct library filename (e.g. libdeepmd_op_pd.so, .dll, or .dylib).

375-375: Verified consistency of tensor input names across C++ and model definitions
Renamed inputs "extended_coord" and "extended_atype" are used in source/api_cc/src/DeepPotPD.cc (lines 375 & 379) and throughout the Python model definitions (e.g. InputSpec names and tests). No remaining calls to GetInputHandle("coord") or GetInputHandle("atype") were found.
No further action required.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6c4309b and b929ae4.

📒 Files selected for processing (3)
  • deepmd/pd/entrypoints/main.py (2 hunks)
  • deepmd/pd/utils/serialization.py (3 hunks)
  • source/api_cc/src/DeepPotPD.cc (7 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • deepmd/pd/entrypoints/main.py
  • deepmd/pd/utils/serialization.py
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: njzjz
PR: deepmodeling/deepmd-kit#4144
File: source/api_cc/tests/test_deeppot_dpa_pt.cc:166-246
Timestamp: 2024-10-08T15:32:11.479Z
Learning: Refactoring between test classes `TestInferDeepPotDpaPt` and `TestInferDeepPotDpaPtNopbc` is addressed in PR #3905.
Learnt from: njzjz
PR: deepmodeling/deepmd-kit#4144
File: source/api_cc/tests/test_deeppot_dpa_pt.cc:166-246
Timestamp: 2024-09-19T04:25:12.408Z
Learning: Refactoring between test classes `TestInferDeepPotDpaPt` and `TestInferDeepPotDpaPtNopbc` is addressed in PR #3905.
Learnt from: njzjz
PR: deepmodeling/deepmd-kit#4226
File: deepmd/dpmodel/atomic_model/base_atomic_model.py:202-202
Timestamp: 2024-10-16T21:49:57.401Z
Learning: When reviewing PRs, avoid making refactor suggestions that are not directly related to the PR's changes. For example, in `deepmd/dpmodel/atomic_model/base_atomic_model.py`, do not suggest simplifying `for kk in ret_dict.keys()` to `for kk in ret_dict` unless it's relevant to the PR.
📚 Learning: refactoring between test classes `testinferdeeppotdpapt` and `testinferdeeppotdpaptnopbc` is address...
Learnt from: njzjz
PR: deepmodeling/deepmd-kit#4144
File: source/api_cc/tests/test_deeppot_dpa_pt.cc:166-246
Timestamp: 2024-10-08T15:32:11.479Z
Learning: Refactoring between test classes `TestInferDeepPotDpaPt` and `TestInferDeepPotDpaPtNopbc` is addressed in PR #3905.

Applied to files:

  • source/api_cc/src/DeepPotPD.cc
📚 Learning: in `deepmd/dpmodel/model/make_model.py`, the variable `nall` assigned but not used is intentional an...
Learnt from: njzjz
PR: deepmodeling/deepmd-kit#4226
File: deepmd/dpmodel/model/make_model.py:370-373
Timestamp: 2024-10-16T21:50:10.680Z
Learning: In `deepmd/dpmodel/model/make_model.py`, the variable `nall` assigned but not used is intentional and should not be flagged in future reviews.

Applied to files:

  • source/api_cc/src/DeepPotPD.cc
📚 Learning: for code blocks in `doc/model/train-fitting-dos.md` that display commands, use 'txt' as the language...
Learnt from: njzjz
PR: deepmodeling/deepmd-kit#3875
File: doc/model/train-fitting-dos.md:107-107
Timestamp: 2024-06-13T16:32:13.786Z
Learning: For code blocks in `doc/model/train-fitting-dos.md` that display commands, use 'txt' as the language specification as per user njzjz's preference.

Applied to files:

  • source/api_cc/src/DeepPotPD.cc
🧬 Code Graph Analysis (1)
source/api_cc/src/DeepPotPD.cc (2)
source/api_cc/src/common.cc (2)
  • load_op_library (412-439)
  • load_op_library (412-412)
source/api_cc/tests/test_deeppot_pd.cc (5)
  • firstneigh (225-225)
  • firstneigh (287-287)
  • firstneigh (370-370)
  • firstneigh (448-448)
  • firstneigh (510-510)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (30)
  • GitHub Check: Test Python (5, 3.9)
  • GitHub Check: Test Python (6, 3.9)
  • GitHub Check: Test Python (6, 3.12)
  • GitHub Check: Test Python (3, 3.9)
  • GitHub Check: Test Python (5, 3.12)
  • GitHub Check: Test Python (4, 3.9)
  • GitHub Check: Test Python (2, 3.9)
  • GitHub Check: Test Python (3, 3.12)
  • GitHub Check: Test Python (4, 3.12)
  • GitHub Check: Test Python (1, 3.12)
  • GitHub Check: Test Python (2, 3.12)
  • GitHub Check: Test Python (1, 3.9)
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (c-cpp)
  • GitHub Check: Analyze (python)
  • GitHub Check: Build C++ (cuda120, cuda)
  • GitHub Check: Build C++ (rocm, rocm)
  • GitHub Check: Build C++ (cuda, cuda)
  • GitHub Check: Build C++ (cpu, cpu)
  • GitHub Check: Build C++ (clang, clang)
  • GitHub Check: Build wheels for cp311-win_amd64
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Build C library (2.14, >=2.5.0,<2.15, libdeepmd_c_cu11.tar.gz)
  • GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
  • GitHub Check: Build wheels for cp311-macosx_arm64
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Build wheels for cp310-manylinux_aarch64
  • GitHub Check: Build wheels for cp311-macosx_x86_64
  • GitHub Check: Test C++ (true)
  • GitHub Check: Test C++ (false)
🔇 Additional comments (6)
source/api_cc/src/DeepPotPD.cc (6)

197-197: LGTM: Operator library loading is now properly enabled.

This uncomments the call to deepmd::load_op_library() which is essential for loading the Paddle custom operator library (deepmd_op_pd) as mentioned in the PR objectives.


269-344: LGTM: Improved logging with proper formatting.

The replacement of std::cout statements with structured logging calls using the new Logger class significantly improves code quality. The addition of debug output for hyperparameters will be valuable for troubleshooting. This addresses the previous review comments about debug print statements by implementing proper logging infrastructure.


439-447: LGTM: Mapping tensor implementation is correct.

The mapping tensor setup properly handles the remapping through fwd_map and the conditional logic correctly handles cases where mapping may or may not be present.


450-453: LGTM: Proper tensor handle lifetime management.

Converting firstneigh_tensor to a member variable (this->firstneigh_tensor) ensures the tensor handle remains valid throughout the computation, which is essential for proper memory management.


456-466: LGTM: Clean variable declarations and simplified method calls.

The removal of redundant parentheses in CopyFromCpu calls improves code clarity. The variable declaration changes are appropriate.


387-438: Ensure message‐passing path is tested and pointer size check is portable

I wasn’t able to find any tests exercising the do_message_passing == 1 && nghost > 0 branch in DeepPotPD.cc (lines 387–438). Please verify or add test coverage for this scenario to avoid regressions.

Also, to improve portability on 32-bit systems, replace the runtime assert with a compile‐time check:

- assert(sizeof(std::intptr_t) == 8);
+ static_assert(sizeof(std::intptr_t) >= sizeof(void*),
+               "intptr_t must be large enough to hold pointer values");

@codecov
Copy link

codecov bot commented Aug 7, 2025

Codecov Report

❌ Patch coverage is 37.19165% with 331 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.35%. Comparing base (102a2ed) to head (6417f8c).
⚠️ Report is 76 commits behind head on devel.

Files with missing lines Patch % Lines
source/op/pd/comm.cc 0.00% 165 Missing ⚠️
source/api_cc/src/DeepPotPD.cc 53.28% 59 Missing and 5 partials ⚠️
deepmd/pd/model/descriptor/repflows.py 32.60% 31 Missing ⚠️
deepmd/pd/model/descriptor/repformers.py 33.33% 30 Missing ⚠️
deepmd/pd/model/task/fitting.py 50.87% 28 Missing ⚠️
deepmd/pd/model/atomic_model/dp_atomic_model.py 46.66% 8 Missing ⚠️
deepmd/pd/cxx_op.py 77.77% 2 Missing ⚠️
deepmd/pd/model/atomic_model/base_atomic_model.py 90.90% 1 Missing ⚠️
deepmd/pd/utils/stat.py 80.00% 1 Missing ⚠️
source/api_cc/src/common.cc 50.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##            devel    #4870      +/-   ##
==========================================
- Coverage   84.71%   84.35%   -0.36%     
==========================================
  Files         699      701       +2     
  Lines       68128    68567     +439     
  Branches     3541     3573      +32     
==========================================
+ Hits        57716    57843     +127     
- Misses       9277     9583     +306     
- Partials     1135     1141       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@HydrogenSulfate
Copy link
Collaborator Author

@njzjz this PR is ready for review.

Copy link
Member

@njzjz njzjz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current state of this PR looks good to me.

@njzjz
Copy link
Member

njzjz commented Aug 8, 2025

@coderabbitai resolve

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 8, 2025

✅ Actions performed

Comments resolved.

@njzjz njzjz added this pull request to the merge queue Aug 8, 2025
Merged via the queue into deepmodeling:devel with commit 91ebe34 Aug 8, 2025
60 checks passed
@HydrogenSulfate HydrogenSulfate deleted the support_dpa2/3_infernce branch August 13, 2025 04:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants