Skip to content

Conversation

@caic99
Copy link
Member

@caic99 caic99 commented May 16, 2025

node_ebd_ext contains embedding on expanded atoms, which might be large for a large cut-off. Current implementation do matmul first, then gather tensor by the neighbors. This introduces saving sub_node_ext_update of size nf * nall * ndim in each repflow layer, where nall might be multiple times larger than nloc.
This PR do gathering first, then compute matmul. After this PR, the peak memory size is of O(nlayer * nf * nloc), unrelated to nall.

Summary by CodeRabbit

  • New Features

    • Introduced an option to enable or disable local mapping in descriptors, enhancing flexibility in descriptor computations.
  • Bug Fixes

    • Improved consistency in neighbor list and embedding processing related to local mapping usage.
  • Tests

    • Added extensive tests verifying descriptor output consistency with and without local mapping enabled.
  • Chores

    • Updated configuration, serialization, and argument handling to support the new local mapping option.

@caic99 caic99 requested review from Copilot and iProzd May 16, 2025 05:19
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR optimizes memory usage in the repflow layer by reordering the operations to gather the node embeddings before performing the matrix multiplication.

  • Reorders the gathering of node embedding data using _make_nei_g1 before the matmul operation.
  • Reduces peak memory usage by avoiding creation of a large intermediate tensor.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented May 16, 2025

📝 Walkthrough
## Walkthrough

A new boolean flag `use_loc_mapping` was introduced across the descriptor classes, argument parsing, and serialization logic in both PyTorch and dpmodel implementations. This flag controls whether local mapping and remapping of neighbor lists and embeddings are applied. Method signatures, serialization, and tests were updated accordingly, and comprehensive tests were added to verify output consistency with and without local mapping. Additionally, the `optim_edge_update` and `forward` methods in `RepFlowLayer` were refactored to move neighbor gathering outside `optim_edge_update` and add fallback handling for missing extended embeddings.

## Changes

| File(s)                                                                                   | Change Summary                                                                                                                                                                                                                                                                                                                                                       |
|------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `deepmd/pt/model/descriptor/repflow_layer.py`                                            | Refactored `optim_edge_update` and `forward` method signatures and logic to move neighbor gathering (`_make_nei_g1`) from `optim_edge_update` to `forward`, added fallback for missing `node_ebd_ext`, and updated internal logic and calls accordingly.                                                                     |
| `deepmd/pt/model/descriptor/dpa3.py`,<br>`deepmd/dpmodel/descriptor/dpa3.py`             | Added `use_loc_mapping` parameter to constructors, stored as attribute, passed to repflows block, included in serialization, and updated forward logic to conditionally select atom types tensor based on this flag and parallel mode.                                                                                      |
| `deepmd/pt/model/descriptor/repflows.py`,<br>`deepmd/dpmodel/descriptor/repflows.py`     | Added `use_loc_mapping` parameter to constructors, stored as attribute, included in serialization, and updated forward logic to control local mapping/remapping and neighbor list handling based on the flag.                                                                         |
| `deepmd/utils/argcheck.py`                                                               | Added `"use_loc_mapping"` as an optional boolean argument with default `True` in `descrpt_dpa3_args()`.                                                                                                                                                                                                                      |
| `source/tests/pt/model/test_loc_mapping.py`                                              | Added new test module with tests for output consistency of `DescrptDPA3` with and without local mapping, covering multiple configurations and precisions.                                                                                                                             |
| `source/tests/universal/dpmodel/descriptor/test_descriptor.py`                           | Updated `DescriptorParamDPA3` to accept and propagate `use_loc_mapping`, and parameterized tests to cover both values of this flag.                                                                                                                                                   |

## Sequence Diagram(s)

```mermaid
sequenceDiagram
    participant User
    participant DescrptDPA3
    participant DescrptBlockRepflows
    participant RepFlowLayer

    User->>DescrptDPA3: forward(coord, atype, nlist, mapping, ...)
    DescrptDPA3->>DescrptBlockRepflows: forward(nlist, extended_coord, extended_atype, extended_atype_embd, mapping, comm_dict)
    alt use_loc_mapping == False
        DescrptBlockRepflows->>DescrptBlockRepflows: Remap nlist and embeddings using mapping
    else use_loc_mapping == True
        DescrptBlockRepflows->>DescrptBlockRepflows: Use original nlist and embeddings
    end
    loop for each RepFlowLayer
        DescrptBlockRepflows->>RepFlowLayer: forward(node_ebd, node_ebd_ext, ...)
    end
    DescrptBlockRepflows-->>DescrptDPA3: node_ebd
    DescrptDPA3-->>User: descriptor output

Possibly related PRs

  • feat(pt/dp/jax/xp): add DPA3 descriptor #4609: Introduces the DPA3 descriptor and related classes including RepFlowLayer and DescrptBlockRepflows. Related by class and method but does not modify the optim_edge_update method or its signature, which are changed in this PR.

Suggested reviewers

  • iProzd

</details>

<!-- walkthrough_end -->

<!-- announcements_start -->

> [!NOTE]
> <details>
> <summary>⚡️ AI Code Reviews for VS Code, Cursor, Windsurf</summary>
> 
> CodeRabbit now has a plugin for VS Code, Cursor and Windsurf. This brings AI code reviews directly in the code editor. Each commit is reviewed immediately, finding bugs before the PR is raised. Seamless context handoff to your AI code agent ensures that you can easily incorporate review feedback.
> Learn more [here](http://coderabbit.ai/ide).
> 
> </details>

---

> [!NOTE]
> <details>
> <summary>⚡️ Faster reviews with caching</summary>
> 
> CodeRabbit now supports caching for code and dependencies, helping speed up reviews. This means quicker feedback, reduced wait times, and a smoother review experience overall. Cached data is encrypted and stored securely. This feature will be automatically enabled for all accounts on May 30th. To opt out, configure `Review - Disable Cache` at either the organization or repository level. If you prefer to disable all data retention across your organization, simply turn off the `Data Retention` setting under your Organization Settings.
> Enjoy the performance boost—your workflow just got faster.
> 
> </details>

<!-- announcements_end -->
<!-- internal state start -->


<!-- DwQgtGAEAqAWCWBnSTIEMB26CuAXA9mAOYCmGJATmriQCaQDG+Ats2bgFyQAOFk+AIwBWJBrngA3EsgEBPRvlqU0AgfFwA6NPEgQAfACgjoCEYDEZyAAUASpETZWaCrKNxU3bABsvkCiQBHbGlcABpIcVwvOkgAIm5KADMuImpYSkgMRRJIEmYBOlp4DCJIAsT8f0hmamZvUNjIAHc0ZAcBZnUaejlIbEQMhm0GAE4RiPwI9MglEm5mbK9iogB6WfnaMABrdT85/ER1SvkACltIMwAWAHZLy4BKDRhphlhMUnRaWn9EAeRcaZvDC0JYlfiJKY5AAGWSUAH0SAJaAiAB64KERMiISrhJoIV4KDC4bQYZB5ApfZbICp8Ego7iYJT0agsZCMxiYMqiFg5LzOD54sh9Q5gtCQPkUD4MPBgfCJCFUIr9DRGACSWABOWlFH8RJQzG40TYROo8HwGHCmuq1Ao8BR1W84kN8CG4nNkBONVwdS8934CSobqwHiSlTY9ES8AoiFw/A10wlHxh2QRSNR6MxpJxkAqPnwTRivVSmttYKtP0dy0z2L4AlaMXd5HgRFgAkqiCe7jZ3F4+DQBP8QSj0nsaAkVc5xRoFHD8GoORoWb4UPacNhJHTcOw3Fo84xcvs8AAXtCMBCAFSZNA+SCX4HwZgY4q5fuwPa8aTsU3uxJefMe/xuF/fM/T5WRKFxdIqhha8vAxE4rQwRwCj4A86QZYEYhZZhED9IYsAKQ8iAweBI3wqJ5ETDIAU5GE/wYeDEOQjID3o690AIHDHjcaZe24A4Yled4cn8KRoxHAFUEqJRa3kAMaU6MtpmLKD/WUIMcyjGNwn8WhsAYKsrUXGtDxPLkaQXaYvR9NTAzNDAngAQUSacplQBYijI78LUhHgSDQLZqjyY5hTQD5UF0/SYgIdAewofAUQfecvHkKEAHkTgwMCMjvC9Mno+4oUgl031QYolASTC9QPGCfChTsEGQfBuHEToj28/VeykZA2AWFxcnlF14DIBh5CadRYHwPBGCBIhDOmJC2FtIZfArLxcCaiEV2wAQ1xTOlcC3Hc9xVAwLEgABhFhjQ2+xHBqFweJyc0UoUVh2DjNzkE8G9B2CGNmlaa0lDKeR+gyeArHio9aEgyYHCIUgYxifxEkoUtShi9kainEkqyYUkkEXUbmgm9AeFEs1+jezpY2fK11mYTYdljQCDiOfqTiYVhdgAJgADn5gAGABmWgRl5gA2UREgAVjQEZZYARmuKXLjQNB+bQWhJclgXEmlkZLmlgRJYER5njKg0lgkt5YytNA4r7AlKk6o0vw0pRiXgLxkHpu2ad2ToW1jQj/GiCRMFjV2CnIMi518GKCcOGSc1dyTkFsJ4ADlJkSbAKBLGYkGlX57P4Pg0boOsGEClpvvi8cmRQeMtWu9gOyMfRjHAKAyHoA80DwQhSHIQNBPbokuF4fhhFEcRutBhQZJUNRNG0XQwEMEwoC7FA2SwIeCGIMh1In96p78NAmjupx+t6JgV9UdQtB0bue9MAxGdoFYWpWDySBeDWNIBgtoWqVBWIBYCTQ4TZQoBobgsgOAGFiKgs6lhHKqhPmPec9AHB33kAeISJRpBPUgFCZqrUES0FIEdXcNAMRsABIoIiGBqAFxyPXGawl6AxUAnyBgllTz7TTAdDEDIqBMIyONAE5Cmx7XhIiWgRU9gLHHEpU8SwYziOcGgKRtIiTDhSk8dUKdLLuRIMw2GvkPzjimsgP8c0GBTGoG9Tw3RyGrnXJubc9CSAYl6D6J0KUqzJkUaItEGIZFvjCRuMR6BgSQkPj2EJYIoRwhqFsDc8iiBKyiWTOiRMolA34f2Qs8gxRFH8GIB0614DOldOXGq8jvFKPybI2J6Z6qWzJEsRS85/jTCnJQWc85MgkGbK2V2KlKBVmRtwFuZjyGUIfNQ2hviTpqlbuQmkLQKDKKClYy00xDjEXYVULhGyPFYwYIIlqZR8AdNaUieCZ5ID2iyvgZxHy4RFGYH6dknIVnmnYp0pRXSPRvI+bBd5mRfkPgtnAHItiqbIEfpZVxukSCRnIPQMFLz0BsnsEsQR4I5EiORPErh/g1F0BMaSGg2twhigJkUIM7EuHa2bknFg7jTwTIUXEglvRHaGlkKEjJAVskCtyRiCZRd3T4spZElAEJdioCyLGXO5Bwiu0Vc8g5jyoLjQGKdC6sF/iTAoS1VZdB1nHQYaTSSWAoS7OcAcgsVQrnRUmAyX4ciBUGqfAy/yA8toGoheyLI4pzSkD4H65AhTtGnSRQMbhJC2gIBcr5JsLY2x8BmRjWyHUTjpMydK+AcJZV+imtHLaKzmBrI3N6jE7IpxWrdfslRbaiTxT0qSsUiRYI10CjFcG5LwkevSC6iN8SNWPIpvgJudLngiRCGSq0BNcDxV8NAhJ9BHEulUSSPoGBiGkFoKdc6jl1rqXspa3ySgGASm8ptXI9JKgeNdp4AQJLciGPEKQgwUAACyljJp4ObGw3AHC+gOroFwKESgIQNqbXQ+cJwBheESOEA1XACAUFeBoaAWJsyzrRPhyoRGSNLnCHaoVtBKOEdgMR0jFBcNaM4BMZjrHaM5n8lxmMfAAC8cR1yxD9Fvbj1G2MYkAEmE5DkP+iofR9DNBMOAJw5kClTGZN8ZaTp6TLGaM1jozQhjunjNsfCGjagXAhOQFE7EcTkm9BGd4zWYNjA+T+qhDYOYAAxP8TQAAyaBwIUC80hkgcwma/1wP/RYwDECgPqQRyBcxoGwPC5QBBsgoTAcgGBqxrDzk5G9YxxT2K04UD2bQDT2HcMUvTJZjz2ZVNKNayZ7MsBeZdeswkog0RUyVYI3p0z+Uib9f05xyViAtjTYm4gJoi3sxoDXJx1b7H0AbaJnNhb7nuvbfW8trbhVIAKeixCTt9WsNabw4dgb5GuNpRtSCrwABtMbVmlwAF0zO0M649vjvWtvMpKMNoH322vbaylN4HE24cxn22D+wK2Edrd2zGVH62keHRqPNnHcJTsY4oOd58z7WiJv89wIL+YwsRai9/eLiWlBAKUClsB6WoHBeyxFvLBXUGxC7p/Znf8AHs5AVziBtAGQizy8goX6DICYOwWfPB91nCEIhOeoDjlxk3zbPgaInIJF6MsRkKE4NYFfMlT2ZYxS2RfB9b5KEAARKXLU3dWEciLDElP/XJ23fpAj4R1DIFoF8oTyxwf0HDrglu6gE7Hg6uyAYtprwp40kwiDHYek5j5KUVAMZKhYQPos4kZ6cjUG3fAAQeBq+JITS7q07vPe4AAEL0S2DT6BiB/fmiEyHyo9LXc3cYeBxQzKY0rUgJHDPv7oRm58IAjJ2Qnx+17YoKKvDJjlRdGM0i5DuaNqKGIDfi7l2XtXTGpxNX7CAPnvNavnEIiyASP8Nj9+jLv5yOSQoVYXCuulWR+UIy+0QXga+SgF+weJAxyOQ+cN4UIB0/cdAcI1Av+GIxkrsqA4MtAAA3PwCWCanAXGK9FaJGNGLGHRF8q2pxMgKWigZhMiBgQkB9hwOEBwJ8gwL9lCACl6gMNfnvFhvPNFF/qgJqFgM3rvr5LgL/rkPkAAWCHAg1FqDCjFOWJlsFjIN3gAOTUiVB1aHIQaAxoqzQxCJDxTMCeKaYaA875iIAnAaDOHnYxToQkrqCvQri2H2FNAdg3ZOEuHdJIoUwkB2L9CvTcx8o9DTTYBISCFz7OBziL5yKwSO40zsANjTTsg0pLorohE/B1q+Q3bGFT4tzPrYBFBggEyuhkAdQHhyEJAKEUhVGlC1ruKqp7BBAhB0A6R5AkjIBxHAFXoYK3p2SD4TCPqiAvpBhvroSfoNjxo7R/rsBJ5AZQCOTO49D4DG7+RSG6L6LkLW70R27cAO4ejIZDzrTkLQAUDBB8GTGbqD7B5iB6o64+aJoe6c5e4+5+4txVaxY/zi5JYc6pbgIUBrBy4C4qgbFbEV6YADq4C17140BHEDA24MCnHLBcBG5wSTEB6fHt7e6+71SFYgaKCkTDT0BB53GvF8ArS+A1RfGgItRd5fI95aEOHtIIAurMkUA/G+4aBwhwjFDqDClYF75npeCVHQjHG241D24lA6KSIW7wJkkUmRgxDeEZ5LAngT4lZMlEm/ESnlHSkgxQixBymYkKlnElCxBeZWjp7J4nj0Bn7souAwlFYalUk7KGHur6kmGGnfG4DEl/ExRTh9pRQz7sTz7JHRDkLgGr4AKtqJKspJ7vavQiE1LYRv4f7Viuw0i5l/6KGUhgjFAdF1gJGKqJmQHJmekXTmHUkwqWEsA2HYZ2Gcl+GBEaCuFWp3YdlATaEaABHOE9lRZ8kCny7j4qhK7Aai4xYbAs4S7JZgnc6dkdiIKK5oLXpYKjzq63wPTa7ppIxGD67kCG47Em77EqmuRW7oknE2nnGcpwkaHTBt7Blsm1y97aH+4fH7r2AEZl4JLwlV4cTIkN6j6t7j6lHWJWgor2KErp6zFdHYBRihKPlKn/nIFoioEsGNFxL5BTpChQgn7wrn77yQDaqcKUBrq0pCEnIeFgg1RMFMjoH4UIiEUmlQjsGcHcGcG8GmG5B9LFC4KEHFDIxMrkKsEEUEqoBVLzyvRU5QYWFWHkIsVoHSUcUEptq3SIBvBNFU6UDXKTCETcVngqmIAcZfLhD9kYDwrMC8EpqAiMightFbQYVEApl4rPbpEeSUku4VT9yfSIQkA3x3kbgPmirnG/hhRPAADq06x+10ZFGYqAVFFcaJEV8pUVmFqAtxwQ4QUIHlF+Pw+lWEiS6EjIvRcinGJV/RPY5SkAhaVYLZ1hRVOVnlseE6DGEKxelikxVF8ViVJFyVbpF+6V7I4VGJWJuVyAAW14AwhVPlFFhaKlrZSqGI/Q6FHVwRLw10cRB+nsLo7ppwI1rAKVDxgVVUsYh6ziNKAxp6wxZCDJkx/kBIcC+hvptW/pMFglFWr19IHhFEPAVO3VI2ragyCBWksY/Ijg7ATwaUYksE8Bx5I4EZ2+A6BuBeYUmV01xVLisYW6O6yAgoRcbEvgHl/591ipbl4yky+a4oRMB89A/+pZRAyA44Yo7VNNraVQoqNs9AgoLqpFY1FFVFzKYxz+hI26xuBe/47I9CYo26mAiACkr66Aty0kywxiRg5gox046tr5OQT6Mx96ZK8xhcixPAyxR6qxgGiAXcl0zxtJBGpWMGXq8GlWV2kAwpoph0cI3ZNlMWlWr27Kn2AAqiROaB9lOOEJxjHUSL9knY5pReaCQOdpdkpr7SRP7YHQ/ghpAKHfZNeB9pHfZAnWEIzTGBXUnb9ineLcKFldaR1TiZeSnflenV5gSeQhOZ3t3t+Vyf8dFoCUuSCVLmlhAr4RuflmSZPpBmcu7eVp7YhkpgEXdhxvDtDkdnRjhcwXCEwNJKjupXhb/kfbvaxZpeSCHW9mwp9lvWxnXaJg3R5VwEXe9l9lRj9jWI/anTqjTCla/TfSXW6R9kJpaJ/TDsnU/WnRnVVtdn6fsg1vdptqTjvYuKxQffsmfegxpfhdg7hWxb/lpaNhA9vdaDTYA2HR/Txkdj/Q3SLcdZQ8XZ9iA2A6TlA7/Z3f8d3e+SyX3eyQPX4UzguXFsCWziudLhCVPQLigmgnOWAEYMzrLsuaCZIxloOQ4QrrI8LjuWruPBrgQmSrrg7QYGeaFQ8rsabgcaqbjZFTzU6m+JUtilcbGJHNKc9FtB3ekVyi3ntQyi7W8a7r3Z+RyRo0I95lTg1B4NY65MXoBcyOXuJZXoieBaid4e2VaTNZ5aoTYTqceP4r9Q0ScubhE/6pcp7ZMcUBUSDK3pk/jTXraCiTkP7DkE6Zni6fwHgB0W6cXR6anUQapLdeEETbLdArqrSDqK7ECCCIAUDE9crjegbchUbTMNMboshWhB+pbQPEsb+rbQBsNCY1AFdP48PnwKctBrBhVohsE/3euUKSKTneKUPUo/MGPZzhPVI/c4ghiJAAYJACrnCWbvoohnUy3RY74KJl4wC/84C83EkwiS/qkyQIhv2WCzTSnei7aZ5doyLgo1/CIz/HgD7IgCsPyK8KIFsFo0rro3ufoweVrkY+YSY45IfJSGHcpswxC3segJKHDXqBaVi8sPaQ4+TJcY6Ike42SlCNC8+dypMFaJxmSrDTdMgP4O7bikvBQUMRpNFsGb8nLugZKI4YVFE2yHyzdBRaKrhegH+CUEsnSETFWKo587y0QPy7dPHsZUWc0UoaUGoEDOyI/mIPFCRM4gTJGO6+MVgKza0TnJMEaoMMy5MUM7kJM3wNM65eM9LTunLTfJ6jkDUEoKdHrSrmMYbQq9MCbes2bZs/xNsxlT+isQc+sSrhax9D45Vn8wC1AFCI5O20SCcJafedlTTbEOELibqkA14MJh3eEOK+tLO3cVw92/8fnGerq6oy1Aa2gCLEa+zScOTi6szsS77GS5KBS7XDI7ORAPi68yo+PeCZCbu9S9uRgruafPS/goeUy8JCy1jbiTy8CzY1NXY9ixfp248X40PnSRum+TTnTk0P2+zb+aDWnvE4SsBQi6BQ03XhBfnsB7E5vtUzEC08XGIL0/IOqwXJq70K3m07qQUzniwjVAh8Fsh/3k8AAMoPg+zOApQo28P8khnGmlNorO1nMUX/U3J3LUFCtKnFQEhxOl4JNYcMqItgWNMN7/nSEzD5gaiVvQi3MCPrkD6nOwd6RFqbr+D1EQit5T10HIs5Ogejvgead4eomoALWSkkf0Bkc9Mgr9TUcUC0fyD0ezLtNMdz3Su92hndK5wDMZCpsjO7rBbZvoxTMuX4zJvODNNb79p0qluLN3oTErPVvRtzFbNfq7PNuRCHOO0nMweu0XNlZwZ+Je1sf5gccPN+3PPPjD2LnKPvOrky5Qk/OVNSkym2OufYkQvt3LsFawnwvqc4dIlac0Cou2HyeeVD2ddIfGs9dPNwjCMj1DfiMutPuy4vs/Oz0lYteL1te4KIZ7fdcMf5NIOubkepXHuEuQkPsfOXdjf5YTckfkLDtN1ZOxD2ZbcjvN32NkfBeasBdsKPTHMSewf3dXPL091GmCnZ1inHcvO/dneAISOfPPvy7jfhmTfmnbet2y1QsLeFabHLcxgae4dNObcZOw9ZPjm4/y74/+0neDdvPnePvpZXeU8z2gbReY8e3tc3Pt4hOCMdiC/PMvXRL9exe/GHcE8mlVNmmyk88eXCbbeLdFay9QatfXM4/BmhkaBvcngfe6Buai39f3vDdqOS8C4g+G9g/bdQ8P7c8Q/40I+WI0cxDI9a64vyNGDYgFyCIrA0AxiktiOk/J+HRgfLCvs6Pvt6MJ7fuMtELMunlY0Z/VDb7xlQgZ940dU+9lT5c75Qc5BWCyDQCf1gCVmR+Esg21xhSkEVClxVjuhiSkTioaKEiHDIxnpHmt7a8kmrMA+u0PiGh5AezlzRL/nRK1q+Tk3kPYv8atd1GL70ULjro8MkYxjz8iwhZfIgY7UtxQTh45iqVW450Z+sYxjmoDBeXkylyv7l8hgaaGViEG/4kAeOEOEgAFhVJxUJo2cWqiYljDIZigEkJoAq3XTMdaAiARDDX2ThEwRowPSargMHz4CZ+e0LDKSQMAd4jUmIFPk1U/ZjIqAwIVsmwE5DodGQ7qVZpzQ0jYFow3+KtuL0qCfVfCbrD1szRQBLNnoYkN6GoGgy1sIQhHSgMRzNJVhvUg2eMjFXZrhA1BMYWQNEEsp7AU42AdiH7WdIdRMBBgqIj8EODuhx4BgoJPADAD0YgovwfvgoDiIbRwgH4AyDYI1C/4DB46NCBGzhA0AkQ2NLQf+QbRZ5y4mgvPAFldhvU3wig85pYktBoDbefDOLiBUETmstQNnboIABQCNOqKz36U0igiAFQNEGZCJJihx/KoTkwu7L81WcRT6HpAaQJ5MGVRecMygAG/5cMEyPNK7E4xdVKavAh9IcB9BjI6wuAJThgHcSdwDAIRdongGpCqUjI6QxoTgRW45DeWbcA0Ll3oBbUwQ2cRwK30+pQhDKhcdAj4GfQCRuSoRHwfZDABXUlAeoAgNECYE7C3CpIWDItFmSz5AgqFNxiNBIDOdiBhMafqNHIGAIsC66LlOmVvqZlLEEeWQGwk6DOIsyGkZIZvl8giCVWHcJ4NQNkQZ9BiaaSMCiBiADA6ABhPgIBEjIGQ1ASwOQlTRaHugniZ4ZsBwldJhEXQnCMmKKnii8A5wqJXgTMGoBK1/BTldyJX2RSNx4AHONxP4GnSHApACgJGh8CKbG1BBFAT6qUI6qH8HuTAHUPPHID+pegbjeUaaCYpdNpoeAyERUlAQHAeolYVfoSEjYFx1a7IbwUgDNpxFU4KXKoVfGYFtRSOcwlYSWwWbltlmhnVZpTgq7m0quVtJtvszq6tsLof5Ttohkv4hl+et/BgPfxponADqSJEIJ/1wBgDwg2YsARAKGxQCYBcAzjEe08RTRCMJAJPiEFT4JZlyNfLPiUBkYy8SsmYuBg/lwDh1uAzvKTFRS7p/kQBV/XMXfx2q3cTCQ472uCKn7ExZAE4tzFOO4Yzjsx1/PMQWPA5LiWEK4pTGuNIFQisgWGLcZw2nGg1ZxOYu3r8UPGLib2n8ePq2PbEp8VgB1MSBUPZyi9SemwiEjXxAk58FmH7HBB4kL79Ri+f7MhOuwo6Ko+SnzKwAcSyFcI/KmpGQlynJjnluWVjG8pbjN6isnGQ6CVm42CDSsvGZrEGsRL4AN9fO/xTQhq0j7HVKOp6VOFaCySpRwetfMdt0hZ4IjYIsgFGskOiHuggyI3CgOhMkSYSgY0nHzn70FbG8OqIrQNvRPNyuRN+/EMuCkSokjgoQJwOdpAHmq+xO6qhNNMY3/SVCFw+ARGK5VdykTgUwYDUB2IJqN0voi/GSVpMOJISgwIxMtkszNplc1m8YutgsR2bW09mziO2vV0KwBYdW5cOXkvQV7DjUJ4JOSXolDLdkmxK4FsYnxJG/iSI/468H9097k8wJ2on3gpkg7JDpucPbFku3uIx934O8f9GGhwDHw6WHQyeFxioA3xYJ8gB+NkCoDPx14b8LeAYA6kn4xS8oxAHCEpihU0CbPQuJvEMAzTe4nwXmCLFliyx+YAsS4LQEuCHTEgSsWWCMCVj8wRgJASWNcCRC0AGAtwQoCLBIAixbgtAAQBtK2kQBIA1wWWJLBGCJBrg/MXWLLESDqxLgIwa4CMC1gfSRg4sEWAwGFj8wGAvMEgCMAYAGxVYtACEN3Fml1gQZiQWgNcDFgMBLgQsa6bzEuCIgGAdYJWDcESBCxLgvMWWJcHpkiwjYVMkYN9IJnbTeYIwXWNcAYAQzpYssbFCMD2n7TgZvMa4ErDGAiwlYXM1WGgElgqBeYtAXmGrJ+kdS8Z/MAGdjNZlCxsUSsRICLD2kGzMZVM5WKbGVlaykQMMmGbzAECXBdZ20h6SLLxkqAlYksBgErDQB0BbkmskgPzESBjABAwsEgLzGVkjA0AosMGYkEjnuy/pMsjmYDIEBixEgRswOQwDFjXTLgAgXmbQBViywnpiIXmDHN3ZoBZYKcqAP2GFnXASA+082QIGxmkyhYkc7WSQCFi7SpZGsSWCzKTlCxgZ8sIWHXPQAixoZSgJWLQCFhozZYz04WLzGxlCxJYJMlQAdKFi2zPZ/ML6ebJFgTzkZgibObcgKD0zZYAgEeRbIDn3S2ZksAOf7KbmVyr588lmYfP5l/TSZysEgHcC+D0z/Zs8rWMrNBlSzGZssB6QwC+m8ySAn0oWHjInlXSH5SsaOfzBFhCxRYaAAGZjL3l7TgZSsBgOvIEAuy2ZyGWWLtJQU/TfpUAOaYdAWlLSeRK0ylIkn0BAA= -->

<!-- internal state end -->
<!-- finishing_touch_checkbox_start -->

<details open="true">
<summary>✨ Finishing Touches</summary>

- [ ] <!-- {"checkboxId": "7962f53c-55bc-4827-bfbf-6a18da830691"} --> 📝 Generate Docstrings

</details>

<!-- finishing_touch_checkbox_end -->
<!-- tips_start -->

---

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

<details>
<summary>❤️ Share</summary>

- [X](https://twitter.com/intent/tweet?text=I%20just%20used%20%40coderabbitai%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20the%20proprietary%20code.%20Check%20it%20out%3A&url=https%3A//coderabbit.ai)
- [Mastodon](https://mastodon.social/share?text=I%20just%20used%20%40coderabbitai%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20the%20proprietary%20code.%20Check%20it%20out%3A%20https%3A%2F%2Fcoderabbit.ai)
- [Reddit](https://www.reddit.com/submit?title=Great%20tool%20for%20code%20review%20-%20CodeRabbit&text=I%20just%20used%20CodeRabbit%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20proprietary%20code.%20Check%20it%20out%3A%20https%3A//coderabbit.ai)
- [LinkedIn](https://www.linkedin.com/sharing/share-offsite/?url=https%3A%2F%2Fcoderabbit.ai&mini=true&title=Great%20tool%20for%20code%20review%20-%20CodeRabbit&summary=I%20just%20used%20CodeRabbit%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20proprietary%20code)

</details>

<details>
<summary>🪧 Tips</summary>

### Chat

There are 3 ways to chat with [CodeRabbit](https://coderabbit.ai?utm_source=oss&utm_medium=github&utm_campaign=deepmodeling/deepmd-kit&utm_content=4744):

- Review comments: Directly reply to a review comment made by CodeRabbit. Example:
  - `I pushed a fix in commit <commit_id>, please review it.`
  - `Explain this complex logic.`
  - `Open a follow-up GitHub issue for this discussion.`
- Files and specific lines of code (under the "Files changed" tab): Tag `@coderabbitai` in a new review comment at the desired location with your query. Examples:
  - `@coderabbitai explain this code block.`
  -	`@coderabbitai modularize this function.`
- PR comments: Tag `@coderabbitai` in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
  - `@coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.`
  - `@coderabbitai read src/utils.ts and explain its main purpose.`
  - `@coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.`
  - `@coderabbitai help me debug CodeRabbit configuration file.`

### Support

Need help? Create a ticket on our [support page](https://www.coderabbit.ai/contact-us/support) for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

### CodeRabbit Commands (Invoked using PR comments)

- `@coderabbitai pause` to pause the reviews on a PR.
- `@coderabbitai resume` to resume the paused reviews.
- `@coderabbitai review` to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
- `@coderabbitai full review` to do a full review from scratch and review all the files again.
- `@coderabbitai summary` to regenerate the summary of the PR.
- `@coderabbitai generate docstrings` to [generate docstrings](https://docs.coderabbit.ai/finishing-touches/docstrings) for this PR.
- `@coderabbitai generate sequence diagram` to generate a sequence diagram of the changes in this PR.
- `@coderabbitai resolve` resolve all the CodeRabbit review comments.
- `@coderabbitai configuration` to show the current CodeRabbit configuration for the repository.
- `@coderabbitai help` to get help.

### Other keywords and placeholders

- Add `@coderabbitai ignore` anywhere in the PR description to prevent this PR from being reviewed.
- Add `@coderabbitai summary` to generate the high-level summary at a specific location in the PR description.
- Add `@coderabbitai` anywhere in the PR title to generate the title automatically.

### CodeRabbit Configuration File (`.coderabbit.yaml`)

- You can programmatically configure CodeRabbit by adding a `.coderabbit.yaml` file to the root of your repository.
- Please see the [configuration documentation](https://docs.coderabbit.ai/guides/configure-coderabbit) for more information.
- If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: `# yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json`

### Documentation and Community

- Visit our [Documentation](https://docs.coderabbit.ai) for detailed information on how to use CodeRabbit.
- Join our [Discord Community](http://discord.gg/coderabbit) to get help, request features, and share feedback.
- Follow us on [X/Twitter](https://twitter.com/coderabbitai) for updates and announcements.

</details>

<!-- tips_end -->

@caic99 caic99 marked this pull request as draft May 16, 2025 05:28
@codecov
Copy link

codecov bot commented May 16, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 84.69%. Comparing base (43e0288) to head (ad23558).

Additional details and impacted files
@@            Coverage Diff             @@
##            devel    #4744      +/-   ##
==========================================
- Coverage   84.69%   84.69%   -0.01%     
==========================================
  Files         697      697              
  Lines       67474    67473       -1     
  Branches     3540     3541       +1     
==========================================
- Hits        57147    57145       -2     
+ Misses       9197     9196       -1     
- Partials     1130     1132       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
deepmd/pt/model/descriptor/dpa3.py (1)

472-478: Minor: collapse trivial if-else

Once the above fix is applied the conditional disappears; you can simplify with the ternary suggested by Ruff:

atype = extended_atype[:, :nloc]  # identical for all cases
🧰 Tools
🪛 Ruff (0.8.2)

473-476: Use ternary operator atype = extended_atype[:, :nloc] if comm_dict is None else extended_atype instead of if-else-block

Replace if-else-block with atype = extended_atype[:, :nloc] if comm_dict is None else extended_atype

(SIM108)

deepmd/pt/model/descriptor/repflows.py (1)

458-464: Unused loop index & readability

Variable idx in for idx, ll in enumerate(self.layers): is never used.
Rename to _ to silence linters and convey intent:

-for idx, ll in enumerate(self.layers):
+for _, ll in enumerate(self.layers):
🧰 Tools
🪛 Ruff (0.8.2)

464-464: Loop control variable idx not used within loop body

Rename unused idx to _idx

(B007)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ad23558 and 7569f78.

⛔ Files ignored due to path filters (1)
  • mem.pickle is excluded by !**/*.pickle
📒 Files selected for processing (4)
  • deepmd/pt/model/descriptor/dpa3.py (1 hunks)
  • deepmd/pt/model/descriptor/repflow_layer.py (6 hunks)
  • deepmd/pt/model/descriptor/repflows.py (4 hunks)
  • deepmd/pt/train/training.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • deepmd/pt/model/descriptor/repflow_layer.py
🧰 Additional context used
🧬 Code Graph Analysis (1)
deepmd/pt/train/training.py (1)
source/tests/pt/model/test_model.py (1)
  • step (414-416)
🪛 Ruff (0.8.2)
deepmd/pt/model/descriptor/dpa3.py

473-476: Use ternary operator atype = extended_atype[:, :nloc] if comm_dict is None else extended_atype instead of if-else-block

Replace if-else-block with atype = extended_atype[:, :nloc] if comm_dict is None else extended_atype

(SIM108)

deepmd/pt/model/descriptor/repflows.py

464-464: Loop control variable idx not used within loop body

Rename unused idx to _idx

(B007)

⏰ Context from checks skipped due to timeout of 90000ms (30)
  • GitHub Check: Test Python (6, 3.12)
  • GitHub Check: Test Python (6, 3.9)
  • GitHub Check: Test Python (5, 3.12)
  • GitHub Check: Test Python (5, 3.9)
  • GitHub Check: Test Python (4, 3.12)
  • GitHub Check: Test Python (4, 3.9)
  • GitHub Check: Test Python (3, 3.12)
  • GitHub Check: Build wheels for cp310-manylinux_aarch64
  • GitHub Check: Build C++ (clang, clang)
  • GitHub Check: Build wheels for cp311-win_amd64
  • GitHub Check: Test Python (3, 3.9)
  • GitHub Check: Build wheels for cp311-macosx_arm64
  • GitHub Check: Build C++ (rocm, rocm)
  • GitHub Check: Test Python (2, 3.12)
  • GitHub Check: Build wheels for cp311-macosx_x86_64
  • GitHub Check: Build C++ (cuda120, cuda)
  • GitHub Check: Test Python (2, 3.9)
  • GitHub Check: Analyze (python)
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Build C++ (cuda, cuda)
  • GitHub Check: Test Python (1, 3.12)
  • GitHub Check: Build C library (2.14, >=2.5.0rc0,<2.15, libdeepmd_c_cu11.tar.gz)
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Test C++ (false)
  • GitHub Check: Build C++ (cpu, cpu)
  • GitHub Check: Test Python (1, 3.9)
  • GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
  • GitHub Check: Analyze (c-cpp)
  • GitHub Check: Test C++ (true)
🔇 Additional comments (2)
deepmd/pt/model/descriptor/dpa3.py (1)

479-486: Pass node_ebd_inp after slicing

After fixing the slice, keep the call unchanged; no further code change is required.
If you decide to support [nall] in the future, remove the shape assertion inside repflows instead.

deepmd/pt/model/descriptor/repflows.py (1)

458-464: Gathering with mapping – ensure dtype/device compatibility

torch.gather will fail if mapping is on CPU while nlist is on GPU (or vice-versa).
Earlier code expanded tensors on the same device. Please confirm mapping is already on the same device and uses torch.long; otherwise add:

mapping = mapping.to(nlist.device, dtype=torch.long)
🧰 Tools
🪛 Ruff (0.8.2)

464-464: Loop control variable idx not used within loop body

Rename unused idx to _idx

(B007)

@caic99 caic99 marked this pull request as ready for review May 16, 2025 07:27
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
deepmd/pt/model/descriptor/repflows.py (1)

462-462: Consider renaming unused loop variable.

The loop control variable idx is not used within the loop body. Consider renaming it to _ or _idx to indicate it's intentionally unused.

-for idx, ll in enumerate(self.layers):
+for _, ll in enumerate(self.layers):
🧰 Tools
🪛 Ruff (0.8.2)

462-462: Loop control variable idx not used within loop body

Rename unused idx to _idx

(B007)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7569f78 and cbaf7fd.

📒 Files selected for processing (1)
  • deepmd/pt/model/descriptor/repflows.py (4 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
deepmd/pt/model/descriptor/repflows.py

462-462: Loop control variable idx not used within loop body

Rename unused idx to _idx

(B007)

⏰ Context from checks skipped due to timeout of 90000ms (30)
  • GitHub Check: Test Python (6, 3.12)
  • GitHub Check: Test Python (6, 3.9)
  • GitHub Check: Test Python (5, 3.12)
  • GitHub Check: Test Python (5, 3.9)
  • GitHub Check: Test Python (4, 3.12)
  • GitHub Check: Test Python (4, 3.9)
  • GitHub Check: Test Python (3, 3.12)
  • GitHub Check: Test Python (3, 3.9)
  • GitHub Check: Test Python (2, 3.12)
  • GitHub Check: Test Python (2, 3.9)
  • GitHub Check: Test Python (1, 3.12)
  • GitHub Check: Test Python (1, 3.9)
  • GitHub Check: Build wheels for cp310-manylinux_aarch64
  • GitHub Check: Build wheels for cp311-win_amd64
  • GitHub Check: Build wheels for cp311-macosx_arm64
  • GitHub Check: Build C++ (clang, clang)
  • GitHub Check: Build wheels for cp311-macosx_x86_64
  • GitHub Check: Analyze (python)
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Build C++ (rocm, rocm)
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Build C++ (cuda120, cuda)
  • GitHub Check: Build C library (2.14, >=2.5.0rc0,<2.15, libdeepmd_c_cu11.tar.gz)
  • GitHub Check: Test C++ (false)
  • GitHub Check: Analyze (c-cpp)
  • GitHub Check: Build C++ (cuda, cuda)
  • GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
  • GitHub Check: Test C++ (true)
  • GitHub Check: Build C++ (cpu, cpu)
🔇 Additional comments (5)
deepmd/pt/model/descriptor/repflows.py (5)

376-376: Good change in the function signature.

The parameter extended_atype_embd is now required (non-optional) in the function signature, which aligns with the PR's memory optimization goal by requiring the embedding tensor upfront.


402-404: Simplifying code by direct assignment.

The direct assignment of extended_atype_embd to atype_embd with a shape check is cleaner than the previous implementation and matches the PR's goal of improving data flow.


456-461: Great memory optimization approach.

Setting node_ebd_ext = None and remapping nlist directly is a key part of reducing memory usage. This matches the PR objective of avoiding the creation of large intermediate tensors, helping to reduce peak memory from O(nlayer * nf * nall * ndim) to O(nlayer * nf * nloc + 1 * nf * nall).


465-465: Effective conditional approach to tensor creation.

Only creating and populating node_ebd_ext when comm_dict is not None is an important optimization that avoids unnecessary memory usage when not needed.


523-523: Properly adjusted layer forward call.

The explicit passing of node_ebd as the first argument to each layer's forward method aligns with the reordering of operations described in the PR objectives - enabling the gather-then-matmul approach instead of matmul-then-gather.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🔭 Outside diff range comments (1)
deepmd/dpmodel/descriptor/repflows.py (1)

174-206: 🛠️ Refactor suggestion

use_ext_ebd is stored but never referenced

The NumPy implementation now accepts use_ext_ebd but the flag is
not used anywhere in the class (unlike the Torch counterpart). This
creates behavioural drift between the two back-ends.

Consider wiring the flag into call() the same way the Torch version
does, or remove the argument to avoid a misleading API.

🧹 Nitpick comments (3)
deepmd/pt/model/descriptor/repflows.py (1)

469-476: Unused loop variable idx — rename for clarity

for idx, ll in enumerate(self.layers): never uses idx.
Rename to _ or drop enumerate:

-for idx, ll in enumerate(self.layers):
+for ll in self.layers:

This silences Ruff warning B007 and improves readability.

🧰 Tools
🪛 Ruff (0.8.2)

469-469: Loop control variable idx not used within loop body

Rename unused idx to _idx

(B007)

deepmd/dpmodel/descriptor/repflows.py (1)

491-497: Minor: drop unused idx in layer loop

for idx, ll in enumerate(self.layers):idx is never read.
Replace with for ll in self.layers: to avoid the dead variable.

🧰 Tools
🪛 Ruff (0.8.2)

492-492: Loop control variable idx not used within loop body

Rename unused idx to _idx

(B007)

source/tests/pt/model/test_dpa3.py (1)

48-70: Test Cartesian explosion – consider parametrisation throttling

Adding use_ext_ebd doubles the existing cartesian product of
parameters. The outer loop already iterates over ≈ 200 combinations;
doubling it may push CI runtime over limits.

If exhaustive coverage is unnecessary, mark a reduced subset with
pytest.mark.parametrize or filter combinations (e.g. run
use_ext_ebd=True only for one representative setting).

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cbaf7fd and 29667c5.

📒 Files selected for processing (5)
  • deepmd/dpmodel/descriptor/repflows.py (2 hunks)
  • deepmd/pt/model/descriptor/dpa3.py (6 hunks)
  • deepmd/pt/model/descriptor/repflows.py (6 hunks)
  • deepmd/utils/argcheck.py (1 hunks)
  • source/tests/pt/model/test_dpa3.py (3 hunks)
✅ Files skipped from review due to trivial changes (1)
  • deepmd/utils/argcheck.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • deepmd/pt/model/descriptor/dpa3.py
🧰 Additional context used
🪛 Ruff (0.8.2)
deepmd/pt/model/descriptor/repflows.py

469-469: Loop control variable idx not used within loop body

Rename unused idx to _idx

(B007)

⏰ Context from checks skipped due to timeout of 90000ms (30)
  • GitHub Check: Test Python (6, 3.12)
  • GitHub Check: Test Python (6, 3.9)
  • GitHub Check: Test Python (5, 3.12)
  • GitHub Check: Test Python (5, 3.9)
  • GitHub Check: Test Python (4, 3.12)
  • GitHub Check: Test Python (4, 3.9)
  • GitHub Check: Build wheels for cp310-manylinux_aarch64
  • GitHub Check: Test Python (3, 3.12)
  • GitHub Check: Test Python (3, 3.9)
  • GitHub Check: Build wheels for cp311-win_amd64
  • GitHub Check: Build C++ (clang, clang)
  • GitHub Check: Test Python (2, 3.12)
  • GitHub Check: Build wheels for cp311-macosx_arm64
  • GitHub Check: Build C++ (rocm, rocm)
  • GitHub Check: Test Python (2, 3.9)
  • GitHub Check: Build wheels for cp311-macosx_x86_64
  • GitHub Check: Test C++ (false)
  • GitHub Check: Build C++ (cuda120, cuda)
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Test Python (1, 3.12)
  • GitHub Check: Analyze (python)
  • GitHub Check: Test Python (1, 3.9)
  • GitHub Check: Test C++ (true)
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Build C library (2.14, >=2.5.0rc0,<2.15, libdeepmd_c_cu11.tar.gz)
  • GitHub Check: Build C++ (cuda, cuda)
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
  • GitHub Check: Analyze (c-cpp)
  • GitHub Check: Build C++ (cpu, cpu)
🔇 Additional comments (1)
deepmd/pt/model/descriptor/repflows.py (1)

404-408: Shape assertion may be too strict after removing the slice

atype_embd = extended_atype_embd followed by
assert list(atype_embd.shape) == [nframes, nloc, self.n_dim]

If extended_atype_embd still contains embeddings for all atoms (nall),
the assertion will fail. Either restore the slice ([:, :nloc, :]) or relax
the assertion:

-assert list(atype_embd.shape) == [nframes, nloc, self.n_dim]
+assert atype_embd.shape[0] == nframes and atype_embd.shape[-1] == self.n_dim
+atype_embd = atype_embd[:, :nloc, :]   # ensure local part

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
deepmd/pt/model/descriptor/repflows.py (1)

469-469: Rename unused loop variable.

The loop control variable idx is not used within the loop body.

-for idx, ll in enumerate(self.layers):
+for _, ll in enumerate(self.layers):
🧰 Tools
🪛 Ruff (0.8.2)

469-469: Loop control variable idx not used within loop body

Rename unused idx to _idx

(B007)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7b7cdfa and 5594c56.

📒 Files selected for processing (3)
  • deepmd/dpmodel/descriptor/dpa3.py (7 hunks)
  • deepmd/pt/model/descriptor/repflows.py (6 hunks)
  • source/tests/pt/model/test_dpa3.py (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • source/tests/pt/model/test_dpa3.py
  • deepmd/dpmodel/descriptor/dpa3.py
🧰 Additional context used
🪛 Ruff (0.8.2)
deepmd/pt/model/descriptor/repflows.py

469-469: Loop control variable idx not used within loop body

Rename unused idx to _idx

(B007)

⏰ Context from checks skipped due to timeout of 90000ms (29)
  • GitHub Check: Build wheels for cp310-manylinux_aarch64
  • GitHub Check: Test Python (6, 3.12)
  • GitHub Check: Build wheels for cp311-win_amd64
  • GitHub Check: Test Python (6, 3.9)
  • GitHub Check: Build wheels for cp311-macosx_arm64
  • GitHub Check: Test Python (5, 3.12)
  • GitHub Check: Build wheels for cp311-macosx_x86_64
  • GitHub Check: Test Python (5, 3.9)
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Test Python (4, 3.12)
  • GitHub Check: Test Python (4, 3.9)
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Test Python (3, 3.12)
  • GitHub Check: Test Python (3, 3.9)
  • GitHub Check: Test Python (2, 3.12)
  • GitHub Check: Test Python (2, 3.9)
  • GitHub Check: Test Python (1, 3.12)
  • GitHub Check: Test Python (1, 3.9)
  • GitHub Check: Build C++ (clang, clang)
  • GitHub Check: Build C++ (rocm, rocm)
  • GitHub Check: Build C++ (cuda120, cuda)
  • GitHub Check: Build C++ (cuda, cuda)
  • GitHub Check: Build C++ (cpu, cpu)
  • GitHub Check: Test C++ (false)
  • GitHub Check: Test C++ (true)
  • GitHub Check: Build C library (2.14, >=2.5.0rc0,<2.15, libdeepmd_c_cu11.tar.gz)
  • GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
  • GitHub Check: Analyze (python)
  • GitHub Check: Analyze (c-cpp)
🔇 Additional comments (4)
deepmd/pt/model/descriptor/repflows.py (4)

188-188: Good feature addition to enhance memory efficiency.

Adding this flag to control embedding handling is a clean way to implement the optimization described in the PR.


456-468: Reversed gather and matmul order as intended.

This is the core of the optimization described in the PR. By performing the gather operation first, we reduce the size of the tensors involved in matrix multiplication operations, which reduces memory usage while preserving numerical results.


472-476: Conditional computation of node_ebd_ext reduces memory usage.

This conditional approach ensures that node_ebd_ext is only computed when needed, rather than always maintaining a large tensor throughout the computation. This directly addresses the PR objective of reducing peak memory usage.


534-537: Updated method call signature to support the optimization.

The explicit passing of node_ebd as the first argument aligns with changes in the RepFlowLayer.forward method.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
deepmd/pt/model/descriptor/repflows.py (1)

469-469: Unused loop control variable.

The loop control variable idx is not used within the loop body.

-for idx, ll in enumerate(self.layers):
+for _, ll in enumerate(self.layers):
🧰 Tools
🪛 Ruff (0.8.2)

469-469: Loop control variable idx not used within loop body

Rename unused idx to _idx

(B007)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5594c56 and ac6677e.

📒 Files selected for processing (1)
  • deepmd/pt/model/descriptor/repflows.py (6 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
deepmd/pt/model/descriptor/repflows.py

469-469: Loop control variable idx not used within loop body

Rename unused idx to _idx

(B007)

⏰ Context from checks skipped due to timeout of 90000ms (29)
  • GitHub Check: Test Python (6, 3.12)
  • GitHub Check: Test Python (6, 3.9)
  • GitHub Check: Test Python (5, 3.12)
  • GitHub Check: Test Python (5, 3.9)
  • GitHub Check: Test Python (4, 3.12)
  • GitHub Check: Test Python (4, 3.9)
  • GitHub Check: Test Python (3, 3.12)
  • GitHub Check: Test Python (3, 3.9)
  • GitHub Check: Test Python (2, 3.12)
  • GitHub Check: Test Python (2, 3.9)
  • GitHub Check: Test Python (1, 3.12)
  • GitHub Check: Test Python (1, 3.9)
  • GitHub Check: Build wheels for cp310-manylinux_aarch64
  • GitHub Check: Build wheels for cp311-win_amd64
  • GitHub Check: Build C++ (clang, clang)
  • GitHub Check: Build C++ (rocm, rocm)
  • GitHub Check: Build wheels for cp311-macosx_arm64
  • GitHub Check: Build wheels for cp311-macosx_x86_64
  • GitHub Check: Build C++ (cuda120, cuda)
  • GitHub Check: Analyze (python)
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Build C++ (cuda, cuda)
  • GitHub Check: Test C++ (false)
  • GitHub Check: Build C library (2.14, >=2.5.0rc0,<2.15, libdeepmd_c_cu11.tar.gz)
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Build C++ (cpu, cpu)
  • GitHub Check: Analyze (c-cpp)
  • GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
  • GitHub Check: Test C++ (true)
🔇 Additional comments (5)
deepmd/pt/model/descriptor/repflows.py (5)

188-188: LGTM: New parameter added for memory optimization.

The new use_ext_ebd parameter is well-defined with a default value of False to maintain backward compatibility while enabling the performance optimization described in the PR.


219-219: LGTM: Instance attribute properly stored.

The flag is correctly stored as an instance attribute for later use in the forward method.


456-460: LGTM: Conditional expansion of mapping.

This code block properly initializes the 3D mapping tensor for extended embeddings only when needed, which aligns with the optimization goal.


473-477: LGTM: Core optimization implemented properly.

This is the key change described in the PR - gathering node embeddings before matrix multiplication to reduce peak memory usage. The code correctly implements the optimization by gathering node_ebd into node_ebd_ext when use_ext_ebd is True.


535-547: LGTM: Method call updated appropriately.

The forward method call has been updated to explicitly pass node_ebd as the first argument, consistent with the changes in the RepFlowLayer class.

@iProzd
Copy link
Collaborator

iProzd commented May 20, 2025

iProzd@28803d9 maybe you can refer to and be consistent with this?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🔭 Outside diff range comments (1)
deepmd/pt/model/descriptor/dpa3.py (1)

100-116: ⚠️ Potential issue

use_loc_mapping parameter missing – constructor raises NameError

self.use_loc_mapping = use_loc_mapping (line 165) references a variable that
is not in the function signature (only use_ext_ebd was added). Importing
the module therefore fails immediately.

-        use_tebd_bias: bool = False,
-        type_map: Optional[list[str]] = None,
-        use_ext_ebd: bool = False,
+        use_tebd_bias: bool = False,
+        type_map: Optional[list[str]] = None,
+        use_ext_ebd: bool = False,
+        use_loc_mapping: bool = True,

Remember to propagate the new flag to:

  • serialize / deserialize
  • the tests (already parameterised)
🧰 Tools
🪛 Ruff (0.11.9)

107-107: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

♻️ Duplicate comments (1)
deepmd/pt/model/descriptor/dpa3.py (1)

476-485: ⚠️ Potential issue

Still passes nall-sized embeddings to repflows

When comm_dict is not None and use_ext_ebd is False,
atype = extended_atype keeps shape [nf, nall], violating the assertion
inside repflows.forward ([nf, nloc, n_dim]).
This was flagged in a previous review but not fully resolved.

Simplest fix → always slice to local atoms:

-        if comm_dict is None or self.use_ext_ebd:
-            atype = extended_atype[:, :nloc]
-        else:
-            atype = extended_atype
+        atype = extended_atype[:, :nloc]

node_ebd_ext is already reconstructed inside repflows, so no functionality
is lost.

🧰 Tools
🪛 Ruff (0.11.9)

476-476: Local variable parrallel_mode is assigned to but never used

Remove assignment to unused variable parrallel_mode

(F841)

🧹 Nitpick comments (5)
deepmd/pt/model/network/utils.py (2)

101-103: Memory blow-up when building 4-D angle mask

a_nlist_mask_3d = a_nlist_mask[:, :, :, None] & a_nlist_mask[:, :, None, :]
creates a tensor of shape nf × nloc × a_nnei × a_nnei; for typical
water-sized systems with nloc=10 k and a_nnei=64 this is already ~320 MB
(BOOL). The subsequent 4-D dihedral mask scales as d_nnei³.

Consider:

  • building the indices with torch.nonzero on the 2-D mask to avoid the
    O(N²) expansion, or
  • restricting it to the actually used neighbor count (torch.sum(mask_i)).

60-66: Docstring misses use_loc_mapping argument

get_graph_index accepts use_loc_mapping, but the parameter list and
description omit it. Please add for clarity and to keep sphinx/autodoc happy.

deepmd/pt/model/descriptor/dpa3.py (1)

476-478: Remove unused variable parrallel_mode

The assignment is never referenced and triggers Ruff F841.

-        parrallel_mode = comm_dict is not None
🧰 Tools
🪛 Ruff (0.11.9)

476-476: Local variable parrallel_mode is assigned to but never used

Remove assignment to unused variable parrallel_mode

(F841)

deepmd/pt/model/descriptor/repflows.py (2)

469-469: Rename unused loop variable.

The loop control variable idx is not used within the loop body. This was also flagged by static analysis.

-for idx, ll in enumerate(self.layers):
+for _, ll in enumerate(self.layers):

Or simply remove the enumerate if the index isn't needed:

-for idx, ll in enumerate(self.layers):
+for ll in self.layers:
🧰 Tools
🪛 Ruff (0.11.9)

469-469: Loop control variable idx not used within loop body

Rename unused idx to _idx

(B007)


464-468: Consider a more robust approach for the gather operation.

The current implementation reshapes both mapping and nlist to 2D before gathering, then reshapes back. This approach works but might be susceptible to dimension mismatches if tensor shapes change in the future.

A more robust approach would be to use 3D tensors with proper dimension expansion:

-            nlist = torch.gather(
-                mapping.reshape(nframes, -1),
-                1,
-                nlist.reshape(nframes, -1),
-            ).reshape(nlist.shape)
+            # Gather directly on 3D tensor, preserving dimensions
+            nlist = mapping.gather(
+                dim=1,
+                index=nlist.unsqueeze(-1).expand(-1, -1, -1, self.n_dim)
+            ).squeeze(-1)

This makes the dimensional requirements explicit and less prone to subtle errors.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ac6677e and a349de1.

📒 Files selected for processing (6)
  • deepmd/pt/model/descriptor/dpa3.py (6 hunks)
  • deepmd/pt/model/descriptor/repflows.py (6 hunks)
  • deepmd/pt/model/network/utils.py (1 hunks)
  • deepmd/utils/argcheck.py (1 hunks)
  • source/tests/pt/model/test_loc_mapping.py (1 hunks)
  • source/tests/universal/dpmodel/descriptor/test_descriptor.py (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • deepmd/utils/argcheck.py
🧰 Additional context used
🧬 Code Graph Analysis (1)
deepmd/pt/model/network/utils.py (2)
deepmd/pt/model/descriptor/dpa3.py (1)
  • forward (436-503)
deepmd/pt/model/descriptor/repflows.py (1)
  • forward (373-555)
🪛 Ruff (0.11.9)
deepmd/pt/model/descriptor/repflows.py

469-469: Loop control variable idx not used within loop body

Rename unused idx to _idx

(B007)

deepmd/pt/model/descriptor/dpa3.py

165-165: Undefined name use_loc_mapping

(F821)


476-476: Local variable parrallel_mode is assigned to but never used

Remove assignment to unused variable parrallel_mode

(F841)

⏰ Context from checks skipped due to timeout of 90000ms (29)
  • GitHub Check: Test Python (6, 3.12)
  • GitHub Check: Test Python (6, 3.9)
  • GitHub Check: Test Python (5, 3.12)
  • GitHub Check: Test Python (5, 3.9)
  • GitHub Check: Build wheels for cp310-manylinux_aarch64
  • GitHub Check: Test Python (4, 3.12)
  • GitHub Check: Build wheels for cp311-win_amd64
  • GitHub Check: Test Python (4, 3.9)
  • GitHub Check: Build wheels for cp311-macosx_arm64
  • GitHub Check: Test Python (3, 3.12)
  • GitHub Check: Build wheels for cp311-macosx_x86_64
  • GitHub Check: Test Python (3, 3.9)
  • GitHub Check: Build C++ (clang, clang)
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Analyze (python)
  • GitHub Check: Test Python (2, 3.12)
  • GitHub Check: Build C++ (rocm, rocm)
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Test Python (2, 3.9)
  • GitHub Check: Build C++ (cuda120, cuda)
  • GitHub Check: Build C library (2.14, >=2.5.0rc0,<2.15, libdeepmd_c_cu11.tar.gz)
  • GitHub Check: Test C++ (false)
  • GitHub Check: Test Python (1, 3.12)
  • GitHub Check: Build C++ (cuda, cuda)
  • GitHub Check: Analyze (c-cpp)
  • GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
  • GitHub Check: Test C++ (true)
  • GitHub Check: Test Python (1, 3.9)
  • GitHub Check: Build C++ (cpu, cpu)
🔇 Additional comments (7)
deepmd/pt/model/descriptor/dpa3.py (1)

384-385: serialize should store the new mapping flag

use_loc_mapping (once added) must be saved and restored to maintain model
behaviour after reload.

     "use_ext_ebd": self.use_ext_ebd,
+    "use_loc_mapping": self.use_loc_mapping,

(And load it in deserialize).

source/tests/universal/dpmodel/descriptor/test_descriptor.py (1)

547-551: Test parameterisation will fail until constructor accepts use_loc_mapping

The newly added "use_loc_mapping": (True, False) expands the cartesian
product, but the DescrptDPA3 constructor currently lacks the argument (see
error above). Once the constructor is fixed the tests should pass.

source/tests/pt/model/test_loc_mapping.py (1)

1-259: Great comprehensive test suite for local mapping functionality.

The test file is well-structured with thorough test coverage across multiple parameter combinations. It correctly validates that outputs remain consistent whether local mapping is enabled or disabled, which is essential for ensuring backward compatibility while optimizing the code.

deepmd/pt/model/descriptor/repflows.py (4)

456-476: Performance optimization aligns with PR objectives.

The implementation of the use_ext_ebd flag and the associated logic changes successfully implement the performance optimization described in the PR objectives. By gathering node embeddings before performing matrix multiplication operations, the code reduces memory usage from O(nlayer * nf * nall * ndim) to O(nlayer * nf * nloc), which can be significant when nall is much larger than nloc.

🧰 Tools
🪛 Ruff (0.11.9)

469-469: Loop control variable idx not used within loop body

Rename unused idx to _idx

(B007)


188-188: Good default for backward compatibility.

Setting use_ext_ebd=False as the default value ensures backward compatibility with existing code.


404-407: Simplification of the embedding handling.

The changes to the atype_embd assignment simplify the code by removing unnecessary conditional logic and slicing.


537-538: Explicit parameter passing improves code clarity.

Explicitly passing node_ebd as the first argument to the layer's forward method improves code clarity and maintainability.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
deepmd/pt/model/descriptor/dpa3.py (3)

165-165: Redundant assignment of use_loc_mapping

The use_loc_mapping flag is already assigned on line 128, making this second assignment redundant.

-        self.use_loc_mapping = use_loc_mapping

476-476: Remove unused variable parrallel_mode

The variable parrallel_mode is assigned but never used, as also flagged by static analysis.

-        parrallel_mode = comm_dict is not None
🧰 Tools
🪛 Ruff (0.11.9)

476-476: Local variable parrallel_mode is assigned to but never used

Remove assignment to unused variable parrallel_mode

(F841)


487-487: Explicit call to forward method

Changed from self.repflows(...) to self.repflows.forward(...). While functionally equivalent, this style change makes the code more explicit.

You could revert to using self.repflows(...) for consistency with other code, as Python's method resolution should handle this correctly.

deepmd/pt/model/descriptor/repflows.py (1)

469-469: Unused loop variable

The loop control variable idx is not used within the loop body.

-        for idx, ll in enumerate(self.layers):
+        for _, ll in enumerate(self.layers):

or simply:

-        for idx, ll in enumerate(self.layers):
+        for ll in self.layers:
🧰 Tools
🪛 Ruff (0.11.9)

469-469: Loop control variable idx not used within loop body

Rename unused idx to _idx

(B007)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a349de1 and 3ccefcc.

📒 Files selected for processing (5)
  • deepmd/dpmodel/descriptor/dpa3.py (7 hunks)
  • deepmd/dpmodel/descriptor/repflows.py (3 hunks)
  • deepmd/pt/model/descriptor/dpa3.py (6 hunks)
  • deepmd/pt/model/descriptor/repflows.py (6 hunks)
  • deepmd/utils/argcheck.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • deepmd/utils/argcheck.py
  • deepmd/dpmodel/descriptor/repflows.py
  • deepmd/dpmodel/descriptor/dpa3.py
🧰 Additional context used
🪛 Ruff (0.11.9)
deepmd/pt/model/descriptor/dpa3.py

476-476: Local variable parrallel_mode is assigned to but never used

Remove assignment to unused variable parrallel_mode

(F841)

deepmd/pt/model/descriptor/repflows.py

469-469: Loop control variable idx not used within loop body

Rename unused idx to _idx

(B007)

⏰ Context from checks skipped due to timeout of 90000ms (29)
  • GitHub Check: Test Python (6, 3.12)
  • GitHub Check: Test Python (6, 3.9)
  • GitHub Check: Test Python (5, 3.12)
  • GitHub Check: Test Python (5, 3.9)
  • GitHub Check: Test Python (4, 3.12)
  • GitHub Check: Test Python (4, 3.9)
  • GitHub Check: Test Python (3, 3.12)
  • GitHub Check: Test Python (3, 3.9)
  • GitHub Check: Test Python (2, 3.12)
  • GitHub Check: Test Python (2, 3.9)
  • GitHub Check: Test Python (1, 3.12)
  • GitHub Check: Test Python (1, 3.9)
  • GitHub Check: Build wheels for cp310-manylinux_aarch64
  • GitHub Check: Build wheels for cp311-win_amd64
  • GitHub Check: Build C++ (clang, clang)
  • GitHub Check: Build wheels for cp311-macosx_arm64
  • GitHub Check: Build C++ (rocm, rocm)
  • GitHub Check: Build wheels for cp311-macosx_x86_64
  • GitHub Check: Analyze (python)
  • GitHub Check: Build C++ (cuda120, cuda)
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Build C library (2.14, >=2.5.0rc0,<2.15, libdeepmd_c_cu11.tar.gz)
  • GitHub Check: Build C++ (cuda, cuda)
  • GitHub Check: Test C++ (false)
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
  • GitHub Check: Build C++ (cpu, cpu)
  • GitHub Check: Analyze (c-cpp)
  • GitHub Check: Test C++ (true)
🔇 Additional comments (10)
deepmd/pt/model/descriptor/dpa3.py (4)

94-95: Added new parameter use_loc_mapping for memory optimization

This new parameter enables the optimization described in PR #4744, allowing the descriptor to gather node embeddings before the matrix multiplication operation. This reduces peak memory usage, making it independent of the number of expanded atoms (nall).

Also applies to: 114-114


128-128: Constructor consistently propagates the use_loc_mapping flag

The flag is properly stored as an instance attribute and passed to the DescrptBlockRepflows constructor, ensuring consistent behavior between the descriptor and its components.

Also applies to: 157-157


384-384: Appropriate serialization of the use_loc_mapping flag

This ensures the flag is preserved when saving and loading models.


480-485: Conditional selection of atom types based on mapping mode

This change is at the core of the optimization, making the atom type selection depend on both the communication dictionary and the mapping mode. When use_loc_mapping is True, it avoids passing the full nall embeddings and prevents runtime assertion errors during distributed or GPU inference.

The removal of the nall variable is also appropriate since it's no longer used.

deepmd/pt/model/descriptor/repflows.py (6)

186-186: Added use_loc_mapping parameter to control node embedding gathering

This parameter, defaulting to True, controls whether local mapping and remapping of neighbor lists and embeddings are applied. This is a key part of the memory optimization strategy described in PR #4744.

Also applies to: 219-219


378-378: Renamed parameter for clarity

Renamed parameter from an implicit input to an explicit extended_atype_embd, making the function signature clearer about what's being passed.


404-406: Simplified variable assignment and assertion

Directly assigns extended_atype_embd to atype_embd and asserts the correct shape. This is cleaner than the previous conditional assignment.


456-468: Core optimization: conditionally gather node embeddings

This is the heart of the optimization described in PR #4744. When use_loc_mapping is False, it reshapes and expands the mapping and gathers node embeddings before the matrix multiplication. This reduces memory usage by avoiding large intermediate tensors, especially when using a large cut-off radius.


473-477: Conditional node embedding gathering in loop

Extends the conditional logic to the loop, ensuring consistent behavior based on the use_loc_mapping flag. When use_loc_mapping is True and comm_dict is None, node_ebd_ext is set to None, avoiding unnecessary memory usage.


537-537: Explicit parameter in layer forward call

Changed from implicit positional parameter to explicitly named node_ebd. This makes the code more readable and maintainable.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
deepmd/pt/model/descriptor/dpa3.py (2)

128-128: Redundant assignment of use_loc_mapping.

The use_loc_mapping attribute is assigned twice in the constructor (lines 128 and 165), which is unnecessary.

-        self.use_loc_mapping = use_loc_mapping
         self.repflow_args = init_subclass_params(repflow, RepFlowArgs)
         self.activation_function = activation_function

Also applies to: 165-165


476-484: Consider using a ternary operator for code simplification.

The current implementation introduces a new variable parallel_mode but the conditional assignment could be simplified using a ternary operator.

-        parallel_mode = comm_dict is not None
-        # nall = extended_coord.view(nframes, -1).shape[1] // 3
-        if parallel_mode:
-            atype = extended_atype
-        else:
-            atype = extended_atype[:, :nloc]
+        # Cast the input to internal precision
+        extended_coord = extended_coord.to(dtype=self.prec)
+        nframes, nloc, nnei = nlist.shape
+        atype = extended_atype if comm_dict is not None else extended_atype[:, :nloc]
🧰 Tools
🪛 Ruff (0.11.9)

481-484: Use ternary operator atype = extended_atype if parallel_mode else extended_atype[:, :nloc] instead of if-else-block

Replace if-else-block with atype = extended_atype if parallel_mode else extended_atype[:, :nloc]

(SIM108)

deepmd/pt/model/descriptor/repflows.py (1)

467-467: Unused loop control variable.

The loop control variable idx is not used in the loop body.

-        for idx, ll in enumerate(self.layers):
+        for _, ll in enumerate(self.layers):

or alternatively:

-        for idx, ll in enumerate(self.layers):
+        for ll in self.layers:
🧰 Tools
🪛 Ruff (0.11.9)

467-467: Loop control variable idx not used within loop body

Rename unused idx to _idx

(B007)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d751e44 and 91611e2.

📒 Files selected for processing (3)
  • deepmd/pt/model/descriptor/dpa3.py (6 hunks)
  • deepmd/pt/model/descriptor/repflows.py (6 hunks)
  • source/tests/pt/model/test_loc_mapping.py (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • source/tests/pt/model/test_loc_mapping.py
🧰 Additional context used
🪛 Ruff (0.11.9)
deepmd/pt/model/descriptor/repflows.py

467-467: Loop control variable idx not used within loop body

Rename unused idx to _idx

(B007)

deepmd/pt/model/descriptor/dpa3.py

481-484: Use ternary operator atype = extended_atype if parallel_mode else extended_atype[:, :nloc] instead of if-else-block

Replace if-else-block with atype = extended_atype if parallel_mode else extended_atype[:, :nloc]

(SIM108)

⏰ Context from checks skipped due to timeout of 90000ms (29)
  • GitHub Check: Test Python (6, 3.12)
  • GitHub Check: Test Python (6, 3.9)
  • GitHub Check: Test Python (5, 3.12)
  • GitHub Check: Test Python (5, 3.9)
  • GitHub Check: Test Python (4, 3.12)
  • GitHub Check: Test Python (4, 3.9)
  • GitHub Check: Build wheels for cp310-manylinux_aarch64
  • GitHub Check: Test Python (3, 3.12)
  • GitHub Check: Build wheels for cp311-win_amd64
  • GitHub Check: Test Python (3, 3.9)
  • GitHub Check: Build wheels for cp311-macosx_arm64
  • GitHub Check: Build C++ (clang, clang)
  • GitHub Check: Build wheels for cp311-macosx_x86_64
  • GitHub Check: Test Python (2, 3.12)
  • GitHub Check: Build C++ (rocm, rocm)
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Analyze (python)
  • GitHub Check: Test Python (2, 3.9)
  • GitHub Check: Build C++ (cuda120, cuda)
  • GitHub Check: Build C++ (cuda, cuda)
  • GitHub Check: Test C++ (false)
  • GitHub Check: Test Python (1, 3.12)
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Build C library (2.14, >=2.5.0rc0,<2.15, libdeepmd_c_cu11.tar.gz)
  • GitHub Check: Test Python (1, 3.9)
  • GitHub Check: Analyze (c-cpp)
  • GitHub Check: Build C++ (cpu, cpu)
  • GitHub Check: Test C++ (true)
  • GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
🔇 Additional comments (10)
deepmd/pt/model/descriptor/dpa3.py (5)

114-114: New parameter added correctly.

The new use_loc_mapping parameter with default value of True maintains backward compatibility while enabling the memory optimization mentioned in the PR objectives.


157-157: Propagating parameter to DescrptBlockRepflows constructor.

Correctly passing the use_loc_mapping parameter to the DescrptBlockRepflows constructor ensures consistent behavior.


384-384: Serialization updated correctly.

Adding use_loc_mapping to the serialized data ensures the parameter is preserved when saving/loading models.


481-481: Unused variable removed.

Removing the unused nall variable improves code clarity and reduces potential confusion.

🧰 Tools
🪛 Ruff (0.11.9)

481-484: Use ternary operator atype = extended_atype if parallel_mode else extended_atype[:, :nloc] instead of if-else-block

Replace if-else-block with atype = extended_atype if parallel_mode else extended_atype[:, :nloc]

(SIM108)


487-487: Method invocation style improved.

Using explicit .forward() method call instead of implicit invocation improves code readability and makes the intent clearer.

deepmd/pt/model/descriptor/repflows.py (5)

186-186: New parameter added correctly.

The use_loc_mapping parameter with default value of True maintains backward compatibility while enabling the memory optimization from the PR objectives.

Also applies to: 219-219


378-378: Parameter handling and shape assertion changes.

The code now properly accepts extended embeddings but asserts the correct shape for atype_embd. This is a core part of the optimization that allows handling embeddings differently based on whether local mapping is used.

Also applies to: 404-406


456-466: Memory optimization by changing gather order.

This change implements the core optimization described in the PR objectives. When use_loc_mapping is True, it follows the original approach. When False, it gathers node embeddings before performing operations, which reduces memory usage with large tensors.


471-475: Added fallback for missing node_ebd_ext.

This conditional handles the case when use_loc_mapping is False and allows gathering the embeddings directly from node_ebd. This completes the optimization by ensuring embeddings are available for subsequent operations.


534-534: Explicit parameter passing.

Using explicit parameter naming improves code readability and makes the function call more clear.

@caic99 caic99 requested a review from Copilot May 22, 2025 06:12
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR aims to improve the performance and peak memory usage of the descriptor computation by gathering the node embeddings prior to matrix multiplication. Key changes include:

  • Adding a new boolean parameter “use_loc_mapping” throughout the codebase.
  • Updating several modules (tests, argcheck, repflows, repflow_layer, dpa3) to conditionally use local mapping for reduced memory overhead.
  • Adjusting the logic in both PyTorch and DeepMD descriptor implementations to support the new local mapping option.

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
source/tests/universal/dpmodel/descriptor/test_descriptor.py Updated tests to include “use_loc_mapping” parameter.
source/tests/pt/model/test_loc_mapping.py New test file to verify behavior with and without local mapping.
deepmd/utils/argcheck.py Added argument for “use_loc_mapping”.
deepmd/pt/model/descriptor/repflows.py Updated the forward pass to handle local mapping flag conditionally.
deepmd/pt/model/descriptor/repflow_layer.py Refactored parameter names and branch logic for local mapping.
deepmd/pt/model/descriptor/dpa3.py Added “use_loc_mapping” parameter and updated serialization.
deepmd/dpmodel/descriptor/repflows.py Added “use_loc_mapping” propagation in the dpmodel descriptor.
deepmd/dpmodel/descriptor/dpa3.py Added “use_loc_mapping” parameter and updated serialization accordingly.

@caic99
Copy link
Member Author

caic99 commented May 22, 2025

iProzd@28803d9 maybe you can refer to and be consistent with this?

@iProzd I've modified the codes in alignment with your commit, and adopt your UTs on this PR and pass them. However other UTs are failing. I suspect that the numerical error while scattering and gathering features should be blamed. Any ideas?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you need to abort modifications in this file, which means to keep exact modifications in commit iProzd@28803d9 and I passed all the uts.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you need to abort modifications in this file

Did you mean you'll submit another PR to this branch?

# nf * nloc * nnei * node/edge_dim
sub_node_ext_update = _make_nei_g1(sub_node_ext_update, nlist)
# nf * nloc * node/edge_dim
sub_node_ext_update = torch.matmul(nei_node_ebd, node_ext)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You missed a _make_nei_g1 here. But when you use locak mapping, it's more efficient to keep the old implementation.

"use_loc_mapping",
bool,
optional=True,
default=True,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a docstr for use_loc_mapping

@caic99
Copy link
Member Author

caic99 commented May 28, 2025

Please see PR #4772.

@caic99 caic99 closed this May 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants