-
Notifications
You must be signed in to change notification settings - Fork 582
perf: gather node embedding before matmul #4744
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR optimizes memory usage in the repflow layer by reordering the operations to gather the node embeddings before performing the matrix multiplication.
- Reorders the gathering of node embedding data using _make_nei_g1 before the matmul operation.
- Reduces peak memory usage by avoiding creation of a large intermediate tensor.
📝 Walkthrough## Walkthrough
A new boolean flag `use_loc_mapping` was introduced across the descriptor classes, argument parsing, and serialization logic in both PyTorch and dpmodel implementations. This flag controls whether local mapping and remapping of neighbor lists and embeddings are applied. Method signatures, serialization, and tests were updated accordingly, and comprehensive tests were added to verify output consistency with and without local mapping. Additionally, the `optim_edge_update` and `forward` methods in `RepFlowLayer` were refactored to move neighbor gathering outside `optim_edge_update` and add fallback handling for missing extended embeddings.
## Changes
| File(s) | Change Summary |
|------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `deepmd/pt/model/descriptor/repflow_layer.py` | Refactored `optim_edge_update` and `forward` method signatures and logic to move neighbor gathering (`_make_nei_g1`) from `optim_edge_update` to `forward`, added fallback for missing `node_ebd_ext`, and updated internal logic and calls accordingly. |
| `deepmd/pt/model/descriptor/dpa3.py`,<br>`deepmd/dpmodel/descriptor/dpa3.py` | Added `use_loc_mapping` parameter to constructors, stored as attribute, passed to repflows block, included in serialization, and updated forward logic to conditionally select atom types tensor based on this flag and parallel mode. |
| `deepmd/pt/model/descriptor/repflows.py`,<br>`deepmd/dpmodel/descriptor/repflows.py` | Added `use_loc_mapping` parameter to constructors, stored as attribute, included in serialization, and updated forward logic to control local mapping/remapping and neighbor list handling based on the flag. |
| `deepmd/utils/argcheck.py` | Added `"use_loc_mapping"` as an optional boolean argument with default `True` in `descrpt_dpa3_args()`. |
| `source/tests/pt/model/test_loc_mapping.py` | Added new test module with tests for output consistency of `DescrptDPA3` with and without local mapping, covering multiple configurations and precisions. |
| `source/tests/universal/dpmodel/descriptor/test_descriptor.py` | Updated `DescriptorParamDPA3` to accept and propagate `use_loc_mapping`, and parameterized tests to cover both values of this flag. |
## Sequence Diagram(s)
```mermaid
sequenceDiagram
participant User
participant DescrptDPA3
participant DescrptBlockRepflows
participant RepFlowLayer
User->>DescrptDPA3: forward(coord, atype, nlist, mapping, ...)
DescrptDPA3->>DescrptBlockRepflows: forward(nlist, extended_coord, extended_atype, extended_atype_embd, mapping, comm_dict)
alt use_loc_mapping == False
DescrptBlockRepflows->>DescrptBlockRepflows: Remap nlist and embeddings using mapping
else use_loc_mapping == True
DescrptBlockRepflows->>DescrptBlockRepflows: Use original nlist and embeddings
end
loop for each RepFlowLayer
DescrptBlockRepflows->>RepFlowLayer: forward(node_ebd, node_ebd_ext, ...)
end
DescrptBlockRepflows-->>DescrptDPA3: node_ebd
DescrptDPA3-->>User: descriptor outputPossibly related PRs
Suggested reviewers
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## devel #4744 +/- ##
==========================================
- Coverage 84.69% 84.69% -0.01%
==========================================
Files 697 697
Lines 67474 67473 -1
Branches 3540 3541 +1
==========================================
- Hits 57147 57145 -2
+ Misses 9197 9196 -1
- Partials 1130 1132 +2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (2)
deepmd/pt/model/descriptor/dpa3.py (1)
472-478: Minor: collapse trivial if-elseOnce the above fix is applied the conditional disappears; you can simplify with the ternary suggested by Ruff:
atype = extended_atype[:, :nloc] # identical for all cases🧰 Tools
🪛 Ruff (0.8.2)
473-476: Use ternary operator
atype = extended_atype[:, :nloc] if comm_dict is None else extended_atypeinstead ofif-else-blockReplace
if-else-block withatype = extended_atype[:, :nloc] if comm_dict is None else extended_atype(SIM108)
deepmd/pt/model/descriptor/repflows.py (1)
458-464: Unused loop index & readabilityVariable
idxinfor idx, ll in enumerate(self.layers):is never used.
Rename to_to silence linters and convey intent:-for idx, ll in enumerate(self.layers): +for _, ll in enumerate(self.layers):🧰 Tools
🪛 Ruff (0.8.2)
464-464: Loop control variable
idxnot used within loop bodyRename unused
idxto_idx(B007)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
mem.pickleis excluded by!**/*.pickle
📒 Files selected for processing (4)
deepmd/pt/model/descriptor/dpa3.py(1 hunks)deepmd/pt/model/descriptor/repflow_layer.py(6 hunks)deepmd/pt/model/descriptor/repflows.py(4 hunks)deepmd/pt/train/training.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- deepmd/pt/model/descriptor/repflow_layer.py
🧰 Additional context used
🧬 Code Graph Analysis (1)
deepmd/pt/train/training.py (1)
source/tests/pt/model/test_model.py (1)
step(414-416)
🪛 Ruff (0.8.2)
deepmd/pt/model/descriptor/dpa3.py
473-476: Use ternary operator atype = extended_atype[:, :nloc] if comm_dict is None else extended_atype instead of if-else-block
Replace if-else-block with atype = extended_atype[:, :nloc] if comm_dict is None else extended_atype
(SIM108)
deepmd/pt/model/descriptor/repflows.py
464-464: Loop control variable idx not used within loop body
Rename unused idx to _idx
(B007)
⏰ Context from checks skipped due to timeout of 90000ms (30)
- GitHub Check: Test Python (6, 3.12)
- GitHub Check: Test Python (6, 3.9)
- GitHub Check: Test Python (5, 3.12)
- GitHub Check: Test Python (5, 3.9)
- GitHub Check: Test Python (4, 3.12)
- GitHub Check: Test Python (4, 3.9)
- GitHub Check: Test Python (3, 3.12)
- GitHub Check: Build wheels for cp310-manylinux_aarch64
- GitHub Check: Build C++ (clang, clang)
- GitHub Check: Build wheels for cp311-win_amd64
- GitHub Check: Test Python (3, 3.9)
- GitHub Check: Build wheels for cp311-macosx_arm64
- GitHub Check: Build C++ (rocm, rocm)
- GitHub Check: Test Python (2, 3.12)
- GitHub Check: Build wheels for cp311-macosx_x86_64
- GitHub Check: Build C++ (cuda120, cuda)
- GitHub Check: Test Python (2, 3.9)
- GitHub Check: Analyze (python)
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Build C++ (cuda, cuda)
- GitHub Check: Test Python (1, 3.12)
- GitHub Check: Build C library (2.14, >=2.5.0rc0,<2.15, libdeepmd_c_cu11.tar.gz)
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Analyze (javascript-typescript)
- GitHub Check: Test C++ (false)
- GitHub Check: Build C++ (cpu, cpu)
- GitHub Check: Test Python (1, 3.9)
- GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
- GitHub Check: Analyze (c-cpp)
- GitHub Check: Test C++ (true)
🔇 Additional comments (2)
deepmd/pt/model/descriptor/dpa3.py (1)
479-486: Passnode_ebd_inpafter slicingAfter fixing the slice, keep the call unchanged; no further code change is required.
If you decide to support[nall]in the future, remove the shape assertion insiderepflowsinstead.deepmd/pt/model/descriptor/repflows.py (1)
458-464: Gathering withmapping– ensure dtype/device compatibility
torch.gatherwill fail ifmappingis on CPU whilenlistis on GPU (or vice-versa).
Earlier code expanded tensors on the same device. Please confirmmappingis already on the samedeviceand usestorch.long; otherwise add:mapping = mapping.to(nlist.device, dtype=torch.long)🧰 Tools
🪛 Ruff (0.8.2)
464-464: Loop control variable
idxnot used within loop bodyRename unused
idxto_idx(B007)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
deepmd/pt/model/descriptor/repflows.py (1)
462-462: Consider renaming unused loop variable.The loop control variable
idxis not used within the loop body. Consider renaming it to_or_idxto indicate it's intentionally unused.-for idx, ll in enumerate(self.layers): +for _, ll in enumerate(self.layers):🧰 Tools
🪛 Ruff (0.8.2)
462-462: Loop control variable
idxnot used within loop bodyRename unused
idxto_idx(B007)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
deepmd/pt/model/descriptor/repflows.py(4 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
deepmd/pt/model/descriptor/repflows.py
462-462: Loop control variable idx not used within loop body
Rename unused idx to _idx
(B007)
⏰ Context from checks skipped due to timeout of 90000ms (30)
- GitHub Check: Test Python (6, 3.12)
- GitHub Check: Test Python (6, 3.9)
- GitHub Check: Test Python (5, 3.12)
- GitHub Check: Test Python (5, 3.9)
- GitHub Check: Test Python (4, 3.12)
- GitHub Check: Test Python (4, 3.9)
- GitHub Check: Test Python (3, 3.12)
- GitHub Check: Test Python (3, 3.9)
- GitHub Check: Test Python (2, 3.12)
- GitHub Check: Test Python (2, 3.9)
- GitHub Check: Test Python (1, 3.12)
- GitHub Check: Test Python (1, 3.9)
- GitHub Check: Build wheels for cp310-manylinux_aarch64
- GitHub Check: Build wheels for cp311-win_amd64
- GitHub Check: Build wheels for cp311-macosx_arm64
- GitHub Check: Build C++ (clang, clang)
- GitHub Check: Build wheels for cp311-macosx_x86_64
- GitHub Check: Analyze (python)
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Build C++ (rocm, rocm)
- GitHub Check: Analyze (javascript-typescript)
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Build C++ (cuda120, cuda)
- GitHub Check: Build C library (2.14, >=2.5.0rc0,<2.15, libdeepmd_c_cu11.tar.gz)
- GitHub Check: Test C++ (false)
- GitHub Check: Analyze (c-cpp)
- GitHub Check: Build C++ (cuda, cuda)
- GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
- GitHub Check: Test C++ (true)
- GitHub Check: Build C++ (cpu, cpu)
🔇 Additional comments (5)
deepmd/pt/model/descriptor/repflows.py (5)
376-376: Good change in the function signature.The parameter
extended_atype_embdis now required (non-optional) in the function signature, which aligns with the PR's memory optimization goal by requiring the embedding tensor upfront.
402-404: Simplifying code by direct assignment.The direct assignment of
extended_atype_embdtoatype_embdwith a shape check is cleaner than the previous implementation and matches the PR's goal of improving data flow.
456-461: Great memory optimization approach.Setting
node_ebd_ext = Noneand remappingnlistdirectly is a key part of reducing memory usage. This matches the PR objective of avoiding the creation of large intermediate tensors, helping to reduce peak memory fromO(nlayer * nf * nall * ndim)toO(nlayer * nf * nloc + 1 * nf * nall).
465-465: Effective conditional approach to tensor creation.Only creating and populating
node_ebd_extwhencomm_dictis not None is an important optimization that avoids unnecessary memory usage when not needed.
523-523: Properly adjusted layer forward call.The explicit passing of
node_ebdas the first argument to each layer'sforwardmethod aligns with the reordering of operations described in the PR objectives - enabling the gather-then-matmul approach instead of matmul-then-gather.
for more information, see https://pre-commit.ci
… into fix-node-ebd-ext
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🔭 Outside diff range comments (1)
deepmd/dpmodel/descriptor/repflows.py (1)
174-206: 🛠️ Refactor suggestion
use_ext_ebdis stored but never referencedThe NumPy implementation now accepts
use_ext_ebdbut the flag is
not used anywhere in the class (unlike the Torch counterpart). This
creates behavioural drift between the two back-ends.Consider wiring the flag into
call()the same way the Torch version
does, or remove the argument to avoid a misleading API.
🧹 Nitpick comments (3)
deepmd/pt/model/descriptor/repflows.py (1)
469-476: Unused loop variableidx— rename for clarity
for idx, ll in enumerate(self.layers):never usesidx.
Rename to_or dropenumerate:-for idx, ll in enumerate(self.layers): +for ll in self.layers:This silences Ruff warning B007 and improves readability.
🧰 Tools
🪛 Ruff (0.8.2)
469-469: Loop control variable
idxnot used within loop bodyRename unused
idxto_idx(B007)
deepmd/dpmodel/descriptor/repflows.py (1)
491-497: Minor: drop unusedidxin layer loop
for idx, ll in enumerate(self.layers):–idxis never read.
Replace withfor ll in self.layers:to avoid the dead variable.🧰 Tools
🪛 Ruff (0.8.2)
492-492: Loop control variable
idxnot used within loop bodyRename unused
idxto_idx(B007)
source/tests/pt/model/test_dpa3.py (1)
48-70: Test Cartesian explosion – consider parametrisation throttlingAdding
use_ext_ebddoubles the existing cartesian product of
parameters. The outer loop already iterates over ≈ 200 combinations;
doubling it may push CI runtime over limits.If exhaustive coverage is unnecessary, mark a reduced subset with
pytest.mark.parametrizeor filter combinations (e.g. run
use_ext_ebd=Trueonly for one representative setting).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
deepmd/dpmodel/descriptor/repflows.py(2 hunks)deepmd/pt/model/descriptor/dpa3.py(6 hunks)deepmd/pt/model/descriptor/repflows.py(6 hunks)deepmd/utils/argcheck.py(1 hunks)source/tests/pt/model/test_dpa3.py(3 hunks)
✅ Files skipped from review due to trivial changes (1)
- deepmd/utils/argcheck.py
🚧 Files skipped from review as they are similar to previous changes (1)
- deepmd/pt/model/descriptor/dpa3.py
🧰 Additional context used
🪛 Ruff (0.8.2)
deepmd/pt/model/descriptor/repflows.py
469-469: Loop control variable idx not used within loop body
Rename unused idx to _idx
(B007)
⏰ Context from checks skipped due to timeout of 90000ms (30)
- GitHub Check: Test Python (6, 3.12)
- GitHub Check: Test Python (6, 3.9)
- GitHub Check: Test Python (5, 3.12)
- GitHub Check: Test Python (5, 3.9)
- GitHub Check: Test Python (4, 3.12)
- GitHub Check: Test Python (4, 3.9)
- GitHub Check: Build wheels for cp310-manylinux_aarch64
- GitHub Check: Test Python (3, 3.12)
- GitHub Check: Test Python (3, 3.9)
- GitHub Check: Build wheels for cp311-win_amd64
- GitHub Check: Build C++ (clang, clang)
- GitHub Check: Test Python (2, 3.12)
- GitHub Check: Build wheels for cp311-macosx_arm64
- GitHub Check: Build C++ (rocm, rocm)
- GitHub Check: Test Python (2, 3.9)
- GitHub Check: Build wheels for cp311-macosx_x86_64
- GitHub Check: Test C++ (false)
- GitHub Check: Build C++ (cuda120, cuda)
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Test Python (1, 3.12)
- GitHub Check: Analyze (python)
- GitHub Check: Test Python (1, 3.9)
- GitHub Check: Test C++ (true)
- GitHub Check: Analyze (javascript-typescript)
- GitHub Check: Build C library (2.14, >=2.5.0rc0,<2.15, libdeepmd_c_cu11.tar.gz)
- GitHub Check: Build C++ (cuda, cuda)
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
- GitHub Check: Analyze (c-cpp)
- GitHub Check: Build C++ (cpu, cpu)
🔇 Additional comments (1)
deepmd/pt/model/descriptor/repflows.py (1)
404-408: Shape assertion may be too strict after removing the slice
atype_embd = extended_atype_embdfollowed by
assert list(atype_embd.shape) == [nframes, nloc, self.n_dim]If
extended_atype_embdstill contains embeddings for all atoms (nall),
the assertion will fail. Either restore the slice ([:, :nloc, :]) or relax
the assertion:-assert list(atype_embd.shape) == [nframes, nloc, self.n_dim] +assert atype_embd.shape[0] == nframes and atype_embd.shape[-1] == self.n_dim +atype_embd = atype_embd[:, :nloc, :] # ensure local part
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
deepmd/pt/model/descriptor/repflows.py (1)
469-469: Rename unused loop variable.The loop control variable
idxis not used within the loop body.-for idx, ll in enumerate(self.layers): +for _, ll in enumerate(self.layers):🧰 Tools
🪛 Ruff (0.8.2)
469-469: Loop control variable
idxnot used within loop bodyRename unused
idxto_idx(B007)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
deepmd/dpmodel/descriptor/dpa3.py(7 hunks)deepmd/pt/model/descriptor/repflows.py(6 hunks)source/tests/pt/model/test_dpa3.py(4 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- source/tests/pt/model/test_dpa3.py
- deepmd/dpmodel/descriptor/dpa3.py
🧰 Additional context used
🪛 Ruff (0.8.2)
deepmd/pt/model/descriptor/repflows.py
469-469: Loop control variable idx not used within loop body
Rename unused idx to _idx
(B007)
⏰ Context from checks skipped due to timeout of 90000ms (29)
- GitHub Check: Build wheels for cp310-manylinux_aarch64
- GitHub Check: Test Python (6, 3.12)
- GitHub Check: Build wheels for cp311-win_amd64
- GitHub Check: Test Python (6, 3.9)
- GitHub Check: Build wheels for cp311-macosx_arm64
- GitHub Check: Test Python (5, 3.12)
- GitHub Check: Build wheels for cp311-macosx_x86_64
- GitHub Check: Test Python (5, 3.9)
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Test Python (4, 3.12)
- GitHub Check: Test Python (4, 3.9)
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Test Python (3, 3.12)
- GitHub Check: Test Python (3, 3.9)
- GitHub Check: Test Python (2, 3.12)
- GitHub Check: Test Python (2, 3.9)
- GitHub Check: Test Python (1, 3.12)
- GitHub Check: Test Python (1, 3.9)
- GitHub Check: Build C++ (clang, clang)
- GitHub Check: Build C++ (rocm, rocm)
- GitHub Check: Build C++ (cuda120, cuda)
- GitHub Check: Build C++ (cuda, cuda)
- GitHub Check: Build C++ (cpu, cpu)
- GitHub Check: Test C++ (false)
- GitHub Check: Test C++ (true)
- GitHub Check: Build C library (2.14, >=2.5.0rc0,<2.15, libdeepmd_c_cu11.tar.gz)
- GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
- GitHub Check: Analyze (python)
- GitHub Check: Analyze (c-cpp)
🔇 Additional comments (4)
deepmd/pt/model/descriptor/repflows.py (4)
188-188: Good feature addition to enhance memory efficiency.Adding this flag to control embedding handling is a clean way to implement the optimization described in the PR.
456-468: Reversed gather and matmul order as intended.This is the core of the optimization described in the PR. By performing the gather operation first, we reduce the size of the tensors involved in matrix multiplication operations, which reduces memory usage while preserving numerical results.
472-476: Conditional computation of node_ebd_ext reduces memory usage.This conditional approach ensures that
node_ebd_extis only computed when needed, rather than always maintaining a large tensor throughout the computation. This directly addresses the PR objective of reducing peak memory usage.
534-537: Updated method call signature to support the optimization.The explicit passing of
node_ebdas the first argument aligns with changes in theRepFlowLayer.forwardmethod.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
deepmd/pt/model/descriptor/repflows.py (1)
469-469: Unused loop control variable.The loop control variable
idxis not used within the loop body.-for idx, ll in enumerate(self.layers): +for _, ll in enumerate(self.layers):🧰 Tools
🪛 Ruff (0.8.2)
469-469: Loop control variable
idxnot used within loop bodyRename unused
idxto_idx(B007)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
deepmd/pt/model/descriptor/repflows.py(6 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
deepmd/pt/model/descriptor/repflows.py
469-469: Loop control variable idx not used within loop body
Rename unused idx to _idx
(B007)
⏰ Context from checks skipped due to timeout of 90000ms (29)
- GitHub Check: Test Python (6, 3.12)
- GitHub Check: Test Python (6, 3.9)
- GitHub Check: Test Python (5, 3.12)
- GitHub Check: Test Python (5, 3.9)
- GitHub Check: Test Python (4, 3.12)
- GitHub Check: Test Python (4, 3.9)
- GitHub Check: Test Python (3, 3.12)
- GitHub Check: Test Python (3, 3.9)
- GitHub Check: Test Python (2, 3.12)
- GitHub Check: Test Python (2, 3.9)
- GitHub Check: Test Python (1, 3.12)
- GitHub Check: Test Python (1, 3.9)
- GitHub Check: Build wheels for cp310-manylinux_aarch64
- GitHub Check: Build wheels for cp311-win_amd64
- GitHub Check: Build C++ (clang, clang)
- GitHub Check: Build C++ (rocm, rocm)
- GitHub Check: Build wheels for cp311-macosx_arm64
- GitHub Check: Build wheels for cp311-macosx_x86_64
- GitHub Check: Build C++ (cuda120, cuda)
- GitHub Check: Analyze (python)
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Build C++ (cuda, cuda)
- GitHub Check: Test C++ (false)
- GitHub Check: Build C library (2.14, >=2.5.0rc0,<2.15, libdeepmd_c_cu11.tar.gz)
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Build C++ (cpu, cpu)
- GitHub Check: Analyze (c-cpp)
- GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
- GitHub Check: Test C++ (true)
🔇 Additional comments (5)
deepmd/pt/model/descriptor/repflows.py (5)
188-188: LGTM: New parameter added for memory optimization.The new
use_ext_ebdparameter is well-defined with a default value ofFalseto maintain backward compatibility while enabling the performance optimization described in the PR.
219-219: LGTM: Instance attribute properly stored.The flag is correctly stored as an instance attribute for later use in the forward method.
456-460: LGTM: Conditional expansion of mapping.This code block properly initializes the 3D mapping tensor for extended embeddings only when needed, which aligns with the optimization goal.
473-477: LGTM: Core optimization implemented properly.This is the key change described in the PR - gathering node embeddings before matrix multiplication to reduce peak memory usage. The code correctly implements the optimization by gathering
node_ebdintonode_ebd_extwhenuse_ext_ebdis True.
535-547: LGTM: Method call updated appropriately.The forward method call has been updated to explicitly pass
node_ebdas the first argument, consistent with the changes in theRepFlowLayerclass.
|
iProzd@28803d9 maybe you can refer to and be consistent with this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🔭 Outside diff range comments (1)
deepmd/pt/model/descriptor/dpa3.py (1)
100-116:⚠️ Potential issue
use_loc_mappingparameter missing – constructor raisesNameError
self.use_loc_mapping = use_loc_mapping(line 165) references a variable that
is not in the function signature (onlyuse_ext_ebdwas added). Importing
the module therefore fails immediately.- use_tebd_bias: bool = False, - type_map: Optional[list[str]] = None, - use_ext_ebd: bool = False, + use_tebd_bias: bool = False, + type_map: Optional[list[str]] = None, + use_ext_ebd: bool = False, + use_loc_mapping: bool = True,Remember to propagate the new flag to:
serialize/deserialize- the tests (already parameterised)
🧰 Tools
🪛 Ruff (0.11.9)
107-107: Do not use mutable data structures for argument defaults
Replace with
None; initialize within function(B006)
♻️ Duplicate comments (1)
deepmd/pt/model/descriptor/dpa3.py (1)
476-485:⚠️ Potential issueStill passes
nall-sized embeddings torepflowsWhen
comm_dict is not Noneanduse_ext_ebdisFalse,
atype = extended_atypekeeps shape[nf, nall], violating the assertion
insiderepflows.forward([nf, nloc, n_dim]).
This was flagged in a previous review but not fully resolved.Simplest fix → always slice to local atoms:
- if comm_dict is None or self.use_ext_ebd: - atype = extended_atype[:, :nloc] - else: - atype = extended_atype + atype = extended_atype[:, :nloc]
node_ebd_extis already reconstructed insiderepflows, so no functionality
is lost.🧰 Tools
🪛 Ruff (0.11.9)
476-476: Local variable
parrallel_modeis assigned to but never usedRemove assignment to unused variable
parrallel_mode(F841)
🧹 Nitpick comments (5)
deepmd/pt/model/network/utils.py (2)
101-103: Memory blow-up when building 4-D angle mask
a_nlist_mask_3d = a_nlist_mask[:, :, :, None] & a_nlist_mask[:, :, None, :]
creates a tensor of shapenf × nloc × a_nnei × a_nnei; for typical
water-sized systems withnloc=10 kanda_nnei=64this is already ~320 MB
(BOOL). The subsequent 4-D dihedral mask scales asd_nnei³.Consider:
- building the indices with
torch.nonzeroon the 2-D mask to avoid the
O(N²) expansion, or- restricting it to the actually used neighbor count (
torch.sum(mask_i)).
60-66: Docstring missesuse_loc_mappingargument
get_graph_indexacceptsuse_loc_mapping, but the parameter list and
description omit it. Please add for clarity and to keep sphinx/autodoc happy.deepmd/pt/model/descriptor/dpa3.py (1)
476-478: Remove unused variableparrallel_modeThe assignment is never referenced and triggers Ruff F841.
- parrallel_mode = comm_dict is not None🧰 Tools
🪛 Ruff (0.11.9)
476-476: Local variable
parrallel_modeis assigned to but never usedRemove assignment to unused variable
parrallel_mode(F841)
deepmd/pt/model/descriptor/repflows.py (2)
469-469: Rename unused loop variable.The loop control variable
idxis not used within the loop body. This was also flagged by static analysis.-for idx, ll in enumerate(self.layers): +for _, ll in enumerate(self.layers):Or simply remove the
enumerateif the index isn't needed:-for idx, ll in enumerate(self.layers): +for ll in self.layers:🧰 Tools
🪛 Ruff (0.11.9)
469-469: Loop control variable
idxnot used within loop bodyRename unused
idxto_idx(B007)
464-468: Consider a more robust approach for the gather operation.The current implementation reshapes both
mappingandnlistto 2D before gathering, then reshapes back. This approach works but might be susceptible to dimension mismatches if tensor shapes change in the future.A more robust approach would be to use 3D tensors with proper dimension expansion:
- nlist = torch.gather( - mapping.reshape(nframes, -1), - 1, - nlist.reshape(nframes, -1), - ).reshape(nlist.shape) + # Gather directly on 3D tensor, preserving dimensions + nlist = mapping.gather( + dim=1, + index=nlist.unsqueeze(-1).expand(-1, -1, -1, self.n_dim) + ).squeeze(-1)This makes the dimensional requirements explicit and less prone to subtle errors.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
deepmd/pt/model/descriptor/dpa3.py(6 hunks)deepmd/pt/model/descriptor/repflows.py(6 hunks)deepmd/pt/model/network/utils.py(1 hunks)deepmd/utils/argcheck.py(1 hunks)source/tests/pt/model/test_loc_mapping.py(1 hunks)source/tests/universal/dpmodel/descriptor/test_descriptor.py(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- deepmd/utils/argcheck.py
🧰 Additional context used
🧬 Code Graph Analysis (1)
deepmd/pt/model/network/utils.py (2)
deepmd/pt/model/descriptor/dpa3.py (1)
forward(436-503)deepmd/pt/model/descriptor/repflows.py (1)
forward(373-555)
🪛 Ruff (0.11.9)
deepmd/pt/model/descriptor/repflows.py
469-469: Loop control variable idx not used within loop body
Rename unused idx to _idx
(B007)
deepmd/pt/model/descriptor/dpa3.py
165-165: Undefined name use_loc_mapping
(F821)
476-476: Local variable parrallel_mode is assigned to but never used
Remove assignment to unused variable parrallel_mode
(F841)
⏰ Context from checks skipped due to timeout of 90000ms (29)
- GitHub Check: Test Python (6, 3.12)
- GitHub Check: Test Python (6, 3.9)
- GitHub Check: Test Python (5, 3.12)
- GitHub Check: Test Python (5, 3.9)
- GitHub Check: Build wheels for cp310-manylinux_aarch64
- GitHub Check: Test Python (4, 3.12)
- GitHub Check: Build wheels for cp311-win_amd64
- GitHub Check: Test Python (4, 3.9)
- GitHub Check: Build wheels for cp311-macosx_arm64
- GitHub Check: Test Python (3, 3.12)
- GitHub Check: Build wheels for cp311-macosx_x86_64
- GitHub Check: Test Python (3, 3.9)
- GitHub Check: Build C++ (clang, clang)
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Analyze (python)
- GitHub Check: Test Python (2, 3.12)
- GitHub Check: Build C++ (rocm, rocm)
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Test Python (2, 3.9)
- GitHub Check: Build C++ (cuda120, cuda)
- GitHub Check: Build C library (2.14, >=2.5.0rc0,<2.15, libdeepmd_c_cu11.tar.gz)
- GitHub Check: Test C++ (false)
- GitHub Check: Test Python (1, 3.12)
- GitHub Check: Build C++ (cuda, cuda)
- GitHub Check: Analyze (c-cpp)
- GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
- GitHub Check: Test C++ (true)
- GitHub Check: Test Python (1, 3.9)
- GitHub Check: Build C++ (cpu, cpu)
🔇 Additional comments (7)
deepmd/pt/model/descriptor/dpa3.py (1)
384-385:serializeshould store the new mapping flag
use_loc_mapping(once added) must be saved and restored to maintain model
behaviour after reload."use_ext_ebd": self.use_ext_ebd, + "use_loc_mapping": self.use_loc_mapping,(And load it in
deserialize).source/tests/universal/dpmodel/descriptor/test_descriptor.py (1)
547-551: Test parameterisation will fail until constructor acceptsuse_loc_mappingThe newly added
"use_loc_mapping": (True, False)expands the cartesian
product, but theDescrptDPA3constructor currently lacks the argument (see
error above). Once the constructor is fixed the tests should pass.source/tests/pt/model/test_loc_mapping.py (1)
1-259: Great comprehensive test suite for local mapping functionality.The test file is well-structured with thorough test coverage across multiple parameter combinations. It correctly validates that outputs remain consistent whether local mapping is enabled or disabled, which is essential for ensuring backward compatibility while optimizing the code.
deepmd/pt/model/descriptor/repflows.py (4)
456-476: Performance optimization aligns with PR objectives.The implementation of the
use_ext_ebdflag and the associated logic changes successfully implement the performance optimization described in the PR objectives. By gathering node embeddings before performing matrix multiplication operations, the code reduces memory usage from O(nlayer * nf * nall * ndim) to O(nlayer * nf * nloc), which can be significant whennallis much larger thannloc.🧰 Tools
🪛 Ruff (0.11.9)
469-469: Loop control variable
idxnot used within loop bodyRename unused
idxto_idx(B007)
188-188: Good default for backward compatibility.Setting
use_ext_ebd=Falseas the default value ensures backward compatibility with existing code.
404-407: Simplification of the embedding handling.The changes to the
atype_embdassignment simplify the code by removing unnecessary conditional logic and slicing.
537-538: Explicit parameter passing improves code clarity.Explicitly passing
node_ebdas the first argument to the layer'sforwardmethod improves code clarity and maintainability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
deepmd/pt/model/descriptor/dpa3.py (3)
165-165: Redundant assignment ofuse_loc_mappingThe
use_loc_mappingflag is already assigned on line 128, making this second assignment redundant.- self.use_loc_mapping = use_loc_mapping
476-476: Remove unused variableparrallel_modeThe variable
parrallel_modeis assigned but never used, as also flagged by static analysis.- parrallel_mode = comm_dict is not None🧰 Tools
🪛 Ruff (0.11.9)
476-476: Local variable
parrallel_modeis assigned to but never usedRemove assignment to unused variable
parrallel_mode(F841)
487-487: Explicit call toforwardmethodChanged from
self.repflows(...)toself.repflows.forward(...). While functionally equivalent, this style change makes the code more explicit.You could revert to using
self.repflows(...)for consistency with other code, as Python's method resolution should handle this correctly.deepmd/pt/model/descriptor/repflows.py (1)
469-469: Unused loop variableThe loop control variable
idxis not used within the loop body.- for idx, ll in enumerate(self.layers): + for _, ll in enumerate(self.layers):or simply:
- for idx, ll in enumerate(self.layers): + for ll in self.layers:🧰 Tools
🪛 Ruff (0.11.9)
469-469: Loop control variable
idxnot used within loop bodyRename unused
idxto_idx(B007)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
deepmd/dpmodel/descriptor/dpa3.py(7 hunks)deepmd/dpmodel/descriptor/repflows.py(3 hunks)deepmd/pt/model/descriptor/dpa3.py(6 hunks)deepmd/pt/model/descriptor/repflows.py(6 hunks)deepmd/utils/argcheck.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- deepmd/utils/argcheck.py
- deepmd/dpmodel/descriptor/repflows.py
- deepmd/dpmodel/descriptor/dpa3.py
🧰 Additional context used
🪛 Ruff (0.11.9)
deepmd/pt/model/descriptor/dpa3.py
476-476: Local variable parrallel_mode is assigned to but never used
Remove assignment to unused variable parrallel_mode
(F841)
deepmd/pt/model/descriptor/repflows.py
469-469: Loop control variable idx not used within loop body
Rename unused idx to _idx
(B007)
⏰ Context from checks skipped due to timeout of 90000ms (29)
- GitHub Check: Test Python (6, 3.12)
- GitHub Check: Test Python (6, 3.9)
- GitHub Check: Test Python (5, 3.12)
- GitHub Check: Test Python (5, 3.9)
- GitHub Check: Test Python (4, 3.12)
- GitHub Check: Test Python (4, 3.9)
- GitHub Check: Test Python (3, 3.12)
- GitHub Check: Test Python (3, 3.9)
- GitHub Check: Test Python (2, 3.12)
- GitHub Check: Test Python (2, 3.9)
- GitHub Check: Test Python (1, 3.12)
- GitHub Check: Test Python (1, 3.9)
- GitHub Check: Build wheels for cp310-manylinux_aarch64
- GitHub Check: Build wheels for cp311-win_amd64
- GitHub Check: Build C++ (clang, clang)
- GitHub Check: Build wheels for cp311-macosx_arm64
- GitHub Check: Build C++ (rocm, rocm)
- GitHub Check: Build wheels for cp311-macosx_x86_64
- GitHub Check: Analyze (python)
- GitHub Check: Build C++ (cuda120, cuda)
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Build C library (2.14, >=2.5.0rc0,<2.15, libdeepmd_c_cu11.tar.gz)
- GitHub Check: Build C++ (cuda, cuda)
- GitHub Check: Test C++ (false)
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
- GitHub Check: Build C++ (cpu, cpu)
- GitHub Check: Analyze (c-cpp)
- GitHub Check: Test C++ (true)
🔇 Additional comments (10)
deepmd/pt/model/descriptor/dpa3.py (4)
94-95: Added new parameteruse_loc_mappingfor memory optimizationThis new parameter enables the optimization described in PR #4744, allowing the descriptor to gather node embeddings before the matrix multiplication operation. This reduces peak memory usage, making it independent of the number of expanded atoms (
nall).Also applies to: 114-114
128-128: Constructor consistently propagates theuse_loc_mappingflagThe flag is properly stored as an instance attribute and passed to the
DescrptBlockRepflowsconstructor, ensuring consistent behavior between the descriptor and its components.Also applies to: 157-157
384-384: Appropriate serialization of theuse_loc_mappingflagThis ensures the flag is preserved when saving and loading models.
480-485: Conditional selection of atom types based on mapping modeThis change is at the core of the optimization, making the atom type selection depend on both the communication dictionary and the mapping mode. When
use_loc_mappingisTrue, it avoids passing the fullnallembeddings and prevents runtime assertion errors during distributed or GPU inference.The removal of the
nallvariable is also appropriate since it's no longer used.deepmd/pt/model/descriptor/repflows.py (6)
186-186: Addeduse_loc_mappingparameter to control node embedding gatheringThis parameter, defaulting to
True, controls whether local mapping and remapping of neighbor lists and embeddings are applied. This is a key part of the memory optimization strategy described in PR #4744.Also applies to: 219-219
378-378: Renamed parameter for clarityRenamed parameter from an implicit input to an explicit
extended_atype_embd, making the function signature clearer about what's being passed.
404-406: Simplified variable assignment and assertionDirectly assigns
extended_atype_embdtoatype_embdand asserts the correct shape. This is cleaner than the previous conditional assignment.
456-468: Core optimization: conditionally gather node embeddingsThis is the heart of the optimization described in PR #4744. When
use_loc_mappingisFalse, it reshapes and expands the mapping and gathers node embeddings before the matrix multiplication. This reduces memory usage by avoiding large intermediate tensors, especially when using a large cut-off radius.
473-477: Conditional node embedding gathering in loopExtends the conditional logic to the loop, ensuring consistent behavior based on the
use_loc_mappingflag. Whenuse_loc_mappingisTrueandcomm_dictisNone,node_ebd_extis set toNone, avoiding unnecessary memory usage.
537-537: Explicit parameter in layer forward callChanged from implicit positional parameter to explicitly named
node_ebd. This makes the code more readable and maintainable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
deepmd/pt/model/descriptor/dpa3.py (2)
128-128: Redundant assignment ofuse_loc_mapping.The
use_loc_mappingattribute is assigned twice in the constructor (lines 128 and 165), which is unnecessary.- self.use_loc_mapping = use_loc_mapping self.repflow_args = init_subclass_params(repflow, RepFlowArgs) self.activation_function = activation_functionAlso applies to: 165-165
476-484: Consider using a ternary operator for code simplification.The current implementation introduces a new variable
parallel_modebut the conditional assignment could be simplified using a ternary operator.- parallel_mode = comm_dict is not None - # nall = extended_coord.view(nframes, -1).shape[1] // 3 - if parallel_mode: - atype = extended_atype - else: - atype = extended_atype[:, :nloc] + # Cast the input to internal precision + extended_coord = extended_coord.to(dtype=self.prec) + nframes, nloc, nnei = nlist.shape + atype = extended_atype if comm_dict is not None else extended_atype[:, :nloc]🧰 Tools
🪛 Ruff (0.11.9)
481-484: Use ternary operator
atype = extended_atype if parallel_mode else extended_atype[:, :nloc]instead ofif-else-blockReplace
if-else-block withatype = extended_atype if parallel_mode else extended_atype[:, :nloc](SIM108)
deepmd/pt/model/descriptor/repflows.py (1)
467-467: Unused loop control variable.The loop control variable
idxis not used in the loop body.- for idx, ll in enumerate(self.layers): + for _, ll in enumerate(self.layers):or alternatively:
- for idx, ll in enumerate(self.layers): + for ll in self.layers:🧰 Tools
🪛 Ruff (0.11.9)
467-467: Loop control variable
idxnot used within loop bodyRename unused
idxto_idx(B007)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
deepmd/pt/model/descriptor/dpa3.py(6 hunks)deepmd/pt/model/descriptor/repflows.py(6 hunks)source/tests/pt/model/test_loc_mapping.py(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- source/tests/pt/model/test_loc_mapping.py
🧰 Additional context used
🪛 Ruff (0.11.9)
deepmd/pt/model/descriptor/repflows.py
467-467: Loop control variable idx not used within loop body
Rename unused idx to _idx
(B007)
deepmd/pt/model/descriptor/dpa3.py
481-484: Use ternary operator atype = extended_atype if parallel_mode else extended_atype[:, :nloc] instead of if-else-block
Replace if-else-block with atype = extended_atype if parallel_mode else extended_atype[:, :nloc]
(SIM108)
⏰ Context from checks skipped due to timeout of 90000ms (29)
- GitHub Check: Test Python (6, 3.12)
- GitHub Check: Test Python (6, 3.9)
- GitHub Check: Test Python (5, 3.12)
- GitHub Check: Test Python (5, 3.9)
- GitHub Check: Test Python (4, 3.12)
- GitHub Check: Test Python (4, 3.9)
- GitHub Check: Build wheels for cp310-manylinux_aarch64
- GitHub Check: Test Python (3, 3.12)
- GitHub Check: Build wheels for cp311-win_amd64
- GitHub Check: Test Python (3, 3.9)
- GitHub Check: Build wheels for cp311-macosx_arm64
- GitHub Check: Build C++ (clang, clang)
- GitHub Check: Build wheels for cp311-macosx_x86_64
- GitHub Check: Test Python (2, 3.12)
- GitHub Check: Build C++ (rocm, rocm)
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Analyze (python)
- GitHub Check: Test Python (2, 3.9)
- GitHub Check: Build C++ (cuda120, cuda)
- GitHub Check: Build C++ (cuda, cuda)
- GitHub Check: Test C++ (false)
- GitHub Check: Test Python (1, 3.12)
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Build C library (2.14, >=2.5.0rc0,<2.15, libdeepmd_c_cu11.tar.gz)
- GitHub Check: Test Python (1, 3.9)
- GitHub Check: Analyze (c-cpp)
- GitHub Check: Build C++ (cpu, cpu)
- GitHub Check: Test C++ (true)
- GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
🔇 Additional comments (10)
deepmd/pt/model/descriptor/dpa3.py (5)
114-114: New parameter added correctly.The new
use_loc_mappingparameter with default value ofTruemaintains backward compatibility while enabling the memory optimization mentioned in the PR objectives.
157-157: Propagating parameter toDescrptBlockRepflowsconstructor.Correctly passing the
use_loc_mappingparameter to theDescrptBlockRepflowsconstructor ensures consistent behavior.
384-384: Serialization updated correctly.Adding
use_loc_mappingto the serialized data ensures the parameter is preserved when saving/loading models.
481-481: Unused variable removed.Removing the unused
nallvariable improves code clarity and reduces potential confusion.🧰 Tools
🪛 Ruff (0.11.9)
481-484: Use ternary operator
atype = extended_atype if parallel_mode else extended_atype[:, :nloc]instead ofif-else-blockReplace
if-else-block withatype = extended_atype if parallel_mode else extended_atype[:, :nloc](SIM108)
487-487: Method invocation style improved.Using explicit
.forward()method call instead of implicit invocation improves code readability and makes the intent clearer.deepmd/pt/model/descriptor/repflows.py (5)
186-186: New parameter added correctly.The
use_loc_mappingparameter with default value ofTruemaintains backward compatibility while enabling the memory optimization from the PR objectives.Also applies to: 219-219
378-378: Parameter handling and shape assertion changes.The code now properly accepts extended embeddings but asserts the correct shape for
atype_embd. This is a core part of the optimization that allows handling embeddings differently based on whether local mapping is used.Also applies to: 404-406
456-466: Memory optimization by changing gather order.This change implements the core optimization described in the PR objectives. When
use_loc_mappingis True, it follows the original approach. When False, it gathers node embeddings before performing operations, which reduces memory usage with large tensors.
471-475: Added fallback for missing node_ebd_ext.This conditional handles the case when
use_loc_mappingis False and allows gathering the embeddings directly from node_ebd. This completes the optimization by ensuring embeddings are available for subsequent operations.
534-534: Explicit parameter passing.Using explicit parameter naming improves code readability and makes the function call more clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR aims to improve the performance and peak memory usage of the descriptor computation by gathering the node embeddings prior to matrix multiplication. Key changes include:
- Adding a new boolean parameter “use_loc_mapping” throughout the codebase.
- Updating several modules (tests, argcheck, repflows, repflow_layer, dpa3) to conditionally use local mapping for reduced memory overhead.
- Adjusting the logic in both PyTorch and DeepMD descriptor implementations to support the new local mapping option.
Reviewed Changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| source/tests/universal/dpmodel/descriptor/test_descriptor.py | Updated tests to include “use_loc_mapping” parameter. |
| source/tests/pt/model/test_loc_mapping.py | New test file to verify behavior with and without local mapping. |
| deepmd/utils/argcheck.py | Added argument for “use_loc_mapping”. |
| deepmd/pt/model/descriptor/repflows.py | Updated the forward pass to handle local mapping flag conditionally. |
| deepmd/pt/model/descriptor/repflow_layer.py | Refactored parameter names and branch logic for local mapping. |
| deepmd/pt/model/descriptor/dpa3.py | Added “use_loc_mapping” parameter and updated serialization. |
| deepmd/dpmodel/descriptor/repflows.py | Added “use_loc_mapping” propagation in the dpmodel descriptor. |
| deepmd/dpmodel/descriptor/dpa3.py | Added “use_loc_mapping” parameter and updated serialization accordingly. |
@iProzd I've modified the codes in alignment with your commit, and adopt your UTs on this PR and pass them. However other UTs are failing. I suspect that the numerical error while scattering and gathering features should be blamed. Any ideas? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you need to abort modifications in this file, which means to keep exact modifications in commit iProzd@28803d9 and I passed all the uts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you need to abort modifications in this file
Did you mean you'll submit another PR to this branch?
| # nf * nloc * nnei * node/edge_dim | ||
| sub_node_ext_update = _make_nei_g1(sub_node_ext_update, nlist) | ||
| # nf * nloc * node/edge_dim | ||
| sub_node_ext_update = torch.matmul(nei_node_ebd, node_ext) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You missed a _make_nei_g1 here. But when you use locak mapping, it's more efficient to keep the old implementation.
| "use_loc_mapping", | ||
| bool, | ||
| optional=True, | ||
| default=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a docstr for use_loc_mapping
|
Please see PR #4772. |
node_ebd_extcontains embedding on expanded atoms, which might be large for a large cut-off. Current implementation do matmul first, then gather tensor by the neighbors. This introduces savingsub_node_ext_updateof sizenf * nall * ndimin each repflow layer, wherenallmight be multiple times larger thannloc.This PR do gathering first, then compute matmul. After this PR, the peak memory size is of
O(nlayer * nf * nloc), unrelated tonall.Summary by CodeRabbit
New Features
Bug Fixes
Tests
Chores