Skip to content

Conversation

@HydrogenSulfate
Copy link
Collaborator

@HydrogenSulfate HydrogenSulfate commented Nov 11, 2025

This pull request updates PaddlePaddle dependencies to nightly builds and refactors device handling throughout the codebase to support more flexible device selection (including XPU), improves compatibility, and adds better flag management for tests. The most significant changes are grouped below.

Dependency Updates:

  • Updated PaddlePaddle and PaddlePaddle-GPU installation in .github/workflows/test_python.yml and .github/workflows/test_cuda.yml to use nightly builds (3.3.0.dev20251204) from new package URLs, improving access to the latest features and fixes. [1] [2]

Device Handling Refactor:

  • Replaced usage of paddle.device.cuda.device_count() and related CUDA-specific APIs with more general paddle.device.device_count() and paddle.device.empty_cache() in multiple locations, enabling support for devices beyond CUDA (e.g., XPU). [1] [2] [3] [4] [5]
  • Updated logic for setting and retrieving device information in deepmd/pd/utils/env.py to use paddle.device.get_device(), ensuring correct device assignment for both CPU and GPU/XPU scenarios.

Device Compatibility Improvements:

  • Enhanced get_generator in deepmd/pd/utils/utils.py to support XPU devices and added a warning for unsupported device types, improving compatibility and error messaging.

Test Flag Management:

  • Added explicit management of the FLAGS_use_stride_kernel Paddle flag in source/tests/pd/test_multitask.py to ensure proper test isolation and restore flag values after tests. [1] [2]
  • Set FLAGS_use_stride_compute_kernel environment variable to 0 in workflow files to control kernel usage during tests. [1] [2]

Distributed Training Check:

  • Improved NCCL initialization check in distributed training setup to handle cases where NCCL is not compiled, preventing assertion errors.

Copilot AI review requested due to automatic review settings November 11, 2025 12:55
Copilot finished reviewing on behalf of HydrogenSulfate November 11, 2025 12:59
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request updates the PaddlePaddle backend to use generic device management APIs instead of CUDA-specific ones, improving compatibility with different hardware backends like XPU. The changes replace paddle.device.cuda.* calls with paddle.device.* equivalents and enhance device selection logic.

  • Replaced CUDA-specific API calls with generic device APIs across multiple files
  • Enhanced get_generator function to support XPU devices and handle unsupported device types gracefully
  • Modified device selection in env.py to use paddle.device.get_device() instead of explicit GPU-LOCAL_RANK construction

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
source/tests/pd/conftest.py Updated cache clearing to use generic paddle.device.empty_cache() instead of CUDA-specific version
deepmd/pd/utils/utils.py Added XPU generator support and warning handling for unsupported devices; added warnings import
deepmd/pd/utils/env.py Changed device selection to use paddle.device.get_device() and paddle.device.device_count() for multi-backend support
deepmd/pd/utils/auto_batch_size.py Updated device availability check and cache clearing to generic APIs
deepmd/pd/entrypoints/main.py Modified NCCL check to support non-NCCL backends and updated device count API

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 11, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

Replaced CUDA-specific Paddle calls with device-agnostic APIs, added XPU generator support, relaxed the NCCL assertion when Paddle isn't built with NCCL, conditionalized device cache clearing in test teardown, and updated CI workflows to use nightly/pre-release Paddle wheels and a new env flag.

Changes

Cohort / File(s) Change Summary
Device API abstraction
deepmd/pd/entrypoints/main.py, deepmd/pd/utils/auto_batch_size.py, deepmd/pd/utils/env.py, source/tests/pd/conftest.py
Replaced CUDA-only calls with device-agnostic APIs (paddle.device.cuda.device_count()paddle.device.device_count(), paddle.device.cuda.empty_cache()paddle.device.empty_cache()); env.py now selects device via paddle.device.get_device()/paddle.device.set_device(DEVICE) when devices are present.
Multi-GPU / NCCL check
deepmd/pd/entrypoints/main.py
Relaxed NCCL assertion: only require nccl_version > 0 when Paddle reports it was compiled with NCCL; skip the check when Paddle is not compiled with NCCL.
Device support & RNG
deepmd/pd/utils/utils.py
Added XPU handling (xpu / xpu:*) for generator creation; imported warnings; unsupported devices now emit a UserWarning and return None instead of raising.
Tests teardown update
source/tests/pd/conftest.py
Teardown clears device cache only when active device is not CPU (checks paddle.device.get_device()), using paddle.device.empty_cache().
CI workflow updates
.github/workflows/test_cuda.yml, .github/workflows/test_python.yml
Switched CI to nightly/pre-release Paddle wheel URLs and added FLAGS_use_stride_compute_kernel: 0 to workflow envs.
Tests: flags handling
source/tests/pd/test_multitask.py
Tests capture and restore FLAGS_use_stride_kernel, temporarily disable it during tests, and set learning_rate.start_lr in some setups.

Sequence Diagram(s)

sequenceDiagram
  participant Runner as App / Test runner
  participant Env as env.py
  participant Main as entrypoints/main.py
  participant Utils as utils.get_generator
  participant Paddle as Paddle API

  Runner->>Env: init
  Env->>Paddle: paddle.device.device_count()
  alt device count > 0
    Paddle-->>Env: n>0
    Env->>Paddle: paddle.device.get_device()
    Paddle-->>Env: DEVICE (e.g., "gpu:0"/"xpu:0")
    Env->>Paddle: paddle.device.set_device(DEVICE)
  else no devices
    Paddle-->>Env: 0
    Env->>Paddle: paddle.device.set_device("cpu")
  end

  Runner->>Main: init multi-GPU
  Main->>Paddle: compiled_with_nccl?
  alt compiled_with_nccl == true
    Main->>Paddle: get_nccl_version()
    Paddle-->>Main: nccl_version
    alt nccl_version > 0
      Main-->>Runner: proceed
    else
      Main-->>Runner: assert/fail
    end
  else
    Main-->>Runner: skip NCCL check
  end

  Runner->>Utils: request RNG for DEVICE
  Utils->>Paddle: inspect DEVICE
  alt DEVICE startswith "xpu"
    Utils->>Paddle: create XPU generator
    Paddle-->>Utils: generator
  else DEVICE startswith "gpu" or DEVICE == "cpu"
    Utils->>Paddle: create GPU/CPU generator
    Paddle-->>Utils: generator
  else unsupported
    Utils-->>Runner: warn + return None
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Files needing extra attention:
    • deepmd/pd/entrypoints/main.py — verify compiled_with_nccl detection and get_nccl_version() semantics.
    • deepmd/pd/utils/utils.py — XPU generator APIs and device-string handling.
    • deepmd/pd/utils/env.py — device selection logic and interaction with local-rank conventions.
    • CI workflow files — validate wheel URLs and the effect of FLAGS_use_stride_compute_kernel.

Possibly related PRs

Suggested reviewers

  • njzjz
  • wanghan-iapcm
  • iProzd

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main objective of the pull request: adapting code for hardware compatibility by replacing CUDA-specific APIs with generic device APIs.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7f25e16 and fa7bb22.

📒 Files selected for processing (5)
  • deepmd/pd/entrypoints/main.py (2 hunks)
  • deepmd/pd/utils/auto_batch_size.py (2 hunks)
  • deepmd/pd/utils/env.py (1 hunks)
  • deepmd/pd/utils/utils.py (2 hunks)
  • source/tests/pd/conftest.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Always run ruff check . and ruff format . before committing changes to Python code

Files:

  • deepmd/pd/utils/auto_batch_size.py
  • source/tests/pd/conftest.py
  • deepmd/pd/entrypoints/main.py
  • deepmd/pd/utils/env.py
  • deepmd/pd/utils/utils.py
🪛 Ruff (0.14.4)
deepmd/pd/utils/utils.py

357-357: No explicit stacklevel keyword argument found

Set stacklevel=2

(B028)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (31)
  • GitHub Check: Agent
  • GitHub Check: CodeQL analysis (python)
  • GitHub Check: Test Python (4, 3.9)
  • GitHub Check: Test Python (4, 3.12)
  • GitHub Check: Test Python (6, 3.12)
  • GitHub Check: Test Python (5, 3.9)
  • GitHub Check: Test Python (5, 3.12)
  • GitHub Check: Test Python (6, 3.9)
  • GitHub Check: Test Python (1, 3.12)
  • GitHub Check: Test Python (3, 3.12)
  • GitHub Check: Test Python (2, 3.12)
  • GitHub Check: Test Python (3, 3.9)
  • GitHub Check: Test Python (1, 3.9)
  • GitHub Check: Test Python (2, 3.9)
  • GitHub Check: Test C++ (false)
  • GitHub Check: Test C++ (true)
  • GitHub Check: Build wheels for cp310-manylinux_aarch64
  • GitHub Check: Build C++ (rocm, rocm)
  • GitHub Check: Build C++ (cuda120, cuda)
  • GitHub Check: Build C++ (clang, clang)
  • GitHub Check: Build wheels for cp311-macosx_arm64
  • GitHub Check: Build wheels for cp311-macosx_x86_64
  • GitHub Check: Build wheels for cp311-win_amd64
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Build C++ (cpu, cpu)
  • GitHub Check: Build C++ (cuda, cuda)
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Analyze (c-cpp)
  • GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
  • GitHub Check: Build C library (2.14, >=2.5.0,<2.15, libdeepmd_c_cu11.tar.gz)
  • GitHub Check: Analyze (python)
🔇 Additional comments (7)
source/tests/pd/conftest.py (1)

9-9: LGTM! Device-agnostic API correctly applied.

The change from paddle.device.cuda.empty_cache() to paddle.device.empty_cache() aligns with the PR objective to support multiple hardware backends.

deepmd/pd/entrypoints/main.py (2)

98-98: LGTM! Relaxed NCCL assertion for multi-backend support.

The assertion now allows multi-GPU initialization when PaddlePaddle is not compiled with NCCL, which is appropriate for non-CUDA backends like XPU that may use different communication libraries.


217-217: LGTM! Device-agnostic API correctly applied.

The change from paddle.device.cuda.device_count() to paddle.device.device_count() enables support for multiple hardware backends. Note that the method name get_ngpus() now returns the count of any available compute devices (GPU, XPU, etc.), which is appropriate for the broader hardware support.

deepmd/pd/utils/auto_batch_size.py (2)

31-39: LGTM! Device-agnostic API correctly applied.

The change from paddle.device.cuda.device_count() to paddle.device.device_count() enables support for multiple hardware backends. Note that the method name is_gpu_available() now effectively checks for any compute device availability (GPU, XPU, etc.), not just GPU, which aligns with the broader hardware support goal.


54-54: LGTM! Device-agnostic API correctly applied.

The change from paddle.device.cuda.empty_cache() to paddle.device.empty_cache() enables proper memory cleanup across different hardware backends.

deepmd/pd/utils/utils.py (1)

349-354: LGTM! XPU device support correctly implemented.

The XPU device branches follow the same pattern as GPU device handling and correctly initialize XPU generators for both single-device ("xpu") and indexed-device ("xpu:N") configurations.

deepmd/pd/utils/env.py (1)

32-37: The review comment is based on an incorrect assumption about the DEVICE environment variable's purpose.

The DEVICE environment variable is designed as a CPU-override flag, not a general device selector. This is confirmed by:

  1. Identical implementation pattern across frameworks: Both the PyTorch (deepmd/pt/utils/env.py line 40-43) and PaddlePaddle implementations use the same logic—only checking if DEVICE == "cpu", never using its value to select non-CPU devices like "xpu:0".

  2. Architectural design: For GPU/accelerator selection in distributed training, both frameworks use LOCAL_RANK (PyTorch) or rely on get_device() with device assignment logic (PaddlePaddle). The DEVICE variable is separate and specifically for the CPU-forcing use case.

  3. Consistency: The comment "Make sure DDP uses correct device if applicable" and the pattern across both backends indicate this is intentional—DEVICE forces CPU when needed, while the framework handles proper GPU selection via other mechanisms.

The current code correctly respects the DEVICE environment variable for its intended purpose: forcing CPU execution when DEVICE == "cpu" is set. The suggested fix to support arbitrary device strings like "xpu:0" would require XPU-specific handling logic, which is a feature addition unrelated to the stated PR goal of respecting the DEVICE environment variable.

Likely an incorrect or invalid review comment.

@HydrogenSulfate HydrogenSulfate marked this pull request as draft November 12, 2025 08:37
@codecov
Copy link

codecov bot commented Dec 8, 2025

Codecov Report

❌ Patch coverage is 23.07692% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.28%. Comparing base (0ad7cbf) to head (6bb94db).

Files with missing lines Patch % Lines
deepmd/pd/utils/utils.py 14.28% 6 Missing ⚠️
deepmd/pd/entrypoints/main.py 0.00% 2 Missing ⚠️
deepmd/pd/utils/auto_batch_size.py 50.00% 1 Missing ⚠️
deepmd/pd/utils/env.py 50.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##            devel    #5047      +/-   ##
==========================================
- Coverage   84.28%   84.28%   -0.01%     
==========================================
  Files         709      709              
  Lines       70561    70567       +6     
  Branches     3618     3618              
==========================================
  Hits        59474    59474              
- Misses       9922     9927       +5     
- Partials     1165     1166       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@HydrogenSulfate HydrogenSulfate marked this pull request as ready for review December 8, 2025 08:48
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
.github/workflows/test_python.yml (1)

35-35: Clarify CI stability strategy for nightly PaddlePaddle wheels.

Switching to nightly/pre-release PaddlePaddle wheels (3.3.0.dev20251204) can introduce instability and make test failures harder to attribute. Consider:

  • Whether this is a temporary measure for testing device-agnostic compatibility or a long-term strategy
  • Adding fallback/retry logic if the specific nightly build becomes unavailable
  • Documenting the rationale and expected timeline for returning to stable releases

The --no-index --find-links pattern is strict; if the nightly wheel URL becomes stale or the specific version is missing, CI will fail. Consider wrapping this in a retry mechanism or providing a fallback stable version.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fb6ccf4 and acab160.

📒 Files selected for processing (2)
  • .github/workflows/test_cuda.yml (2 hunks)
  • .github/workflows/test_python.yml (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • .github/workflows/test_cuda.yml
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-24T22:42:23.128Z
Learnt from: CR
Repo: deepmodeling/deepmd-kit PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-24T22:42:23.128Z
Learning: Install backend dependencies (TensorFlow, PyTorch, JAX, or Paddle) before building C++ components, and set TENSORFLOW_ROOT and PYTORCH_ROOT environment variables

Applied to files:

  • .github/workflows/test_python.yml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (29)
  • GitHub Check: Test C++ (false)
  • GitHub Check: Test C++ (true)
  • GitHub Check: Test Python (6, 3.12)
  • GitHub Check: Test Python (5, 3.9)
  • GitHub Check: Test Python (6, 3.9)
  • GitHub Check: Test Python (5, 3.12)
  • GitHub Check: Test Python (3, 3.12)
  • GitHub Check: Test Python (3, 3.9)
  • GitHub Check: Test Python (4, 3.12)
  • GitHub Check: Test Python (1, 3.12)
  • GitHub Check: Test Python (4, 3.9)
  • GitHub Check: Test Python (2, 3.12)
  • GitHub Check: Test Python (1, 3.9)
  • GitHub Check: Test Python (2, 3.9)
  • GitHub Check: Build C library (2.14, >=2.5.0,<2.15, libdeepmd_c_cu11.tar.gz)
  • GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
  • GitHub Check: Analyze (c-cpp)
  • GitHub Check: Analyze (python)
  • GitHub Check: Build C++ (cuda120, cuda)
  • GitHub Check: Build C++ (clang, clang)
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Build wheels for cp310-manylinux_aarch64
  • GitHub Check: Build wheels for cp311-win_amd64
  • GitHub Check: Build wheels for cp311-macosx_x86_64
  • GitHub Check: Build C++ (rocm, rocm)
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Build C++ (cpu, cpu)
  • GitHub Check: Build C++ (cuda, cuda)
  • GitHub Check: Build wheels for cp311-macosx_arm64
🔇 Additional comments (1)
.github/workflows/test_python.yml (1)

60-70: Inconsistent environment variable scope across pytest invocations.

FLAGS_use_stride_compute_kernel: 0 is set for the main pytest run (line 63) but not for the subsequent "Test TF2 eager mode" step (line 65). Verify whether this omission is intentional or if the flag should also be applied there.

For consistency and clarity, consider either:

  • Applying the flag to both pytest invocations if both need stride kernel handling
  • Documenting why the TF2 test intentionally omits this flag

@HydrogenSulfate
Copy link
Collaborator Author

hello @njzjz, this PR is ready for review

@njzjz njzjz added this pull request to the merge queue Dec 8, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Dec 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants