-
Notifications
You must be signed in to change notification settings - Fork 584
refactor: pass hopper deepgemm include directory through python #2090
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughAdds a TVM FFI to set DeepGEMM JIT include directories, centralizes include-dir state in the DeepGEMM compiler via a static accessor/mutator, updates fused MoE build inputs and runtime initialization to call the new FFI, and adds a small CUDA header include. Changes
Sequence Diagram(s)sequenceDiagram
participant Py as Python initializer
participant TVM as TVM FFI layer
participant Binding as deepgemm_jit_setup (CUDA binding)
participant Compiler as deep_gemm::jit::Compiler
Py->>TVM: call set_deepgemm_jit_include_dirs([path])
TVM->>Binding: invoke exported function
Binding->>Binding: convert tvm::ffi::Array<String> → vector<filesystem::path>
Binding->>Compiler: Compiler::setIncludeDirs(paths)
Compiler->>Compiler: set static JIT include dirs (getJitIncludeDirs ref)
Compiler-->>Binding: return
Binding-->>TVM: return
TVM-->>Py: return
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @yzh119, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly improves the robustness of DeepGEMM JIT compilation by eliminating a fragile dependency on shell commands for discovering include directories. Instead, it establishes a direct communication channel where Python APIs programmatically provide the necessary paths to the C++ DeepGEMM compiler, ensuring consistent and reliable behavior across diverse user environments. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors how the deepgemm JIT include directory is located, moving from a shell-based discovery (pip show) to a more robust approach where the path is passed from Python to the C++ backend. This is a great improvement, making the build process less dependent on the environment. The changes are well-contained and logical. I've found a potential race condition in the Compiler singleton initialization that could lead to JIT compilation failures depending on call order, and I've suggested a fix. I also have a minor suggestion to improve performance in one of the new C++ functions. Overall, a solid refactoring.
| [[nodiscard]] bool isValid() const { return !includeDirs_.empty(); } | ||
|
|
||
| // Set include directories before the singleton is initialized | ||
| static void setIncludeDirs(std::vector<std::filesystem::path> const& dirs) { | ||
| setJitIncludeDirs(dirs); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a potential race condition here depending on the initialization order. The Compiler singleton's constructor copies the JIT include directories into the includeDirs_ member. If getInstance() is called before setIncludeDirs(), includeDirs_ will be initialized with an empty vector and will not be updated when setIncludeDirs() is called later. This would cause JIT compilation to fail because of missing include paths.
To make this more robust, I suggest removing the includeDirs_ member and directly using getJitIncludeDirs() in isValid() and build(). This ensures that the most up-to-date include directories are always used.
This would involve:
- Removing the
includeDirs_member variable. - Updating the constructor to remove the initialization of
includeDirs_. - Updating
isValid()as suggested below. - Updating
build()to loop overgetJitIncludeDirs()instead ofincludeDirs_.
[[nodiscard]] bool isValid() const { return !getJitIncludeDirs().empty(); }
// Set include directories before the singleton is initialized
static void setIncludeDirs(std::vector<std::filesystem::path> const& dirs) {
setJitIncludeDirs(dirs);
}
| std::vector<std::filesystem::path> dirs; | ||
| for (const auto& dir : include_dirs) { | ||
| dirs.push_back(std::filesystem::path(std::string(dir))); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a slight performance improvement and to follow modern C++ best practices, you can reserve memory for the vector to avoid potential reallocations and use emplace_back to construct the path in-place.
std::vector<std::filesystem::path> dirs;
dirs.reserve(include_dirs.size());
for (const auto& dir : include_dirs) {
dirs.emplace_back(std::string(dir));
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
csrc/fused_moe/cutlass_backend/flashinfer_cutlass_fused_moe_binding.cu(2 hunks)csrc/nv_internal/tensorrt_llm/deep_gemm/compiler.cuh(2 hunks)csrc/nv_internal/tensorrt_llm/deep_gemm/jit_utils.cuh(2 hunks)csrc/nv_internal/tensorrt_llm/deep_gemm/runtime.cuh(2 hunks)flashinfer/fused_moe/core.py(1 hunks)flashinfer/jit/fused_moe.py(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-12T03:35:17.583Z
Learnt from: raayandhar
Repo: flashinfer-ai/flashinfer PR: 2070
File: include/flashinfer/gemm/bf16_gemm_cutlass_template.h:145-160
Timestamp: 2025-11-12T03:35:17.583Z
Learning: In flashinfer GEMM implementations (e.g., include/flashinfer/gemm/bf16_gemm_cutlass_template.h, fp8_gemm_cutlass_template.h), it is acceptable to catch and silently ignore std::runtime_error exceptions in getWorkspaceSizeImpl when probing multiple GEMM configurations, as some configurations may legitimately fail due to SMEM constraints. This pattern should include a comment like "// Swallow errors when SMEM exceeds maximum allowed" to document the rationale.
Applied to files:
csrc/nv_internal/tensorrt_llm/deep_gemm/jit_utils.cuhcsrc/fused_moe/cutlass_backend/flashinfer_cutlass_fused_moe_binding.cu
🧬 Code graph analysis (1)
flashinfer/fused_moe/core.py (1)
csrc/fused_moe/cutlass_backend/flashinfer_cutlass_fused_moe_binding.cu (2)
set_deepgemm_jit_include_dirs(1206-1212)set_deepgemm_jit_include_dirs(1206-1206)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Deploy Docs
🔇 Additional comments (5)
flashinfer/jit/fused_moe.py (1)
149-149: LGTM - Binding file path updatedThe change aligns with the new unified binding file that includes the
set_deepgemm_jit_include_dirsFFI export.csrc/nv_internal/tensorrt_llm/deep_gemm/runtime.cuh (1)
68-68: LGTM - Consistent use of CUDA Driver APIThe migration from CUDA Runtime API to CUDA Driver API is consistent throughout the file.
csrc/fused_moe/cutlass_backend/flashinfer_cutlass_fused_moe_binding.cu (1)
1204-1217: LGTM - FFI binding correctly forwards include directoriesThe implementation properly converts TVM array types to C++ filesystem paths and forwards them to the DeepGEMM compiler.
csrc/nv_internal/tensorrt_llm/deep_gemm/jit_utils.cuh (1)
42-54: LGTM - Proper macro guard prevents conflictsThe
#ifndefguard is a good practice to prevent macro redefinition errors, especially given the noted potential conflict withtvm_ffi_utils.h.flashinfer/fused_moe/core.py (1)
327-334: Verify initialization order to prevent race conditionThe DeepGEMM include directories are set after the module is loaded. The
Compilersingleton incompiler.cuhcopies the static include dirs vector to a member variableincludeDirs_during initialization (line 475). If theCompilersingleton is initialized anywhere during the module load process (before line 333 executes), the member variable won't be updated with the correct paths, causing compilation failures.Run this script to verify that no code path during module loading triggers
Compiler::getInstance():Consider documenting the initialization order requirement or refactoring the
Compilerto use the static include dirs directly instead of copying to a member variable.
|
/bot run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
csrc/nv_internal/tensorrt_llm/deep_gemm/compiler.cuh (1)
124-132: Consider thread safety for concurrent access.The static vector is now mutable through
setJitIncludeDirs()without synchronization. IfsetIncludeDirs()is called concurrently withbuild()operations or from multiple threads, there could be data races when reading/writing the vector.Given the typical usage pattern (set once during initialization, then read during compilation), this is likely acceptable, but consider adding a comment documenting that
setIncludeDirs()should be called during initialization before any compilation occurs.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
csrc/fused_moe/cutlass_backend/deepgemm_jit_setup.cu(1 hunks)csrc/nv_internal/tensorrt_llm/deep_gemm/compiler.cuh(4 hunks)flashinfer/jit/fused_moe.py(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Deploy Docs
🔇 Additional comments (7)
csrc/nv_internal/tensorrt_llm/deep_gemm/compiler.cuh (4)
124-132: LGTM! Race condition from past reviews is now resolved.The implementation correctly addresses the initialization race condition raised in previous reviews. By returning a reference to the static vector and providing a setter, the code ensures that all consumers see the same up-to-date include directories without any stale copies.
238-238: LGTM! Correct usage of the static vector.The code now correctly calls
getJitIncludeDirs()directly instead of using a potentially stale member copy. This ensures thatisValid()andbuild()always see the current include directories.Also applies to: 316-316
240-243: LGTM! Clean API for external configuration.The new
setIncludeDirs()public static method provides a clean interface for external code to configure include directories before compilation, which aligns well with the PR's objective of passing the directory through Python APIs.
473-473: LGTM! Constructor simplified correctly.The constructor no longer copies the static include directories to a member variable, which was the source of the initialization race condition. The removal is correct and complete.
flashinfer/jit/fused_moe.py (1)
149-151: LGTM! Correct integration of the new DeepGEMM setup.The change correctly integrates the new
deepgemm_jit_setup.cusource file into the module generation, making the FFI functionset_deepgemm_jit_include_dirsavailable for configuration. The binding file change from the SM100-specific version to the generic version aligns with the refactoring objective.csrc/fused_moe/cutlass_backend/deepgemm_jit_setup.cu (2)
25-31: LGTM! Clean TVM FFI wrapper implementation.The function correctly converts TVM array of strings to a
std::vector<std::filesystem::path>and forwards to the Compiler API. The implementation is straightforward and properly integrates the Python-to-C++ configuration path.
35-36: LGTM! Correct TVM FFI export.The macro correctly exports the function as
set_deepgemm_jit_include_dirs, making it callable from Python/TVM.
|
[FAILED] Pipeline #38602487: 1/17 passed |
|
/bot run |
|
[CANCELING] Pipeline #38631200: canceled |
|
/bot run |
|
[SUCCESS] Pipeline #38637572: 10/18 passed |
|
Failed UTs are not relevant (will be fixed in #2097). |
cyx-6
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
/bot run |
📌 Description
This PR implements the refactor mentioned in
https://github.com/flashinfer-ai/flashinfer/pull/1969/files#r2461856020
In our current design we rely on calling
pip show flashinfer-python 2>/dev/null || uv pip show flashinfer-python 2>/dev/nullto obtain deepgemm jit include directory, which is error-prune (e.g. if user do not havepipavailable in the environment it will fail), in this PR we pass the deepgemm jit include directory through python APIs.🔍 Related Issues
#1969
🚀 Pull Request Checklist
Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.
✅ Pre-commit Checks
pre-commitby runningpip install pre-commit(or used your preferred method).pre-commit install.pre-commit run --all-filesand fixed any reported issues.🧪 Tests
unittest, etc.).Reviewer Notes
cc @djmmoss @jiahanc @nvmbreughe
Summary by CodeRabbit
New Features
Chores
✏️ Tip: You can customize this high-level summary in your review settings.