refactor: pass hopper deepgemm include directory through python #2090

yzh119 · 2025-11-14T03:32:24Z

📌 Description

This PR implements the refactor mentioned in
https://github.com/flashinfer-ai/flashinfer/pull/1969/files#r2461856020

In our current design we rely on calling pip show flashinfer-python 2>/dev/null || uv pip show flashinfer-python 2>/dev/null to obtain deepgemm jit include directory, which is error-prune (e.g. if user do not have pip available in the environment it will fail), in this PR we pass the deepgemm jit include directory through python APIs.

🔍 Related Issues

#1969

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

I have installed pre-commit by running pip install pre-commit (or used your preferred method).
I have installed the hooks with pre-commit install.
I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

Tests have been added or updated as needed.
All tests are passing (unittest, etc.).

Reviewer Notes

cc @djmmoss @jiahanc @nvmbreughe

Summary by CodeRabbit

New Features
- Fused MoE modules now set DeepGEMM JIT include directories at runtime; a runtime-exposed entry lets the module provide include paths during initialization.
- Module generation includes the DeepGEMM JIT setup so those paths are applied before runtime use.
Chores
- JIT compiler API updated to centrally manage and accept external include directories.
- Minor header/build adjustments to support the new initialization flow.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-11-14T03:32:34Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Adds a TVM FFI to set DeepGEMM JIT include directories, centralizes include-dir state in the DeepGEMM compiler via a static accessor/mutator, updates fused MoE build inputs and runtime initialization to call the new FFI, and adds a small CUDA header include.

Changes

Cohort / File(s)	Summary
DeepGEMM compiler API & state `csrc/nv_internal/tensorrt_llm/deep_gemm/compiler.cuh`	Replace instance `includeDirs_` with a lazily-initialized static `getJitIncludeDirs()` returning a reference; add `setJitIncludeDirs()` and public static `Compiler::setIncludeDirs()`; update constructor/isValid and replace uses with the static accessor.
CUDA FFI bindings (new setup & binding swap) `csrc/fused_moe/cutlass_backend/deepgemm_jit_setup.cu`, `csrc/fused_moe/cutlass_backend/flashinfer_cutlass_fused_moe_binding.cu`	Add new exported function `set_deepgemm_jit_include_dirs` (wrapper in `flashinfer` namespace) converting `tvm::ffi::Array<tvm::ffi::String>` → `std::vector<std::filesystem::path>` and calling `deep_gemm::jit::Compiler::setIncludeDirs()`; swap which binding source is used by the module.
Python runtime init `flashinfer/fused_moe/core.py`	After loading the Cutlass fused MoE backend, import JIT env and call the exported `set_deepgemm_jit_include_dirs()` with the constructed DeepGEMM include path (`FLASHINFER_CSRC_DIR/nv_internal/tensorrt_llm`).
JIT build inputs `flashinfer/jit/fused_moe.py`	Swap binding kernel source to `flashinfer_cutlass_fused_moe_binding.cu` and add `deepgemm_jit_setup.cu` to CUDA sources compiled into the fused MoE module.
Header include tweak `csrc/nv_internal/tensorrt_llm/deep_gemm/jit_utils.cuh`	Add `#include <cuda.h>` at the top of the header.

Sequence Diagram(s)

sequenceDiagram
    participant Py as Python initializer
    participant TVM as TVM FFI layer
    participant Binding as deepgemm_jit_setup (CUDA binding)
    participant Compiler as deep_gemm::jit::Compiler

    Py->>TVM: call set_deepgemm_jit_include_dirs([path])
    TVM->>Binding: invoke exported function
    Binding->>Binding: convert tvm::ffi::Array<String> → vector<filesystem::path>
    Binding->>Compiler: Compiler::setIncludeDirs(paths)
    Compiler->>Compiler: set static JIT include dirs (getJitIncludeDirs ref)
    Compiler-->>Binding: return
    Binding-->>TVM: return
    TVM-->>Py: return

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Review thread-safety and lifetime of the static vector returned by getJitIncludeDirs().
Verify FFI string → path conversion, empty/invalid entries, and exported symbol name matches Python usage.
Confirm the Python call happens before any JIT compilation paths run.

Possibly related PRs

feat: enable deepgemm jit for fp8 block-scale on SM90 #1969 — also modifies DeepGEMM compiler include-directory handling and related APIs.
fix: support both pip and uv pip for finding flashinfer-python package #2043 — touches getJitIncludeDirs() implementation and include-path discovery logic.

Suggested reviewers

djmmoss
jiahanc
nvmbreughe
yongwww
wenscarl
aleozlx

Poem

🐰 I hop through code with tiny paws so fleet,

I stitch include paths where C++ and Python meet,
A bind, a call, the JIT awakes with cheer,
Static paths in place, the compiler purrs near,
Hooray — a rabbit's nibble, neat and sweet 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the main change: refactoring the method of passing the DeepGEMM include directory configuration through Python APIs instead of shell commands.
Description check	✅ Passed	The PR description is well-structured, addresses all required template sections, explains the rationale for the change, references related issues, and includes pre-commit verification and testing notes.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 90a0149 and bcd9bb7.

📒 Files selected for processing (2)

flashinfer/fused_moe/core.py (1 hunks)
flashinfer/jit/fused_moe.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

flashinfer/fused_moe/core.py
flashinfer/jit/fused_moe.py

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Deploy Docs

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2025-11-14T03:32:46Z

Summary of Changes

Hello @yzh119, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly improves the robustness of DeepGEMM JIT compilation by eliminating a fragile dependency on shell commands for discovering include directories. Instead, it establishes a direct communication channel where Python APIs programmatically provide the necessary paths to the C++ DeepGEMM compiler, ensuring consistent and reliable behavior across diverse user environments.

Highlights

Refactored DeepGEMM Include Path Discovery: The method for obtaining DeepGEMM JIT include directories has been refactored from relying on error-prone shell commands (pip show or uv pip show) to using more robust Python APIs.
New C++ API for Include Paths: A new C++ function set_deepgemm_jit_include_dirs is now exposed via TVM_FFI, allowing Python to directly pass include directory paths to the DeepGEMM compiler.
DeepGEMM Compiler Update: The deep_gemm::jit::Compiler in C++ now includes a static setIncludeDirs method to receive the include paths programmatically, removing the need for internal shell command execution.
Python Integration: The Python codebase (flashinfer/fused_moe/core.py) has been updated to retrieve the necessary include path from the jit_env and pass it to the C++ backend using the new API.
File Renaming: The file flashinfer_cutlass_fused_moe_sm100_binding.cu has been renamed to flashinfer_cutlass_fused_moe_binding.cu.
Macro Renaming for Conflict Avoidance: The CHECK_CUDA macro in jit_utils.cuh and runtime.cuh has been renamed to CHECK_CUDA_DRIVER and guarded with #ifndef to prevent potential redefinition conflicts.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request refactors how the deepgemm JIT include directory is located, moving from a shell-based discovery (pip show) to a more robust approach where the path is passed from Python to the C++ backend. This is a great improvement, making the build process less dependent on the environment. The changes are well-contained and logical. I've found a potential race condition in the Compiler singleton initialization that could lead to JIT compilation failures depending on call order, and I've suggested a fix. I also have a minor suggestion to improve performance in one of the new C++ functions. Overall, a solid refactoring.

gemini-code-assist · 2025-11-14T03:34:51Z

csrc/nv_internal/tensorrt_llm/deep_gemm/compiler.cuh

  [[nodiscard]] bool isValid() const { return !includeDirs_.empty(); }

+  // Set include directories before the singleton is initialized
+  static void setIncludeDirs(std::vector<std::filesystem::path> const& dirs) {
+    setJitIncludeDirs(dirs);
+  }


There's a potential race condition here depending on the initialization order. The Compiler singleton's constructor copies the JIT include directories into the includeDirs_ member. If getInstance() is called before setIncludeDirs(), includeDirs_ will be initialized with an empty vector and will not be updated when setIncludeDirs() is called later. This would cause JIT compilation to fail because of missing include paths.

To make this more robust, I suggest removing the includeDirs_ member and directly using getJitIncludeDirs() in isValid() and build(). This ensures that the most up-to-date include directories are always used.

This would involve:

Removing the includeDirs_ member variable.

Updating the constructor to remove the initialization of includeDirs_.

Updating isValid() as suggested below.

Updating build() to loop over getJitIncludeDirs() instead of includeDirs_.

[[nodiscard]] bool isValid() const { return !getJitIncludeDirs().empty(); } // Set include directories before the singleton is initialized static void setIncludeDirs(std::vector<std::filesystem::path> const& dirs) { setJitIncludeDirs(dirs); }

gemini-code-assist · 2025-11-14T03:34:52Z

csrc/fused_moe/cutlass_backend/flashinfer_cutlass_fused_moe_binding.cu

+  std::vector<std::filesystem::path> dirs;
+  for (const auto& dir : include_dirs) {
+    dirs.push_back(std::filesystem::path(std::string(dir)));
+  }


For a slight performance improvement and to follow modern C++ best practices, you can reserve memory for the vector to avoid potential reallocations and use emplace_back to construct the path in-place.

std::vector<std::filesystem::path> dirs; dirs.reserve(include_dirs.size()); for (const auto& dir : include_dirs) { dirs.emplace_back(std::string(dir)); }

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9a79b78 and b36e1fb.

📒 Files selected for processing (6)

csrc/fused_moe/cutlass_backend/flashinfer_cutlass_fused_moe_binding.cu (2 hunks)
csrc/nv_internal/tensorrt_llm/deep_gemm/compiler.cuh (2 hunks)
csrc/nv_internal/tensorrt_llm/deep_gemm/jit_utils.cuh (2 hunks)
csrc/nv_internal/tensorrt_llm/deep_gemm/runtime.cuh (2 hunks)
flashinfer/fused_moe/core.py (1 hunks)
flashinfer/jit/fused_moe.py (1 hunks)

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-11-12T03:35:17.583Z

Learnt from: raayandhar
Repo: flashinfer-ai/flashinfer PR: 2070
File: include/flashinfer/gemm/bf16_gemm_cutlass_template.h:145-160
Timestamp: 2025-11-12T03:35:17.583Z
Learning: In flashinfer GEMM implementations (e.g., include/flashinfer/gemm/bf16_gemm_cutlass_template.h, fp8_gemm_cutlass_template.h), it is acceptable to catch and silently ignore std::runtime_error exceptions in getWorkspaceSizeImpl when probing multiple GEMM configurations, as some configurations may legitimately fail due to SMEM constraints. This pattern should include a comment like "// Swallow errors when SMEM exceeds maximum allowed" to document the rationale.

Applied to files:

csrc/nv_internal/tensorrt_llm/deep_gemm/jit_utils.cuh
csrc/fused_moe/cutlass_backend/flashinfer_cutlass_fused_moe_binding.cu

🧬 Code graph analysis (1)

flashinfer/fused_moe/core.py (1)

csrc/fused_moe/cutlass_backend/flashinfer_cutlass_fused_moe_binding.cu (2)

set_deepgemm_jit_include_dirs (1206-1212)

set_deepgemm_jit_include_dirs (1206-1206)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Deploy Docs

🔇 Additional comments (5)

flashinfer/jit/fused_moe.py (1)

149-149: LGTM - Binding file path updated

The change aligns with the new unified binding file that includes the set_deepgemm_jit_include_dirs FFI export.

csrc/nv_internal/tensorrt_llm/deep_gemm/runtime.cuh (1)

68-68: LGTM - Consistent use of CUDA Driver API

The migration from CUDA Runtime API to CUDA Driver API is consistent throughout the file.

csrc/fused_moe/cutlass_backend/flashinfer_cutlass_fused_moe_binding.cu (1)

1204-1217: LGTM - FFI binding correctly forwards include directories

The implementation properly converts TVM array types to C++ filesystem paths and forwards them to the DeepGEMM compiler.

csrc/nv_internal/tensorrt_llm/deep_gemm/jit_utils.cuh (1)

42-54: LGTM - Proper macro guard prevents conflicts

The #ifndef guard is a good practice to prevent macro redefinition errors, especially given the noted potential conflict with tvm_ffi_utils.h.

flashinfer/fused_moe/core.py (1)

327-334: Verify initialization order to prevent race condition

The DeepGEMM include directories are set after the module is loaded. The Compiler singleton in compiler.cuh copies the static include dirs vector to a member variable includeDirs_ during initialization (line 475). If the Compiler singleton is initialized anywhere during the module load process (before line 333 executes), the member variable won't be updated with the correct paths, causing compilation failures.

Run this script to verify that no code path during module loading triggers Compiler::getInstance():

Consider documenting the initialization order requirement or refactoring the Compiler to use the static include dirs directly instead of copying to a member variable.

csrc/nv_internal/tensorrt_llm/deep_gemm/compiler.cuh

yzh119 · 2025-11-17T00:29:25Z

/bot run

flashinfer-bot · 2025-11-17T00:30:01Z

GitLab MR !142 has been created, and the CI pipeline #38602487 is currently running. I'll report back once the pipeline job completes.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

csrc/nv_internal/tensorrt_llm/deep_gemm/compiler.cuh (1)

124-132: Consider thread safety for concurrent access.

The static vector is now mutable through setJitIncludeDirs() without synchronization. If setIncludeDirs() is called concurrently with build() operations or from multiple threads, there could be data races when reading/writing the vector.

Given the typical usage pattern (set once during initialization, then read during compilation), this is likely acceptable, but consider adding a comment documenting that setIncludeDirs() should be called during initialization before any compilation occurs.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b36e1fb and 482b3b8.

📒 Files selected for processing (3)

csrc/fused_moe/cutlass_backend/deepgemm_jit_setup.cu (1 hunks)
csrc/nv_internal/tensorrt_llm/deep_gemm/compiler.cuh (4 hunks)
flashinfer/jit/fused_moe.py (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Deploy Docs

🔇 Additional comments (7)

csrc/nv_internal/tensorrt_llm/deep_gemm/compiler.cuh (4)

124-132: LGTM! Race condition from past reviews is now resolved.

The implementation correctly addresses the initialization race condition raised in previous reviews. By returning a reference to the static vector and providing a setter, the code ensures that all consumers see the same up-to-date include directories without any stale copies.

238-238: LGTM! Correct usage of the static vector.

The code now correctly calls getJitIncludeDirs() directly instead of using a potentially stale member copy. This ensures that isValid() and build() always see the current include directories.

Also applies to: 316-316

240-243: LGTM! Clean API for external configuration.

The new setIncludeDirs() public static method provides a clean interface for external code to configure include directories before compilation, which aligns well with the PR's objective of passing the directory through Python APIs.

473-473: LGTM! Constructor simplified correctly.

The constructor no longer copies the static include directories to a member variable, which was the source of the initialization race condition. The removal is correct and complete.

flashinfer/jit/fused_moe.py (1)

149-151: LGTM! Correct integration of the new DeepGEMM setup.

The change correctly integrates the new deepgemm_jit_setup.cu source file into the module generation, making the FFI function set_deepgemm_jit_include_dirs available for configuration. The binding file change from the SM100-specific version to the generic version aligns with the refactoring objective.

csrc/fused_moe/cutlass_backend/deepgemm_jit_setup.cu (2)

25-31: LGTM! Clean TVM FFI wrapper implementation.

The function correctly converts TVM array of strings to a std::vector<std::filesystem::path> and forwards to the Compiler API. The implementation is straightforward and properly integrates the Python-to-C++ configuration path.

35-36: LGTM! Correct TVM FFI export.

The macro correctly exports the function as set_deepgemm_jit_include_dirs, making it callable from Python/TVM.

flashinfer-bot · 2025-11-17T02:58:54Z

[FAILED] Pipeline #38602487: 1/17 passed

yzh119 · 2025-11-17T13:20:20Z

/bot run

flashinfer-bot · 2025-11-17T13:20:33Z

GitLab MR !142 has been updated with latest changes, and the CI pipeline #38631200 is currently running. I'll report back once the pipeline job completes.

flashinfer-bot · 2025-11-17T13:53:58Z

[CANCELING] Pipeline #38631200: canceled

yzh119 · 2025-11-17T15:54:17Z

/bot run

flashinfer-bot · 2025-11-17T15:55:20Z

GitLab MR !142 has been updated with latest changes, and the CI pipeline #38637572 is currently running. I'll report back once the pipeline job completes.

flashinfer-bot · 2025-11-17T23:45:03Z

[SUCCESS] Pipeline #38637572: 10/18 passed

yzh119 · 2025-11-18T07:54:51Z

Failed UTs are not relevant (will be fixed in #2097).

cyx-6

LGTM

…dirs

yzh119 · 2025-11-28T03:42:48Z

/bot run

flashinfer-bot · 2025-11-28T03:43:30Z

GitLab MR !142 has been updated with latest changes, and the CI pipeline #39288620 is currently running. I'll report back once the pipeline job completes.

yzh119 and others added 2 commits November 13, 2025 22:07

upd

058e205

upd

b36e1fb

yzh119 requested review from cyx-6, djmmoss, jiahanc, nvmbreughe, wenscarl and yongwww as code owners November 14, 2025 03:32

gemini-code-assist bot reviewed Nov 14, 2025

View reviewed changes

coderabbitai bot reviewed Nov 14, 2025

View reviewed changes

csrc/nv_internal/tensorrt_llm/deep_gemm/compiler.cuh Show resolved Hide resolved

yzh119 added 2 commits November 16, 2025 19:02

fix

45f680c

upd

482b3b8

coderabbitai bot reviewed Nov 17, 2025

View reviewed changes

fix header

d0fb9ae

add missing headers

90a0149

yzh119 enabled auto-merge (squash) November 18, 2025 07:54

cyx-6 approved these changes Nov 18, 2025

View reviewed changes

Merge remote-tracking branch 'origin/main' into fix-deepgemm-include-…

bcd9bb7

…dirs

yzh119 requested a review from aleozlx as a code owner November 28, 2025 03:42

refactor: pass hopper deepgemm include directory through python #2090

Are you sure you want to change the base?

refactor: pass hopper deepgemm include directory through python #2090

Conversation

yzh119 commented Nov 14, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📌 Description

🔍 Related Issues

🚀 Pull Request Checklist

✅ Pre-commit Checks

🧪 Tests

Reviewer Notes

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Other AI code review bot(s) detected

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

gemini-code-assist bot commented Nov 14, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yzh119 commented Nov 17, 2025

Uh oh!

flashinfer-bot commented Nov 17, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

flashinfer-bot commented Nov 17, 2025

Uh oh!

yzh119 commented Nov 17, 2025

Uh oh!

flashinfer-bot commented Nov 17, 2025

Uh oh!

flashinfer-bot commented Nov 17, 2025

Uh oh!

yzh119 commented Nov 17, 2025

Uh oh!

flashinfer-bot commented Nov 17, 2025

Uh oh!

flashinfer-bot commented Nov 17, 2025

Uh oh!

yzh119 commented Nov 18, 2025

Uh oh!

cyx-6 left a comment

Choose a reason for hiding this comment

Uh oh!

yzh119 commented Nov 28, 2025

Uh oh!

flashinfer-bot commented Nov 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

yzh119 commented Nov 14, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 14, 2025 •

edited

Loading