Skip to content

Conversation

@kahyunnam
Copy link
Contributor

@kahyunnam kahyunnam commented Nov 14, 2025

📌 Description

Download flashinfer-cubin and flashinfer-jit-cache to avoid compilation. (Unless the JIT kernel is not in the flashinfer-jit-cache; then it will still JIT compile during test runtime. We could set export FLASHINFER_DISABLE_JIT = 1 to avoid this, but then it will "skip" a lot of tests that use JIT kernels that aren't found in flashinfer-jit-cache.)

🔍 Related Issues

Issue was discussed on slack. "Ideally, we would move that compilation off-line which would reduce test time & make kernel hang detection much easier. "

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

  • I have installed pre-commit by running pip install pre-commit (or used your preferred method).
  • I have installed the hooks with pre-commit install.
  • I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

  • Tests have been added or updated as needed.
  • All tests are passing (unittest, etc.).

Summary by CodeRabbit

  • Chores

    • Improved runtime install flow to detect CUDA and compute an effective JIT architecture mapping, then install matching precompiled kernel artifacts and local package sources; steps run only outside dry-run mode and verify installation by showing config.
    • Simplified build parallelism calculation to a constant division by 8 (with existing guards).
  • Bug Fixes

    • Missing precompiled kernel artifacts now cause an explicit error/abort instead of a warning.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 14, 2025

Walkthrough

Adds non-dry-run runtime initialization to install and verify prebuilt kernel artifacts and local Python sources, and simplifies the MAX_JOBS calculation for JIT cache wheel builds by using a constant divisor for memory-based job estimation.

Changes

Cohort / File(s) Change Summary
Kernel install & test script
scripts/task_test_blackwell_kernels.sh
On non-dry-run: print CUDA_VERSION; compute JIT_ARCH_EFFECTIVE with special-case mapping for 12.0/cu129; set DIST_CUBIN_DIR and DIST_JIT_CACHE_DIR under dist/${CUDA_VERSION}; install flashinfer-cubin from DIST_CUBIN_DIR or exit with error if missing; install flashinfer-jit-cache from DIST_JIT_CACHE_DIR or exit with error if missing; then pip install -e . and run python -m flashinfer show-config in /tmp.
JIT cache wheel build script
scripts/build_flashinfer_jit_cache_whl.sh
Replace conditional architecture-dependent MAX_JOBS calculation with a single rule: compute jobs as MEM_AVAILABLE_GB / 8 (no aarch64 special-case), then enforce MAX_JOBS >=1 and cap by NPROC.

Sequence Diagram(s)

sequenceDiagram
    participant Script as task_test_blackwell_kernels.sh
    participant Env as Environment
    participant FS as Filesystem (dist/)
    participant Pip as pip
    participant Python as python

    rect rgb(240,248,255)
    Note over Script: Start (only if DRY_RUN unset)
    Script->>Env: Check DRY_RUN
    alt DRY_RUN not set
        Script->>Env: Echo CUDA_VERSION
        Script->>Script: Compute JIT_ARCH_EFFECTIVE (special-case 12.0/cu129)
        Script->>Script: Set DIST_CUBIN_DIR = dist/${CUDA_VERSION}/cubin
        Script->>Script: Set DIST_JIT_CACHE_DIR = dist/${CUDA_VERSION}/jit-cache
    else DRY_RUN set
        Script-->>Env: Skip install steps
    end
    end

    rect rgb(230,255,240)
    Note over Script,FS: Install prebuilt artifacts (error if missing)
    Script->>FS: Check DIST_CUBIN_DIR for wheels
    alt cubin wheels found
        Script->>Pip: pip install <cubin wheel>
        Pip-->>Script: success
    else missing
        Script-->>Env: exit with error ("missing flashinfer-cubin artifact")
    end
    Script->>FS: Check DIST_JIT_CACHE_DIR for wheels
    alt jit-cache wheels found
        Script->>Pip: pip install <jit-cache wheel>
        Pip-->>Script: success
    else missing
        Script-->>Env: exit with error ("missing flashinfer-jit-cache artifact")
    end
    end

    rect rgb(255,250,230)
    Note over Script,Python: Install local package & verify
    Script->>Pip: pip install -e . -v --no-deps
    Pip-->>Script: installed
    Script->>Python: (cd /tmp && python -m flashinfer show-config)
    Python-->>Script: config output / verification
    end
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

  • Small number of script changes with straightforward control-flow.
  • Pay extra attention to:
    • Correctness of JIT_ARCH_EFFECTIVE mapping logic (12.0 / cu129 -> 12.0a vs 12.0f).
    • Exit behavior when prebuilt wheels are missing (explicit error exits).
    • The new unconditional memory divisor change in build_flashinfer_jit_cache_whl.sh and its impact on aarch64 builds.

Possibly related PRs

Suggested reviewers

  • yongwww

Poem

🐰 I sniff the CUDA version with a hop and cheer,
I hunt the cubins where the dist files appear,
If kernels are missing I shout, "No more play!"
Then pip my roots and verify — off I sway. 🥕

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main objective of the PR: moving compilation offline to reduce test time.
Description check ✅ Passed The PR description covers the main purpose, includes discussion of fallback behavior, references related Slack discussion, and completes all pre-commit and testing checklist items.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

  • Provide your own instructions using the high_level_summary_instructions setting.
  • Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
  • Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

  1. 📝 Description — Summarize the main change in 50–60 words, explaining what was done.
  2. 📓 References — List relevant issues, discussions, documentation, or related PRs.
  3. 📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.
  4. 📊 Contributor Summary — Include a Markdown table showing contributions:
    | Contributor | Lines Added | Lines Removed | Files Changed |
  5. ✔️ Additional Notes — Add any extra reviewer context.
    Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@kahyunnam kahyunnam force-pushed the knam/unit-testing-move-compilation-offline branch from c9c6768 to 375ca18 Compare November 14, 2025 01:16
@kahyunnam kahyunnam self-assigned this Nov 14, 2025
@kahyunnam
Copy link
Contributor Author

/bot run

@kahyunnam kahyunnam marked this pull request as ready for review November 14, 2025 01:18
@flashinfer-bot
Copy link
Collaborator

GitLab MR !137 has been created, and the CI pipeline #38459095 is currently running. I'll report back once the pipeline job completes.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
scripts/task_test_blackwell_kernels.sh (2)

41-50: Inconsistent verbosity flags in sequential pip installations.

Lines 43 and 45 use -q (quiet) for kernel installations, while line 49 uses -v (verbose) for local source installation. This inconsistency makes it unclear whether the verbosity change is intentional and may make output harder to parse in CI logs.

Standardize the verbosity flags across all pip installations in this initialization block:

  # Install precompiled kernels
  echo "Installing flashinfer-cubin from PyPI/index..."
- pip install -q flashinfer-cubin
+ pip install -q flashinfer-cubin
  echo "Installing flashinfer-jit-cache for ${CUDA_STREAM} from https://flashinfer.ai/whl/${CUDA_STREAM} ..."
- pip install -q --extra-index-url "https://flashinfer.ai/whl/${CUDA_STREAM}" flashinfer-jit-cache
+ pip install -q --extra-index-url "https://flashinfer.ai/whl/${CUDA_STREAM}" flashinfer-jit-cache
  echo ""

  # Install local python sources
- pip install -e . -v --no-deps
+ pip install -e . -q --no-deps

Alternatively, if verbose output is intentional for debugging local installs, add a comment explaining the choice.


41-50: Verify that the custom PyPI index URL for flashinfer-jit-cache is reliable.

The script hardcodes the index URL https://flashinfer.ai/whl/${CUDA_STREAM} and expects it to always be available and contain the flashinfer-jit-cache package for the detected CUDA stream. If this URL becomes unavailable or if a CUDA stream version is not published, the pip install will fail and halt all subsequent tests.

Add error handling and diagnostics to surface issues clearly:

  echo "Installing flashinfer-jit-cache for ${CUDA_STREAM} from https://flashinfer.ai/whl/${CUDA_STREAM} ..."
- pip install -q --extra-index-url "https://flashinfer.ai/whl/${CUDA_STREAM}" flashinfer-jit-cache
+ if ! pip install -q --extra-index-url "https://flashinfer.ai/whl/${CUDA_STREAM}" flashinfer-jit-cache; then
+     echo "❌ ERROR: Failed to install flashinfer-jit-cache for CUDA stream ${CUDA_STREAM}"
+     echo "   Index URL: https://flashinfer.ai/whl/${CUDA_STREAM}"
+     exit 1
+ fi

Can you confirm that the custom index URL is stable and that all supported CUDA streams (cu128, cu129, cu130) are consistently published with the corresponding flashinfer-jit-cache package?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 54101e9 and 375ca18.

📒 Files selected for processing (1)
  • scripts/task_test_blackwell_kernels.sh (1 hunks)
🔇 Additional comments (1)
scripts/task_test_blackwell_kernels.sh (1)

52-55: Verify that python -m flashinfer show-config is an appropriate verification step.

The verification runs python -m flashinfer show-config to confirm successful installation. However, this assumes:

  1. The show-config subcommand exists in the flashinfer module
  2. The command is idempotent and doesn't modify the environment
  3. The command completes quickly without external dependencies

If this command fails (e.g., due to missing dependencies, invalid environment, or a transient issue), the entire test run is aborted before any tests can run, which may be overly strict for a verification step.

Can you confirm:

  • That python -m flashinfer show-config is a lightweight, read-only command that verifies the installation without side effects?
  • What the expected output is and whether it should be validated beyond the exit code?
  • Whether a failed verification should block all tests or only warn/skip?

@flashinfer-bot
Copy link
Collaborator

[FAILED] Pipeline #38459095: 3/17 passed

@kahyunnam kahyunnam closed this Nov 14, 2025
@kahyunnam kahyunnam reopened this Nov 14, 2025
@kahyunnam kahyunnam changed the title Reduce test time by moving compilation off-line [To merge AFTER flashinfer-ci changes updated] Reduce test time by moving compilation off-line Nov 14, 2025
Copy link
Collaborator

@bkryu bkryu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kahyunnam! Left a comment about the behavior with jit cache & cubin wheels are not found.

@kahyunnam kahyunnam force-pushed the knam/unit-testing-move-compilation-offline branch from 0a7f5c7 to 3f6dc1c Compare November 22, 2025 01:26
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
scripts/task_test_blackwell_kernels.sh (3)

44-45: Validate that relative paths are robust to working directory assumptions.

Lines 44–45 construct distribution paths as ../dist/${CUDA_VERSION}/${JIT_ARCH_EFFECTIVE}/.... This assumes the script is invoked from a specific directory (likely the repository root). If the script is called from a different directory, these paths will fail silently or point to unintended locations.

Consider either:

  1. Using $(dirname "${BASH_SOURCE[0]}") to anchor paths relative to the script location.
  2. Adding explicit validation that DIST_CUBIN_DIR and DIST_JIT_CACHE_DIR are accessible before attempting installation.
  3. Documenting the expected working directory requirement in a comment.

33-42: Simplify JIT_ARCH mapping logic for clarity.

The nested conditional on lines 34–39 is difficult to follow. The logic maps only 12.0 to architecture-specific suffixes (12.0a for cu129, 12.0f otherwise), while other values pass through unchanged. Consider extracting this into a helper function or adding comments to explain the mapping rules and why 12.0 is special.


28-29: Add validation or explicit handling for CUDA_VERSION.

Line 28 echoes CUDA_VERSION but does not validate that it is set to an expected value (cu128, cu129, cu130, etc.). If CUDA_VERSION is unset or malformed, the script will still proceed and construct invalid paths. Consider adding a check to fail fast if the value is unexpected, or document the assumption that CUDA_VERSION is always set by the caller.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0a7f5c7 and 3f6dc1c.

📒 Files selected for processing (2)
  • scripts/build_flashinfer_jit_cache_whl.sh (1 hunks)
  • scripts/task_test_blackwell_kernels.sh (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Deploy Docs
🔇 Additional comments (3)
scripts/build_flashinfer_jit_cache_whl.sh (1)

14-15: Clarify rationale for changing MAX_JOBS divisor from architecture-dependent to constant.

The change removes the conditional logic and always divides by 8, whereas the original divided by 4 on x86_64. While this simplifies the calculation, it may result in fewer parallel jobs on non-aarch64 systems, potentially increasing build time.

Was this change validated on both architectures? If not, consider testing build times on x86_64 to confirm acceptable performance.

scripts/task_test_blackwell_kernels.sh (2)

65-72: Approve artifact installation and verification flow.

The addition of local source installation and verification via python -m flashinfer show-config is well-structured. Running the verification in /tmp isolates side effects and ensures the installed packages work in a clean environment. The error handling is appropriate for this stage.


51-53: Clarify intent regarding exit 1 statements at lines 52 and 60.

The current code contains exit 1 at both locations (lines 52 and 60). The review comment references a prior resolution where you stated you "removed the 'exit 1' for both the cubin / jit-cache else logic," but I cannot access the prior conversation to verify this claim.

Please confirm:

  • Was removing the exit 1 statements intentionally reverted?
  • Is the current behavior (hard error on missing artifacts) the intended behavior?
  • If the intent was to warn and continue with JIT compilation fallback, these statements need to be replaced with warnings.

Without access to the prior review thread, I cannot determine whether this is an oversight or intentional. The developer must clarify the design decision.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants