Skip to content

Conversation

@w169q169
Copy link

@w169q169 w169q169 commented Dec 23, 2025

This patch adds support for Programmatic Dependent Kernel Launch (PDL) in the TVM CUDA FFI layer. PDL enables launching dependent kernels on the GPU without host intervention, improving performance and expressiveness for dynamic CUDA workloads.

Refer to NVIDIA documentation for PDL semantics:
https://docs.nvidia.com/cuda/cuda-programming-guide/03-advanced/advanced-host-programming.html#programmatic-dependent-kernel-launch

In addition, this patch extends the CUDA FFI layer to support cooperative kernel launches via cuLaunchCooperativeKernel. Cooperative kernels allow grid-wide synchronization and are required for certain multi-stage or producer–consumer GPU workloads. When a kernel is marked for cooperative launch, it will be dispatched using cuLaunchCooperativeKernel instead of the standard cuLaunchKernel.

Refer to NVIDIA documentation for cooperative kernel usage:
https://docs.nvidia.com/cuda/cuda-programming-guide/04-special-topics/cooperative-groups.html#when-to-use-cudalaunchcooperativekernel

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @w169q169, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request integrates support for Programmatic Dependent Kernel Launch (PDL) into the TVM CUDA FFI. This enhancement is designed to improve the performance and expressiveness of dynamic CUDA workloads by allowing GPU kernels to directly launch subsequent kernels, thereby reducing the overhead associated with host CPU synchronization. The changes involve modifying the CUDA runtime to recognize and act upon a new flag in function metadata, which then triggers the use of advanced CUDA kernel launch APIs.

Highlights

  • Programmatic Dependent Kernel Launch (PDL) Support: Added core support for Programmatic Dependent Kernel Launch (PDL) within the TVM CUDA FFI layer, enabling GPU kernels to launch other kernels without host intervention.
  • Conditional Kernel Launch Mechanism: Implemented a conditional kernel launch mechanism that utilizes cuLaunchKernelEx with CUlaunchConfig and CUlaunchAttribute when PDL is enabled, specifically setting CU_LAUNCH_ATTRIBUTE_PROGRAMMATIC_STREAM_SERIALIZATION.
  • Function Metadata Extension: Extended the FunctionInfo metadata structure to include a has_programmatic_dependent_launch boolean flag, which is persisted during serialization and deserialization for both JSON and binary formats.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for Programmatic Dependent Kernel Launch (PDL) in the CUDA FFI, which is a great feature for dynamic workloads. The implementation correctly uses cuLaunchKernelEx when the feature is enabled for a function.

My review focuses on improving robustness and ensuring backward compatibility. I've pointed out a couple of areas for improvement:

  • In cuda_module.cc, I suggest zero-initializing CUDA launch configuration structs to prevent potential issues with uninitialized data.
  • In meta_data.h and file_utils.cc, I've identified issues with serialization that could break backward compatibility. Specifically, the new has_programmatic_dependent_launch flag needs to be handled correctly when loading older module metadata, both for JSON and binary formats. The binary format handling, in particular, seems to have a breaking change.

Please take a look at the detailed comments. Addressing these points will make the changes more robust and maintainable.

@tqchen
Copy link
Member

tqchen commented Dec 23, 2025

thanks a lot for the contribution, to align with existing impl, we can use launch param tag, see LaunchParamConfig implementation. The launch param tag can be specified as part of tir.kernel_launch_params, which then set the LaunchPramConfig setting

we can use

constexpr const char* kUseProgramaticDependentLaunch = "tir.use_programtic_dependent_launch";

  • add a field use_programtic_dependent_launch_ to field to LaunchConfig
  • expose a function use_programtic_dependent_launch() which can be queried in LaunchConfig

This way we can directly specify pdl properties in the tvmscript as kernel function attribute

@w169q169 w169q169 marked this pull request as ready for review December 24, 2025 03:26
@w169q169
Copy link
Author

thanks a lot for the contribution, to align with existing impl, we can use launch param tag, see LaunchParamConfig implementation. The launch param tag can be specified as part of tir.kernel_launch_params, which then set the LaunchPramConfig setting

we can use

constexpr const char* kUseProgramaticDependentLaunch = "tir.use_programtic_dependent_launch";

  • add a field use_programtic_dependent_launch_ to field to LaunchConfig
  • expose a function use_programtic_dependent_launch() which can be queried in LaunchConfig

This way we can directly specify pdl properties in the tvmscript as kernel function attribute

Thanks for the detailed guidance. I have updated the PR according to your suggestions. Please let me know if there are any further comments or improvements needed.

@silentCoder-dev
Copy link

silentCoder-dev commented Dec 24, 2025

Could you extend this implementation to include cuLaunchCooperativeKernel support? This is needed for issue tile-ai/tilelang/issues/1501

@w169q169
Copy link
Author

Could you extend this implementation to include cuLaunchCooperativeKernel support? This is needed for issue tile-ai/tilelang/issues/1501

OK, I’d be happy to work on it.

@w169q169 w169q169 changed the title [CUDA][FFI] Add support for Programmatic Dependent Kernel Launch (PDL) in TVM CUDA FFI [CUDA][FFI] Extend kernel launch config to support Programmatic Dependent Launch and cuLaunchCooperativeKernel Dec 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants