Skip to content

Conversation

@usberkeley
Copy link
Contributor

@usberkeley usberkeley commented Nov 27, 2025

Purpose

1)Fix #29591
2)delete lmcache/disagg_prefill_lmcache_v0.py:vLLM V0 architecture has already been deprecated.

When using CUDA with multiprocessing, Python's default multiprocessing start method (fork) can cause CUDA context initialization failures in child processes, leading to errors.

Test Plan

None

Test Result

None


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@mergify
Copy link

mergify bot commented Nov 27, 2025

Documentation preview: https://vllm--29592.org.readthedocs.build/en/29592/

@mergify mergify bot added documentation Improvements or additions to documentation nvidia kv-connector labels Nov 27, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly addresses a CUDA compatibility issue in the kv_cache_sharing_lmcache_v1.py example. By setting the multiprocessing start method to 'spawn', it prevents potential CUDA context initialization failures in child processes that can occur with the default 'fork' method. The implementation is sound, using multiprocessing.set_start_method('spawn', force=True) to ensure the correct context is established, which is a robust approach for a standalone example script. The changes are well-justified and improve the reliability of the example in CUDA environments. The code looks good, and I have no high or critical severity comments.

@usberkeley usberkeley force-pushed the fix_kv_cache_sharing_lmcache branch from 0391a6c to e6b7a9b Compare November 27, 2025 08:34
…_cache_sharing_lmcache_v1.py

Signed-off-by: Bradley <bradley.b.pitt@gmail.com>
@usberkeley usberkeley force-pushed the fix_kv_cache_sharing_lmcache branch from 03739f9 to 5d6749a Compare November 27, 2025 08:40
@usberkeley
Copy link
Contributor Author

usberkeley commented Nov 27, 2025

Hi @YaoJiayi

Could you please take a look and review when you have time? Thank you very much!

@LucasWilkinson
Copy link
Collaborator

cc @NickLucche @njhill (please redirect if you are not the right person; not sure who the LMCache owner is)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation kv-connector nvidia

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

[Bug] RuntimeError: Cannot re-initialize CUDA in forked subprocess when using lmcache/kv_cache_sharing_lmcache_v1

2 participants