Deadlock repro example with TokioMultiThreadExecutor + read_files
#1536
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Note: it's not a PR to merge, it's just a report (issue) which needs code examples to show how to repro the problem. Treat it as bug report.
TokioMultiThreadExecutor Deadlock with sync_channel
Problem
read_files() deadlocks when called from a tokio worker thread using current
TokioMultiThreadExecutor.Sequence:
read_files()createssync_channel(0)and callstask_executor.spawn()read_files()returns iterator that blocks onreceiver.recv()Root cause: Tokio's
schedule_task()useswith_current()to detect worker context and schedules locally for performance. However,sync_channel.recv()OS-blocks the thread, preventing the local task from executing.Reproduction
Minimal (pure tokio example)
cargo test --test executor_deadlock_repro test_deadlock_minimal_reproDemonstrates core issue: spawning from worker thread then blocking on sync primitive.
Full delta-kernel reproduction
cargo test --test executor_deadlock_repro test_tokio_multi_thread_executor_deadlock -- --ignoredTimes out after 5 seconds. Marked
#[ignore]since it's expected to hang.Workaround verification
cargo test --test executor_deadlock_repro test_tokio_background_executor_worksShows
TokioBackgroundExecutoravoids the issue by using separate OS thread.Solutions
Immediate workaround
Use
TokioBackgroundExecutorinstead ofTokioMultiThreadExecutor.Long-term fixes
Option 1: Replace
sync_channelwith tokio async channelStorageHandler::read_files()to return async iteratorOption 2: Force remote scheduling
Option 3: Use
tokio::task::block_in_placeTechnical Details
Call stack
Tokio scheduling logic
In
tokio/src/runtime/scheduler/multi_thread/worker.rs:Optimization assumes worker continues executing.
sync_channel.recv()violates this assumption.Debugged using: