Commit 5cdc1b5
committed
[yugabyte#26880] DocDB: Prevent deadlock between txn load and compaction that causes peer to enter a stuck state
Summary:
**Problem**
Background compactions using a filter (for instance, tables where cdc is enabled) could deadlock with the transaction loader causing the peer to enter into a stuck state. This is because compaction can wait for the loader to finish with `RunningTransactionContext::mutex_` held, and the loader could wait on the same mutex causing a deadlock.
**Issue seen**
In one of the internal clusters, we saw 2 peers stuck with the following traces unable to process any consensus operations.
The peer gets stuck trying to process a replicated op and holds up `ReplicaState::update_lock_`, and waits for the loader thread to complete.
```
@ 0x7fcfb39f27a9 __pthread_cond_timedwait
@ 0x555e3409bcd4 yb::tablet::TransactionLoader::WaitLoaded()
@ 0x555e33ff760e yb::tablet::Tablet::ApplyKeyValueRowOperations()
@ 0x555e33ff6dce yb::tablet::Tablet::ApplyOperation()
@ 0x555e33ff69f2 yb::tablet::Tablet::ApplyRowOperations()
@ 0x555e33fafeed yb::tablet::WriteOperation::DoReplicated()
@ 0x555e33fa129d yb::tablet::Operation::Replicated()
@ 0x555e33fa3690 yb::tablet::OperationDriver::ReplicationFinished()
@ 0x555e32f29082 yb::consensus::ConsensusRound::NotifyReplicationFinished()
@ 0x555e32f81091 yb::consensus::ReplicaState::ApplyPendingOperationsUnlocked()
@ 0x555e32f8039a yb::consensus::ReplicaState::AdvanceCommittedOpIdUnlocked()
@ 0x555e32f65d49 yb::consensus::RaftConsensus::UpdateReplica()
@ 0x555e32f41e4f yb::consensus::RaftConsensus::Update()
@ 0x555e3437f846 yb::tserver::ConsensusServiceImpl::UpdateConsensus()
@ 0x555e32fce4f8 std::__1::__function::__func<>::operator()()
@ 0x555e32fcf0fe yb::consensus::ConsensusServiceIf::Handle()
Total number of threads: 2
```
The loader thread waits on `RunningTransactionContext::mutex_`
```
@ 0x7fcfb39f582a __lll_lock_wait
@ 0x7fcfb39eead8 __GI___pthread_mutex_lock
@ 0x555e340a9003 yb::tablet::TransactionParticipant::Impl::LoadTransaction()
@ 0x555e340984ce yb::tablet::TransactionLoader::Executor::Execute()
@ 0x555e347d9cd8 yb::Thread::SuperviseThread()
@ 0x7fcfb39ec1c9 start_thread
@ 0x7fcfb3c3de72 __GI___clone
Total number of threads: 2
```
and `RunningTransactionContext::mutex_` is held by the background compaction thread that is waiting for the loader to complete, causing a deadlock.
```
@ 0x7fcfb39f27a9 __pthread_cond_timedwait
@ 0x555e3409bcd4 yb::tablet::TransactionLoader::WaitLoaded()
@ 0x555e340ae872 yb::tablet::TransactionParticipant::Cleanup()
@ 0x555e330db3bf yb::docdb::(anonymous namespace)::DocDBIntentsCompactionFilter::CompactionFinished()
@ 0x555e33c79d65 rocksdb::CompactionJob::ProcessKeyValueCompaction()
@ 0x555e33c76f6a rocksdb::CompactionJob::Run()
@ 0x555e33cadea9 rocksdb::DBImpl::BackgroundCompaction()
@ 0x555e33cabd95 rocksdb::DBImpl::BackgroundCallCompaction()
@ 0x555e347a847b yb::(anonymous namespace)::PriorityThreadPoolWorker::Run()
@ 0x555e347d9cd8 yb::Thread::SuperviseThread()
@ 0x7fcfb39ec1c9 start_thread
@ 0x7fcfb3c3de72 __GI___clone
Total number of threads: 2
```
**Fix**
This revision addresses the issue by executing `WaitLoaded` outside scope of `RunningTransactionContext::mutex_`
Jira: DB-16294
Test Plan:
Jenkins
./yb_build.sh --cxx-test cdcsdk_ysql-test --gtest_filter CDCSDKYsqlTest.TestCompactionDoesntDeadlockWithTxnLoader
The test fails without the changes.
Reviewers: sergei, esheng
Reviewed By: sergei
Subscribers: ybase
Differential Revision: https://phorge.dev.yugabyte.com/D433541 parent 0816b2f commit 5cdc1b5
File tree
5 files changed
+77
-2
lines changed- src/yb
- integration-tests
- tablet
5 files changed
+77
-2
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2724 | 2724 | | |
2725 | 2725 | | |
2726 | 2726 | | |
| 2727 | + | |
| 2728 | + | |
| 2729 | + | |
| 2730 | + | |
| 2731 | + | |
| 2732 | + | |
| 2733 | + | |
| 2734 | + | |
| 2735 | + | |
| 2736 | + | |
| 2737 | + | |
| 2738 | + | |
| 2739 | + | |
| 2740 | + | |
| 2741 | + | |
| 2742 | + | |
| 2743 | + | |
| 2744 | + | |
| 2745 | + | |
| 2746 | + | |
| 2747 | + | |
| 2748 | + | |
| 2749 | + | |
| 2750 | + | |
| 2751 | + | |
| 2752 | + | |
| 2753 | + | |
| 2754 | + | |
| 2755 | + | |
| 2756 | + | |
| 2757 | + | |
| 2758 | + | |
| 2759 | + | |
| 2760 | + | |
| 2761 | + | |
| 2762 | + | |
| 2763 | + | |
| 2764 | + | |
| 2765 | + | |
| 2766 | + | |
| 2767 | + | |
| 2768 | + | |
| 2769 | + | |
| 2770 | + | |
| 2771 | + | |
| 2772 | + | |
| 2773 | + | |
| 2774 | + | |
| 2775 | + | |
| 2776 | + | |
| 2777 | + | |
| 2778 | + | |
| 2779 | + | |
| 2780 | + | |
| 2781 | + | |
| 2782 | + | |
2727 | 2783 | | |
2728 | 2784 | | |
2729 | 2785 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
135 | 135 | | |
136 | 136 | | |
137 | 137 | | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
138 | 141 | | |
139 | 142 | | |
140 | 143 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
33 | 33 | | |
34 | 34 | | |
35 | 35 | | |
| 36 | + | |
36 | 37 | | |
37 | 38 | | |
38 | 39 | | |
| |||
108 | 109 | | |
109 | 110 | | |
110 | 111 | | |
| 112 | + | |
111 | 113 | | |
112 | 114 | | |
113 | 115 | | |
| |||
432 | 434 | | |
433 | 435 | | |
434 | 436 | | |
| 437 | + | |
| 438 | + | |
| 439 | + | |
| 440 | + | |
| 441 | + | |
| 442 | + | |
| 443 | + | |
| 444 | + | |
| 445 | + | |
| 446 | + | |
| 447 | + | |
435 | 448 | | |
436 | 449 | | |
437 | 450 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
87 | 87 | | |
88 | 88 | | |
89 | 89 | | |
| 90 | + | |
90 | 91 | | |
91 | 92 | | |
92 | 93 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
804 | 804 | | |
805 | 805 | | |
806 | 806 | | |
| 807 | + | |
| 808 | + | |
| 809 | + | |
| 810 | + | |
807 | 811 | | |
808 | 812 | | |
809 | 813 | | |
810 | 814 | | |
811 | 815 | | |
812 | 816 | | |
813 | 817 | | |
814 | | - | |
815 | | - | |
816 | 818 | | |
817 | 819 | | |
818 | 820 | | |
| |||
0 commit comments