Commit 44a67f1
committed
[yugabyte#25689] DocDB: Fix throw_bad_weak_ptr issue when attempting to abort transactions
Summary:
One of the stress tests faced a crash with the following trace
```
* thread #1, name = 'yb-tserver', stop reason = signal SIGABRT
* frame #0: 0x00007f4ecfd66acf libc.so.6`raise + 271
frame #1: 0x00007f4ecfd39ea5 libc.so.6`abort + 295
frame #2: 0x00005606b0891403 yb-server`abort_message + 195
frame #3: 0x00005606b0890f9c yb-server`demangling_terminate_handler() + 268
frame #4: 0x00005606b0890c66 yb-server`std::__terminate(void (*)()) + 6
frame #5: 0x00005606b0892bab yb-server`__cxxabiv1::failed_throw(__cxxabiv1::__cxa_exception*) + 27
frame #6: 0x00005606b0892b3f yb-server`__cxa_throw + 111
frame #7: 0x00005606ae47cd1e yb-server`std::__1::__throw_bad_weak_ptr[abi:ue170006]() at shared_ptr.h:137:5
frame #8: 0x00005606af954382 yb-server`yb::tablet::TransactionParticipant::Impl::Abort(yb::StronglyTypedUuid<yb::TransactionId_Tag> const&, std::__1::function<void (yb::Result<yb::TransactionStatusResult>)>) [inlined] std::__1::shared_ptr<yb::tablet::RunningTransaction>::shared_ptr[abi:ue170006]<yb::tablet::RunningTransaction, void>(this=<unavailable>, __r=std::__1::weak_ptr<yb::tablet::RunningTransaction>::element_type @ 0x0000162400000001) at shared_ptr.h:704:13
frame #9: 0x00005606af954334 yb-server`yb::tablet::TransactionParticipant::Impl::Abort(yb::StronglyTypedUuid<yb::TransactionId_Tag> const&, std::__1::function<void (yb::Result<yb::TransactionStatusResult>)>) [inlined] std::__1::enable_shared_from_this<yb::tablet::RunningTransaction>::shared_from_this[abi:ue170006](this=0x00001624cd5da018) at shared_ptr.h:1954:17
frame #10: 0x00005606af954334 yb-server`yb::tablet::TransactionParticipant::Impl::Abort(yb::StronglyTypedUuid<yb::TransactionId_Tag> const&, std::__1::function<void (yb::Result<yb::TransactionStatusResult>)>) [inlined] yb::tablet::RunningTransaction::Abort(this=0x00001624cd5da018, client=0x00001624fd7d7f10, callback=yb::TransactionStatusCallback @ 0x00007f4babc39700, lock=0x00007f4babc396e0)>, std::__1::unique_lock<std::__1::mutex>*) at running_transaction.cc:200:34
frame #11: 0x00005606af953ccf yb-server`yb::tablet::TransactionParticipant::Impl::Abort(this=<unavailable>, id=<unavailable>, callback=<unavailable>)>) at transaction_participant.cc:707:45
frame yugabyte#12: 0x00005606af95d7d4 yb-server`yb::tablet::TransactionParticipant::StopActiveTxnsPriorTo(yb::HybridTime, std::__1::chrono::time_point<yb::CoarseMonoClock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l>>>, yb::StronglyTypedUuid<yb::TransactionId_Tag>*) at transaction_participant.cc:1355:7
frame yugabyte#13: 0x00005606af95d3e5 yb-server`yb::tablet::TransactionParticipant::StopActiveTxnsPriorTo(this=<unavailable>, cutoff=<unavailable>, deadline=yb::CoarseTimePoint @ 0x00007f4babc398b8, exclude_txn_id=<unavailable>) at transaction_participant.cc:2700:17
```
This suggests an issue where the underlying `RunningTransaction` is being destroyed and we are trying to call `shared_from_this()` post that. This happens as we release the transaction participant's lock before creating a shared ref for the `RunningTransaction` instance we are trying to abort.
This diff fixes the issue by creating the shared_ref first before releasing the participant's mutex, and then using it later.
Jira: DB-14948
Test Plan: Jenkins
Reviewers: esheng
Reviewed By: esheng
Subscribers: rthallam, ybase
Differential Revision: https://phorge.dev.yugabyte.com/D413841 parent f5dc7f1 commit 44a67f1
1 file changed
+3
-2
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
182 | 182 | | |
183 | 183 | | |
184 | 184 | | |
| 185 | + | |
185 | 186 | | |
186 | 187 | | |
187 | 188 | | |
| |||
197 | 198 | | |
198 | 199 | | |
199 | 200 | | |
200 | | - | |
| 201 | + | |
201 | 202 | | |
202 | 203 | | |
203 | 204 | | |
204 | 205 | | |
205 | 206 | | |
206 | | - | |
| 207 | + | |
207 | 208 | | |
208 | 209 | | |
209 | 210 | | |
| |||
0 commit comments