Skip to content

Commit 44a67f1

Browse files
committed
[yugabyte#25689] DocDB: Fix throw_bad_weak_ptr issue when attempting to abort transactions
Summary: One of the stress tests faced a crash with the following trace ``` * thread #1, name = 'yb-tserver', stop reason = signal SIGABRT * frame #0: 0x00007f4ecfd66acf libc.so.6`raise + 271 frame #1: 0x00007f4ecfd39ea5 libc.so.6`abort + 295 frame #2: 0x00005606b0891403 yb-server`abort_message + 195 frame #3: 0x00005606b0890f9c yb-server`demangling_terminate_handler() + 268 frame #4: 0x00005606b0890c66 yb-server`std::__terminate(void (*)()) + 6 frame #5: 0x00005606b0892bab yb-server`__cxxabiv1::failed_throw(__cxxabiv1::__cxa_exception*) + 27 frame #6: 0x00005606b0892b3f yb-server`__cxa_throw + 111 frame #7: 0x00005606ae47cd1e yb-server`std::__1::__throw_bad_weak_ptr[abi:ue170006]() at shared_ptr.h:137:5 frame #8: 0x00005606af954382 yb-server`yb::tablet::TransactionParticipant::Impl::Abort(yb::StronglyTypedUuid<yb::TransactionId_Tag> const&, std::__1::function<void (yb::Result<yb::TransactionStatusResult>)>) [inlined] std::__1::shared_ptr<yb::tablet::RunningTransaction>::shared_ptr[abi:ue170006]<yb::tablet::RunningTransaction, void>(this=<unavailable>, __r=std::__1::weak_ptr<yb::tablet::RunningTransaction>::element_type @ 0x0000162400000001) at shared_ptr.h:704:13 frame #9: 0x00005606af954334 yb-server`yb::tablet::TransactionParticipant::Impl::Abort(yb::StronglyTypedUuid<yb::TransactionId_Tag> const&, std::__1::function<void (yb::Result<yb::TransactionStatusResult>)>) [inlined] std::__1::enable_shared_from_this<yb::tablet::RunningTransaction>::shared_from_this[abi:ue170006](this=0x00001624cd5da018) at shared_ptr.h:1954:17 frame #10: 0x00005606af954334 yb-server`yb::tablet::TransactionParticipant::Impl::Abort(yb::StronglyTypedUuid<yb::TransactionId_Tag> const&, std::__1::function<void (yb::Result<yb::TransactionStatusResult>)>) [inlined] yb::tablet::RunningTransaction::Abort(this=0x00001624cd5da018, client=0x00001624fd7d7f10, callback=yb::TransactionStatusCallback @ 0x00007f4babc39700, lock=0x00007f4babc396e0)>, std::__1::unique_lock<std::__1::mutex>*) at running_transaction.cc:200:34 frame #11: 0x00005606af953ccf yb-server`yb::tablet::TransactionParticipant::Impl::Abort(this=<unavailable>, id=<unavailable>, callback=<unavailable>)>) at transaction_participant.cc:707:45 frame yugabyte#12: 0x00005606af95d7d4 yb-server`yb::tablet::TransactionParticipant::StopActiveTxnsPriorTo(yb::HybridTime, std::__1::chrono::time_point<yb::CoarseMonoClock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l>>>, yb::StronglyTypedUuid<yb::TransactionId_Tag>*) at transaction_participant.cc:1355:7 frame yugabyte#13: 0x00005606af95d3e5 yb-server`yb::tablet::TransactionParticipant::StopActiveTxnsPriorTo(this=<unavailable>, cutoff=<unavailable>, deadline=yb::CoarseTimePoint @ 0x00007f4babc398b8, exclude_txn_id=<unavailable>) at transaction_participant.cc:2700:17 ``` This suggests an issue where the underlying `RunningTransaction` is being destroyed and we are trying to call `shared_from_this()` post that. This happens as we release the transaction participant's lock before creating a shared ref for the `RunningTransaction` instance we are trying to abort. This diff fixes the issue by creating the shared_ref first before releasing the participant's mutex, and then using it later. Jira: DB-14948 Test Plan: Jenkins Reviewers: esheng Reviewed By: esheng Subscribers: rthallam, ybase Differential Revision: https://phorge.dev.yugabyte.com/D41384
1 parent f5dc7f1 commit 44a67f1

File tree

1 file changed

+3
-2
lines changed

1 file changed

+3
-2
lines changed

src/yb/tablet/running_transaction.cc

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -182,6 +182,7 @@ void RunningTransaction::Abort(client::YBClient* client,
182182
abort_waiters_.push_back(std::move(callback));
183183
auto status_tablet = this->status_tablet();
184184
abort_request_in_progress_ = true;
185+
auto shared_self = shared_from_this();
185186
lock->unlock();
186187
VLOG_WITH_PREFIX(3) << "Abort request: " << was_empty;
187188
if (!was_empty) {
@@ -197,13 +198,13 @@ void RunningTransaction::Abort(client::YBClient* client,
197198
nullptr /* tablet */,
198199
client,
199200
&req,
200-
[status_tablet, self = shared_from_this(), weak_context = context_.RetainWeak()](
201+
[status_tablet, shared_self, weak_context = context_.RetainWeak()](
201202
const Status& status, const tserver::AbortTransactionResponsePB& response) {
202203
auto context_lock = weak_context.lock();
203204
if (!context_lock) {
204205
return;
205206
}
206-
self->AbortReceived(status_tablet, status, response);
207+
shared_self->AbortReceived(status_tablet, status, response);
207208
}),
208209
&abort_handle_);
209210
}

0 commit comments

Comments
 (0)