Skip to content

Commit 1104ee6

Browse files
committed
[yugabyte#9934] [docdb] Don't update rocksdb_dir on Remote Bootstrap
Summary: Data Race Issue on Remote Bootstrap with rocksdb_dir: ``` [ts-4] WARNING: ThreadSanitizer: data race (pid=8370) [ts-4] Write of size 8 at 0x7b50002606a8 by thread T46 (mutexes: write M257685187220342284): [ts-4] #0 std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::assign(char const*, unsigned long) <null> (libc++.so.1+0xd5fa5) [ts-4] #1 std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::operator=(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) <null> (libc++.so.1+0xd5e8a) [ts-4] #2 yb::tablet::KvStoreInfo::LoadFromPB(yb::tablet::KvStoreInfoPB const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) /nfusr/centos-gcp-cloud/jenkins-worker-r2ttyq/jenkins/jenkins-github-yugabyte-db-centos-master-clang7-tsan-339/build/tsan-clang7-dynamic-ninja/../../src/yb/tablet/tablet_metadata.cc:201:15 (libtablet.so+0x36795c) [ts-4] #3 yb::tablet::RaftGroupMetadata::LoadFromSuperBlock(yb::tablet::RaftGroupReplicaSuperBlockPB const&) /nfusr/centos-gcp-cloud/jenkins-worker-r2ttyq/jenkins/jenkins-github-yugabyte-db-centos-master-clang7-tsan-339/build/tsan-clang7-dynamic-ninja/../../src/yb/tablet/tablet_metadata.cc:525:5 (libtablet.so+0x36b36b) [ts-4] #4 yb::tablet::RaftGroupMetadata::ReplaceSuperBlock(yb::tablet::RaftGroupReplicaSuperBlockPB const&) /nfusr/centos-gcp-cloud/jenkins-worker-r2ttyq/jenkins/jenkins-github-yugabyte-db-centos-master-clang7-tsan-339/build/tsan-clang7-dynamic-ninja/../../src/yb/tablet/tablet_metadata.cc:586:3 (libtablet.so+0x36c078) [ts-4] #5 yb::tserver::RemoteBootstrapClient::Finish() /nfusr/centos-gcp-cloud/jenkins-worker-r2ttyq/jenkins/jenkins-github-yugabyte-db-centos-master-clang7-tsan-339/build/tsan-clang7-dynamic-ninja/../../src/yb/tserver/remote_bootstrap_client.cc:421:3 (libtserver.so+0x1d1699) [ts-4] #6 yb::tserver::TSTabletManager::StartRemoteBootstrap(yb::consensus::StartRemoteBootstrapRequestPB const&) /nfusr/centos-gcp-cloud/jenkins-worker-r2ttyq/jenkins/jenkins-github-yugabyte-db-centos-master-clang7-tsan-339/build/tsan-clang7-dynamic-ninja/../../src/yb/tserver/ts_tablet_manager.cc:1099:3 (libtserver.so+0x267088) [ts-4] #7 yb::tserver::ConsensusServiceImpl::StartRemoteBootstrap(yb::consensus::StartRemoteBootstrapRequestPB const*, yb::consensus::StartRemoteBootstrapResponsePB*, yb::rpc::RpcContext) /nfusr/centos-gcp-cloud/jenkins-worker-r2ttyq/jenkins/jenkins-github-yugabyte-db-centos-master-clang7-tsan-339/build/tsan-clang7-dynamic-ninja/../../src/yb/tserver/tablet_service.cc:2767:31 (libtserver.so+0x21844b) ``` ``` [ts-4] Previous read of size 8 at 0x7b50002606a8 by thread T21: [ts-4] #0 std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::__get_long_size() const /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20210813185027-9a29e26965-centos7-x86_64-clang7/installed/tsan/libcxx/include/c++/v1/string:1468:34 (libtablet.so+0x36d6a2) [ts-4] #1 std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::size() const /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20210813185027-9a29e26965-centos7-x86_64-clang7/installed/tsan/libcxx/include/c++/v1/string:941 (libtablet.so+0x36d6a2) [ts-4] #2 std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::empty() const /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20210813185027-9a29e26965-centos7-x86_64-clang7/installed/tsan/libcxx/include/c++/v1/string:957 (libtablet.so+0x36d6a2) [ts-4] #3 yb::tablet::RaftGroupMetadata::data_root_dir() const /nfusr/centos-gcp-cloud/jenkins-worker-r2ttyq/jenkins/jenkins-github-yugabyte-db-centos-master-clang7-tsan-339/build/tsan-clang7-dynamic-ninja/../../src/yb/tablet/tablet_metadata.cc:773 (libtablet.so+0x36d6a2) [ts-4] #4 yb::tserver::TSTabletManager::CreateReportedTabletPB(std::__1::shared_ptr<yb::tablet::TabletPeer> const&, yb::master::ReportedTabletPB*) /nfusr/centos-gcp-cloud/jenkins-worker-r2ttyq/jenkins/jenkins-github-yugabyte-db-centos-master-clang7-tsan-339/build/tsan-clang7-dynamic-ninja/../../src/yb/tserver/ts_tablet_manager.cc:1840:68 (libtserver.so+0x26c6a2) [ts-4] #5 yb::tserver::TSTabletManager::GenerateTabletReport(yb::master::TabletReportPB*, bool) /nfusr/centos-gcp-cloud/jenkins-worker-r2ttyq/jenkins/jenkins-github-yugabyte-db-centos-master-clang7-tsan-339/build/tsan-clang7-dynamic-ninja/../../src/yb/tserver/ts_tablet_manager.cc:1925:5 (libtserver.so+0x26d072) [ts-4] #6 yb::tserver::Heartbeater::Thread::TryHeartbeat() /nfusr/centos-gcp-cloud/jenkins-worker-r2ttyq/jenkins/jenkins-github-yugabyte-db-centos-master-clang7-tsan-339/build/tsan-clang7-dynamic-ninja/../../src/yb/tserver/heartbeater.cc:371:32 (libtserver.so+0x1a2331) [ts-4] #7 yb::tserver::Heartbeater::Thread::DoHeartbeat() /nfusr/centos-gcp-cloud/jenkins-worker-r2ttyq/jenkins/jenkins-github-yugabyte-db-centos-master-clang7-tsan-339/build/tsan-clang7-dynamic-ninja/../../src/yb/tserver/heartbeater.cc:530:19 (libtserver.so+0x1a3678) ``` Test Plan: ybd tsan --gtest_filter LoadBalancerMultiTableTest.GlobalLeaderBalancing ybd tsan --gtest_filter LoadBalancerMultiTableTest.GlobalLoadBalancing Reviewers: sergei Reviewed By: sergei Subscribers: ybase, bogdan Differential Revision: https://phabricator.dev.yugabyte.com/D12906
1 parent 04e23a5 commit 1104ee6

File tree

3 files changed

+25
-15
lines changed

3 files changed

+25
-15
lines changed

src/yb/tablet/tablet_metadata.cc

Lines changed: 18 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -196,9 +196,13 @@ Status KvStoreInfo::LoadTablesFromPB(
196196
return Status::OK();
197197
}
198198

199-
Status KvStoreInfo::LoadFromPB(const KvStoreInfoPB& pb, const TableId& primary_table_id) {
199+
Status KvStoreInfo::LoadFromPB(const KvStoreInfoPB& pb,
200+
const TableId& primary_table_id,
201+
bool local_superblock) {
200202
kv_store_id = KvStoreId(pb.kv_store_id());
201-
rocksdb_dir = pb.rocksdb_dir();
203+
if (local_superblock) {
204+
rocksdb_dir = pb.rocksdb_dir();
205+
}
202206
lower_bound_key = pb.lower_bound_key();
203207
upper_bound_key = pb.upper_bound_key();
204208
has_been_fully_compacted = pb.has_been_fully_compacted();
@@ -489,18 +493,19 @@ Status RaftGroupMetadata::LoadFromDisk() {
489493

490494
RaftGroupReplicaSuperBlockPB superblock;
491495
RETURN_NOT_OK(ReadSuperBlockFromDisk(&superblock));
492-
RETURN_NOT_OK_PREPEND(LoadFromSuperBlock(superblock),
496+
RETURN_NOT_OK_PREPEND(LoadFromSuperBlock(superblock, /* local_superblock = */ true),
493497
"Failed to load data from superblock protobuf");
494498
state_ = kInitialized;
495499
return Status::OK();
496500
}
497501

498-
Status RaftGroupMetadata::LoadFromSuperBlock(const RaftGroupReplicaSuperBlockPB& superblock) {
502+
Status RaftGroupMetadata::LoadFromSuperBlock(const RaftGroupReplicaSuperBlockPB& superblock,
503+
bool local_superblock) {
499504
if (!superblock.has_kv_store()) {
500505
// Backward compatibility for tablet=KV-store=raft-group.
501506
RaftGroupReplicaSuperBlockPB superblock_migrated(superblock);
502507
RETURN_NOT_OK(MigrateSuperblock(&superblock_migrated));
503-
RETURN_NOT_OK(LoadFromSuperBlock(superblock_migrated));
508+
RETURN_NOT_OK(LoadFromSuperBlock(superblock_migrated, local_superblock));
504509
return Flush();
505510
}
506511

@@ -522,7 +527,9 @@ Status RaftGroupMetadata::LoadFromSuperBlock(const RaftGroupReplicaSuperBlockPB&
522527
primary_table_id_ = superblock.primary_table_id();
523528
colocated_ = superblock.colocated();
524529

525-
RETURN_NOT_OK(kv_store_.LoadFromPB(superblock.kv_store(), primary_table_id_));
530+
RETURN_NOT_OK(kv_store_.LoadFromPB(superblock.kv_store(),
531+
primary_table_id_,
532+
local_superblock));
526533

527534
wal_dir_ = superblock.wal_dir();
528535
tablet_data_state_ = superblock.tablet_data_state();
@@ -571,7 +578,7 @@ Status RaftGroupMetadata::Flush() {
571578
std::lock_guard<MutexType> lock(data_mutex_);
572579
ToSuperBlockUnlocked(&pb);
573580
}
574-
RETURN_NOT_OK(ReplaceSuperBlockUnlocked(pb));
581+
RETURN_NOT_OK(SaveToDiskUnlocked(pb));
575582
TRACE("Metadata flushed");
576583

577584
return Status::OK();
@@ -580,16 +587,16 @@ Status RaftGroupMetadata::Flush() {
580587
Status RaftGroupMetadata::ReplaceSuperBlock(const RaftGroupReplicaSuperBlockPB &pb) {
581588
{
582589
MutexLock l(flush_lock_);
583-
RETURN_NOT_OK_PREPEND(ReplaceSuperBlockUnlocked(pb), "Unable to replace superblock");
590+
RETURN_NOT_OK_PREPEND(SaveToDiskUnlocked(pb), "Unable to replace superblock");
584591
}
585592

586-
RETURN_NOT_OK_PREPEND(LoadFromSuperBlock(pb),
593+
RETURN_NOT_OK_PREPEND(LoadFromSuperBlock(pb, /* local_superblock = */ false),
587594
"Failed to load data from superblock protobuf");
588595

589596
return Status::OK();
590597
}
591598

592-
Status RaftGroupMetadata::ReplaceSuperBlockUnlocked(const RaftGroupReplicaSuperBlockPB &pb) {
599+
Status RaftGroupMetadata::SaveToDiskUnlocked(const RaftGroupReplicaSuperBlockPB &pb) {
593600
flush_lock_.AssertAcquired();
594601

595602
string path = fs_manager_->GetRaftGroupMetadataPath(raft_group_id_);
@@ -966,7 +973,7 @@ Result<RaftGroupMetadataPtr> RaftGroupMetadata::CreateSubtabletMetadata(
966973
ToSuperBlock(&superblock);
967974

968975
RaftGroupMetadataPtr metadata(new RaftGroupMetadata(fs_manager_, raft_group_id_));
969-
RETURN_NOT_OK(metadata->LoadFromSuperBlock(superblock));
976+
RETURN_NOT_OK(metadata->LoadFromSuperBlock(superblock, /* local_superblock = */ true));
970977
metadata->raft_group_id_ = raft_group_id;
971978
metadata->wal_dir_ = GetSubRaftGroupWalDir(raft_group_id);
972979
metadata->kv_store_.lower_bound_key = lower_bound_key;

src/yb/tablet/tablet_metadata.h

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -134,7 +134,9 @@ struct KvStoreInfo {
134134
rocksdb_dir(rocksdb_dir_),
135135
snapshot_schedules(snapshot_schedules_.begin(), snapshot_schedules_.end()) {}
136136

137-
CHECKED_STATUS LoadFromPB(const KvStoreInfoPB& pb, const TableId& primary_table_id);
137+
CHECKED_STATUS LoadFromPB(const KvStoreInfoPB& pb,
138+
const TableId& primary_table_id,
139+
bool local_superblock);
138140

139141
CHECKED_STATUS LoadTablesFromPB(
140142
const google::protobuf::RepeatedPtrField<TableInfoPB>& pbs, const TableId& primary_table_id);
@@ -529,13 +531,14 @@ class RaftGroupMetadata : public RefCountedThreadSafe<RaftGroupMetadata> {
529531
CHECKED_STATUS LoadFromDisk();
530532

531533
// Update state of metadata to that of the given superblock PB.
532-
CHECKED_STATUS LoadFromSuperBlock(const RaftGroupReplicaSuperBlockPB& superblock);
534+
CHECKED_STATUS LoadFromSuperBlock(const RaftGroupReplicaSuperBlockPB& superblock,
535+
bool local_superblock);
533536

534537
CHECKED_STATUS ReadSuperBlock(RaftGroupReplicaSuperBlockPB *pb);
535538

536539
// Fully replace superblock.
537540
// Requires 'flush_lock_'.
538-
CHECKED_STATUS ReplaceSuperBlockUnlocked(const RaftGroupReplicaSuperBlockPB &pb);
541+
CHECKED_STATUS SaveToDiskUnlocked(const RaftGroupReplicaSuperBlockPB &pb);
539542

540543
// Requires 'data_mutex_'.
541544
void ToSuperBlockUnlocked(RaftGroupReplicaSuperBlockPB* superblock) const REQUIRES(data_mutex_);

src/yb/tserver/remote_bootstrap_client.cc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -409,7 +409,7 @@ Status RemoteBootstrapClient::Finish() {
409409
CHECK(started_);
410410

411411
CHECK(downloaded_wal_);
412-
CHECK(downloaded_rocksdb_files_) << "files not downloaded";;
412+
CHECK(downloaded_rocksdb_files_) << "files not downloaded";
413413

414414
RETURN_NOT_OK(WriteConsensusMetadata());
415415

0 commit comments

Comments
 (0)