Skip to content

Commit bd4874b

Browse files
committed
[yugabyte#13358] YSQL: Fix DDL atomicity stress test failure in tsan build
Summary: The DDL atomicity stress tests failed more on pg15 branch with an error like: ``` WARNING: ThreadSanitizer: data race (pid=180911) Write of size 8 at 0x7b2c000257b8 by thread T17 (mutexes: write M0): #0 profile_open_file prof_file.c (libkrb5.so.3+0xf45b3) #1 profile_init_flags <null> (libkrb5.so.3+0xfb056) #2 k5_os_init_context <null> (libkrb5.so.3+0xe5546) #3 krb5_init_context_profile <null> (libkrb5.so.3+0xabc90) #4 krb5_init_context <null> (libkrb5.so.3+0xabbd5) #5 krb5_gss_init_context init_sec_context.c (libgssapi_krb5.so.2+0x448da) #6 acquire_cred_from acquire_cred.c (libgssapi_krb5.so.2+0x39159) #7 krb5_gss_acquire_cred_from acquire_cred.c (libgssapi_krb5.so.2+0x39072) #8 gss_add_cred_from <null> (libgssapi_krb5.so.2+0x1fcd3) #9 gss_acquire_cred_from <null> (libgssapi_krb5.so.2+0x1f69d) #10 gss_acquire_cred <null> (libgssapi_krb5.so.2+0x1f431) #11 pg_GSS_have_cred_cache ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-gssapi-common.c:68:10 (libpq.so.5+0x543fe) yugabyte#12 PQconnectPoll ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-connect.c:2909:22 (libpq.so.5+0x359ca) yugabyte#13 connectDBComplete ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-connect.c:2241:10 (libpq.so.5+0x30807) yugabyte#14 PQconnectdb ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-connect.c:719:10 (libpq.so.5+0x30af1) yugabyte#15 yb::pgwrapper::PGConn::Connect(string const&, std::chrono::time_point<yb::CoarseMonoClock, std::chrono::duration<long long, std::ratio<1l, 1000000000l>>>, bool, string const&) ${YB_SRC_ROOT}/src/yb/yql/pgwrapper/libpq_utils.cc:348:24 (libpq_utils.so+0x13c5b) yugabyte#16 yb::pgwrapper::PGConn::Connect(string const&, bool, string const&) ${YB_SRC_ROOT}/src/yb/yql/pgwrapper/libpq_utils.h:254:12 (libpq_utils.so+0x1a77e) yugabyte#17 yb::pgwrapper::PGConnBuilder::Connect(bool) const ${YB_SRC_ROOT}/src/yb/yql/pgwrapper/libpq_utils.cc:743:10 (libpq_utils.so+0x1a77e) yugabyte#18 yb::pgwrapper::LibPqTestBase::ConnectToDBAsUser(string const&, string const&, bool) ${YB_SRC_ROOT}/src/yb/yql/pgwrapper/libpq_test_base.cc:54:6 (libpg_wrapper_test_base.so+0x26f34) yugabyte#19 yb::pgwrapper::LibPqTestBase::ConnectToDB(string const&, bool) ${YB_SRC_ROOT}/src/yb/yql/pgwrapper/libpq_test_base.cc:44:10 (libpg_wrapper_test_base.so+0x26b1e) yugabyte#20 yb::pgwrapper::LibPqTestBase::Connect(bool) ${YB_SRC_ROOT}/src/yb/yql/pgwrapper/libpq_test_base.cc:40:10 (libpg_wrapper_test_base.so+0x26b1e) yugabyte#21 yb::pgwrapper::PgDdlAtomicityStressTest::Connect() ${YB_SRC_ROOT}/src/yb/yql/pgwrapper/pg_ddl_atomicity_stress-test.cc:147:25 (pg_ddl_atomicity_stress-test+0x136d6c) yugabyte#22 yb::pgwrapper::PgDdlAtomicityStressTest::TestDdl(std::vector<string, std::allocator<string>> const&, int) ${YB_SRC_ROOT}/src/yb/yql/pgwrapper/pg_ddl_atomicity_stress-test.cc:165:15 (pg_ddl_atomicity_stress-test+0x136df5) yugabyte#23 yb::pgwrapper::PgDdlAtomicityStressTest_StressTest_Test::TestBody()::$_2::operator()() const ${YB_SRC_ROOT}/src/yb/yql/pgwrapper/pg_ddl_atomicity_stress-test.cc:316:5 (pg_ddl_atomicity_stress-test+0x13d2eb) ``` It appears that the function `yb::pgwrapper::LibPqTestBase::Connect` isn't thread safe. I restructured the code to make the connections in a single thread and then pass them to various concurrent threads for testing. Jira: DB-2996 Test Plan: ./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/0 --clang17 ./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/1 --clang17 ./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/2 --clang17 ./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/3 --clang17 ./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/4 --clang17 ./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/5 --clang17 ./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/6 --clang17 ./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/7 --clang17 ./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/8 --clang17 ./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/9 --clang17 ./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/10 --clang17 ./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/11 --clang17 ./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/12 --clang17 ./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/13 --clang17 ./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/14 --clang17 ./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/15 --clang17 ./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/16 --clang17 ./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/17 --clang17 ./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/18 --clang17 ./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/19 --clang17 Verified that no more tsan errors. Reviewers: fizaa Reviewed By: fizaa Subscribers: yql Differential Revision: https://phorge.dev.yugabyte.com/D37111
1 parent da9b281 commit bd4874b

File tree

1 file changed

+26
-24
lines changed

1 file changed

+26
-24
lines changed

src/yb/yql/pgwrapper/pg_ddl_atomicity_stress-test.cc

Lines changed: 26 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -92,11 +92,11 @@ class PgDdlAtomicityStressTest
9292
return std::get<2>(GetParam());
9393
}
9494

95-
Status TestDdl(const std::vector<std::string>& ddl, const int iteration);
95+
Status TestDdl(PGConn* conn, const std::vector<std::string>& ddl, const int iteration);
9696

97-
Status TestConcurrentIndex(const int num_iterations);
97+
Status TestConcurrentIndex(PGConn* conn, const int num_iterations);
9898

99-
Status TestDml(const int num_iterations);
99+
Status TestDml(PGConn* conn, const int num_iterations);
100100

101101
template<class... Args>
102102
Result<bool> ExecuteFormatWithRetry(PGConn* conn, const std::string& format, Args&&... args) {
@@ -161,13 +161,12 @@ std::string PgDdlAtomicityStressTest::database() {
161161
}
162162

163163
Status PgDdlAtomicityStressTest::TestDdl(
164-
const std::vector<std::string>& ddls, const int num_iterations) {
165-
auto conn = VERIFY_RESULT(Connect());
164+
PGConn* conn, const std::vector<std::string>& ddls, const int num_iterations) {
166165
for (int i = 0; i < num_iterations; ++i) {
167166
for (const auto& ddl : ddls) {
168167
auto stmt = Format(ddl, kTable, i);
169168
LOG(INFO) << "Executing stmt " << stmt;
170-
while (!VERIFY_RESULT(DoExecuteWithRetry(&conn, stmt))) {
169+
while (!VERIFY_RESULT(DoExecuteWithRetry(conn, stmt))) {
171170
LOG(INFO) << "Retry executing stmt " << stmt;
172171
}
173172
}
@@ -224,35 +223,33 @@ Result<bool> PgDdlAtomicityStressTest::DoExecuteWithRetry(PGConn* conn, const st
224223
return s;
225224
}
226225

227-
Status PgDdlAtomicityStressTest::TestConcurrentIndex(const int num_iterations) {
228-
auto conn = VERIFY_RESULT(Connect());
226+
Status PgDdlAtomicityStressTest::TestConcurrentIndex(PGConn* conn, const int num_iterations) {
229227
for (int i = 0; i < num_iterations; ++i) {
230228
bool index_created = false;
231229
while (!index_created) {
232230
// If concurrent index creation fails, it does not clean up the invalid index. Thus to
233231
// make the statement idempotent, drop the index if the create index failed before retrying.
234232
index_created = VERIFY_RESULT(ExecuteFormatWithRetry(
235-
&conn, "CREATE INDEX idx_$0 ON $1(key)", i, kTable));
233+
conn, "CREATE INDEX idx_$0 ON $1(key)", i, kTable));
236234
if (!index_created) {
237235
auto stmt = Format("DROP INDEX IF EXISTS idx_$0", i);
238-
while (!VERIFY_RESULT(ExecuteFormatWithRetry(&conn, stmt))) {
236+
while (!VERIFY_RESULT(ExecuteFormatWithRetry(conn, stmt))) {
239237
LOG(INFO) << "Retry executing stmt " << stmt;
240238
}
241239
}
242240
}
243241
auto stmt = Format("DROP INDEX idx_$0", i);
244-
while (!VERIFY_RESULT(ExecuteFormatWithRetry(&conn, stmt))) {
242+
while (!VERIFY_RESULT(ExecuteFormatWithRetry(conn, stmt))) {
245243
LOG(INFO) << "Retry executing stmt " << stmt;
246244
}
247245
}
248246
return Status::OK();
249247
}
250248

251-
Status PgDdlAtomicityStressTest::TestDml(const int num_iterations) {
252-
auto conn = VERIFY_RESULT(Connect());
249+
Status PgDdlAtomicityStressTest::TestDml(PGConn* conn, const int num_iterations) {
253250
for (int i = 1; i <= num_iterations;) {
254251
if (VERIFY_RESULT(ExecuteFormatWithRetry(
255-
&conn, "UPDATE $0 SET value = 'value_$1' WHERE key = $1", kTable, i))) {
252+
conn, "UPDATE $0 SET value = 'value_$1' WHERE key = $1", kTable, i))) {
256253
++i;
257254
}
258255
}
@@ -288,45 +285,50 @@ TEST_P(PgDdlAtomicityStressTest, StressTest) {
288285
// exists when it is executed. Each thread uses its own connection for its entire duration.
289286

290287
// Create a thread to add and drop columns.
291-
thread_holder.AddThreadFunctor([this, num_iterations] {
288+
auto conn1 = ASSERT_RESULT(Connect());
289+
thread_holder.AddThreadFunctor([this, &conn1, num_iterations] {
292290
std::vector<std::string> ddls = {
293291
"ALTER TABLE $0 ADD COLUMN col_$1 TEXT",
294292
"ALTER TABLE $0 DROP COLUMN col_$1"
295293
};
296-
ASSERT_OK(TestDdl(ddls, num_iterations));
294+
ASSERT_OK(TestDdl(&conn1, ddls, num_iterations));
297295
LOG(INFO) << "Thread to add and drop columns has completed";
298296
});
299297

300298
// Create a thread to add and drop columns with default values.
301-
thread_holder.AddThreadFunctor([this, num_iterations] {
299+
auto conn2 = ASSERT_RESULT(Connect());
300+
thread_holder.AddThreadFunctor([this, &conn2, num_iterations] {
302301
std::vector<std::string> ddls = {
303302
"ALTER TABLE $0 ADD COLUMN col_def_$1 TEXT DEFAULT 'def'",
304303
"ALTER TABLE $0 DROP COLUMN col_def_$1"
305304
};
306-
ASSERT_OK(TestDdl(ddls, num_iterations));
305+
ASSERT_OK(TestDdl(&conn2, ddls, num_iterations));
307306
LOG(INFO) << "Thread to add and drop columns with default values has completed";
308307
});
309308

310309
// Create a thread to create/drop an index on this table.
311-
thread_holder.AddThreadFunctor([this, num_iterations] {
310+
auto conn3 = ASSERT_RESULT(Connect());
311+
thread_holder.AddThreadFunctor([this, &conn3, num_iterations] {
312312
std::vector<std::string> ddls = {
313313
"CREATE INDEX NONCONCURRENTLY non_concurrent_idx_$1 ON $0(key)",
314314
"DROP INDEX non_concurrent_idx_$1"
315315
};
316-
ASSERT_OK(TestDdl(ddls, num_iterations));
316+
ASSERT_OK(TestDdl(&conn3, ddls, num_iterations));
317317
LOG(INFO) << "Thread to create/drop an index has completed";
318318
});
319319

320320
// ConcurrentIndex is a very long running operation. Cleaning up a failed ConcurrentIndex is
321321
// also a DDL, and this can be a very long running test. Reduce the number of iterations.
322-
thread_holder.AddThreadFunctor([this, num_iterations] {
323-
ASSERT_OK(TestConcurrentIndex(num_iterations / 2));
322+
auto conn4 = ASSERT_RESULT(Connect());
323+
thread_holder.AddThreadFunctor([this, &conn4, num_iterations] {
324+
ASSERT_OK(TestConcurrentIndex(&conn4, num_iterations / 2));
324325
LOG(INFO) << "Thread to run concurrent index has completed";
325326
});
326327

327328
// Create a thread to update the rows on this table.
328-
thread_holder.AddThreadFunctor([this, num_iterations] {
329-
ASSERT_OK(TestDml(num_iterations));
329+
auto conn5 = ASSERT_RESULT(Connect());
330+
thread_holder.AddThreadFunctor([this, &conn5, num_iterations] {
331+
ASSERT_OK(TestDml(&conn5, num_iterations));
330332
LOG(INFO) << "Thread to update the rows has completed";
331333
});
332334

0 commit comments

Comments
 (0)