Skip to content

Commit 02ced43

Browse files
committed
[yugabyte#22821] YSQL: Preserve local limit in a multi-page read
Summary: ### Issue We set the read time explicitly in pg_op.cc for subsequent pages. This sets the read time on consistent read point. Consistent read point thinks this is a new read point and clears out all the local limit values stored. This will cause the local limit to advance, leading to read restart errors. ### Fix Clear out paging read time field before passing the response back to the pg layer from tserver's pg client session. Except for catalog sessions since they do not follow used_read_time logic. ### History 1c2e37d was introduced to use the same read time across all pages of a read. This was done as follows: docdb sets a read_time in the paging state and passes back to Pg. Pg would then copy paste the same paging state in the next RPC to docdb. Docdb would then pick the read time from the paging state to ensure that the same snapshot is used. b97a881 was a follow-up fix after 1c2e37d. After this fix, PG, sets the read time in the subsequent rpc requests using the read time from the paging state, instead of relying on docdb to use the read time from the copy pasted paging state. Currently, we do not need the paging state read time (sans upgrade reasons) for 1. Plain sessions have used_read_time logic that sets the read time of subsequent RPCs. 2. DDL sessions use distributed transactions from the start and the used_read_time for that is also handled. Catalog sessions require paging read time at the moment. We do not clear paging read time for catalog requests when sending back the response to pggate. **Upgrade/Rollback safety:** The used read time logic is present in local proxy. Since the local proxy is upgraded along with the Pg layer, no upgrade issues are expected. Jira: DB-11718 Test Plan: Jenkins ``` ./yb_build.sh --cxx-test pg_read_visibility-test --gtest-filter MultiPageScan ./yb_build.sh --cxx-test pgwrapper_pg_single_tserver-test --gtest_filter PgSingleTServerTest.TestPagingInSerializableIsolation ./yb_build.sh --gtest_filter PgLibPqTest.PagingReadRestart ``` Reviewers: pjain, sergei, dmitry Reviewed By: pjain, sergei, dmitry Subscribers: svc_phabricator, yql, ybase Differential Revision: https://phorge.dev.yugabyte.com/D35750
1 parent 788434a commit 02ced43

File tree

2 files changed

+45
-4
lines changed

2 files changed

+45
-4
lines changed

src/yb/tserver/pg_client_session.cc

Lines changed: 21 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -465,10 +465,27 @@ struct PerformData {
465465
// Prevent further paging reads from read restart errors.
466466
// See the ProcessUsedReadTime(...) function for details.
467467
*op_resp.mutable_paging_state()->mutable_read_time() = resp.catalog_read_time();
468-
}
469-
if (transaction && transaction->isolation() == IsolationLevel::SERIALIZABLE_ISOLATION) {
470-
// Delete read time from paging state since a read time is not used in serializable
471-
// isolation level.
468+
} else {
469+
// Clear read time for the next page here unless absolutely necessary.
470+
//
471+
// Otherwise, if we do not clear read time here, a request for the
472+
// next page with this read time can be sent back by the pg layer.
473+
// Explicit read time in the request clears out existing local limits
474+
// since the pg client session incorrectly believes that this passed
475+
// read time is new. However, paging read time is simply a copy of
476+
// the previous read time.
477+
//
478+
// Rely on
479+
// 1. Either pg client session to set the read time.
480+
// See pg_client_session.cc's SetupSession
481+
// and transaction.cc's SetReadTimeIfNeeded
482+
// and batcher.cc's ExecuteOperations
483+
// 2. Or transaction used read time logic in transaction.cc
484+
// 3. Or plain session's used read time logic in CheckPlainSessionPendingUsedReadTime
485+
// to set the read time for the next page.
486+
//
487+
// Catalog sessions are not handled by the above logic, so
488+
// we set the paging read time above.
472489
op_resp.mutable_paging_state()->clear_read_time();
473490
}
474491
}

src/yb/yql/pgwrapper/pg_local_limit_optimization-test.cc

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -167,6 +167,10 @@ class PgLocalLimitOptimizationTest : public PgMiniTestBase {
167167
// Force the scan in a single page ...
168168
ASSERT_OK(read_conn.Execute(Format(
169169
"SET yb_fetch_row_limit = $0", 2 * kNumInitialRows)));
170+
} else {
171+
// ... or multiple pages.
172+
ASSERT_OK(read_conn.Execute(Format(
173+
"SET yb_fetch_row_limit = $0", kNumInitialRows / 100)));
170174
}
171175
PopulateReadConnCache(read_conn);
172176

@@ -308,6 +312,26 @@ TEST_F(PgLocalLimitOptimizationTest, SinglePageScan) {
308312
InsertRowConcurrentlyWithTableScan();
309313
}
310314

315+
// Before #22821, in a multi-page scan, for each subsequent page scan,
316+
// the read time was set by pggate explicitly based on the used time
317+
// returned by the response for the previous page.
318+
//
319+
// This behavior of overriding the read time also resets the per-tablet
320+
// local limit map. There is no reason for pggate to send read time
321+
// explicitly since the read time does not change across multiple pages.
322+
//
323+
// This test ensures that there is no read restart error just because the
324+
// scan spans multiple pages. Fails without #22821.
325+
TEST_F(PgLocalLimitOptimizationTest, MultiPageScan) {
326+
// Test Config
327+
is_single_tablet_ = true;
328+
is_single_page_scan_ = false;
329+
scan_cmd_ = ScanCmd::kOrdered;
330+
331+
// Run Test
332+
InsertRowConcurrentlyWithTableScan();
333+
}
334+
311335
// In a multi-tablet scan, the read time is
312336
// 1. Either picked on the local tserver proxy. (This test).
313337
// 2. Or picked on the first tserver that the scan hits.

0 commit comments

Comments
 (0)