Skip to content

Commit 49f1b8c

Browse files
committed
[yugabyte#24431] YSQL: Supplement read restart error with more info
Summary: ### Motivation Read restart errors are a common occurence in YugabyteDB. Mitigation strategy varies case-by-case. Well known approaches: 1. Use READ COMMITTED isolation level. A popular mitigation strategy. 2. Increase ysql_output_buffer_size. This is applicable for large SELECTs that have a low read restart rate. 3. Use the deferrable mode. This is for background tasks that scan a lot of rows. 4. Leverage time sync service. Best option but still early. However, these mitigitation strategies do not always work. Such cases require addressing the underlying problem. Currently, the logs lack sufficient detail, leading to speculation rather than concrete diagnosis. ### Examples #### Running CREATE/DROP TEMP TABLE concurrently results in a read restart error ``` 2025-04-04 01:57:32.353 UTC [587443] ERROR: Restart read required at: read: { days: 20182 time: 01:57:32.350502 } local_limit: { days: 20182 time: 01:57:32.350502 } global_limit: <min> in_txn_limit: <max> serial_no: 0 } (yb_max_query_layer_retries set to 0 are exhausted) ``` Users do not expect this error because TEMP TABLEs are local to each pg backend. Moreover, increasing ysql_output_buffer_size does not mitigate the error here because read restart errors on DDLs cannot be retried yet. That said, users find it helpful to understand the root cause of the error. After this revision, the log becomes ``` 2025-04-04 01:57:32.353 UTC [587443] DETAIL: restart_read_time: { read: { days: 20182 time: 01:57:32.350502 } local_limit: { days: 20182 time: 01:57:32.350502 } global_limit: <min> in_txn_limit: <max> serial_no: 0 }, original_read_time: { read: { days: 20182 time: 01:57:32.096749 } local_limit: { days: 20182 time: 01:57:32.098414 } global_limit: { days: 20182 time: 01:57:32.596749 } in_txn_limit: <max> serial_no: 0 }, table: template1.pg_yb_invalidation_messages [ysql_schema=pg_catalog] [00000001000030008001000000001f90], tablet: 00000000000000000000000000000000, leader_uuid: f1ac558309244cbab385761b531980f0, key: SubDocKey(DocKey(CoTableId=901f0000-0000-0180-0030-000001000000, [], [13515, 10]), [SystemColumnId(0)]), start_time: 2025-04-04 01:57:32.094965+00 ``` **Cause:** The read restart error is returned when accessing the `pg_yb_invalidation_messages` table in the above example. **Mitigation:** Avoid creating TEMP TABLEs from mutliple sessions concurrently. Note: The issue is fixed by table-level locking. #### CREATE INDEX NONCONCURRENTLY CREATE INDEX NONCONCURRENTLY accesses both the user table as well as syscatalog. How can the user identify whether the read restart error occurred due to a concurrent DDL or DML? ``` [ts-1] 2025-04-04 04:25:47.694 UTC [665450] DETAIL: restart_read_time: { read: { days: 20182 time: 04:25:47.512991 } local_limit: { days: 20182 time: 04:25:47.512991 } global_limit: <min> in_txn_limit: <max> serial_no: 0 }, original_read_time: { read: { days: 20182 time: 04:25:47.012991 } local_limit: { days: 20182 time: 04:25:47.512991 } global_limit: { days: 20182 time: 04:25:47.512991 } in_txn_limit: <max> serial_no: 0 }, table: yugabyte.test_table [ysql_schema=public] [000034cb000030008000000000004000], tablet: bf9d2bf8908b43d4b8b5ff75b169bf11, leader_uuid: 6d60cea932ca41498fee5ad07c375d18, key: SubDocKey(DocKey(0x0a73, [5], []), [ColumnId(1)]) ``` **Cause:** In the above log, the error occurs on the user table. **Mitigation:** Stop concurrent writes when creating the INDEX. The read restart can also be on a catalog table. Example: ``` [ts-1] 2025-04-04 02:17:48.392 UTC [592811] DETAIL: restart_read_time: { read: { days: 20182 time: 02:17:48.326938 } local_limit: { days: 20182 time: 02:17:48.326938 } global_limit: <min> in_txn_limit: <max> serial_no: 0 }, original_read_time: { read: { days: 20182 time: 02:17:48.023542 } local_limit: { days: 20182 time: 02:17:48.024082 } global_limit: { days: 20182 time: 02:17:48.523542 } in_txn_limit: <max> serial_no: 0 }, table: yugabyte.pg_attribute [ysql_schema=pg_catalog] [000034cb0000300080010000000004e1], tablet: 00000000000000000000000000000000, leader_uuid: e941d4ee43bc40a8b4303eab317f52fd, key: SubDocKey(DocKey(CoTableId=e1040000-0000-0180-0030-0000cb340000, [], [16384, 5]), []), start_time: 2025-04-04 02:17:48.022154+00 [ts-1] 2025-04-04 02:17:48.392 UTC [592811] STATEMENT: CREATE INDEX NONCONCURRENTLY non_concurrent_idx_0 ON test_table(key) ``` #### Read restarts on absent keys Users sometimes report that they notice read restart errors even though they are no conflicting keys. To better observe such scenarios, print a key on which read restart error occurred. There can be multiple such keys, so we print the first one we encounter. Here's an example fixed by yugabyte#25214. Setup ``` CREATE TABLE keys(k1 INT, k2 INT, PRIMARY KEY(k1 ASC, k2 ASC)); INSERT INTO keys(k1, k2) VALUES (1, 1); ``` The following query ``` INSERT INTO keys(k1, k2) VALUES (0, 0), (9, 9) ON CONFLICT DO NOTHING ``` results in a read restart error even though none of 0 or 9 conflict with 1. ``` 2025-04-04 17:55:03.013 UTC [955800] DETAIL: restart_read_time: { read: { days: 20182 time: 17:55:03.012147 } local_limit: { days: 20182 time: 17:55:03.012147 } global_limit: <min> in_txn_limit: <max> serial_no: 0 }, original_read_time: { read: { days: 20182 time: 17:55:02.983152 } local_limit: { days: 20182 time: 17:55:03.483152 } global_limit: { days: 20182 time: 17:55:03.483152 } in_txn_limit: { days: 20182 time: 17:55:03.011777 } serial_no: 0 }, table: yugabyte.keys [ysql_schema=public] [000034cb000030008000000000004000], tablet: 2a19ede71f7e4389b1ca682e87318b32, leader_uuid: 7010bbec8ec94110b5e2316bca5319f4, key: SubDocKey(DocKey([], [1, 1]), [SystemColumnId(0)]), start_time: 2025-04-04 17:55:03.010743+00 ``` This helps us understand where the read restart error occurred. It occurs on key = (1, 1), not relevant to the INSERT ON CONFLICT statement. The key field helps diagnose such scenarios. **Mitigation:** See 1bcb09c for the fix. #### Local Limit Fixes Commits d67ba12 and 02ced43 fix some bugs with the local llimit mechanism. Identifying the value of local limit when the issue occurs is fairly straightforward. The error message has local limit. ``` [ts-1] 2025-04-04 02:17:48.392 UTC [592811] DETAIL: restart_read_time: { read: { days: 20182 time: 02:17:48.326938 } local_limit: { days: 20182 time: 02:17:48.326938 } global_limit: <min> in_txn_limit: <max> serial_no: 0 }, original_read_time: { read: { days: 20182 time: 02:17:48.023542 } local_limit: { days: 20182 time: 02:17:48.024082 } global_limit: { days: 20182 time: 02:17:48.523542 } in_txn_limit: <max> serial_no: 0 }, table: yugabyte.pg_attribute [ysql_schema=pg_catalog] [000034cb0000300080010000000004e1], tablet: 00000000000000000000000000000000, leader_uuid: e941d4ee43bc40a8b4303eab317f52fd, key: SubDocKey(DocKey(CoTableId=e1040000-0000-0180-0030-0000cb340000, [], [16384, 5]), []), start_time: 2025-04-04 02:17:48.022154+00 [ts-1] 2025-04-04 02:17:48.392 UTC [592811] STATEMENT: CREATE INDEX NONCONCURRENTLY non_concurrent_idx_0 ON test_table(key) ``` #### pg_partman faces read restart errors pg_partman faces a read restart error on max query ``` ERROR: Restart read required at: { read: { physical: 1730622229379200 } local_limit: { physical: 1730622229379200 } global_limit: <min> in_txn_limit: <max> serial_no: 0 } CONTEXT: SQL statement "SELECT max(id)::text FROM public.t1_p100000" ``` **Cause:** This is because there is no range index on the tablet 1_p100000 to retrieve the max value efficiently using the index. A log such as ``` 2025-04-04 18:55:31.936 UTC [963409] DETAIL: restart_read_time: { read: { days: 20182 time: 18:55:31.934321 } local_limit: { days: 20182 time: 18:55:31.934321 } global_limit: <min> in_txn_limit: <max> serial_no: 0 }, original_read_time: { days: 20182 time: 18:55:31.738805 }, table: yugabyte.tokens_idx [ysql_schema=public] [000034cb000030008000000000004003], tablet: {tokens_idx_tablet_id1}, leader_uuid: {ts2_peer_id}, key: SubDocKey(DocKey([], [300, "G%\xf8S\xde\xb0h\xe6)\xa3G0\x96\xae\x9e9\xf7r\xbe\xc3\x00\x00!!"]), [SystemColumnId(0)]), start_time: 2025-04-04 18:55:31.933012+00 ``` informs us the table on which the read restart error occurred. If the table is an index such as tokens_idx, the query is using an index scan. **Upgrade/Rollback safety:** Adds a new key restart_read_key to tserver.proto. The key is marked optional. When absent an empty value is printed in its stead. Jira: DB-13338 Test Plan: Jenkins Backport-through: 2024.2 Reviewers: smishra, pjain Reviewed By: pjain Subscribers: yql Differential Revision: https://phorge.dev.yugabyte.com/D42883
1 parent 58c2f45 commit 49f1b8c

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

41 files changed

+258
-113
lines changed

docs/content/preview/architecture/transactions/read-restart-error.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ Read restart errors are raised to maintain the _read-after-commit-visibility_ gu
2424
YugabyteDB doesn't require atomic clocks, but instead allows a configurable setting for maximum clock skew. Time synchronization protocols such as NTP synchronize commodity hardware clocks periodically to keep the skew low and bounded. Additionally, YugabyteDB has optimizations to resolve this ambiguity internally with best-effort. However, when it can't resolve the error internally, YugabyteDB outputs a `read restart` error to the external client, similar to the following:
2525

2626
```output
27-
ERROR: Query error: Restart read required at: { read: { physical: 1656351408684482 } local_limit: { physical: 1656351408684482 } global_limit: <min> in_txn_limit: <max> serial_no: 0 }
27+
ERROR: Query error: Restart read required
2828
```
2929

3030
The following scenario describes how clock skew can result in the above mentioned ambiguity around data visibility in detail:

src/postgres/src/backend/utils/error/elog.c

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4624,6 +4624,27 @@ yb_errdetail_from_status(const char *fmt, const size_t nargs, const char **args)
46244624
return 0; /* return value does not matter */
46254625
}
46264626

4627+
int
4628+
yb_errdetail_log_from_status(const char *fmt, const size_t nargs, const char **args)
4629+
{
4630+
Assert(!IsMultiThreadedMode());
4631+
4632+
ErrorData *edata = &errordata[errordata_stack_depth];
4633+
MemoryContext oldcontext;
4634+
4635+
recursion_depth++;
4636+
CHECK_STACK_DEPTH();
4637+
oldcontext = MemoryContextSwitchTo(edata->assoc_context);
4638+
4639+
YB_EVALUATE_MESSAGE_FROM_STATUS(edata->domain, detail_log,
4640+
false /* appendval */ ,
4641+
true /* translateit */ , nargs, args);
4642+
4643+
MemoryContextSwitchTo(oldcontext);
4644+
recursion_depth--;
4645+
return 0; /* return value does not matter */
4646+
}
4647+
46274648
/*
46284649
* Set the given field to the given value (assumed to be palloc-d) or, if the
46294650
* new value is null, pstrdup the existing value of the field to ensure that

src/postgres/src/backend/utils/misc/pg_yb_utils.c

Lines changed: 37 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -857,7 +857,10 @@ void
857857
GetStatusMsgAndArgumentsByCode(const uint32_t pg_err_code, YbcStatus s,
858858
const char **msg_buf, size_t *msg_nargs,
859859
const char ***msg_args, const char **detail_buf,
860-
size_t *detail_nargs, const char ***detail_args)
860+
size_t *detail_nargs, const char ***detail_args,
861+
const char **detail_log_buf,
862+
size_t *detail_log_nargs,
863+
const char ***detail_log_args)
861864
{
862865
const char *status_msg = YBCMessageAsCString(s);
863866
size_t status_nargs;
@@ -871,6 +874,9 @@ GetStatusMsgAndArgumentsByCode(const uint32_t pg_err_code, YbcStatus s,
871874
*detail_buf = NULL;
872875
*detail_nargs = 0;
873876
*detail_args = NULL;
877+
*detail_log_buf = NULL;
878+
*detail_log_nargs = 0;
879+
*detail_log_args = NULL;
874880
elog(DEBUG2, "status_msg=%s pg_err_code=%d", status_msg, pg_err_code);
875881

876882
switch (pg_err_code)
@@ -891,6 +897,36 @@ GetStatusMsgAndArgumentsByCode(const uint32_t pg_err_code, YbcStatus s,
891897
*detail_nargs = status_nargs;
892898
*detail_args = status_args;
893899
break;
900+
case ERRCODE_YB_RESTART_READ:
901+
*msg_buf = "Restart read required";
902+
*msg_nargs = 0;
903+
*msg_args = NULL;
904+
905+
/*
906+
* Read restart errors occur when writes fall within the uncertianty
907+
* interval [read_time, global_limit).
908+
*
909+
* Moreover, read_time can be less than the current time since it
910+
* is picked as the docdb tablet's safe time as an optimization in
911+
* some cases.
912+
*
913+
* As a consequence, read_time may be lower than the commit time of the
914+
* previous transaction from the same session.
915+
*
916+
* In this case, a read restart error may be issued to move the read
917+
* time past the commit time.
918+
*
919+
* To capture such cases, print the start time of the statement. This
920+
* allows comparison between the start time and the original read time.
921+
*/
922+
*detail_log_buf = psprintf("%s, stmt_start_time: %s, txn_start_time: %s, iso:%d",
923+
status_msg,
924+
timestamptz_to_str(GetCurrentStatementStartTimestamp()),
925+
timestamptz_to_str(GetCurrentTransactionStartTimestamp()),
926+
XactIsoLevel);
927+
*detail_log_nargs = status_nargs;
928+
*detail_log_args = status_args;
929+
break;
894930
case ERRCODE_YB_DEADLOCK:
895931
*msg_buf = "deadlock detected";
896932
*msg_nargs = 0;

src/postgres/src/include/pg_yb_utils.h

Lines changed: 14 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1160,7 +1160,10 @@ void GetStatusMsgAndArgumentsByCode(const uint32_t pg_err_code, YbcStatus s,
11601160
const char ***msg_args,
11611161
const char **detail_buf,
11621162
size_t *detail_nargs,
1163-
const char ***detail_args);
1163+
const char ***detail_args,
1164+
const char **detail_log_buf,
1165+
size_t *detail_log_nargs,
1166+
const char ***detail_log_args);
11641167

11651168
bool YbIsBatchedExecution();
11661169
void YbSetIsBatchedExecution(bool value);
@@ -1201,21 +1204,30 @@ YbOptSplit *YbGetSplitOptions(Relation rel);
12011204
const char *funcname = YBCStatusFuncname(_status); \
12021205
const char *msg_buf = NULL; \
12031206
const char *detail_buf = NULL; \
1207+
const char *detail_log_buf = NULL; \
12041208
size_t msg_nargs = 0; \
12051209
size_t detail_nargs = 0; \
1210+
size_t detail_log_nargs = 0; \
12061211
const char **msg_args = NULL; \
12071212
const char **detail_args = NULL; \
1213+
const char **detail_log_args = NULL; \
12081214
GetStatusMsgAndArgumentsByCode(pg_err_code, _status, \
12091215
&msg_buf, &msg_nargs, &msg_args, \
12101216
&detail_buf, &detail_nargs, \
1211-
&detail_args); \
1217+
&detail_args, &detail_log_buf, \
1218+
&detail_log_nargs, \
1219+
&detail_log_args); \
12121220
YBCFreeStatus(_status); \
12131221
if (errstart(adjusted_elevel, TEXTDOMAIN)) \
12141222
{ \
12151223
Assert(msg_buf); \
12161224
yb_errmsg_from_status(msg_buf, msg_nargs, msg_args); \
12171225
if (detail_buf) \
12181226
yb_errdetail_from_status(detail_buf, detail_nargs, detail_args); \
1227+
if (detail_log_buf) \
1228+
yb_errdetail_log_from_status(detail_log_buf, \
1229+
detail_log_nargs, \
1230+
detail_log_args); \
12191231
yb_set_pallocd_error_file_and_func(filename, funcname); \
12201232
errcode(pg_err_code); \
12211233
errhidecontext(true); \

src/postgres/src/include/utils/elog.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -508,6 +508,7 @@ extern void yb_errfinish(const char *filename, int lineno, const char *funcname)
508508
extern int yb_external_errcode(int sqlerrcode);
509509
extern int yb_errmsg_from_status(const char *fmt, const size_t nargs, const char **args);
510510
extern int yb_errdetail_from_status(const char *fmt, const size_t nargs, const char **args);
511+
extern int yb_errdetail_log_from_status(const char *fmt, const size_t nargs, const char **args);
511512
extern void yb_set_pallocd_error_file_and_func(const char *filename, const char *funcname);
512513
extern sigjmp_buf *yb_get_exception_stack(void);
513514
extern void yb_set_exception_stack(sigjmp_buf *new_sigjmp_buf);

src/yb/client/async_rpc.cc

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -471,10 +471,21 @@ bool AsyncRpcBase<Req, Resp>::CommonResponseCheck(const Status& status) {
471471
auto restart_read_time = ReadHybridTime::FromRestartReadTimePB(resp_);
472472
if (restart_read_time) {
473473
auto read_point = batcher_->read_point();
474+
auto tablet_id = req_.tablet_id();
475+
HybridTime original_read_time;
474476
if (read_point) {
475-
read_point->RestartRequired(req_.tablet_id(), restart_read_time);
477+
original_read_time = read_point->GetReadTime(tablet_id).read;
478+
read_point->RestartRequired(tablet_id, restart_read_time);
476479
}
477-
Failed(STATUS(TryAgain, Format("Restart read required at: $0", restart_read_time), Slice(),
480+
auto leader_uuid = tablet().current_leader_uuid();
481+
auto table_name = table()->name().ToString();
482+
auto key = resp_.restart_read_key();
483+
Failed(STATUS(TryAgain,
484+
Format("restart_read_time: $0, original_read_time: $1"
485+
", table: $2, tablet: $3, leader_uuid: $4, key: $5",
486+
restart_read_time, original_read_time,
487+
table_name, tablet_id, leader_uuid, key),
488+
Slice(),
478489
TransactionError(TransactionErrorCode::kReadRestartRequired)));
479490
return false;
480491
}

src/yb/common/common_fwd.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@ class ConstContiguousRow;
3232
class DocHybridTime;
3333
class EncodedDocHybridTime;
3434
class HybridTime;
35+
struct ReadRestartData;
3536
class MissingValueProvider;
3637
class TableProperties;
3738
class TransactionStatusManager;

src/yb/common/consistent_read_point.cc

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -88,7 +88,7 @@ void ConsistentReadPoint::RestartRequired(const TabletId& tablet,
8888

8989
void ConsistentReadPoint::RestartRequiredUnlocked(
9090
const TabletId& tablet, const ReadHybridTime& restart_time) {
91-
DCHECK(read_time_) << "Unexpected restart without a read time set";
91+
LOG_IF(DFATAL, !read_time_) << "Unexpected restart without a read time set";
9292
restart_read_ht_.MakeAtLeast(restart_time.read);
9393
// We should inherit per-tablet restart time limits before restart, doing it lazily.
9494
if (restarts_.empty()) {
@@ -199,7 +199,8 @@ void ConsistentReadPoint::SetInTxnLimit(HybridTime value) {
199199

200200
ReadHybridTime ConsistentReadPoint::GetReadTime() const {
201201
std::lock_guard lock(mutex_);
202-
DCHECK_LE(read_time_.read, read_time_.global_limit);
202+
LOG_IF(DFATAL, read_time_.read > read_time_.global_limit)
203+
<< "Read time cannot be higher than the global limit" << read_time_;
203204
return read_time_;
204205
}
205206

src/yb/common/doc_hybrid_time.h

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -205,4 +205,9 @@ inline std::ostream& operator<<(std::ostream& os, const DocHybridTime& ht) {
205205
return os << ht.ToString();
206206
}
207207

208+
struct MaxSeenHtData {
209+
EncodedDocHybridTime max_seen_ht{DocHybridTime::kMin};
210+
std::string max_seen_ht_key;
211+
};
212+
208213
} // namespace yb

src/yb/common/hybrid_time.h

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -298,4 +298,27 @@ inline HybridTime operator "" _usec_ht(unsigned long long microseconds) { // NOL
298298

299299
using hybrid_time_literals::operator"" _usec_ht;
300300

301+
struct ReadRestartData {
302+
HybridTime restart_time;
303+
std::string key;
304+
305+
bool is_valid() const {
306+
return restart_time.is_valid();
307+
}
308+
309+
void MakeAtLeast(ReadRestartData rhs) {
310+
if (!rhs.restart_time.is_valid()) {
311+
return;
312+
}
313+
if (!restart_time.is_valid() || restart_time < rhs.restart_time) {
314+
restart_time = rhs.restart_time;
315+
key = rhs.key;
316+
}
317+
}
318+
319+
std::string ToString() const {
320+
return YB_STRUCT_TO_STRING(restart_time, key);
321+
}
322+
};
323+
301324
} // namespace yb

0 commit comments

Comments
 (0)