From dcaa98ec58c168c107825be34f515d3cb3b50fd7 Mon Sep 17 00:00:00 2001 From: Lev Kokotov Date: Fri, 14 Nov 2025 10:56:40 -0800 Subject: [PATCH 1/4] Some cleanup --- docs/configuration/pgdog.toml/rewrite.md | 47 +++++++++++---- docs/features/sharding/sharding-functions.md | 61 ++++++++++++++++++-- docs/features/transaction-mode.md | 26 +++++++++ 3 files changed, 119 insertions(+), 15 deletions(-) diff --git a/docs/configuration/pgdog.toml/rewrite.md b/docs/configuration/pgdog.toml/rewrite.md index 4d63942..665780c 100644 --- a/docs/configuration/pgdog.toml/rewrite.md +++ b/docs/configuration/pgdog.toml/rewrite.md @@ -17,16 +17,16 @@ split_inserts = "error" | Field | Description | Default | | --- | --- | --- | -| `enabled` | Master toggle; when `false`, PgDog parses but never applies rewrite plans. | `false` | -| `shard_key` | Behaviour when an `UPDATE` changes a sharding key.
`error` rejects the statement.
`rewrite` migrates the row between shards.
`ignore` forwards it unchanged. | `"error"` | -| `split_inserts` | Behaviour when a sharded table receives a multi-row `INSERT`.
`error` rejects the statement.
`rewrite` fans the rows out to their shards.
`ignore` forwards it unchanged. | `"error"` | +| `enabled` | Master toggle: when `false`, PgDog parses but never applies rewrite plans. | `false` | +| `shard_key` | Behaviour when an `UPDATE` changes a sharding key: `error` rejects the statement,
`rewrite` migrates the row between shards,
`ignore` forwards it unchanged. | `"error"` | +| `split_inserts` | Behaviour when a sharded table receives a multi-row `INSERT`: `error` rejects the statement, `rewrite` fans the rows out to their shards, `ignore` forwards it unchanged. | `"error"` | !!! note "Two-phase commit" - PgDog recommends enabling [`general.two_phase_commit`](general.md#two_phase_commit) when either policy is set to `rewrite`. Without it, rewrites are committed shard-by-shard and can leave partial changes if a shard fails. + PgDog recommends enabling [two-phase commit](../../features/sharding/2pc.md) when either policy is set to `rewrite`. Without it, rewrites are committed shard-by-shard and can leave partial changes if a shard fails. ## Runtime overrides -The admin database exposes these toggles via `SET`: +The admin database exposes these toggles via `SET` command: ```postgresql SET rewrite_enabled TO true; -- mirrors [rewrite].enabled @@ -34,12 +34,39 @@ SET rewrite_shard_key_updates TO rewrite; -- error | rewrite | ignore SET rewrite_split_inserts TO rewrite; -- error | rewrite | ignore ``` -Switches apply to subsequent sessions once the cluster reloads configuration. Session-level overrides allow canary testing before persisting them in `pgdog.toml`. +The setting changes are applied immediately. These overrides allow canary testing before persisting them in `pgdog.toml`. ## Limitations -* Shard-key rewrites require the `WHERE` clause to resolve to a single row; otherwise PgDog rolls back and raises `rewrite.shard_key="rewrite" is not yet supported ...`. -* Split INSERT rewrites must run outside explicit transactions so PgDog can orchestrate per-shard `BEGIN`/`COMMIT` cycles. Inside a transaction PgDog returns `25001` and leaves the client transaction intact. -* Both features fall back to `error` semantics while `rewrite.enabled = false` or when PgDog cannot determine a target shard. +### Sharding key updates -See [feature docs](../../features/sharding/sharding-functions.md#rewrite-behaviour) for walkthroughs of these flows. +Sharding key rewrites in an `UPDATE` clause have to resolve to a single row. If the sharding key isn't unique or the `WHERE` clause has an incorrect `OR` condition, for example, PgDog will rollback the transaction and raise an error. + +For example: + +```postgresql +UPDATE users SET id = 5 WHERE admin = true; +``` + +On a single-shard deployment, this would raise an unique index violation error. On a cross-shard deployment, PgDog rewrite engine will block cross-shard updates that could potentially affect multiple rows. + +### Multi-tuple inserts + +`INSERT` statements with multiple tuples have to be executed outside of an explicit transaction. PgDog needs to start a cross-shard transaction to safely commit the rows to multiple shards, and an existing transaction will interfere with its internal state. + +For example: + +```postgresql +BEGIN; +INSERT INTO users VALUES ($1, $2), ($3, $4); +``` + +This scenario will raise an error (code `25001`). + +### Default behavior + +Both split inserts and sharding key updates fallback to raising an error if `enabled` is set to `false`. + +### Read more + +- [Rewrite behavior](../../features/sharding/sharding-functions.md#rewrite-behaviour) diff --git a/docs/features/sharding/sharding-functions.md b/docs/features/sharding/sharding-functions.md index 568c598..3510cb4 100644 --- a/docs/features/sharding/sharding-functions.md +++ b/docs/features/sharding/sharding-functions.md @@ -180,14 +180,65 @@ shard = 0 This will send all queries that don't specify a schema or use a schema without a mapping to shard zero. -## Rewrite behaviour +## Rewrite behavior -PgDog can transparently move writes between shards when [`rewrite`](../../configuration/pgdog.toml/rewrite.md) is enabled. +PgDog can transparently move writes between shards when [`rewrite`](../../configuration/pgdog.toml/rewrite.md) feature is enabled. -* **Shard-key updates** (`rewrite.shard_key = "rewrite"`) delete the matching row from its current shard and re-insert it on the shard implied by the new key. Exactly one row must match the `WHERE` clause; PgDog aborts rewrites that affect multiple rows or unresolved shards. -* **Split INSERTs** (`rewrite.split_inserts = "rewrite"`) decompose multi-row `INSERT` statements so each shard receives only the rows it owns. PgDog opens per-shard transactions and can escalate to two-phase commit when configured to preserve atomicity across shards. +### Sharding key updates -Both features require `rewrite.enabled = true`, operate only on sharded tables, and fall back to returning errors when PgDog cannot determine a safe rewrite plan. Running them alongside [`general.two_phase_commit`](../../configuration/pgdog.toml/general.md#two_phase_commit) is recommended to guarantee atomic outcomes. +Sharding key updates handles the situation when a query is changing the value of the sharding key, which could require the row to be moved to a different shard. + +For example: + +```postgresql +UPDATE users SET id = $2 WHERE id = $1; +``` + +If configured, PgDog can rewrite this query into three statements, executed inside a cross-shard transaction: + +```postgresql +SELECT * FROM users WHERE id = $2; /* query 1 */ +DELETE FROM users WHERE id = $2; /* query 2 */ +INSERT INTO users VALUES (/* row fetched in query #1 */]); /* query 3 */ +``` + +#### Limitations + +The row returned by _query 1_, constructed from the `WHERE` clause of the original `UPDATE` statement, have to match exactly one row. If that's not the case, the operation will be aborted and PgDog will raise an error. + +### Multi-tuple inserts + +Multi-tuple `INSERT` statements may write rows that belong on separate shards. To handle this situation, if configured, PgDog can rewrite the statement into multiple, single-tuple statements, and send them to their respective shards in a cross-shard transaction. + +For example: + +```postgresql +INSERT INTO users (id, email) VALUES ($1, $2), ($3, $4); +``` + +This statement will be rewritten into the following two queries: + +```postgresql +INSERT INTO users (id, email) VALUES ($1, $2); +INSERT INTO users (id, email) VALUES ($1, $2); +``` + +#### Limitations + +Since PgDog starts a cross-shard transaction to make this operation atomic, the original `INSERT` statement must not be sent inside an explicit transaction by the client. If that's the case, PgDog will abort the operation and return an error. + +### Configuration + +Both features require `enabled` flag to be set to `true`, for example: + +```toml +[rewrite] +enabled = true +split_inserts = "rewrite" +shard_key = "rewrite" +``` + +If a safe rewrite plan cannot be determined, PgDog will abort the transaction and return an error. To guarantee cross-shard atomicity of the operation, consider enabling [two-phase commit](2pc). ## Read more diff --git a/docs/features/transaction-mode.md b/docs/features/transaction-mode.md index 105198e..b874379 100644 --- a/docs/features/transaction-mode.md +++ b/docs/features/transaction-mode.md @@ -58,9 +58,35 @@ This is performed efficiently, and server parameters are updated only if they di 1. The database has a primary and replica(s) 2. The database has more than one shard 3. [`prepared_statements`](../configuration/pgdog.toml/general.md#prepared_statements) setting is set to `"full"` + 4. [`query_parser_enabled`](../configuration/pgdog.toml/general.md#query_parser_enabled) is set to `true` This is to avoid unnecessary overhead of using `pg_query` (however small), when we don't absolutely have to. +## Advisory locks + +Advisory locks are an implementation of distributed locking in PostgreSQL. They are set on the server connection and released when the client removes the lock or disconnects. + +For example: + +```postgresql +SELECT pg_advisory_lock(1234); +``` + +In transaction mode, server connections are re-used between clients, so additional care needs to be taken to keep the server connection tied to the client that created the lock. + +PgDog is able to detect advisory lock usage and will pin the server connection to the client connection until one of the following conditions are met: + +1. The client releases the lock with `pg_advisory_unlock` +2. The client disconnects + +!!! note "Performance" + If multiple clients use advisory locks and don't release them quickly, the effectiveness of transaction pooling will be reduced because server connections will not be effectively re-used between client transactions. + +### Limitions + +PgDog doesn't keep track of multiple advisory locks inside client connections. If a client acquires two different locks, for example, and only releases one, the server connection will still be returned back to the pool with the acquired lock. + + ### Connection parameters Most Postgres connection drivers support passing parameters in the connection URL. Using the special `options` setting, each parameter is set using the `-c` flag, for example: From 6214de27f55770993f259f31d1ab939cbcb7fd51 Mon Sep 17 00:00:00 2001 From: Lev Kokotov Date: Fri, 14 Nov 2025 13:27:22 -0800 Subject: [PATCH 2/4] Format --- docs/configuration/pgdog.toml/rewrite.md | 8 ++++---- docs/features/sharding/sharding-functions.md | 8 ++++---- docs/features/transaction-mode.md | 2 +- 3 files changed, 9 insertions(+), 9 deletions(-) diff --git a/docs/configuration/pgdog.toml/rewrite.md b/docs/configuration/pgdog.toml/rewrite.md index 665780c..52f1292 100644 --- a/docs/configuration/pgdog.toml/rewrite.md +++ b/docs/configuration/pgdog.toml/rewrite.md @@ -18,8 +18,8 @@ split_inserts = "error" | Field | Description | Default | | --- | --- | --- | | `enabled` | Master toggle: when `false`, PgDog parses but never applies rewrite plans. | `false` | -| `shard_key` | Behaviour when an `UPDATE` changes a sharding key: `error` rejects the statement,
`rewrite` migrates the row between shards,
`ignore` forwards it unchanged. | `"error"` | -| `split_inserts` | Behaviour when a sharded table receives a multi-row `INSERT`: `error` rejects the statement, `rewrite` fans the rows out to their shards, `ignore` forwards it unchanged. | `"error"` | +| `shard_key` | Behavior when an `UPDATE` changes a sharding key: `error` rejects the statement,
`rewrite` migrates the row between shards,
`ignore` forwards it unchanged. | `"error"` | +| `split_inserts` | Behavior when a sharded table receives a multi-row `INSERT`: `error` rejects the statement, `rewrite` fans the rows out to their shards, `ignore` forwards it unchanged. | `"error"` | !!! note "Two-phase commit" PgDog recommends enabling [two-phase commit](../../features/sharding/2pc.md) when either policy is set to `rewrite`. Without it, rewrites are committed shard-by-shard and can leave partial changes if a shard fails. @@ -48,7 +48,7 @@ For example: UPDATE users SET id = 5 WHERE admin = true; ``` -On a single-shard deployment, this would raise an unique index violation error. On a cross-shard deployment, PgDog rewrite engine will block cross-shard updates that could potentially affect multiple rows. +On a single-shard deployment, this would raise a unique index violation error. On a cross-shard deployment, PgDog rewrite engine will block cross-shard updates that could potentially affect multiple rows. ### Multi-tuple inserts @@ -69,4 +69,4 @@ Both split inserts and sharding key updates fallback to raising an error if `ena ### Read more -- [Rewrite behavior](../../features/sharding/sharding-functions.md#rewrite-behaviour) +- [Rewrite behavior](../../features/sharding/sharding-functions.md#rewrite-behavior) diff --git a/docs/features/sharding/sharding-functions.md b/docs/features/sharding/sharding-functions.md index 3510cb4..01b4b79 100644 --- a/docs/features/sharding/sharding-functions.md +++ b/docs/features/sharding/sharding-functions.md @@ -199,12 +199,12 @@ If configured, PgDog can rewrite this query into three statements, executed insi ```postgresql SELECT * FROM users WHERE id = $2; /* query 1 */ DELETE FROM users WHERE id = $2; /* query 2 */ -INSERT INTO users VALUES (/* row fetched in query #1 */]); /* query 3 */ +INSERT INTO users VALUES (/* row fetched in query #1 */); /* query 3 */ ``` #### Limitations -The row returned by _query 1_, constructed from the `WHERE` clause of the original `UPDATE` statement, have to match exactly one row. If that's not the case, the operation will be aborted and PgDog will raise an error. +The row returned by _query 1_, constructed from the `WHERE` clause of the original `UPDATE` statement, has to match exactly one row. If that's not the case, the operation will be aborted and PgDog will raise an error. ### Multi-tuple inserts @@ -220,7 +220,7 @@ This statement will be rewritten into the following two queries: ```postgresql INSERT INTO users (id, email) VALUES ($1, $2); -INSERT INTO users (id, email) VALUES ($1, $2); +INSERT INTO users (id, email) VALUES ($3, $4); ``` #### Limitations @@ -229,7 +229,7 @@ Since PgDog starts a cross-shard transaction to make this operation atomic, the ### Configuration -Both features require `enabled` flag to be set to `true`, for example: +Both features require the `enabled` flag to be set to `true`, for example: ```toml [rewrite] diff --git a/docs/features/transaction-mode.md b/docs/features/transaction-mode.md index b874379..9ac3628 100644 --- a/docs/features/transaction-mode.md +++ b/docs/features/transaction-mode.md @@ -82,7 +82,7 @@ PgDog is able to detect advisory lock usage and will pin the server connection t !!! note "Performance" If multiple clients use advisory locks and don't release them quickly, the effectiveness of transaction pooling will be reduced because server connections will not be effectively re-used between client transactions. -### Limitions +### Limitations PgDog doesn't keep track of multiple advisory locks inside client connections. If a client acquires two different locks, for example, and only releases one, the server connection will still be returned back to the pool with the acquired lock. From d9087aebd143967df22aa7d5115be34a094535c6 Mon Sep 17 00:00:00 2001 From: Lev Kokotov Date: Fri, 14 Nov 2025 13:30:14 -0800 Subject: [PATCH 3/4] Spelling --- docs/configuration/pgdog.toml/rewrite.md | 4 ++-- docs/features/sharding/sharding-functions.md | 12 ++++++------ docs/features/transaction-mode.md | 4 ++-- 3 files changed, 10 insertions(+), 10 deletions(-) diff --git a/docs/configuration/pgdog.toml/rewrite.md b/docs/configuration/pgdog.toml/rewrite.md index 52f1292..d5ea571 100644 --- a/docs/configuration/pgdog.toml/rewrite.md +++ b/docs/configuration/pgdog.toml/rewrite.md @@ -26,7 +26,7 @@ split_inserts = "error" ## Runtime overrides -The admin database exposes these toggles via `SET` command: +The admin database exposes these toggles via the `SET` command: ```postgresql SET rewrite_enabled TO true; -- mirrors [rewrite].enabled @@ -48,7 +48,7 @@ For example: UPDATE users SET id = 5 WHERE admin = true; ``` -On a single-shard deployment, this would raise a unique index violation error. On a cross-shard deployment, PgDog rewrite engine will block cross-shard updates that could potentially affect multiple rows. +On a single-shard deployment, this would raise a unique index violation error. On a cross-shard deployment, the PgDog rewrite engine will block cross-shard updates that could potentially affect multiple rows. ### Multi-tuple inserts diff --git a/docs/features/sharding/sharding-functions.md b/docs/features/sharding/sharding-functions.md index 01b4b79..0c09914 100644 --- a/docs/features/sharding/sharding-functions.md +++ b/docs/features/sharding/sharding-functions.md @@ -3,11 +3,11 @@ icon: material/function --- # Sharding functions -The sharding function inside PgDog transforms column values in SQL queries to specific shard numbers. They are in turn used for routing queries to one or more databases in the [configuration](../../configuration/pgdog.toml/databases.md). +The sharding function inside PgDog transforms column values in SQL queries to specific shard numbers, which are in turn used for routing queries to one or more databases in the [configuration](../../configuration/pgdog.toml/databases.md). ## How it works -PgDog sharding function is based on PostgreSQL declarative partitions. This choice is intentional: it allows data to be sharded both inside PgDog and inside PostgreSQL, with the use of the same partition functions. +The PgDog sharding function is based on PostgreSQL declarative partitions. This choice is intentional: it allows data to be sharded both inside PgDog and inside PostgreSQL, with the use of the same partition functions. PgDog supports all three PostgreSQL partition functions and uses them for sharding data between nodes: @@ -81,14 +81,14 @@ values = [1, 2, 3] shard = 0 ``` -This example will route all queries with `user_id` equals to one, two or three to shard zero. Unlike [hash](#hash) sharding, a value <-> shard mapping is required for _all_ values of the sharding key. If a value is used that doesn't have a mapping, the query will be sent to [all shards](cross-shard.md). +This example will route all queries with `user_id` equal to one, two, or three to shard zero. Unlike [hash](#hash) sharding, a value <-> shard mapping is required for _all_ values of the sharding key. If a value is used that doesn't have a mapping, the query will be sent to [all shards](cross-shard.md). !!! note "Required configuration" The `[[sharded_tables]]` configuration entry is still required for list and range sharding. It specifies the data type of the column, which tells PgDog how to parse its value at runtime. ## Range -Sharding by range function is similar to [list](#list) sharding function, except instead of specifying the values explicitly, you can specify a bounding range. All values which are included in the range will be sent to the specified shard, for example: +Sharding by range is similar to [list](#list) sharding, except instead of specifying the values explicitly, you can specify a bounding range. All values that are included in the range will be sent to the specified shard, for example: ```toml [[sharded_mappings]] @@ -182,11 +182,11 @@ This will send all queries that don't specify a schema or use a schema without a ## Rewrite behavior -PgDog can transparently move writes between shards when [`rewrite`](../../configuration/pgdog.toml/rewrite.md) feature is enabled. +PgDog can transparently move writes between shards when the [`rewrite`](../../configuration/pgdog.toml/rewrite.md) feature is enabled. ### Sharding key updates -Sharding key updates handles the situation when a query is changing the value of the sharding key, which could require the row to be moved to a different shard. +Sharding key updates handle the situation when a query is changing the value of the sharding key, which could require the row to be moved to a different shard. For example: diff --git a/docs/features/transaction-mode.md b/docs/features/transaction-mode.md index 9ac3628..62ac04d 100644 --- a/docs/features/transaction-mode.md +++ b/docs/features/transaction-mode.md @@ -57,7 +57,7 @@ This is performed efficiently, and server parameters are updated only if they di 1. The database has a primary and replica(s) 2. The database has more than one shard - 3. [`prepared_statements`](../configuration/pgdog.toml/general.md#prepared_statements) setting is set to `"full"` + 3. [`prepared_statements`](../configuration/pgdog.toml/general.md#prepared_statements) is set to `"full"` 4. [`query_parser_enabled`](../configuration/pgdog.toml/general.md#query_parser_enabled) is set to `true` This is to avoid unnecessary overhead of using `pg_query` (however small), when we don't absolutely have to. @@ -74,7 +74,7 @@ SELECT pg_advisory_lock(1234); In transaction mode, server connections are re-used between clients, so additional care needs to be taken to keep the server connection tied to the client that created the lock. -PgDog is able to detect advisory lock usage and will pin the server connection to the client connection until one of the following conditions are met: +PgDog is able to detect advisory lock usage and will pin the server connection to the client connection until one of the following conditions is met: 1. The client releases the lock with `pg_advisory_unlock` 2. The client disconnects From aac5c1eb3c39816de676986a278cd95cf525ab36 Mon Sep 17 00:00:00 2001 From: Lev Kokotov Date: Fri, 14 Nov 2025 13:35:17 -0800 Subject: [PATCH 4/4] Fix SQL example --- docs/features/sharding/sharding-functions.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/features/sharding/sharding-functions.md b/docs/features/sharding/sharding-functions.md index 0c09914..00af225 100644 --- a/docs/features/sharding/sharding-functions.md +++ b/docs/features/sharding/sharding-functions.md @@ -197,9 +197,9 @@ UPDATE users SET id = $2 WHERE id = $1; If configured, PgDog can rewrite this query into three statements, executed inside a cross-shard transaction: ```postgresql -SELECT * FROM users WHERE id = $2; /* query 1 */ -DELETE FROM users WHERE id = $2; /* query 2 */ -INSERT INTO users VALUES (/* row fetched in query #1 */); /* query 3 */ +SELECT * FROM users WHERE id = $1; /* query 1 */ +DELETE FROM users WHERE id = $1; /* query 2 */ +INSERT INTO users VALUES ($1, $2); /* row fetched in query #1 */ ``` #### Limitations