diff --git a/docs/configuration/pgdog.toml/rewrite.md b/docs/configuration/pgdog.toml/rewrite.md index 4d63942..d5ea571 100644 --- a/docs/configuration/pgdog.toml/rewrite.md +++ b/docs/configuration/pgdog.toml/rewrite.md @@ -17,16 +17,16 @@ split_inserts = "error" | Field | Description | Default | | --- | --- | --- | -| `enabled` | Master toggle; when `false`, PgDog parses but never applies rewrite plans. | `false` | -| `shard_key` | Behaviour when an `UPDATE` changes a sharding key.
`error` rejects the statement.
`rewrite` migrates the row between shards.
`ignore` forwards it unchanged. | `"error"` | -| `split_inserts` | Behaviour when a sharded table receives a multi-row `INSERT`.
`error` rejects the statement.
`rewrite` fans the rows out to their shards.
`ignore` forwards it unchanged. | `"error"` | +| `enabled` | Master toggle: when `false`, PgDog parses but never applies rewrite plans. | `false` | +| `shard_key` | Behavior when an `UPDATE` changes a sharding key: `error` rejects the statement,
`rewrite` migrates the row between shards,
`ignore` forwards it unchanged. | `"error"` | +| `split_inserts` | Behavior when a sharded table receives a multi-row `INSERT`: `error` rejects the statement, `rewrite` fans the rows out to their shards, `ignore` forwards it unchanged. | `"error"` | !!! note "Two-phase commit" - PgDog recommends enabling [`general.two_phase_commit`](general.md#two_phase_commit) when either policy is set to `rewrite`. Without it, rewrites are committed shard-by-shard and can leave partial changes if a shard fails. + PgDog recommends enabling [two-phase commit](../../features/sharding/2pc.md) when either policy is set to `rewrite`. Without it, rewrites are committed shard-by-shard and can leave partial changes if a shard fails. ## Runtime overrides -The admin database exposes these toggles via `SET`: +The admin database exposes these toggles via the `SET` command: ```postgresql SET rewrite_enabled TO true; -- mirrors [rewrite].enabled @@ -34,12 +34,39 @@ SET rewrite_shard_key_updates TO rewrite; -- error | rewrite | ignore SET rewrite_split_inserts TO rewrite; -- error | rewrite | ignore ``` -Switches apply to subsequent sessions once the cluster reloads configuration. Session-level overrides allow canary testing before persisting them in `pgdog.toml`. +The setting changes are applied immediately. These overrides allow canary testing before persisting them in `pgdog.toml`. ## Limitations -* Shard-key rewrites require the `WHERE` clause to resolve to a single row; otherwise PgDog rolls back and raises `rewrite.shard_key="rewrite" is not yet supported ...`. -* Split INSERT rewrites must run outside explicit transactions so PgDog can orchestrate per-shard `BEGIN`/`COMMIT` cycles. Inside a transaction PgDog returns `25001` and leaves the client transaction intact. -* Both features fall back to `error` semantics while `rewrite.enabled = false` or when PgDog cannot determine a target shard. +### Sharding key updates -See [feature docs](../../features/sharding/sharding-functions.md#rewrite-behaviour) for walkthroughs of these flows. +Sharding key rewrites in an `UPDATE` clause have to resolve to a single row. If the sharding key isn't unique or the `WHERE` clause has an incorrect `OR` condition, for example, PgDog will rollback the transaction and raise an error. + +For example: + +```postgresql +UPDATE users SET id = 5 WHERE admin = true; +``` + +On a single-shard deployment, this would raise a unique index violation error. On a cross-shard deployment, the PgDog rewrite engine will block cross-shard updates that could potentially affect multiple rows. + +### Multi-tuple inserts + +`INSERT` statements with multiple tuples have to be executed outside of an explicit transaction. PgDog needs to start a cross-shard transaction to safely commit the rows to multiple shards, and an existing transaction will interfere with its internal state. + +For example: + +```postgresql +BEGIN; +INSERT INTO users VALUES ($1, $2), ($3, $4); +``` + +This scenario will raise an error (code `25001`). + +### Default behavior + +Both split inserts and sharding key updates fallback to raising an error if `enabled` is set to `false`. + +### Read more + +- [Rewrite behavior](../../features/sharding/sharding-functions.md#rewrite-behavior) diff --git a/docs/features/sharding/sharding-functions.md b/docs/features/sharding/sharding-functions.md index 568c598..00af225 100644 --- a/docs/features/sharding/sharding-functions.md +++ b/docs/features/sharding/sharding-functions.md @@ -3,11 +3,11 @@ icon: material/function --- # Sharding functions -The sharding function inside PgDog transforms column values in SQL queries to specific shard numbers. They are in turn used for routing queries to one or more databases in the [configuration](../../configuration/pgdog.toml/databases.md). +The sharding function inside PgDog transforms column values in SQL queries to specific shard numbers, which are in turn used for routing queries to one or more databases in the [configuration](../../configuration/pgdog.toml/databases.md). ## How it works -PgDog sharding function is based on PostgreSQL declarative partitions. This choice is intentional: it allows data to be sharded both inside PgDog and inside PostgreSQL, with the use of the same partition functions. +The PgDog sharding function is based on PostgreSQL declarative partitions. This choice is intentional: it allows data to be sharded both inside PgDog and inside PostgreSQL, with the use of the same partition functions. PgDog supports all three PostgreSQL partition functions and uses them for sharding data between nodes: @@ -81,14 +81,14 @@ values = [1, 2, 3] shard = 0 ``` -This example will route all queries with `user_id` equals to one, two or three to shard zero. Unlike [hash](#hash) sharding, a value <-> shard mapping is required for _all_ values of the sharding key. If a value is used that doesn't have a mapping, the query will be sent to [all shards](cross-shard.md). +This example will route all queries with `user_id` equal to one, two, or three to shard zero. Unlike [hash](#hash) sharding, a value <-> shard mapping is required for _all_ values of the sharding key. If a value is used that doesn't have a mapping, the query will be sent to [all shards](cross-shard.md). !!! note "Required configuration" The `[[sharded_tables]]` configuration entry is still required for list and range sharding. It specifies the data type of the column, which tells PgDog how to parse its value at runtime. ## Range -Sharding by range function is similar to [list](#list) sharding function, except instead of specifying the values explicitly, you can specify a bounding range. All values which are included in the range will be sent to the specified shard, for example: +Sharding by range is similar to [list](#list) sharding, except instead of specifying the values explicitly, you can specify a bounding range. All values that are included in the range will be sent to the specified shard, for example: ```toml [[sharded_mappings]] @@ -180,14 +180,65 @@ shard = 0 This will send all queries that don't specify a schema or use a schema without a mapping to shard zero. -## Rewrite behaviour +## Rewrite behavior -PgDog can transparently move writes between shards when [`rewrite`](../../configuration/pgdog.toml/rewrite.md) is enabled. +PgDog can transparently move writes between shards when the [`rewrite`](../../configuration/pgdog.toml/rewrite.md) feature is enabled. -* **Shard-key updates** (`rewrite.shard_key = "rewrite"`) delete the matching row from its current shard and re-insert it on the shard implied by the new key. Exactly one row must match the `WHERE` clause; PgDog aborts rewrites that affect multiple rows or unresolved shards. -* **Split INSERTs** (`rewrite.split_inserts = "rewrite"`) decompose multi-row `INSERT` statements so each shard receives only the rows it owns. PgDog opens per-shard transactions and can escalate to two-phase commit when configured to preserve atomicity across shards. +### Sharding key updates -Both features require `rewrite.enabled = true`, operate only on sharded tables, and fall back to returning errors when PgDog cannot determine a safe rewrite plan. Running them alongside [`general.two_phase_commit`](../../configuration/pgdog.toml/general.md#two_phase_commit) is recommended to guarantee atomic outcomes. +Sharding key updates handle the situation when a query is changing the value of the sharding key, which could require the row to be moved to a different shard. + +For example: + +```postgresql +UPDATE users SET id = $2 WHERE id = $1; +``` + +If configured, PgDog can rewrite this query into three statements, executed inside a cross-shard transaction: + +```postgresql +SELECT * FROM users WHERE id = $1; /* query 1 */ +DELETE FROM users WHERE id = $1; /* query 2 */ +INSERT INTO users VALUES ($1, $2); /* row fetched in query #1 */ +``` + +#### Limitations + +The row returned by _query 1_, constructed from the `WHERE` clause of the original `UPDATE` statement, has to match exactly one row. If that's not the case, the operation will be aborted and PgDog will raise an error. + +### Multi-tuple inserts + +Multi-tuple `INSERT` statements may write rows that belong on separate shards. To handle this situation, if configured, PgDog can rewrite the statement into multiple, single-tuple statements, and send them to their respective shards in a cross-shard transaction. + +For example: + +```postgresql +INSERT INTO users (id, email) VALUES ($1, $2), ($3, $4); +``` + +This statement will be rewritten into the following two queries: + +```postgresql +INSERT INTO users (id, email) VALUES ($1, $2); +INSERT INTO users (id, email) VALUES ($3, $4); +``` + +#### Limitations + +Since PgDog starts a cross-shard transaction to make this operation atomic, the original `INSERT` statement must not be sent inside an explicit transaction by the client. If that's the case, PgDog will abort the operation and return an error. + +### Configuration + +Both features require the `enabled` flag to be set to `true`, for example: + +```toml +[rewrite] +enabled = true +split_inserts = "rewrite" +shard_key = "rewrite" +``` + +If a safe rewrite plan cannot be determined, PgDog will abort the transaction and return an error. To guarantee cross-shard atomicity of the operation, consider enabling [two-phase commit](2pc). ## Read more diff --git a/docs/features/transaction-mode.md b/docs/features/transaction-mode.md index 105198e..62ac04d 100644 --- a/docs/features/transaction-mode.md +++ b/docs/features/transaction-mode.md @@ -57,10 +57,36 @@ This is performed efficiently, and server parameters are updated only if they di 1. The database has a primary and replica(s) 2. The database has more than one shard - 3. [`prepared_statements`](../configuration/pgdog.toml/general.md#prepared_statements) setting is set to `"full"` + 3. [`prepared_statements`](../configuration/pgdog.toml/general.md#prepared_statements) is set to `"full"` + 4. [`query_parser_enabled`](../configuration/pgdog.toml/general.md#query_parser_enabled) is set to `true` This is to avoid unnecessary overhead of using `pg_query` (however small), when we don't absolutely have to. +## Advisory locks + +Advisory locks are an implementation of distributed locking in PostgreSQL. They are set on the server connection and released when the client removes the lock or disconnects. + +For example: + +```postgresql +SELECT pg_advisory_lock(1234); +``` + +In transaction mode, server connections are re-used between clients, so additional care needs to be taken to keep the server connection tied to the client that created the lock. + +PgDog is able to detect advisory lock usage and will pin the server connection to the client connection until one of the following conditions is met: + +1. The client releases the lock with `pg_advisory_unlock` +2. The client disconnects + +!!! note "Performance" + If multiple clients use advisory locks and don't release them quickly, the effectiveness of transaction pooling will be reduced because server connections will not be effectively re-used between client transactions. + +### Limitations + +PgDog doesn't keep track of multiple advisory locks inside client connections. If a client acquires two different locks, for example, and only releases one, the server connection will still be returned back to the pool with the acquired lock. + + ### Connection parameters Most Postgres connection drivers support passing parameters in the connection URL. Using the special `options` setting, each parameter is set using the `-c` flag, for example: