Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
124 changes: 124 additions & 0 deletions docs/features/sharding/unique-ids.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
---
icon: material/identifier
---
# Unique IDs

!!! note "Experimental feature"
This feature is new and experimental. Please [report](https://github.com/pgdogdev/pgdog/issues) any issues you encounter.

To generate unique identifiers, regular PostgreSQL databases use [sequences](https://www.postgresql.org/docs/current/sql-createsequence.html). For example, `BIGSERIAL` and `SERIAL` columns get their values by calling:

```
SELECT nextval('sequence_name');
```

This guarantees that these columns contain unique and monotonically increasing integers.

<!--The `BIGSERIAL` data type is used to identify rows in ORMs like ActiveRecord (Rails) and Django, so making them work in sharded databases is pretty important.-->

If your database is sharded however, using sequences will create identical IDs for different rows on different shards. To address this, PgDog can generate unique 64-bit signed identifiers internally, based on the system clock.

## How it works

The unique ID algorithm implemented by PgDog is based on three inputs:

- Current system time in milliseconds
- Unique identifier for the PgDog node (e.g. `hostname`)
- An internal sequence

The unique node identifier ensures that two different instances of PgDog can't produce the same ID at the same time. Additionally, the internal sequence allows for submillisecond ID creation in very busy deployments.

Once configured, you can fetch unique IDs using a standard SQL command:

=== "Command"
```postgresql
SHOW pgdog.unique_id;
```
=== "Output"
```
unique_id
----------------
29888761298944

```


### Configuration

#### Node identifier

To make IDs _globally_ unique, a different node identifier is required for each instance in a PgDog deployment.

If you're using our [Helm chart](https://github.com/pgdogdev/helm), this is taken care of automatically when deploying it as a [`StatefulSet`](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/) resource:

```yaml
statefulSet:
enabled: true
```

Otherwise, you need to ensure each PgDog instance has a different **`NODE_ID`** environment variable configured at startup. The variable can contain anything, as long as it ends with `-<number>` (hyphen and a number).

For example, if you have a three node deployment, they could be identified as follows:

=== "Node 1"
```bash
export NODE_ID=pgdog-prod-0
```
=== "Node 2"
```bash
export NODE_ID=pgdog-prod-1
```
=== "Node 3"
```bash
export NODE_ID=pgdog-prod-2
```

When configured correctly, you're able to get each node's identifier by querying the [admin](../../administration/index.md) database, for example:

=== "Command"
```
SHOW INSTANCE_ID;
```
=== "Output"
```
instance_id
----------------
pgdog-prod-0
```

!!! note "Maximum number of nodes"
Due to how the ID generation algorithm is implemented, PgDog allows up to a maximum
of **1024** instances (starting at 0) in the same deployment.


#### Minimum ID

If you're migrating data from an existing database, you can ensure that all IDs generated by PgDog start at a minimum value. This is configurable in [`pgdog.toml`](../../configuration/pgdog.toml/general.md), like so:

```toml
[general]
unique_id_min = 5_000_000
```

When set, all generated IDs are guaranteed to be larger than this value.

## Limitations

The generated unique IDs are 64-bit signed integers, matching the `BIGINT` (and `BIGSERIAL`) PostgreSQL format. However, since they are time-based, subsequently generated IDs will have gaps, for example:

```
678973936041
678944576104
678948770152
```

This is normally not an issue, since PostgreSQL sequences are not guaranteed to be gap-free either, but this is something to be aware of for applications that attempt to detect rolled back transactions.

Additionally, because PgDog reserves only 41 bits for the timestamp portion of the identifier, the IDs have a maximum value. Currently, the available
ID range is **69.73 years**, set to overflow on **August 3, 2095**. We expect databases to use 128-bit integers by then, expanding the ID range almost indefinitely.

### Generation rate

Since the identifiers are time-based, to ensure uniqueness, PgDog limits how many IDs can be generated per unit of time. This limit is currently **4,096** IDs per millisecond.

When it's reached, PgDog will pause ID generation until the clock ticks to the next millisecond. This gives it an effective ID generation rate of _4,096,000 / second / node_, which should be sufficient for most deployments.
Loading