Skip to content

A PostgreSQL extension that provides retry functionality for SQL statements on transient errors with exponential backoff.

License

Notifications You must be signed in to change notification settings

Agent-Hellboy/pg_retry

Repository files navigation

pg_retry

Build

A PostgreSQL extension that provides retry functionality for SQL statements on transient errors with exponential backoff.

Overview

pg_retry allows you to automatically retry SQL statements that fail due to transient errors such as serialization failures, deadlocks, lock timeouts, or query cancellations. It implements exponential backoff with jitter to avoid thundering herd problems.

⚠️ Caution

This is an exploration library. If you still want to use it, I want you to know that you are welcome.

I'm not expecting any contributions at this time, because this project is a personal exploration of PostgreSQL internals and extension development. However, comments and discussions are welcome.

This library intentionally violates a fundamental computer science principle: making engineers explicitly aware of errors and failures. By automatically retrying failed operations, it abstracts away transient errors that might contain valuable debugging information or indicate deeper systemic issues.

Features

  • Automatic retries on configurable transient errors
  • Exponential backoff with configurable base and maximum delays
  • Jitter to prevent synchronized retries
  • Subtransaction isolation for each retry attempt
  • Comprehensive validation to prevent misuse
  • Configurable GUC parameters for defaults
  • Detailed logging of retry attempts

Installation

Prerequisites

  • PostgreSQL 17+
  • C compiler and build tools

Install from PGXN

Install the PGXN client if it is not already available on your system (choose whichever fits your environment):

pip install pgxnclient          # cross-platform
brew install pgxnclient         # macOS with Homebrew
apt-get install pgxnclient      # Debian/Ubuntu

Then install the published package directly from the PGXN registry:

pgxn install pg_retry

The distribution page with package metadata and downloadable artifacts is available at https://manager.pgxn.org/distributions/pg_retry/1.0.0.

Build and Install

# Clone or download the extension
cd pg_retry

# Build the extension
make

# Install (may require sudo)
make install

# Run tests (optional)
make installcheck

Enable the Extension

CREATE EXTENSION pg_retry;

Function Signature

retry.retry(
  sql TEXT,                          -- the SQL statement to run (exactly one statement)
  max_tries INT DEFAULT 3,           -- total attempts = 1 + retries; must be >= 1
  base_delay_ms INT DEFAULT 50,      -- initial backoff delay in milliseconds
  max_delay_ms INT DEFAULT 1000,     -- cap for exponential backoff
  retry_sqlstates TEXT[] DEFAULT ARRAY['40001','40P01','55P03','57014']
) RETURNS INT                       -- number of rows processed/returned by the statement

Retryable SQLSTATEs

By default, the following SQLSTATEs are considered retryable:

  • 40001: serialization_failure
  • 40P01: deadlock_detected
  • 55P03: lock_not_available
  • 57014: query_canceled (e.g., statement_timeout)

Usage Examples

Basic Usage

-- Simple retry with defaults
SELECT retry.retry('UPDATE accounts SET balance = balance - 100 WHERE id = 1');

Custom Retry Parameters

-- More aggressive retries for critical operations
SELECT retry.retry(
    'INSERT INTO audit_log (event, timestamp) VALUES ($1, NOW())',
    5,        -- max_tries
    100,      -- base_delay_ms
    5000,     -- max_delay_ms
    ARRAY['40001', '40P01', '55P03', '57014', '53300']  -- additional SQLSTATEs
);

Handling Different Statement Types

-- DML operations return affected rows
SELECT retry.retry('UPDATE users SET last_login = NOW() WHERE id = 123');

-- SELECT returns number of rows returned
SELECT retry.retry('SELECT * FROM large_table WHERE status = $1');

-- DDL/utility operations return 0
SELECT retry.retry('CREATE INDEX CONCURRENTLY ON big_table (column)');

Configuration (GUC Parameters)

You can set default values using PostgreSQL GUC parameters:

-- Set global defaults
ALTER SYSTEM SET pg_retry.default_max_tries = 5;
ALTER SYSTEM SET pg_retry.default_base_delay_ms = 100;
ALTER SYSTEM SET pg_retry.default_max_delay_ms = 5000;
ALTER SYSTEM SET pg_retry.default_sqlstates = '40001,40P01,55P03,57014,53300';

-- Reload configuration
SELECT pg_reload_conf();

Safety and Validation

The extension includes several safety checks:

Single Statement Only

Only exactly one SQL statement is allowed per call:

-- This works
SELECT retry.retry('SELECT 42');

-- This fails
SELECT retry.retry('SELECT 1; SELECT 2');

No Transaction Control

Transaction control statements are prohibited:

-- These all fail
SELECT retry.retry('BEGIN; SELECT 1; COMMIT');
SELECT retry.retry('SAVEPOINT sp1; SELECT 1; RELEASE sp1');
SELECT retry.retry('ROLLBACK');

Parameter Validation

Input parameters are validated:

-- These fail
SELECT retry.retry('SELECT 1', 0);           -- max_tries < 1
SELECT retry.retry('SELECT 1', 3, -1);       -- negative delay
SELECT retry.retry('SELECT 1', 3, 1000, 500); -- base > max delay

Retry Behavior

Exponential Backoff Algorithm

delay = min(max_delay_ms, base_delay_ms * (2^(attempt-1)))
jitter = random() * (delay * 0.2)  -- ±20%
final_delay = max(1ms, delay + jitter)

Example Delays (base_delay_ms=50, max_delay_ms=1000)

  • Attempt 1: ~50ms ± 10ms
  • Attempt 2: ~100ms ± 20ms
  • Attempt 3: ~200ms ± 40ms
  • Attempt 4: ~400ms ± 80ms
  • Attempt 5: ~800ms ± 160ms
  • Attempt 6+: ~1000ms ± 200ms

Logging

Each retry attempt is logged as a WARNING:

WARNING: pg_retry: attempt 2/3 failed with SQLSTATE 40001: could not serialize access due to concurrent update

Error Handling

  • Retryable errors: Automatically retried up to max_tries
  • Non-retryable errors: Immediately rethrown
  • Exhausted retries: Last error is rethrown with full context

Performance Considerations

  • Each retry runs in a subtransaction
  • SPI overhead for statement execution
  • Exponential backoff prevents resource exhaustion
  • Jitter prevents thundering herd problems

Limitations

  • Only supports single SQL statements
  • No support for transaction control
  • Cannot retry certain operations (COPY FROM STDIN, large objects, cursors)
  • Function is marked VOLATILE and PARALLEL RESTRICTED

Testing

Run the regression tests:

make installcheck

System Tests

Use the system harness for concurrency, pgbench, and GUC coverage. This target spins up a disposable cluster, installs pg_retry, and runs the pytest suite in system_tests/:

make systemtest
# include pgbench-heavy suites
make systemtest SYSTEMTEST_PYTEST_FLAGS="--pgbench"
# replay captured workloads
make systemtest SYSTEMTEST_PYTEST_FLAGS="--pgreplay"
# run dangerous fault-injection suites (requires env vars, see docs)
make systemtest SYSTEMTEST_PYTEST_FLAGS="--faults"
# skip reinstall when CI already ran `make install`
make systemtest SYSTEMTEST_SKIP_INSTALL=1

See system_tests/README.md for details on prerequisites and the helper SQL.

The test suite includes basic sanity checks. Future enhancements will expand testing capabilities as PostgreSQL internals are explored further.

License

Copyright (c) 2025, Prince Roshan

This extension is released under PostgreSQL license terms. See the LICENSE file for the full license text.

Contributing

While contributions are not expected for this exploration project, you are welcome to:

  • Open issues to discuss the approach or report bugs
  • Comment on design decisions and implementation details
  • Fork the repository for your own experiments

About

A PostgreSQL extension that provides retry functionality for SQL statements on transient errors with exponential backoff.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published