You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- [Identifying Snapshot vs CDC Messages](#identifying-snapshot-vs-cdc-messages)
15
+
- [Usage](#usage)
16
+
- [Examples](#examples)
17
+
- [Availability](#availability)
18
+
- [Configuration](#configuration)
19
+
- [API](#api)
20
+
- [Exposed Metrics](#exposed-metrics)
21
+
- [Snapshot Metrics](#snapshot-metrics)
22
+
- [Compatibility](#compatibility)
23
+
- [Elasticsearch Version Compatibility](#elasticsearch-version-compatibility)
24
+
- [Breaking Changes](#breaking-changes)
25
+
26
+
## 📸 Snapshot Feature
27
+
28
+
**Capture existing data before starting CDC!** The snapshot feature enables initial data synchronization, ensuring Elasticsearch receives both historical and real-time data.
29
+
30
+
✨ **Key Highlights:**
31
+
32
+
-**Zero Data Loss**: Consistent point-in-time snapshot using PostgreSQL's `pg_export_snapshot()`
33
+
-**Chunk-Based Processing**: Memory-efficient processing of large tables
34
+
-**Multi-Instance Support**: Parallel processing across multiple instances for faster snapshots
35
+
-**Crash Recovery**: Automatic resume from failures with chunk-level tracking
36
+
-**No Duplicates**: Seamless transition from snapshot to CDC mode
37
+
-**Flexible Modes**: Choose between `initial`, `never`, or `snapshot_only` based on your needs
@@ -162,6 +224,13 @@ This setup ensures continuous data synchronization and minimal downtime in captu
162
224
|`cdc.slot.createIfNotExists`| bool | no | - | Create replication slot if not exists. Otherwise, return `replication slot is not exists` error. ||
163
225
|`cdc.slot.name`| string | yes | - | Set the logical replication slot name | Should be unique and descriptive. |
164
226
|`cdc.slot.slotActivityCheckerInterval`| int | yes | 1000 | Set the slot activity check interval time in milliseconds | Specify as an integer value in milliseconds (e.g., `1000` for 1 second). |
227
+
|`cdc.snapshot.enabled`| bool | no | false | Enable initial snapshot feature | When enabled, captures existing data before starting CDC. |
228
+
|`cdc.snapshot.mode`| string | no | never | Snapshot mode: `initial`, `never`, or `snapshot_only`|**initial:** Take snapshot only if no previous snapshot exists, then start CDC. <br> **never:** Skip snapshot, start CDC immediately. <br> **snapshot_only:** Take snapshot and exit (no CDC). |
229
+
|`cdc.snapshot.chunkSize`| int64 | no | 8000 | Number of rows per chunk during snapshot | Adjust based on table size. Larger chunks = fewer chunks but more memory per chunk. |
230
+
|`cdc.snapshot.claimTimeout`| time.Duration | no | 30s | Timeout to reclaim stale chunks | If a worker doesn't send heartbeat for this duration, chunk is reclaimed by another worker. |
231
+
|`cdc.snapshot.heartbeatInterval`| time.Duration | no | 5s | Interval for worker heartbeat updates | Workers send heartbeat every N seconds to indicate they're processing a chunk. |
232
+
|`cdc.snapshot.instanceId`| string | no | auto | Custom instance identifier (optional) | Auto-generated as hostname-pid if not specified. Useful for tracking workers in multi-instance scenarios. |
233
+
|`cdc.snapshot.tables`|[]Table | no*| - | Tables to snapshot (required for `snapshot_only` mode, optional for `initial` mode) |**snapshot_only:** Must be specified here (independent from publication). <br> **initial:** If specified, must be a subset of publication tables. If not specified, all publication tables are snapshotted. |
165
234
|`elasticsearch.username`| string | no (yes, if the auth enabled) | - | The username for authenticating to Elasticsearch. | Maps table names to Elasticsearch indices. |
166
235
|`elasticsearch.password`| string | no (yes, if the auth enabled) | - | The password associated with the elasticsearch.username for authenticating to Elasticsearch. | Maps table names to Elasticsearch indices. |
167
236
|`elasticsearch.tableIndexMapping`| map[string]string | yes | - | Mapping of PostgreSQL table events to Elasticsearch indices | Maps table names to Elasticsearch indices. |
@@ -198,6 +267,17 @@ the `/metrics` endpoint.
198
267
| go_pq_cdc_elasticsearch_index_total | Total number of index operation. | slot_name, host, index_name | Counter |
199
268
| go_pq_cdc_elasticsearch_delete_total | Total number of delete operation. | slot_name, host, index_name | Counter |
200
269
270
+
### Snapshot Metrics
271
+
272
+
| Metric Name | Description | Labels | Value Type |
0 commit comments