You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Sep 11, 2024. It is now read-only.
Copy file name to clipboardExpand all lines: README.md
+12-1Lines changed: 12 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -116,9 +116,19 @@ enabled).
116
116
### Record grouping
117
117
118
118
Incoming records are being grouped until flushed.
119
+
The connector flushes grouped records in one file per `offset.flush.interval.ms` setting for partitions that have received new messages during this period. The setting defaults to 60 seconds.
120
+
121
+
Record grouping, similar to Kafka topics, has 2 modes:
122
+
123
+
- Changelog: Connector groups all records in the order received from a Kafka topic, and stores all of them in a file.
124
+
- Compact: Connector groups all records by an identity (e.g. key) and only keeps the latest value stored in a file.
125
+
126
+
Modes are defined implicitly by the fields used of the [file name template](#file-name-format).
119
127
120
128
#### Grouping by the topic and partition
121
129
130
+
*Mode: Changelog*
131
+
122
132
In this mode, the connector groups records by the topic and partition.
123
133
When a file is written, an offset of the first record in it is added to
124
134
its name.
@@ -153,6 +163,8 @@ In this case, there will be two files `topicA-part0-off0` and
153
163
154
164
#### Grouping by the key
155
165
166
+
*Mode: Compact*
167
+
156
168
In this mode, the connector groups records by the Kafka key. It always
157
169
puts one record in a file, the latest record that arrived before a flush
158
170
for each key. Also, it overwrites files if later new records with the
@@ -223,7 +235,6 @@ Connector class name, in this case: `io.aiven.kafka.connect.s3.AivenKafkaConnect
223
235
### S3 Object Names
224
236
225
237
S3 connector stores series of files in the specified bucket. Each object is named using pattern `[<aws.s3.prefix>]<topic>-<partition>-<startoffset>[.gz]`. The `.gz` extension is used if gzip compression is used, see `file.compression.type` below.
226
-
The connector creates one file per Apache Kafka Connect `offset.flush.interval.ms` setting for partitions that have received new messages during that period. The setting defaults to 60 seconds.
0 commit comments