S3 sink connector by Confluent naming and data formats
The Apache Kafka Connect® S3 sink connector moves data from Aiven for Apache Kafka® to Amazon S3 for long-term storage.
Aiven provides two S3 sink connectors:
- An Aiven-developed connector
- A Confluent-developed connector
This content applies to the Confluent version. For the Aiven-developed connector, see S3 sink connector additional parameters.
S3 naming format
The connector stores data as objects in the configured S3 bucket. By default, object names follow this pattern:
topics/<TOPIC_NAME>/partition=<PARTITION_NUMBER>/<TOPIC_NAME>+<PARTITION_NUMBER>+<START_OFFSET>.<FILE_EXTENSION>
The following placeholders define the pattern:
TOPIC_NAME: The Kafka topic name written to Amazon S3.PARTITION_NUMBER: The Kafka topic partition number.START_OFFSET: The starting offset of the records in the file.FILE_EXTENSION: Depends on the configured serialization format. For example, binary serialization produces files with a.binextension.
For example, a topic with 3 partitions initially generates the following files in the destination S3 bucket:
topics/<TOPIC_NAME>/partition=0/<TOPIC_NAME>+0+0000000000.bin
topics/<TOPIC_NAME>/partition=1/<TOPIC_NAME>+1+0000000000.bin
topics/<TOPIC_NAME>/partition=2/<TOPIC_NAME>+2+0000000000.bin
S3 data format
By default, the connector stores data in binary format, with one message per line. The
connector creates a file after a fixed number of messages. The flush.size
parameter defines this number. Setting flush.size to 1 creates one file per message.
For example, for a topic with three partitions and ten messages, setting flush.size
to 1 produces the following files in the destination S3 bucket (one file per message):
topics/<TOPIC_NAME>/partition=0/<TOPIC_NAME>+0+0000000000.bin
topics/<TOPIC_NAME>/partition=0/<TOPIC_NAME>+0+0000000001.bin
topics/<TOPIC_NAME>/partition=0/<TOPIC_NAME>+0+0000000002.bin
topics/<TOPIC_NAME>/partition=0/<TOPIC_NAME>+0+0000000003.bin
topics/<TOPIC_NAME>/partition=1/<TOPIC_NAME>+1+0000000000.bin
topics/<TOPIC_NAME>/partition=1/<TOPIC_NAME>+1+0000000001.bin
topics/<TOPIC_NAME>/partition=1/<TOPIC_NAME>+1+0000000002.bin
topics/<TOPIC_NAME>/partition=2/<TOPIC_NAME>+2+0000000000.bin
topics/<TOPIC_NAME>/partition=2/<TOPIC_NAME>+2+0000000001.bin
topics/<TOPIC_NAME>/partition=2/<TOPIC_NAME>+2+0000000002.bin
For more information, see the Confluent S3 sink connector documentation.