Create a ClickHouse sink connector for Aiven for Apache Kafka®
The ClickHouse sink connector delivers data from Apache Kafka® topics to a ClickHouse database for efficient querying and analysis.
Prerequisites
Before you begin, ensure that you have the following:
- An Aiven for Apache Kafka® service with Apache Kafka Connect enabled or a dedicated Aiven for Apache Kafka Connect® service.
- Access to a ClickHouse service (either Aiven for ClickHouse or an external instance),
including:
- Hostname, port, and credentials for the ClickHouse service.
- A pre-created target database and table.
Limitations
The ClickHouse sink connector has the following limitations related to data consistency and exactly-once delivery:
-
No exactly-once delivery after restore: The connector does not guarantee exactly-once delivery after the ClickHouse service is restored from a backup, powered off, or forked. However, at least once delivery is guaranteed, which may result in duplicate records in ClickHouse.
-
Manual removal of duplicate records: If duplicates occur, manually remove them to maintain data consistency in ClickHouse. For detailed instructions, see Remove duplicate records.
Remove duplicate records
Ensure all potential duplicates are processed before removing them:
-
Verify the committed offset in Aiven for Apache Kafka:
- Access the Aiven Console and select your Aiven for Apache Kafka service.
- Click Topics and select the topic used by the connector.
- Go to the Consumer Group tab and check the Offset column for the committed offset.
-
Verify the committed offset in ClickHouse:
-
In the ClickHouse service, access the query editor.
-
If you are using Aiven for ClickHouse®, go to the service's Overview page, and click Query editor.
-
Run the following query to get offset details:
SELECT key, minOffset, maxOffset, state FROM connect_state;
-
-
Confirm the following conditions from the query result:
- The
state
column is set toAFTER_PROCESSING
. - The
minOffset
andmaxOffset
columns have the same value. - The committed offset from Apache Kafka is equal to or greater than the
minOffset
value in ClickHouse.
- The
-
Remove duplicate records:
After confirming these conditions, remove any duplicate records by running the following SQL command in ClickHouse:
OPTIMIZE TABLE table_name DEDUPLICATE;
Create a ClickHouse sink connector configuration file
Create a file named clickhouse_sink_connector.json
with the following configuration:
{
"name": "clickhouse_sink_connector",
"connector.class": "com.clickhouse.kafka.connect.ClickHouseSinkConnector",
"tasks.max": "1",
"topics": "test_topic",
"hostname": "my-clickhouse-hostname",
"port": "12345",
"database": "default",
"username": "avnadmin",
"password": "mypassword",
"ssl": "true",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter": "org.apache.kafka.connect.storage.StringConverter"
}
Parameters
name
: Name of the connector.topics
: Apache Kafka topics from which to pull data.hostname
: Hostname of the ClickHouse serviceport
: Port of the ClickHouse service.database
: Target database in ClickHouse.username
: Username for authentication in the ClickHouse service.password
: Password for authentication in the ClickHouse service.ssl
: Set totrue
to enable SSL encryption.
For more configuration options, see the ClickHouse sink connector GitHub repository.
Create the connector
- Aiven Console
- Aiven CLI
-
Access the Aiven Console.
-
Select your Aiven for Apache Kafka® or Aiven for Apache Kafka Connect® service.
-
Click Connectors.
-
Click Create connector if Apache Kafka Connect is already enabled on the service. If not, click Enable connector on this service.
Alternatively, to enable connectors:
- Click Service settings in the sidebar.
- In the Service management section, click Actions > Enable Kafka connect.
-
In the sink connectors list, select ClickHouse, and click Get started.
-
On the ClickHouse connector page, go to the Common tab.
-
Locate the Connector configuration text box and click Edit.
-
Paste the configuration from your
clickhouse_sink_connector.json
file into the text box. -
Click Create connector.
-
Verify the connector status on the Connectors page.
To create the ClickHouse sink connector using the Aiven CLI, run:
avn service connector create SERVICE_NAME @clickhouse_sink_connector.json
Parameters:
SERVICE_NAME
: Name of your Aiven for Apache Kafka® service.@clickhouse_sink_connector.json
: Path to the JSON configuration file.
Example: Define and create a ClickHouse sink connector
This example shows how to create a ClickHouse sink connector with the following properties:
- Connector name:
clickhouse_sink_connector
- Apache Kafka topic:
test-topic
- ClickHouse hostname:
clickhouse-31d766f9-systest-project.avns.net
- ClickHouse port:
14420
- Target database:
default
- Username:
avnadmin
- Password:
mypassword
- SSL:
true
{
"name": "clickhouse_sink_connector",
"connector.class": "com.clickhouse.kafka.connect.ClickHouseSinkConnector",
"tasks.max": "1",
"topics": "test-topic",
"hostname": "clickhouse-31d766f9-systest-project.avns.net",
"port": "14420",
"database": "default",
"username": "avnadmin",
"password": "mypassword",
"ssl": "true"
}
Once this configuration is saved in the clickhouse_sink_connector.json
file, you can
create the connector using the Aiven Console or CLI, and verify that data from the
Apache Kafka topic test-topic
is successfully delivered to your ClickHouse instance.