Skip to main content

Create a ClickHouse sink connector for Aiven for Apache Kafka®

The ClickHouse sink connector delivers data from Apache Kafka® topics to a ClickHouse database for efficient querying and analysis.

Prerequisites

Before you begin, ensure that you have the following:

Limitations

The ClickHouse sink connector has the following limitations related to data consistency and exactly-once delivery:

  1. No exactly-once delivery after restore: The connector does not guarantee exactly-once delivery after the ClickHouse service is restored from a backup, powered off, or forked. However, at least once delivery is guaranteed, which may result in duplicate records in ClickHouse.

  2. Manual removal of duplicate records: If duplicates occur, manually remove them to maintain data consistency in ClickHouse. For detailed instructions, see Remove duplicate records.

Remove duplicate records

Ensure all potential duplicates are processed before removing them:

  1. Verify the committed offset in Aiven for Apache Kafka:

    1. Access the Aiven Console and select your Aiven for Apache Kafka service.
    2. Click Topics and select the topic used by the connector.
    3. Go to the Consumer Group tab and check the Offset column for the committed offset.
  2. Verify the committed offset in ClickHouse:

    1. In the ClickHouse service, access the query editor.

    2. If you are using Aiven for ClickHouse®, go to the service's Overview page, and click Query editor.

    3. Run the following query to get offset details:

      SELECT key, minOffset, maxOffset, state FROM connect_state;
  3. Confirm the following conditions from the query result:

    • The state column is set to AFTER_PROCESSING.
    • The minOffset and maxOffset columns have the same value.
    • The committed offset from Apache Kafka is equal to or greater than the minOffset value in ClickHouse.
  4. Remove duplicate records:

    After confirming these conditions, remove any duplicate records by running the following SQL command in ClickHouse:

    OPTIMIZE TABLE table_name DEDUPLICATE;

Create a ClickHouse sink connector configuration file

Create a file named clickhouse_sink_connector.json with the following configuration:

{
"name": "clickhouse_sink_connector",
"connector.class": "com.clickhouse.kafka.connect.ClickHouseSinkConnector",
"tasks.max": "1",
"topics": "test_topic",
"hostname": "my-clickhouse-hostname",
"port": "12345",
"database": "default",
"username": "avnadmin",
"password": "mypassword",
"ssl": "true",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter": "org.apache.kafka.connect.storage.StringConverter"
}

Parameters

  • name: Name of the connector.
  • topics: Apache Kafka topics from which to pull data.
  • hostname: Hostname of the ClickHouse service
  • port: Port of the ClickHouse service.
  • database: Target database in ClickHouse.
  • username: Username for authentication in the ClickHouse service.
  • password: Password for authentication in the ClickHouse service.
  • ssl: Set to true to enable SSL encryption.

For more configuration options, see the ClickHouse sink connector GitHub repository.

Create the connector

  1. Access the Aiven Console.

  2. Select your Aiven for Apache Kafka® or Aiven for Apache Kafka Connect® service.

  3. Click Connectors.

  4. Click Create connector if Apache Kafka Connect is already enabled on the service. If not, click Enable connector on this service.

    Alternatively, to enable connectors:

    1. Click Service settings in the sidebar.
    2. In the Service management section, click Actions > Enable Kafka connect.
  5. In the sink connectors list, select ClickHouse, and click Get started.

  6. On the ClickHouse connector page, go to the Common tab.

  7. Locate the Connector configuration text box and click Edit.

  8. Paste the configuration from your clickhouse_sink_connector.json file into the text box.

  9. Click Create connector.

  10. Verify the connector status on the Connectors page.

Example: Define and create a ClickHouse sink connector

This example shows how to create a ClickHouse sink connector with the following properties:

  • Connector name: clickhouse_sink_connector
  • Apache Kafka topic: test-topic
  • ClickHouse hostname: clickhouse-31d766f9-systest-project.avns.net
  • ClickHouse port: 14420
  • Target database: default
  • Username: avnadmin
  • Password: mypassword
  • SSL: true
{
"name": "clickhouse_sink_connector",
"connector.class": "com.clickhouse.kafka.connect.ClickHouseSinkConnector",
"tasks.max": "1",
"topics": "test-topic",
"hostname": "clickhouse-31d766f9-systest-project.avns.net",
"port": "14420",
"database": "default",
"username": "avnadmin",
"password": "mypassword",
"ssl": "true"
}

Once this configuration is saved in the clickhouse_sink_connector.json file, you can create the connector using the Aiven Console or CLI, and verify that data from the Apache Kafka topic test-topic is successfully delivered to your ClickHouse instance.