Skip to content

Aiven for Apache Kafka® as a source for Aiven for ClickHouse®

Deploy an Aiven for Apache Kafka® node and integrate it as a data source with Aiven for ClickHouse® using Terraform. A part of Aiven's Terraform Cookbook.

This article shows by way of example how to integrate an Aiven for Apache Kafka® service with a Aiven for ClickHouse® database using Terraform provider for Aiven. An Apache Kafka® source topic is used as a data source and Aiven for ClickHouse® is used to filter or transform the raw data with a materialized view before writing it to a regular table.

First, check out how sensor data can look like for a better understanding of this recipe and the clickhouse_kafka_user_config Terraform block used in this article.

{ "sensor_id": 10000001, "ts": "2022-12-01T10:08:24.446369", "key": "cpu_usage", "value": 96 }

Describe the setup

Imagine that you've been collecting IoT measurements from thousands of sensors and these metrics are being populated in an Apache Kafka topic called iot_measurements. Now, you'd like to set up an Aiven for ClickHouse database and write filtered messages into table cpu_high_usage.

This recipe calls for the following:

  1. Set up an Aiven for ClickHouse database for writing and processing raw data. Sign up for an Aiven account to do so.
  2. Insert the measurements data from Apache Kafka topic iot_measurements into the Aiven for ClickHouse database.
  3. Filter the data and save the output to the new cpu_high_usage table.

Common files

Navigate to a new folder and add the following files:

  1. file
terraform { required_providers { aiven = { source = "aiven/aiven" version = ">=4.0.0, < 5.0.0" } } } provider "aiven" { api_token = var.aiven_api_token }

You can set environment variable TF_VAR_aiven_api_token for the api_token property so that you don't need to pass the -var-file flag when executing Terraform commands. file

Use it for defining the variables to avoid including sensitive information in source control. The file defines the API token, the project name, and the prefix for the service name.

variable "aiven_api_token" { description = "Aiven console API token" type = string } variable "project_name" { description = "Aiven console project name" type = string }

*.tfvars file

Use it to indicate the actual values of the variables so that they can be passed (with the -var-file= flag) to Terraform during runtime and excluded later on. Configure the var-values.tfvars file as follows:


The following Terraform script initializes both Aiven for Apache Kafka and Aiven for ClickHouse services, creates the service integration, the source Apache Kafka topic, and the Aiven for ClickHouse database.

resource "aiven_kafka" "kafka" { project = var.project_name cloud_name = "google-europe-west1" plan = "business-4" service_name = "kafka-gcp-eu" maintenance_window_dow = "monday" maintenance_window_time = "10:00:00" } resource "aiven_kafka_topic" "source" { project = var.project_name service_name = aiven_kafka.kafka.service_name partitions = 50 replication = 3 topic_name = "iot_measurements" } resource "aiven_clickhouse" "clickhouse" { project = var.project_name cloud_name = "google-europe-west1" plan = "startup-8" service_name = "clickhouse-gcp-eu" maintenance_window_dow = "monday" maintenance_window_time = "10:00:00" } resource "aiven_service_integration" "clickhouse_kafka_source" { project = var.project_name integration_type = "clickhouse_kafka" source_service_name = aiven_kafka.kafka.service_name destination_service_name = aiven_clickhouse.clickhouse.service_name clickhouse_kafka_user_config { tables { name = "edge_measurements_raw" group_name = "clickhouse-ingestion" data_format = "JSONEachRow" columns { name = "sensor_id" type = "UInt64" } columns { name = "ts" type = "DateTime64(6)" } columns { name = "key" type = "LowCardinality(String)" } columns { name = "value" type = "Float64" } topics { name = aiven_kafka_topic.source.topic_name } } } } resource "aiven_clickhouse_database" "measurements" { project = var.project_name service_name = aiven_clickhouse.clickhouse.service_name name = "iot_measurements" }

Execute the files

Run the following command:

terraform init

The init command performs initialization operations to prepare the working directory for use with Terraform. For this recipe, init automatically finds, downloads, and installs the necessary Aiven Terraform Provider plugins.

Run the following command:

terraform plan -var-file=var-values.tfvars

The plan command creates an execution plan and shows the resources to be created (or modified). This command doesn't actually create any resources but gives you a heads-up on what's going to happen next.

If the output of terraform plan looks as expected, run the following command:

terraform apply -var-file=var-values.tfvars

The terraform apply command creates (or modifies) your infrastructure resources.

Check out the results

  • Resource aiven_clickhouse creates an Aiven for ClickHouse service with the project name, the cloud name (provider, region, zone), the Aiven service plan, and the service name as specified in the file.
  • Resource aiven_clickhouse_database creates a database that can be used to further transform the ingested data and perform analytics on it.
  • Resource aiven_kafka creates an Aiven for Apache Kafka cluster.
  • Resource aiven_kafka_topic creates Apache Kafka topic iot_measurements.
  • Resource aiven_service_integration creates the integration between the Aiven for Apache Kafka and the Aiven for ClickHouse service.

The service integration creates a database to insert the ingested data to. In this instance, the database name is service_kafka-gcp-eu (it depends on the Kafka service name) and the table name is edge_measurements_raw as specified in the code.

Learn more

When you use this recipe, parameters and configurations will vary from those used in this article. For Aiven for Apache Kafka and Aiven for ClickHouse advanced parameters, a related blog, and instructions on how to get started with Aiven Terraform Provider, see Set up your first Aiven Terraform project.