Skip to main content

Configure the Iceberg sink connector with AWS Glue catalog

The AWS Glue catalog directly manages Iceberg metadata within AWS Glue. It supports automatic table creation and schema evolution.

Prerequisites

Configure AWS IAM permissions

The Iceberg sink connector requires an IAM user with permissions to access Amazon S3 and AWS Glue. These permissions allow the connector to write data to an S3 bucket and manage metadata in the AWS Glue catalog.

To set up the required permissions:

  1. Create an IAM user in AWS Identity and Access Management (IAM) with permissions for Amazon S3 and AWS Glue.

  2. Attach the following policy to the IAM user:

    {
    "Version": "2012-10-17",
    "Statement": [
    {
    "Sid": "S3Access",
    "Effect": "Allow",
    "Action": [
    "s3:GetObject",
    "s3:PutObject",
    "s3:DeleteObject",
    "s3:ListBucket",
    "s3:GetBucketLocation",
    "s3:AbortMultipartUpload",
    "s3:ListMultipartUploadParts"
    ],
    "Resource": [
    "arn:aws:s3:::<your-bucket-name>/*"
    ]
    },
    {
    "Sid": "S3ListBucket",
    "Effect": "Allow",
    "Action": "s3:ListBucket",
    "Resource": [
    "arn:aws:s3:::<your-bucket-name>"
    ]
    },
    {
    "Sid": "GlueAccess",
    "Effect": "Allow",
    "Action": [
    "glue:CreateDatabase",
    "glue:GetDatabase",
    "glue:GetTables",
    "glue:SearchTables",
    "glue:CreateTable",
    "glue:UpdateTable",
    "glue:GetTable",
    "glue:BatchCreatePartition",
    "glue:CreatePartition",
    "glue:UpdatePartition",
    "glue:GetPartition",
    "glue:GetPartitions"
    ],
    "Resource": [
    "arn:aws:glue:<your-aws-region>:<your-aws-account>:catalog",
    "arn:aws:glue:<your-aws-region>:<your-aws-account>:database/*",
    "arn:aws:glue:<your-aws-region>:<your-aws-account>:table/*"
    ]
    }
    ]
    }

    Replace the placeholder values in the policy:

    • <your-aws-region>: Your AWS Glue catalog’s region
    • <your-aws-account>: Your AWS account ID
    • <your-bucket-name>: The name of your Amazon S3 bucket
  3. Obtain the access key ID and secret access key for the IAM user.

  4. Add these credentials to the Iceberg sink connector configuration.

For more information on creating and managing AWS IAM users and policies, see the AWS IAM documentation.

AWS Glue naming conventions

When creating databases and tables in AWS Glue for the Iceberg sink connector, follow these naming conventions to ensure compatibility:

  • Database names:

    • Use only lowercase letters (a-z), numbers (0-9), and underscores (_).
    • Must be between 1 and 252 characters long.
    • Examples:
      • Valid: sales_data, customer_orders_2024
      • Invalid: SalesData, customer orders
  • Table names:

    • Use only lowercase letters, numbers, and underscores.
    • Must be between 1 and 255 characters long.
    • Examples:
      • Valid: product_catalog, order_history_2023
      • Invalid: ProductCatalog , order-history
  • Column names: AWS Glue has minimal restrictions on column names. Using only letters, numbers, and underscores is recommended for best compatibility.

For more details, see the AWS Athena naming conventions.

Create an Iceberg sink connector configuration

To configure the Iceberg sink connector, define a JSON configuration file based on your catalog type.

note

Loading worker properties is not supported. Use iceberg.kafka.* properties instead.

  1. Create AWS resources, including an S3 bucket, Glue database, and tables.

  2. Add the following configurations to the Iceberg sink connector:

    {
    "name": "<your-connector-name>",
    "connector.class": "org.apache.iceberg.connect.IcebergSinkConnector",
    "tasks.max": "2",
    "topics": "<your-topics>",
    "key.converter": "org.apache.kafka.connect.json.JsonConverter",
    "value.converter": "org.apache.kafka.connect.json.JsonConverter",
    "key.converter.schemas.enable": "false",
    "value.converter.schemas.enable": "false",
    "consumer.override.auto.offset.reset": "earliest",
    "iceberg.kafka.auto.offset.reset": "earliest",
    "iceberg.tables": "<database-name>.<table-name>",
    "iceberg.tables.auto-create-enabled": "true",
    "iceberg.control.topic": "<your-iceberg-control-topic-name>",
    "iceberg.control.commit.interval-ms": "1000",
    "iceberg.control.commit.timeout-ms": "2147483647",
    "iceberg.catalog.type": "glue",
    "iceberg.catalog.glue_catalog.glue.id": "<your-aws-account-id>",
    "iceberg.catalog.warehouse": "s3://<your-bucket-name>",
    "iceberg.catalog.client.region": "<your-aws-region>",
    "iceberg.catalog.client.credentials-provider": "org.apache.iceberg.aws.StaticCredentialsProvider",
    "iceberg.catalog.client.credentials-provider.access-key-id": "<your-access-key-id>",
    "iceberg.catalog.client.credentials-provider.secret-access-key": "<your-secret-access-key>",
    "iceberg.catalog.io-impl": "org.apache.iceberg.aws.s3.S3FileIO",
    "iceberg.catalog.s3.access-key-id": "<your-access-key-id>",
    "iceberg.catalog.s3.secret-access-key": "<your-secret-access-key>",
    "iceberg.catalog.s3.path-style-access": "true",
    "iceberg.kafka.bootstrap.servers": "<APACHE_KAFKA_HOST>:<APACHE_KAFKA_PORT>",
    "iceberg.kafka.security.protocol": "SSL",
    "iceberg.kafka.ssl.keystore.location": "/run/aiven/keys/public.keystore.p12",
    "iceberg.kafka.ssl.keystore.password": "password",
    "iceberg.kafka.ssl.keystore.type": "PKCS12",
    "iceberg.kafka.ssl.truststore.location": "/run/aiven/keys/public.truststore.jks",
    "iceberg.kafka.ssl.truststore.password": "password",
    "iceberg.kafka.ssl.key.password": "password"
    }

    Parameters:

    Most connector parameters are shared with the AWS Glue REST catalog parameters configuration. The key differences for the AWS Glue catalog are:

    • iceberg.tables.auto-create-enabled: Set to true to enable automatic table creation for AWS Glue catalog.
    • iceberg.catalog.type: Specify glue for AWS Glue catalog.
    • iceberg.catalog.glue_catalog.glue.id: Enter the AWS account ID for AWS Glue catalog.
    • iceberg.catalog.client.credentials-provider: Specify the credentials provider for AWS Glue catalog.
note

Apache Kafka security settings are the same for both AWS Glue REST and AWS Glue catalog configurations.

Create the Iceberg sink connector

  1. Access the Aiven Console.
  2. Select your Aiven for Apache Kafka or Aiven for Apache Kafka Connect service.
  3. Click Connectors.
  4. Click Create connector if Apache Kafka Connect is enabled on the service. If not, click Enable connector on this service.

Alternatively, to enable connectors:

  1. Click Service settings in the sidebar.

  2. In the Service management section, click Actions > Enable Kafka connect.

  3. In the sink connectors list, select Iceberg Sink Connector, and click Get started.

  4. On the Iceberg Sink Connector page, go to the Common tab.

  5. Locate the Connector configuration text box and click Edit.

  6. Paste the configuration from your iceberg_sink_connector.json file into the text box.

  7. Click Create connector.

  8. Verify the connector status on the Connectors page.

Example: Define and create an Iceberg sink connector

This example shows how to create an Iceberg sink connector using AWS Glue Catalog with the following properties:

  • Connector name: iceberg_sink_glue
  • Apache Kafka topic: test-topic
  • AWS Glue region: us-west-1
  • AWS S3 bucket: my-s3-bucket
  • AWS IAM access key ID: your-access-key-id
  • AWS IAM secret access key: your-secret-access-key
  • Target table: mydatabase.mytable
  • Commit interval: 1000 ms
  • Tasks: 2
{
"name": "iceberg_sink_glue",
"connector.class": "org.apache.iceberg.connect.IcebergSinkConnector",
"tasks.max": "2",
"topics": "test-topic",
"iceberg.catalog.type": "glue",
"iceberg.catalog.glue_catalog.glue.id": "123456789012",
"iceberg.catalog.client.region": "us-west-1",
"iceberg.catalog.client.credentials-provider": "org.apache.iceberg.aws.StaticCredentialsProvider",
"iceberg.catalog.client.credentials-provider.access-key-id": "your-access-key-id",
"iceberg.catalog.client.credentials-provider.secret-access-key": "your-secret-access-key",
"iceberg.catalog.io-impl": "org.apache.iceberg.aws.s3.S3FileIO",
"iceberg.catalog.s3.access-key-id": "your-access-key-id",
"iceberg.catalog.s3.secret-access-key": "your-secret-access-key",
"iceberg.catalog.warehouse": "s3://<your-bucket-name>",
"iceberg.tables": "mydatabase.mytable",
"iceberg.tables.auto-create-enabled": "true",
"iceberg.control.commit.interval-ms": "1000",
"iceberg.control.commit.timeout-ms": "2147483647",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"iceberg.kafka.bootstrap.servers": "kafka.example.com:9092",
"iceberg.kafka.security.protocol": "SSL",
"iceberg.kafka.ssl.keystore.location": "/run/aiven/keys/public.keystore.p12",
"iceberg.kafka.ssl.keystore.password": "password",
"iceberg.kafka.ssl.keystore.type": "PKCS12",
"iceberg.kafka.ssl.truststore.location": "/run/aiven/keys/public.truststore.jks",
"iceberg.kafka.ssl.truststore.password": "password",
"iceberg.kafka.ssl.key.password": "password"
}

Related pages