Skip to main content

Configure Apache Kafka® metrics sent to Datadog

When creating a Datadog service integration, you can customize which metrics are sent to the Datadog endpoint using the Aiven CLI.

Prerequisites

note

Datadog integration is not available for new Startup-2 plans in Aiven for Apache Kafka. Existing customers using Startup-2 with Datadog integration can continue to create new Startup-2 plans with Datadog integration and use their existing services without upgrading to a higher plan.

Aiven recommends using a Business-4 plan or higher for Aiven for Apache Kafka services with Datadog integration to avoid resource pressure on Startup-2 plans.

If you are an existing customer and cannot create a Startup-2 plan with Datadog integration in a new project, contact Aiven Support.

Default metrics

When a Datadog integration is configured for your Aiven for Apache Kafka service, a comprehensive set of Kafka broker metrics is collected automatically. These include standard JMX metrics for broker health, request handling, and replication.

Tiered storage metrics

For services with tiered storage enabled, the following metrics are collected automatically to monitor the health and performance of tiered storage operations:

Error metrics

MetricDescription
kafka.tiered_storage.remote_copy_errors.rateRate of errors when copying segments to remote storage
kafka.tiered_storage.remote_fetch_errors.rateRate of errors when fetching segments from remote storage
kafka.tiered_storage.remote_delete_errors.rateRate of errors when deleting segments from remote storage
kafka.tiered_storage.build_remote_log_aux_state_errors.rateRate of errors rebuilding auxiliary state

Throughput metrics

MetricDescription
kafka.tiered_storage.remote_copy_bytes.rateRate of bytes copied to remote storage
kafka.tiered_storage.remote_copy_requests.rateRate of copy requests to remote storage
kafka.tiered_storage.remote_fetch_bytes.rateRate of bytes fetched from remote storage
kafka.tiered_storage.remote_fetch_requests.rateRate of fetch requests from remote storage
kafka.tiered_storage.remote_delete_requests.rateRate of delete requests to remote storage

Lag metrics

MetricDescription
kafka.tiered_storage.remote_copy_lag_bytesBytes eligible for tiering but not yet copied
kafka.tiered_storage.remote_copy_lag_segmentsSegments eligible for tiering but not yet copied
kafka.tiered_storage.remote_delete_lag_bytesBytes eligible for deletion but not yet deleted
kafka.tiered_storage.remote_delete_lag_segmentsSegments eligible for deletion but not yet deleted

Storage metrics

MetricDescription
kafka.tiered_storage.remote_log_size_bytesTotal size of remote log in bytes
kafka.tiered_storage.remote_log_metadata_countNumber of metadata entries for remote storage

Thread pool metrics

MetricDescription
kafka.tiered_storage.remote_log_manager_tasks_avg_idle_percentAverage idle percent of copy thread pool
kafka.tiered_storage.remote_log_reader_avg_idle_percentAverage idle percent of read thread pool
kafka.tiered_storage.remote_log_reader_task_queue_sizeSize of read task queue

Throttling metrics

MetricDescription
kafka.tiered_storage.remote_fetch_throttle_time_avgAverage fetch throttle time in milliseconds
kafka.tiered_storage.remote_fetch_throttle_time_maxMaximum fetch throttle time in milliseconds
kafka.tiered_storage.remote_copy_throttle_time_avgAverage copy throttle time in milliseconds
kafka.tiered_storage.remote_copy_throttle_time_maxMaximum copy throttle time in milliseconds

Cache metrics

MetricDescription
kafka.tiered_storage.cache.chunk_cache_sizeSize of the chunk cache
kafka.tiered_storage.cache.chunk_cache_hitsChunk cache hit count
kafka.tiered_storage.cache.chunk_cache_missesChunk cache miss count
kafka.tiered_storage.cache.segment_manifest_cache_sizeSize of segment manifest cache
kafka.tiered_storage.cache.segment_indexes_cache_sizeSize of segment indexes cache

Cloud storage backend metrics

Metrics specific to your cloud storage provider, such as S3, GCS, or Azure:

MetricDescription
kafka.tiered_storage.s3.get_object_requests_rateS3 GetObject request rate
kafka.tiered_storage.s3.get_object_time_avgAverage S3 GetObject latency
kafka.tiered_storage.gcs.object_get_rateGCS object get rate
kafka.tiered_storage.azure.blob_get_rateAzure Blob get rate

Configurable custom metrics

The following metrics can be enabled on-demand by configuring the Datadog integration. These metrics are tagged with topic and partition, enabling independent monitoring of each topic and partition:

  • kafka.log.log_size
  • kafka.log.log_start_offset
  • kafka.log.log_end_offset

Variables

Replace the following placeholders in the code samples:

VariableDescription
SERVICE_NAMEAiven for Apache Kafka® service name
INTEGRATION_IDID of the integration between Aiven for Apache Kafka® service and Datadog

To find the INTEGRATION_ID parameter, run:

avn service integration-list SERVICE_NAME

Customize metrics for Datadog

Before customizing metrics, configure and enable a Datadog endpoint in your Aiven for Apache Kafka service. For setup instructions, see Send metrics to Datadog.

Format any listed parameters as a comma-separated list: ['value0', 'value1', 'value2', ...].

To customize the metrics sent to Datadog, use the service integration-update command with the kafka_custom_metrics parameter. Specify a comma-separated list of custom metrics, such as kafka.log.log_size, kafka.log.log_start_offset, and kafka.log.log_end_offset.

For example, to send the kafka.log.log_size and kafka.log.log_end_offset metrics, run:

avn service integration-update                                                \
-c 'kafka_custom_metrics=["kafka.log.log_size","kafka.log.log_end_offset"]' \
INTEGRATION_ID

After updating settings, view the collected metrics in your Datadog explorer.

Customize consumer metrics for Datadog

Apache Kafka Consumer Integration collects metrics for message offsets. To customize the metrics sent from this Datadog integration to Datadog, use the service integration-update command with the following parameters:

  • include_topics: A comma-separated list of topics to include.

    note

    By default, all topics are included.

  • exclude_topics: A comma-separated list of topics to exclude.

    warning

    To use exclude_topics, you must specify at least one include_consumer_groups value. Otherwise, exclude_topics does not take effect.

  • include_consumer_groups: A comma-separated list of consumer groups to include.

  • exclude_consumer_groups: A comma-separated list of consumer groups to exclude.

For example, to include topics topic1 and topic2, and exclude topic3, run:

avn service integration-update                                                  \
-c 'kafka_custom_metrics=["kafka.log.log_size","kafka.log.log_end_offset"]' \
-c 'include_topics=["topic1","topic2"]' \
INTEGRATION_ID

After updating settings, view the collected metrics in your Datadog explorer.

Related pages