Skip to main content

Enable the consumer lag predictor for Aiven for Apache Kafka® Limited availability

The consumer lag predictor in Aiven for Apache Kafka® provides visibility into the time between message production and consumption, allowing for improved cluster performance and scalability.

Prerequisites

Before you start, ensure you have the following:

  • Aiven account.
  • Aiven for Apache Kafka® service running.
  • Prometheus integration set up for your Aiven for Apache Kafka for extracting metrics.
  • Necessary permissions to modify service configurations.
  • The consumer lag predictor for Aiven for Apache Kafka® is a limited availability feature and requires activation on your Aiven account. Contact the sales team at sales@aiven.io to request activation.

Enable and configure the consumer lag predictor

  1. Once the consumer lag predictor is activated for your account, log in to the Aiven Console, select your project, and choose your Aiven for Apache Kafka® service.

  2. On the Overview page, click Service settings from the sidebar.

  3. Go to the Advanced configuration section, and click Configure.

  4. In the Advanced configuration window, click Add configuration options.

  5. Set kafka_lag_predictor.enabled to Enabled. This enables the lag predictor to compute predictions for all consumer groups and topics.

  6. Configure the following options:

    • Set kafka_lag_predictor.group_filters: Specify the consumer group pattern to include only the desired consumer groups in the lag prediction. By default, the consumer lag predictor calculates the lag for all consumer groups, but you can restrict this by specifying group patterns.

      Example group patterns:

      • consumer_group_*: Matches any consumer group that starts with consumer_group_, such as consumer_group_1 or consumer_group_a.
      • important_group: Matches exactly the consumer group named important_group.
      • group?-test: Matches consumer groups like group1-test or groupA-test, where the ? represents any single character.
    • Set kafka_lag_predictor.topics: Specify which topics to include in the lag prediction. By default, predictions are computed for all topics, but you can restrict this by using topic names or patterns.

      Example topic patterns:

      • important_topic_*: Matches any topic that starts with important_topic_, such as important_topic_1, important_topic_data.
      • secondary_topic: Matches exactly the topic named secondary_topic.
      • topic?-logs: Matches topics like topic1-logs or topicA-logs, where the ? represents any single character.
  7. Click Save configuration to save your changes and enable consumer lag prediction.

Monitor metrics with Prometheus

After enabling the consumer lag predictor, you can use Prometheus to access and monitor detailed metrics that provide insights into your Apache Kafka cluster's performance:

MetricTypeDescription
kafka_lag_predictor_topic_produced_records_totalCounterRepresents the total count of records produced.
kafka_lag_predictor_group_consumed_records_totalCounterRepresents the total count of records consumed.
kafka_lag_predictor_group_lag_predicted_secondsGaugeRepresents the estimated time lag, in seconds, for a consumer group to catch up to the latest message.

For example, you can monitor the average estimated time lag in seconds for a consumer group to consume produced messages using the following PromQL query:

avg by(topic,group)(kafka_lag_predictor_group_lag_predicted_seconds_gauge)

Another useful metric to monitor is the consume/produce ratio. You can monitor this per topic and partition for consumer groups by using the following PromQL query:

sum by(group, topic, partition)(
kafka_lag_predictor_group_consumed_records_total_counter
)
/ on(topic, partition) group_left()
sum by(topic, partition)(
kafka_lag_predictor_topic_produced_records_total_counter
)