Skip to main content

Aiven for Apache Kafka® metrics available via Prometheus

Explore common metrics available via Prometheus for your Aiven for Apache Kafka® service.

How to retrieve metrics

You can retrieve a complete list of metrics from your service by querying the Prometheus endpoint. To do this:

  1. Gather the necessary details:

    • Aiven project certificate: ca.pem. To download the CA certificate, see Download CA certificates.
    • Prometheus credentials: <PROMETHEUS_USER>:<PROMETHEUS_PASSWORD>
    • Aiven for Apache Kafka hostname: <KAFKA_HOSTNAME>
    • Prometheus port: <PROMETHEUS_PORT>
  2. Run the following curl command to query the Prometheus endpoint:

       curl --cacert ca.pem \
    --user '<PROMETHEUS_USER>:<PROMETHEUS_PASSWORD>' \
    'https://<KAFKA_HOSTNAME>:<PROMETHEUS_PORT>/metrics'

For more information about setting up Prometheus integration, see Use Prometheus with Aiven

Host metrics

Host metrics provide insights into system-level performance, including CPU, memory, disk, and network usage.

CPU utilization

CPU utilization metrics offer insights into CPU usage. These metrics include time spent on different processes, system load, and overall uptime.

MetricDescription
cpu_usage_guestCPU time spent running a virtual CPU for guest operating systems
cpu_usage_guest_niceCPU time running low-priority virtual CPUs for guest operating systems; interrupted by higher-priority tasks and measured in hundredths of a second
cpu_usage_idleTime the CPU spends doing nothing
cpu_usage_iowaitTime waiting for I/O to complete
cpu_usage_irqTime servicing interrupts
cpu_usage_niceTime running user-niced processes
cpu_usage_softirqTime servicing softirqs
cpu_usage_stealTime spent in other operating systems when running in a virtualized environment
cpu_usage_systemTime spent running system processes
cpu_usage_userTime spent running user processes
system_load1System load average for the last minute
system_load15System load average for the last 15 minutes
system_load5System load average for the last 5 minutes
system_n_cpusNumber of CPU cores available
system_n_usersNumber of users logged in
system_uptimeTime for which the system has been up and running

Disk space utilization

Disk space utilization metrics provide a snapshot of disk usage. These metrics include information about free and used disk space, as well as inode usage and total disk capacity.

MetricDescription
disk_freeAmount of free disk space
disk_inodes_freeNumber of free inodes
disk_inodes_totalTotal number of inodes
disk_inodes_usedNumber of used inodes
disk_totalTotal disk space
disk_usedAmount of used disk space
disk_used_percentPercentage of disk space used

Disk input and output

Metrics such as diskio_io_time and diskio_iops_in_progress provide insights into disk I/O operations. These metrics cover read/write operations, the duration of these operations, and the number of bytes read/written.

MetricDescription
diskio_io_timeTotal time spent on I/O operations
diskio_iops_in_progressNumber of I/O operations currently in progress
diskio_merged_readsNumber of read operations that were merged
diskio_merged_writesNumber of write operations that were merged
diskio_read_bytesTotal bytes read from disk
diskio_read_timeTotal time spent on read operations
diskio_readsTotal number of read operations
diskio_weighted_io_timeWeighted time spent on I/O operations, considering their duration and intensity
diskio_write_bytesTotal bytes written to disk
diskio_write_timeTotal time spent on write operations
diskio_writesTotal number of write operations

Generic memory

The following metrics, including mem_active and mem_available, provide insights into your system's memory usage.

MetricDescription
mem_activeAmount of actively used memory
mem_availableAmount of available memory
mem_available_percentPercentage of available memory
mem_bufferedAmount of memory used for buffering I/O
mem_cachedAmount of memory used for caching
mem_commit_limitMaximum amount of memory that can be committed
mem_committed_asTotal amount of committed memory
mem_dirtyAmount of memory waiting to be written to disk
mem_freeAmount of free memory
mem_high_freeAmount of free memory in the high memory zone
mem_high_totalTotal amount of memory in the high memory zone
mem_huge_pages_freeNumber of free huge pages
mem_huge_page_sizeSize of huge pages
mem_huge_pages_totalTotal number of huge pages
mem_inactiveAmount of inactive memory
mem_low_freeAmount of free memory in the low memory zone
mem_low_totalTotal amount of memory in the low memory zone
mem_mappedAmount of memory mapped into the process's address space
mem_page_tablesAmount of memory used by page tables
mem_sharedAmount of memory shared between processes
mem_slabAmount of memory used by the kernel for data structure caches
mem_swap_cachedAmount of swap memory cached
mem_swap_freeAmount of free swap memory
mem_swap_totalTotal amount of swap memory
mem_totalTotal amount of memory
mem_usedAmount of used memory
mem_used_percentPercentage of used memory
mem_vmalloc_chunkLargest contiguous block of vmalloc memory available
mem_vmalloc_totalTotal amount of vmalloc memory
mem_vmalloc_usedAmount of used vmalloc memory
mem_wiredAmount of wired memory
mem_write_backAmount of memory being written back to disk
mem_write_back_tmpAmount of temporary memory being written back to disk

Network

The following metrics, including net_bytes_recv and net_packets_sent, provide insights into your system's network operations.

MetricDescription
net_bytes_recvTotal bytes received on the network interfaces
net_bytes_sentTotal bytes sent on the network interfaces
net_drop_inIncoming packets dropped
net_drop_outOutgoing packets dropped
net_err_inIncoming packets with errors
net_err_outOutgoing packets with errors
net_icmp_inaddrmaskrepsNumber of ICMP address mask replies received
net_icmp_inaddrmasksNumber of ICMP address mask requests received
net_icmp_incsumerrorsNumber of ICMP checksum errors
net_icmp_indestunreachsNumber of ICMP destination unreachable messages received
net_icmp_inechorepsNumber of ICMP echo replies received
net_icmp_inechosNumber of ICMP echo requests received
net_icmp_inerrorsNumber of ICMP messages received with errors
net_icmp_inmsgsTotal number of ICMP messages received
net_icmp_inparmprobsNumber of ICMP parameter problem messages received
net_icmp_inredirectsNumber of ICMP redirect messages received
net_icmp_insrcquenchsNumber of ICMP source quench messages received
net_icmp_intimeexcdsNumber of ICMP time exceeded messages received
net_icmp_intimestamprepsNumber of ICMP timestamp reply messages received
net_icmp_intimestampsNumber of ICMP timestamp request messages received
net_icmpmsg_intype3Number of ICMP type 3 (destination unreachable) messages received
net_icmpmsg_intype8Number of ICMP type 8 (echo request) messages received
net_icmpmsg_outtype0Number of ICMP type 0 (echo reply) messages sent
net_icmpmsg_outtype3Number of ICMP type 3 (destination unreachable) messages sent
net_icmp_outaddrmaskrepsNumber of ICMP address mask reply messages sent
net_icmp_outaddrmasksNumber of ICMP address mask request messages sent
net_icmp_outdestunreachsNumber of ICMP destination unreachable messages sent
net_icmp_outechorepsNumber of ICMP echo reply messages sent
net_icmp_outechosNumber of ICMP echo request messages sent
net_icmp_outerrorsNumber of ICMP messages sent with errors
net_icmp_outmsgsTotal number of ICMP messages sent
net_icmp_outparmprobsNumber of ICMP parameter problem messages sent
net_icmp_outredirectsNumber of ICMP redirect messages sent
net_icmp_outsrcquenchsNumber of ICMP source quench messages sent
net_icmp_outtimeexcdsNumber of ICMP time exceeded messages sent
net_icmp_outtimestamprepsNumber of ICMP timestamp reply messages sent
net_icmp_outtimestampsNumber of ICMP timestamp request messages sent
net_icmp_outratelimitglobalNumber of globally rate-limited ICMP messages sent
net_icmp_outratelimithostNumber of ICMP messages rate-limited per host
net_ip_defaultttlDefault time-to-live for IP packets
net_ip_forwardingIndicates if IP forwarding is enabled
net_ip_forwdatagramsNumber of forwarded IP datagrams
net_ip_fragcreatesNumber of IP fragments created
net_ip_fragfailsNumber of failed IP fragmentations
net_ip_fragoksNumber of successful IP fragmentations
net_ip_inaddrerrorsNumber of incoming IP packets with address errors
net_ip_indeliversNumber of incoming IP packets delivered to higher layers
net_ip_indiscardsNumber of incoming IP packets discarded
net_ip_inhdrerrorsNumber of incoming IP packets with header errors
net_ip_inreceivesTotal number of incoming IP packets received
net_ip_inunknownprotosNumber of incoming IP packets with unknown protocols
net_ip_outdiscardsNumber of outgoing IP packets discarded
net_ip_outnoroutesNumber of outgoing IP packets with no route available
net_ip_outrequestsTotal number of outgoing IP packets requested to be sent
net_ip_outtransmitsNumber of IP packets transmitted successfully
net_ip_reasmfailsNumber of failed IP reassembly attempts
net_ip_reasmoksNumber of successful IP reassembly attempts
net_ip_reasmreqdsNumber of IP fragments received needing reassembly
net_ip_reasmtimeoutNumber of IP reassembly timeouts
net_packets_recvTotal number of packets received on the network interfaces
net_packets_sentTotal number of packets sent on the network interfaces
netstat_tcp_closeNumber of TCP connections in the CLOSE state
netstat_tcp_close_waitNumber of TCP connections in the CLOSE_WAIT state
netstat_tcp_closingNumber of TCP connections in the CLOSING state
netstat_tcp_establishedNumber of TCP connections in the ESTABLISHED state
netstat_tcp_fin_wait1Number of TCP connections in the FIN_WAIT_1 state
netstat_tcp_fin_wait2Number of TCP connections in the FIN_WAIT_2 state
netstat_tcp_last_ackNumber of TCP connections in the LAST_ACK state
netstat_tcp_listenNumber of TCP connections in the LISTEN state
netstat_tcp_noneNumber of TCP connections in the NONE state
netstat_tcp_syn_recvNumber of TCP connections in the SYN_RECV state
netstat_tcp_syn_sentNumber of TCP connections in the SYN_SENT state
netstat_tcp_time_waitNumber of TCP connections in the TIME_WAIT state
netstat_udp_socketNumber of UDP sockets
net_tcp_activeopensNumber of active TCP open connections
net_tcp_attemptfailsNumber of failed TCP connection attempts
net_tcp_currestabNumber of currently established TCP connections
net_tcp_estabresetsNumber of established TCP connections reset
net_tcp_incsumerrorsNumber of TCP checksum errors in incoming packets
net_tcp_inerrsNumber of incoming TCP packets with errors
net_tcp_insegsNumber of TCP segments received
net_tcp_maxconnMaximum number of TCP connections supported
net_tcp_outrstsNumber of TCP reset packets sent
net_tcp_outsegsNumber of TCP segments sent
net_tcp_passiveopensNumber of passive TCP open connections
net_tcp_retranssegsNumber of TCP segments retransmitted
net_tcp_rtoalgorithmTCP retransmission timeout algorithm
net_tcp_rtomaxMaximum TCP retransmission timeout
net_tcp_rtominMinimum TCP retransmission timeout
net_udp_ignoredmultiNumber of UDP multicast packets ignored
net_udp_incsumerrorsNumber of UDP checksum errors in incoming packets
net_udp_indatagramsNumber of UDP datagrams received
net_udp_inerrorsNumber of incoming UDP packets with errors
net_udp_memerrorsNumber of UDP packets dropped due to memory errors
net_udplite_ignoredmultiNumber of UDP-Lite multicast packets ignored
net_udplite_incsumerrorsNumber of UDP-Lite checksum errors in incoming packets
net_udplite_indatagramsNumber of UDP-Lite datagrams received
net_udplite_inerrorsNumber of incoming UDP-Lite packets with errors
net_udplite_memerrorsNumber of UDP-L

Kernel

The metrics listed below, such as kernel_boot_time and kernel_context_switches, provide insights into the operations of your system's kernel.

MetricDescription
kernel_boot_timeTime at which the system was last booted
kernel_context_switchesNumber of context switches that have occurred in the kernel
kernel_entropy_availAmount of available entropy in the kernel's entropy pool
kernel_interruptsNumber of interrupts that have occurred
kernel_processes_forkedNumber of processes that have been forked

Process

Metrics such as processes_running and processes_zombies provide insights into the management of the system's processes.

MetricDescription
processes_blockedNumber of processes that are blocked
processes_deadNumber of processes that have terminated
processes_idleNumber of processes that are idle
processes_pagingNumber of processes that are paging
processes_runningNumber of processes currently running
processes_sleepingNumber of processes that are sleeping
processes_stoppedNumber of processes that are stopped
processes_totalTotal number of processes
processes_total_threadsTotal number of threads across all processes
processes_unknownNumber of processes in an unknown state
processes_zombiesNumber of zombie processes (terminated but not reaped by parent process)

Swap usage

Metrics such as swap_free and swap_used provide insights into the usage of the system's swap memory.

MetricDescription
swap_freeAmount of free swap memory
swap_inAmount of data swapped in from disk
swap_outAmount of data swapped out to disk
swap_totalTotal amount of swap memory
swap_usedAmount of used swap memory
swap_used_percentPercentage of swap memory used

Aiven for Apache Kafka-specific metrics

Metrics specific to Apache Kafka provide detailed insights into the health and performance of your Kafka clusters, including broker, controller, and topic-level metrics.

Garbage collector MXBean

Metrics associated with the java_lang_GarbageCollector provide insights into the JVM's garbage collection process. These metrics include the collection count and the duration of collections.

MetricDescription
java_lang_GarbageCollector_G1_Young_Generation_CollectionCountReturns the total number of collections that have occurred
java_lang_GarbageCollector_G1_Young_Generation_CollectionTimeReturns the approximate accumulated collection elapsed time in milliseconds
java_lang_GarbageCollector_G1_Young_Generation_durationDuration of G1 Young Generation garbage collections

Memory Usage

Metrics starting with java_lang_Memory provide insights into the JVM's memory usage, including committed memory, initial memory, max memory, and used memory.

MetricDescription
java_lang_Memory_committedReturns the amount of memory in bytes that is committed for the Java virtual machine to use
java_lang_Memory_initReturns the amount of memory in bytes that the Java virtual machine initially requests from the operating system for memory management.
java_lang_Memory_maxReturns the maximum amount of memory in bytes that can be used for memory management
java_lang_Memory_usedReturns the amount of used memory in bytes.
java_lang_Memory_ObjectPendingFinalizationCountNumber of objects pending finalization

Apache Kafka Connect

For a comprehensive list of Apache Kafka Connect metrics exposed through Prometheus, see Apache Kafka® Connect available via Prometheus.

Apache Kafka broker metrics

Apache Kafka brokers expose metrics that provide insights into the health and performance of the Apache Kafka cluster. Find detailed descriptions of these metrics, see the monitoring section of the Apache Kafka documentation.

Metric types

Cumulative counters (_count)

Metrics with a _count suffix are cumulative counters. They track the total number of occurrences for a specific event since the broker started.

Example:

kafka_controller_ControllerStats_LeaderElectionRateAndTimeMs_Count: Total number of leader elections that have occurred in the controller.

Rate counters (perSec)

Metrics with a perSec suffix in their name are also cumulative counters. They track the total number of events per second, not the current rate.

Example:

kafka_server_BrokerTopicMetrics_MessagesInPerSec_Count: Total number of incoming messages received by the broker.

note

To calculate the rate of change for these _Count metrics, you can use functions such as rate() in PromQL.

Apache Kafka controller metrics

Apache Kafka offers a range of metrics to help you assess the performance and health of your Apache Kafka controller.

  • Percentile Metrics: Metrics like kafka_controller_ControllerStats_LeaderElectionRateAndTimeMs_XthPercentile (where X can be 50th, 75th, 95th, etc.) show the time taken for leader elections to complete at various percentiles. This helps in understanding the distribution of leader election times.
  • Interval Metrics: Metrics ending with FifteenMinuteRate, FiveMinuteRate, following kafka_controller_ControllerStats_LeaderElectionRateAndTimeMs_, show the rate of leader elections over different time intervals.
  • Statistical Metrics: Metrics ending with Max, Mean, Min, StdDev, following kafka_controller_ControllerStats_LeaderElectionRateAndTimeMs_, provide statistical measures about the leader election times.
  • Controller State Metrics: Metrics starting with kafka_controller_KafkaController_ give insights into the state of the Kafka controller, such as the number of active brokers, offline partitions, and replicas to delete.
MetricDescription
kafka_controller_ControllerStats_LeaderElectionRateAndTimeMs_50thPercentileTime taken for leader elections to complete at the 50th percentile
kafka_controller_ControllerStats_LeaderElectionRateAndTimeMs_75thPercentileTime taken for leader elections to complete at the 75th percentile
kafka_controller_ControllerStats_LeaderElectionRateAndTimeMs_95thPercentileTime taken for leader elections to complete at the 95th percentile
kafka_controller_ControllerStats_LeaderElectionRateAndTimeMs_98thPercentileTime taken for leader elections to complete at the 98th percentile
kafka_controller_ControllerStats_LeaderElectionRateAndTimeMs_99thPercentileTime taken for leader elections to complete at the 99th percentile
kafka_controller_ControllerStats_LeaderElectionRateAndTimeMs_999thPercentileTime taken for leader elections to complete at the 99.9th percentile
kafka_controller_ControllerStats_LeaderElectionRateAndTimeMs_CountThe total number of leader elections
kafka_controller_ControllerStats_LeaderElectionRateAndTimeMs_FifteenMinuteRateRate of leader elections over the last 15 minutes
kafka_controller_ControllerStats_LeaderElectionRateAndTimeMs_FiveMinuteRateRate of leader elections over the last 5 minutes
kafka_controller_ControllerStats_LeaderElectionRateAndTimeMs_MaxMaximum time taken for a leader election
kafka_controller_ControllerStats_LeaderElectionRateAndTimeMs_MeanMean time taken for leader elections
kafka_controller_ControllerStats_LeaderElectionRateAndTimeMs_MeanRateMean rate of leader elections
kafka_controller_ControllerStats_LeaderElectionRateAndTimeMs_MinMinimum time taken for a leader election
kafka_controller_ControllerStats_LeaderElectionRateAndTimeMs_OneMinuteRateRate of leader elections over the last minute
kafka_controller_ControllerStats_LeaderElectionRateAndTimeMs_StdDevStandard deviation of leader election times
kafka_controller_ControllerStats_UncleanLeaderElectionsPerSec_CountNumber of unclean leader elections. Unclean leader elections can lead to data loss
kafka_controller_KafkaController_ActiveBrokerCount_ValueNumber of active brokers
kafka_controller_KafkaController_ActiveControllerCount_ValueNumber of active controllers
kafka_controller_KafkaController_FencedBrokerCount_ValueNumber of fenced brokers
kafka_controller_KafkaController_OfflinePartitionsCount_ValueNumber of offline partitions
kafka_controller_KafkaController_PreferredReplicaImbalanceCount_ValueNumber of preferred replica imbalances
kafka_controller_KafkaController_ReplicasIneligibleToDeleteCount_ValueNumber of replicas ineligible to delete
kafka_controller_KafkaController_ReplicasToDeleteCount_ValueNumber of replicas to delete
kafka_controller_KafkaController_TopicsIneligibleToDeleteCount_ValueNumber of topics ineligible to delete
kafka_controller_KafkaController_TopicsToDeleteCount_ValueNumber of topics to delete

Jolokia collector collect time

Jolokia is a JMX-HTTP bridge that provides an alternative to native JMX access. The following metric provides insights into the time taken by the Jolokia collector to collect metrics.

MetricDescription
kafka_jolokia_collector_collect_timeRepresents the time taken by the Jolokia collector to collect metrics

Apache Kafka log

Apache Kafka provides a variety of metrics that offer insights into its operation. These metrics are useful for understanding the operation of the log cleaner and log flush operations.

Log cleaner metrics

These metrics provide insights into the log cleaner's operation, which helps in compacting the Apache Kafka logs.

MetricDescription
kafka_log_LogCleaner_cleaner_recopy_percent_ValuePercentage of log segments that were recopied during cleaning
kafka_log_LogCleanerManager_time_since_last_run_ms_ValueTime in milliseconds since the last log cleaner run
kafka_log_LogCleaner_max_clean_time_secs_ValueMaximum time in seconds taken for a log cleaning operation

Log flush rate metrics

Metrics like kafka_log_LogFlushStats_LogFlushRateAndTimeMs_XthPercentile provide the time taken to flush logs at various percentiles.

These metrics offer insights into log flush operations, ensuring that the system writes data from memory to disk. They also indicate the time required to flush logs at different percentiles.

MetricDescription
kafka_log_LogFlushStats_LogFlushRateAndTimeMs_50thPercentileTime taken to flush logs at the 50th percentile
kafka_log_LogFlushStats_LogFlushRateAndTimeMs_75thPercentileTime taken to flush logs at the 75th percentile
kafka_log_LogFlushStats_LogFlushRateAndTimeMs_95thPercentileTime taken to flush logs at the 95th percentile
kafka_log_LogFlushStats_LogFlushRateAndTimeMs_98thPercentileTime taken to flush logs at the 98th percentile
kafka_log_LogFlushStats_LogFlushRateAndTimeMs_99thPercentileTime taken to flush logs at the 99th percentile
kafka_log_LogFlushStats_LogFlushRateAndTimeMs_999thPercentileTime taken to flush logs at the 99.9th percentile
kafka_log_LogFlushStats_LogFlushRateAndTimeMs_CountTotal number of log flush operations
kafka_log_LogFlushStats_LogFlushRateAndTimeMs_FifteenMinuteRateRate of log flush operations over the last 15 minutes
kafka_log_LogFlushStats_LogFlushRateAndTimeMs_FiveMinuteRateRate of log flush operations over the last 5 minutes
kafka_log_LogFlushStats_LogFlushRateAndTimeMs_MaxMaximum time taken for a log flush operation
kafka_log_LogFlushStats_LogFlushRateAndTimeMs_MeanMean time taken for log flush operations
kafka_log_LogFlushStats_LogFlushRateAndTimeMs_MeanRateMean rate of log flush operations
kafka_log_LogFlushStats_LogFlushRateAndTimeMs_MinMinimum time taken for a log flush operation
kafka_log_LogFlushStats_LogFlushRateAndTimeMs_OneMinuteRateRate of log flush operations over the last minute
kafka_log_LogFlushStats_LogFlushRateAndTimeMs_StdDevStandard deviation of log flush times

Log metrics

These metrics provide general information about log sizes and offsets.

MetricDescription
kafka_log_Log_LogEndOffset_ValueEnd offset of the log
kafka_log_Log_LogStartOffset_ValueStart offset of the log
kafka_log_Log_Size_ValueSize of the log

Apache Kafka network

Apache Kafka provides several metrics, such as kafka_network_RequestMetrics_RequestsPerSec_Count and kafka_network_RequestMetrics_TotalTimeMs_Mean, to monitor the performance and health of network requests made to the Apache Kafka brokers.

MetricDescription
kafka_network_RequestChannel_RequestQueueSize_ValueSize of the request queue
kafka_network_RequestChannel_ResponseQueueSize_ValueSize of the response queue
kafka_network_RequestMetrics_RequestsPerSec_CountTotal number of requests per second.
kafka_network_RequestMetrics_TotalTimeMs_95thPercentileTotal time for requests at the 95th percentile
kafka_network_RequestMetrics_TotalTimeMs_CountTotal number of requests
kafka_network_RequestMetrics_TotalTimeMs_MeanMean total time for requests
kafka_network_SocketServer_NetworkProcessorAvgIdlePercent_ValueAverage idle percentage of the network processor

Apache Kafka server

Apache Kafka provides a range of metrics that help monitor the server's performance and health.

  • Topic metrics: BrokerTopicMetrics offer insights into various operations related to topics, such as bytes in/out and failed fetch/produce requests.
  • Replica metrics: kafka_server_ReplicaManager_LeaderCount_Value provides insights into the state of replicas within the Apache Kafka cluster.

The topic tag is crucial in these metrics. If you don't specify it, the system displays a combined rate for all topics, along with the rate for each individual topic. To view rates for specific topics, use the topic tag. To exclude the combined rate for all topics and only list metrics for individual topics, filter with topic!="".

MetricDescription
kafka_server_BrokerTopicMetrics_BytesInPerSec_CountByte in (from the clients) rate per topic. Omitting topic=(...) will yield the all-topic rate
kafka_server_BrokerTopicMetrics_BytesOutPerSec_CountByte out (to the clients) rate per topic. Omitting topic=(...) will yield the all-topic rate
kafka_server_BrokerTopicMetrics_BytesRejectedPerSec_CountRejected byte rate per topic due to the record batch size being greater than max.message.bytes configuration. Omitting topic=(...) will yield the all-topic rate
kafka_server_BrokerTopicMetrics_FailedFetchRequestsPerSec_CountFailed fetch request (from clients or followers) rate per topic. Omitting topic=(...) will yield the all-topic rate
kafka_server_BrokerTopicMetrics_FailedProduceRequestsPerSec_CountFailed produce request rate per topic. Omitting topic=(...) will yield the all-topic rate
kafka_server_BrokerTopicMetrics_FetchMessageConversionsPerSec_CountMessage format conversion rate for produce or fetch requests per topic. Omitting topic=(...) will yield the all-topic rate.
kafka_server_BrokerTopicMetrics_MessagesInPerSec_CountIncoming message rate per topic. Omitting topic=(...) will yield the all-topic rate
kafka_server_BrokerTopicMetrics_ProduceMessageConversionsPerSec_CountMessage format conversion rate for produce or fetch requests per topic. Omitting topic=(...) will yield the all-topic rate
kafka_server_BrokerTopicMetrics_ReassignmentBytesInPerSec_CountIncoming byte rate of reassignment traffic.
kafka_server_BrokerTopicMetrics_ReassignmentBytesOutPerSec_CountOutgoing byte rate of reassignment traffic.
kafka_server_BrokerTopicMetrics_ReplicationBytesInPerSec_CountByte in (from other brokers) rate per topic. Omitting topic=(...) will yield the all-topic rate
kafka_server_BrokerTopicMetrics_ReplicationBytesOutPerSec_CountByte out (to other brokers) rate per topic. Omitting topic=(...) will yield the all-topic rate
kafka_server_BrokerTopicMetrics_TotalFetchRequestsPerSec_CountFetch request (from clients or followers) rate per topic. Omitting topic=(...) will yield the all-topic rate
kafka_server_BrokerTopicMetrics_TotalProduceRequestsPerSec_CountTotal number of produce requests per second. This metric is collected per host and not per topic
kafka_server_DelayedOperationPurgatory_NumDelayedOperations_ValueNumber of delayed operations in purgatory.
kafka_server_DelayedOperationPurgatory_PurgatorySize_ValueSize of the purgatory queue.
kafka_server_KafkaRequestHandlerPool_RequestHandlerAvgIdlePercent_OneMinuteRateAverage idle percentage of request handlers over the last minute
kafka_server_KafkaServer_BrokerState_ValueState of the broker
kafka_server_ReplicaManager_IsrExpandsPerSec_CountNumber of ISR expansions per second
kafka_server_ReplicaManager_IsrShrinksPerSec_CountNumber of ISR shrinks per second
kafka_server_ReplicaManager_LeaderCount_ValueNumber of leader replicas
kafka_server_ReplicaManager_PartitionCount_ValueNumber of partitions
kafka_server_ReplicaManager_UnderMinIsrPartitionCount_ValueNumber of partitions under the minimum ISR
kafka_server_ReplicaManager_UnderReplicatedPartitions_ValueNumber of under-replicated partitions
kafka_server_group_coordinator_metrics_group_completed_rebalance_countNumber of completed group rebalances
kafka_server_group_coordinator_metrics_group_completed_rebalance_rateRate of completed group rebalances
kafka_server_group_coordinator_metrics_offset_commit_countNumber of offset commits
kafka_server_group_coordinator_metrics_offset_commit_rateRate of offset commits
kafka_server_group_coordinator_metrics_offset_deletion_countNumber of offset deletions
kafka_server_group_coordinator_metrics_offset_deletion_rateRate of offset deletions
kafka_server_group_coordinator_metrics_offset_expiration_countNumber of offset expirations
kafka_server_group_coordinator_metrics_offset_expiration_rateRate of offset expirations

Tiered storage metrics

Aiven for Apache Kafka includes several metrics to monitor the performance and health of your Apache Kafka broker's tiered storage operations. Access these metrics through Prometheus to gain insights into various aspects of tiered storage, including data copying, fetching, deleting, and their associated lags and errors.

MetricDescription
kafka_server_BrokerTopicMetrics_RemoteCopyBytesPerSec_CountNumber of bytes per second being copied to remote storage
kafka_server_BrokerTopicMetrics_RemoteCopyRequestsPerSec_CountNumber of copy requests per second to remote storage
kafka_server_BrokerTopicMetrics_RemoteCopyErrorsPerSec_CountNumber of errors per second encountered during remote copy
kafka_server_BrokerTopicMetrics_RemoteCopyLagBytes_ValueNumber of bytes in non-active segments eligible for tiering that are not yet uploaded to remote storage
kafka_server_BrokerTopicMetrics_RemoteCopyLagSegments_ValueNumber of non-active segments eligible for tiering that are not yet uploaded to remote storage
kafka_server_BrokerTopicMetrics_RemoteFetchBytesPerSec_CountNumber of bytes per second being fetched from remote storage
kafka_server_BrokerTopicMetrics_RemoteFetchRequestsPerSec_CountNumber of fetch requests per second from remote storage
kafka_server_BrokerTopicMetrics_RemoteFetchErrorsPerSec_CountNumber of errors per second encountered during remote fetch
kafka_server_BrokerTopicMetrics_RemoteDeleteRequestsPerSec_CountNumber of delete requests per second to remote storage
kafka_server_BrokerTopicMetrics_RemoteDeleteErrorsPerSec_CountNumber of errors per second encountered during remote delete
kafka_server_BrokerTopicMetrics_RemoteDeleteLagBytes_ValueNumber of bytes in non-active segments marked for deletion but not yet deleted from remote storage
kafka_server_BrokerTopicMetrics_RemoteDeleteLagSegments_ValueNumber of non-active segments marked for deletion but not yet deleted from remote storage