Get the best from Apache Kafka®
We recommend to follow these best practices to ensure that your Apache Kafka® service is fast and reliable.
Check your topic replication factors
Apache Kafka services rely on replication between brokers to preserve data in case of the loss of a node. Consider how business critical the data in each topic is and make sure that replication is set high enough for it.
You can set the replication factor in Aiven web console when you create a topic or edit an existing one.
We do not allow to set the replication factor below 2 in order to prevent data loss from unexpected node termination.
Choose a reasonable number of partitions for a topic
Too few partitions can cause bottlenecks in data processing. In the most extreme case, a single partition means that messages are effectively processed sequentially. However, too many partitions causes strain on the cluster because of an additional overhead. As you cannot reduce the number of partitions for existing topics, it is usually best to start with a low number that allows efficient data processing and increase it if needed.
As a general rule of thumb, the recommendation is to have max 4000 partitions per broker, and max 200 000 partitions per cluster (source).
Ordering is guaranteed only per partition. If you require relative ordering of records, put that subset of data into the same partition.
Periodically examine topics with entity-based partitioning for imbalances
If you partition messages based on an entity ID (for example, user ID), there is a risk of heavily imbalanced partitions. This results in uneven load in your cluster and reduces how effectively it can process messages in parallel.
You can check the size of each partition in Aiven web console in the topic details, in the Partitions tab.
Find the right balance between throughput and latency
To find the right balance try different batch sizes in your producer and consumer configurations. Bigger batches increase throughput but also increase the latency for individual messages. Conversely, using smaller batches decreases message processing latency, but the overhead per message increases and the overall throughput decreases.
You can, for example, set batch.size
and linger.ms
in the producer
configuration of your application code (see official Apache Kafka
documentation for reference).
Acknowledgements of received data
You can specify a value for acknowledgements setting acks
in the
client producer configuration. This will have an impact on how the
success of a write operation is determined.
With acks
equal to 0 after the producer sends the data, it does
not wait for a confirmation from the broker. This will make
communication faster. However, there is a potential loss of data in case
of the broker being down when the producer sends the data. This
configuration is only appropriate when you can afford loss of data.
With acks
equal to 1 (default value and recommended behaviour),
the producer waits for the leader broker to acknowledge that the data
was received. This mode partially prevents data loss, however, the data
loss still can occur if the broker goes down between the moment it sent
acknowledgement and the data was replicated.
With acks
equal to all, the leader and all the replicas will send
confirmation of the received data. This configuration slows the
communication, but ensures that there will be no data loss, since the
replicas also confirm that the data was received.