Aiven Blog

Say Goodbye to ZooKeeper

Automated, Zero-Downtime KRaft Migrations Now Available on Aiven

Anton Agestam

|RSS Feed

Senior Software Engineer

The Apache Kafka® ecosystem has been steadily moving toward a simpler, more scalable architecture with KRaft (Kafka Raft), leaving ZooKeeper behind. In March 2025, Kafka 4.0 dropped support for ZooKeeper entirely. Since June 2025, all new Aiven for Apache Kafka® services have been deployed with KRaft by default, allowing our users to benefit from faster partition scaling and simplified cluster management.

But what about our existing, mission-critical, Kafka services still running on ZooKeeper?

Despite being heads down busy building Diskless Topics all year, we didn’t forget about the continuity of our existing critical services. After thorough testing and reliability-focused design work, we are thrilled to announce that you can upgrade your Aiven cluster with a fully automated, zero-downtime migration from ZooKeeper to KRaft, starting today!

Limited initial roll-out

To ensure a smooth transition, we are rolling out migrations in a controlled and phased manner. This initial rollout phase includes limitations that are checked when upgrades are initiated.

  • Service Plan Limit: Migrations are initially only enabled for a subset of service plans. We will be actively expanding this list over the coming weeks. The initially supported service plans startup-2/startup-4, business-4, and business-8.
  • Limited Migration Window: Migrations can only be triggered during early EU office hours: Monday to Friday 06:00 to 14:00 UTC.
  • Fleet-wide Quota: The number of simultaneously ongoing migrations is limited and will work as first-come-first-serve. We are starting small and ramping this up over the coming weeks.

Seamless, Online Migrations

We know that for data streaming, uptime is everything. That’s why we engineered our migration path to run fully online, eliminating the need to drain traffic or force outages. However, we still recommend applying your standard operational rigor: validate the process in a non-production environment first, and execute the production rollout in accordance with your standard maintenance procedures.

The migration to KRaft is triggered automatically during your routine version upgrade from Kafka 3.8 to Kafka 3.9.

Behind the scenes, the migration process incurs exactly two rolling restarts of the brokers in your cluster. We've built this to be completely managed for you: every single stage is safely coordinated and heavily safeguarded with automated health checks. While the total duration will naturally vary based on your data volume, the migration logic itself is efficient—in our benchmarks, it adds about 15 minutes of overhead for a 32-node cluster. If a node isn't healthy at any point, the process waits, ensuring your data streaming remains uninterrupted. By always waiting for the cluster to stabilize at each stage, the automation is prioritizing uptime and stability over speed of completing the full migration process.

Unlocking the Future of Kafka

Dropping ZooKeeper is the key to unlocking new powerful features in modern Kafka. Migrating to KRaft is the essential first step that enables you to upgrade to Kafka 4.0 and activate two major architectural shifts:

  • Diskless Topics (KIP-1150): KRaft paves the way for diskless topics, a fundamental architectural shift that allows you to run brokers that fetch data entirely from object storage without needing to keep copies on local broker disks. This opens the door to incredibly elastic, cost-effective architectures tailored for your specific streaming workloads.
  • The New Consumer Rebalance Protocol (KIP-848): KRaft is a prerequisite for this highly anticipated protocol redesign. Moving forward, this will allow your consumer groups to scale and rebalance much faster, virtually eliminating the "stop-the-world" latency spikes historically associated with consumer rebalancing.
  • Queues for Kafka (KIP-932): This new feature brings true queue-like semantics to Kafka. It enables multiple consumers to process messages from a single partition, dramatically increasing parallelism. By eliminating head-of-line blocking, it makes Kafka suitable for tasks like job queuing.

Providing Unmatched Reliability

Not all KRaft implementations are created equal. A key aspect of Aiven’s KRaft architecture is our utilization of online membership changes for the KRaft controller quorum. Thanks to our usage of KIP-853, and all voter membership changes going through Kafka’s internal Raft log, our KRaft controller clusters have unmatched operational safety compared to vendors that rely on rolling restarts.

During node replacements and version upgrades, when a new node is joining a cluster, it gets added to the voter quorum. Once automation verifies that its KRaft health checks are passing, and it is not lagging behind, we remove the node that is being replaced from the quorum. In traditional KRaft installations that do not make use of KIP-853, every such membership change requires one rolling restart of the full controller cluster resulting in 3 rolling restarts for an tri-node controller cluster (9 node restarts total).

By supporting dynamic, online membership changes, we ensure that the controller quorum experiences the most minimal risk possible while upgrading. This lets it remain highly available and structurally sound even when the underlying infrastructure is changing or scaling. This architectural decision makes Aiven's KRaft operations, and the migration process itself, extremely reliable.

Leave the Heavy Lifting to Us

Implementing a seamless ZooKeeper-to-KRaft migration is a complex engineering challenge. While we view a seamless transition as a fundamental requirement for true managed infrastructure, some other major cloud providers have explicitly chosen not to support automated migrations for this feature.

AWS MSK does not offer an upgrade path, leaving your cluster stranded. This forces their users into a complex, manual migration to a brand new cluster. This requires cluster downtime, leading to inevitable business interruption and lost engineering cycles.

At Aiven, we believe managed infrastructure should be managed. You shouldn't have to suffer downtime or rebuild your clusters from scratch just to stay up to date with the latest Apache Kafka and benefit from new features. Live upgrades between different consensus protocols is a very tricky endeavour, but that’s why we’re here. We handle the heavy lifting so your teams can focus on building applications, not looking after the underlying infrastructure.

Ready to Upgrade?

It’s time to modernize your Kafka deployments. You can initiate your upgrade to Kafka 3.9 and begin your automated KRaft migration today directly from the Aiven Console or via the Aiven CLI/Terraform provider. If you’re having to migrate for Kraft anyway, this is the perfect opportunity to migrate to Aiven, where the next upgrade will be handled for you.

For a deeper dive into how KRaft works on the Aiven platform, check out our public technical documentation.

Happy Streaming!


Stay updated with Aiven

Subscribe for the latest news and insights on open source, Aiven offerings, and more.

Related resources