MirrorMaker 2.0 is a robust data replication utility for Apache Kafka®. It acts as a consumer and producer for multiple Kafka clusters, so that users can easily and reliably copy data from one cluster to another. This increases the resilience of Kafka-centric architectures.
Reasons to replicate your Apache Kafka cluster data
Apache Kafka only stores data temporarily and is not really a database in that sense. Why, then, should you worry about replicating that fleeting data?
Because data replication between your Apache Kafka clusters can add flexibility, performance and reliability to your core data infrastructure. Particularly large companies with huge data volumes can benefit from this.
1. Disaster recovery
The best understood and most important scenario where you would want to replicate data between Apache Kafka clusters is disaster recovery. Many businesses rely on Apache Kafka as a cornerstone of their data infrastructure. Apache Kafka is mature, reliable and offered by trusted providers, but disasters can still happen, and data can still become temporarily unavailable--or lost altogether.
The best way to mitigate the risks is to have a copy of your data in another Kafka cluster in a different data center. That way, you can switch clients to it relatively seamlessly, moving to an alternative deployment on the fly with minor or no service interruptions.
MirrorMaker2 preserves consumer offset mappings and offers tooling for nearly transparent consumer migration between clusters. This is a key to successful disaster recovery.
2. Going to the cloud
More and more companies are migrating their Apache Kafka clusters from on-prem installations to the cloud. Some are at the stage where they're moving from one cloud region or provider to another.
Tools that support the cloud migration of data services give you more control over your data. Replicating data between Kafka clusters is an excellent choice for low-downtime Kafka cloud migration.
3. Getting closer
For many global businesses, it's not uncommon to produce and consume data in geographically distributed locations. Replication lets you bring the data where the users are. This cuts down on latency and network costs and offers optimal throughput.
4. Isolating data
Some data sets may need to be isolated to a separate Kafka cluster for legal, compliance, and performance reasons.
For instance, in the case of legal considerations, you can limit the retention period of a topic you’re writing to in one cluster and mirror it to another with longer retention in a region that’s compliant to read from.
To boost performance, you can use one cluster to fleetingly store incoming data, then aggregate it and mirror only the aggregated data to another cluster. This keeps your incoming pipeline clean but still retains the important bits, and as a bonus it may save money in terms of storage space, too.
5. Data analytics
Aggregation is also a factor in data pipelines, which might require the consolidation of data from distributed Kafka clusters into a single one. That aggregate cluster then broadcasts that data to other clusters and/or data systems for analysis and visualisations.
Apache Kafka MirrorMaker makes life easier
When replicating Apache Kafka clusters using Kafka Connect, MirrorMaker2 synchronizes topic configuration (including partitioning) and ACLs from source to target clusters. No more need for external tooling to make this happen.
In situations where records are partitioned semantically, it’s good to know that the partitions are preserved during replication; rebuilding them would be a pain.
Complex replication topologies, like active-active and chain replication, are easy to set up. A single MM2 cluster can run multiple replication flows, and it has a mechanism for preventing replication cycles.
Apache Kafka MirrorMaker2 makes for a robust replication architecture that you can use for multiple purposes. And the best thing is, you don’t have to set anything up by yourself: you can get it as an add-on to Aiven for Apache Kafka, and let Aiven do the work.
Not using Aiven services yet? Sign up now for your free trial at https://console.aiven.io/signup!
Jan 26, 2023
Webhook data from Apache Kafka®️ to the world
Let your data flow with the Apache Kafka®️ HTTP sink connector
Jan 21, 2021
Aiven supports Apache Kafka® 2.7
Aiven for Apache Kafka moves to version 2.7. Read to find out what the key improvements in the new version are and how you can get in on the action.
Aug 5, 2020
The top requirements for building performant real-time applications
Olivier de Garrigues from Lenses.io outlines what engineering teams should consider when building and launching real-time apps in our world of fast innovation.
Subscribe to the Aiven newsletter
All things open source, plus our product updates and news in a monthly newsletter.