Oct 6, 2021
5 reasons why you should be using MirrorMaker 2.0 for data replication
Replicating data in your Apache Kafka®? Good idea! Read the top 5 reasons why you should be doing it with MirrorMaker 2.
MirrorMaker 2.0 is a robust data replication utility for Apache Kafka®. It acts as a consumer and producer for multiple Kafka clusters, so that users can easily and reliably copy data from one cluster to another. This increases the resilience of Kafka-centric architectures.
Reasons to replicate your Apache Kafka cluster data
Apache Kafka only stores data temporarily and is not really a database in that sense. Why, then, should you worry about replicating that fleeting data?
Because data replication between your Apache Kafka clusters can add flexibility, performance and reliability to your core data infrastructure. Particularly large companies with huge data volumes can benefit from this.
1. Disaster recovery
The best understood and most important scenario where you would want to replicate data between Apache Kafka clusters is disaster recovery. Many businesses rely on Apache Kafka as a cornerstone of their data infrastructure. Apache Kafka is mature, reliable and offered by trusted providers, but disasters can still happen, and data can still become temporarily unavailable--or lost altogether.
The best way to mitigate the risks is to have a copy of your data in another Kafka cluster in a different data center. That way, you can switch clients to it relatively seamlessly, moving to an alternative deployment on the fly with minor or no service interruptions.
MirrorMaker2 preserves consumer offset mappings and offers tooling for nearly transparent consumer migration between clusters. This is a key to successful disaster recovery.
2. Going to the cloud
More and more companies are migrating their Apache Kafka clusters from on-prem installations to the cloud. Some are at the stage where they're moving from one cloud region or provider to another.
Tools that support the cloud migration of data services give you more control over your data. Replicating data between Kafka clusters is an excellent choice for low-downtime Kafka cloud migration.
3. Getting closer
For many global businesses, it's not uncommon to produce and consume data in geographically distributed locations. Replication lets you bring the data where the users are. This cuts down on latency and network costs and offers optimal throughput.
4. Isolating data
Some data sets may need to be isolated to a separate Kafka cluster for legal, compliance, and performance reasons.
For instance, in the case of legal considerations, you can limit the retention period of a topic you’re writing to in one cluster and mirror it to another with longer retention in a region that’s compliant to read from.
To boost performance, you can use one cluster to fleetingly store incoming data, then aggregate it and mirror only the aggregated data to another cluster. This keeps your incoming pipeline clean but still retains the important bits, and as a bonus it may save money in terms of storage space, too.
5. Data analytics
Aggregation is also a factor in data pipelines, which might require the consolidation of data from distributed Kafka clusters into a single one. That aggregate cluster then broadcasts that data to other clusters and/or data systems for analysis and visualisations.
Apache Kafka MirrorMaker makes life easier
When replicating Apache Kafka clusters using Kafka Connect, MirrorMaker2 synchronizes topic configuration (including partitioning) and ACLs from source to target clusters. No more need for external tooling to make this happen.
In situations where records are partitioned semantically, it’s good to know that the partitions are preserved during replication; rebuilding them would be a pain.
Complex replication topologies, like active-active and chain replication, are easy to set up. A single MM2 cluster can run multiple replication flows, and it has a mechanism for preventing replication cycles.
Apache Kafka MirrorMaker2 makes for a robust replication architecture that you can use for multiple purposes. And the best thing is, you don’t have to set anything up by yourself: you can get it as an add-on to Aiven for Apache Kafka, and let Aiven do the work.
Not using Aiven services yet? Sign up now for your free trial at https://console.aiven.io/signup!
In the meantime, make sure you follow our changelog and blog RSS feeds or our LinkedIn and Twitter accounts to stay up-to-date with product and feature-related news.
Nov 14, 2022
Aiven contributes back to the Apache Kafka® community with a dedicated full-time team. Find out more about our work!
Mar 17, 2021
We don’t like to think about disasters, but sometimes they just happen. Find out how you should prepare your data for the worst, even while hoping for the best.
Chris & James
Nov 8, 2022
In this article, guest blogger Gigi Sayfan explores event-driven programming: its benefits and shortcomings, how it works, and what useful patterns it brings to the table. We’ll also dive into some fun examples.
Subscribe to the Aiven newsletter
All things open source, plus our product updates and news in a monthly newsletter.