Unifying multiple data streams and clouds on a single platform
To overcome these challenges, and to increase its data analytics capabilities, Dojo wanted to create a centralized data streaming and messaging architecture. It decided to publish all its data to Apache Kafka®, an open-source distributed event streaming platform that can handle high-volume, high-velocity and high-variety data streams at very low latencies.
But Dojo also wanted a managed Kafka service, and one that was compatible with multiple clouds across a variety of geographical regions. After evaluating the market, it chose Aiven for Apache Kafka. “We work with lots of cloud providers, so we need a solution that fits with them all. We also wanted an open-source solution as we try to avoid vendor lock-in. Aiven was the perfect match,” says Elad Leev, Senior Data Platform Engineer, Dojo.
“Within a year, we built a complete end-to-end data streaming platform that met all our demands for reliability, scalability and fault-tolerance, and which could deal with any data challenge. Aiven for Apache Kafka is at the heart of this environment.” says Jérémy.
All the data generated from different systems, for example from settlement, clearing and billing, still runs on different clouds and databases such as OracleDB, MongoDB, PostgreSQL and Google Spanner. But now the majority of the data is published to Aiven for Apache Kafka running on Google Cloud, which is then synched according to the unique needs of various teams and functions of the business. For example, a significant volume of data goes to Google BigQuery for data analytics and to feed Dojo’s various AI and ML systems.
“With the Aiven Platform, our teams have the autonomy to select the database and cloud solution they believe will work best, and to shape the data to solve their specific use case,” Jérémy says.
For greater resilience, Dojo has adopted a cross cloud disaster recovery strategy, using several regional sites that replicate the data to other cloud providers in real time.