Our latest Kafka meetup on January 30th was hosted at Rovio Entertainment’s headquarters in Espoo Finland.
Rovio's own Henri Heiskanen, Director of Data Engineering, presented Rovio’s use case for Apache Kafka to handle their massive data requirements.
Our CTO, Heikki Nousiainen followed with a presentation on benchmarking raw Kafka performance across four different clouds to demonstrate its capabilities.
We've summarized Rovio’s presentation for you. There's also a copy of the slide deck at the end of this post.
A tale of two Kafkas
Heiskanen kicked the meetup off with his presentation, Story of Two Kafkas, a reference to their ongoing transition to a newer pipeline. A user of Kafka since version 0.7, Rovio's daily load is impressive:
- 3 billion events,
- 1 Terabyte of data
Without getting too deep into their architecture, suffice it to say that their Kafka node count is high. In spite of its size, when asked how big a team should be to manage Kafka, he responded,
"One really good Kafka guy on development side who is available 24 hours."
His wink and a nod response provides insight into the difference between setting up Kafka and running it in a production cluster, as one of his slides summarized;
Setting up Kafka is simple, but for running a production cluster, you need monitoring, orchestration, and capable people.
That said, Rovio’s choice to go with Kafka is obvious:
Kafka’s the best tool available and doesn’t pose vendor lock-in, unlike AWS’s Kinesis and Google’s PubSub.
Additionally, there are a lot of good Kafka tools and connectivity libraries available, as he quipped, “If you feel you need to implement something yourself, you’ve not googled enough.”
Apache Kafka cloud performance
Aiven’s Nousiainen closed the event, presenting Kafka’s raw cloud performance over AWS, GCP, Microsoft Azure, and UpCloud in three Aiven plans:
- Business-4 3-node cluster | 1 CPU | 4GB memory
- Business-8 3-node cluster | 2 CPU | 8GB memory
- Premium-8-5x 5-node cluster | 2 CPU | 8GB memory
We also used client settings typical of Aiven customers and over network access, running the benchmark anywhere from one to two hours.
While the specifics are in the slide deck, an interesting result was UpCloud’s superior performance over the other cloud platforms, regardless of Aiven plan.
When asked about it, he speculated that it was UpCloud's network bandwidth and fast disk IO that were the causal differences in this benchmark.
Ultimately, the idea of the test was just to demonstrate that, even with modest specs, Kafka can still deliver impressive throughput.
We will be following up with additional benchmark’s that mirror real world usage more closely. In the meantime, you can review Heikki's presentation slides below!
We’re planning on hosting the new Apache Kafka meetup during March.
We will begin livestreaming our Kafka meetups to give even more users access to the information that we cover in them.
To stay up-to-date, follow us on Facebook and look out for the next Kafka meetup announcement!