When Everactive constructed their first data pipeline in 2014, they started with Apache NiFi and OpenTSDB. Very soon they outgrew this setup. “We weren’t able to automate anything in that system, and management was a pain,” says Rob, “Then we ran into performance issues, and upgrading into a cluster wasn’t feasible.” If they couldn’t manage one single server, how would they ever manage a whole cluster?
Next they tried out an Apache Pulsar cluster, but this had essentially the same problems. “In theory, it was inexpensive to run our own cluster, but we couldn’t both run it and do our actual jobs,” Rob explains. “And we couldn’t find anybody we could pay money to run it for us.”
In the meantime, performance issues were accumulating. Sensor installations were taking too long, because the signals from the sensor had to travel through a bottlenecked system, which took up to 5 minutes per sensor.
At this point, too, the concept was fully commercialized and business was really taking off. They needed a solution. Now, at least, they had a much better idea of what they were looking for: a system that would be…
- … able to ingest huge amounts of data and pass it into a time-series database for processing.
- … easily scalable.
- … fault tolerant.
- … managed for Everactive by experts.
Fortunately, there was Apache Kafka®, a widely used solution available as a managed service.