Chicago OSDI Meetup Recap - Fun & Learning Partnered with Discover

What happened at the Open Source Data Infrastructure Chicago meetup? :face_with_monocle:

August 9 the folks at Discover hosted the Chicago Open Source Data Infrastructure meetup - the first in Chicago from the meetups we’re organizing globally - and co-located with the Devopsdays Chicago conference which I and a few colleagues were in town for.

It was a fun evening, with talks by my colleague Dewan Ahmed, and Ehfaj Khan, Expert Application Engineer at Discover Financial Services (DFS).

Ehfaj compared batch processing, a technique for processing large amounts of data offline, with event (or: real-time) processing.

Key challenges with batch processing are that while the system can identify data sets that are not fit for processing (yet) or even corrupted, sending them back to the source system for correction causes delays in business processes. Unstructured data (and data inconsistency and data integrity issues) and batch processing are no friends. And: debugging can be painful and time-consuming when processing large amounts of data.

Data is becoming the backbone and nervous system of any organization. Organizations are pushing to modernize their data infrastructure at an increasing rate. Speed as a key element of business success, that’s where real-time data comes in!

Event processing provides you with a live feedback loop, so that you can act on your data as it comes in. Real-time visibility in your business processing and better response to your target customers makes for better customer experience. Event processing also significantly enhances the organization’s visibility of its operations and enables the organization to make faster and better decisions.

But whether event processing is the answer for your business, that (of course) depends. Like the successful adoption of any new technology, tools or framework depends on your use case and the business problem you’re trying to solve.

Dewan talked about Infrastructure-as-Code (IaC) for data. Or, like he said: “we build apps to move data” - of course IaC is for data as well.

Your streaming platform, relational database, NoSQL database, networking, monitoring and security need the same “-ilities” that the rest of your system needs:

  • Reproducibility
  • Repeatability
  • Disposability (pets vs cattle)
  • Consistency
  • Ability to incorporate design changes

Here too it’s important to investigate first internally whether IaC is right for you, if you’re set up right to create a pilot project, to ultimately (maybe) roll it out wherever it makes sense.

But to showcase how orchestration with Terraform would work, Dewan shared a demo in which he uses Apache Kafka Mirrormaker to replicate data over different data centers, from (Kafka) source to target cluster.

Data center 1 AKMM Data center 2
topic-a Replication flow topic-b; dc1.topic-a
Apache Kafka Source Apache Kafka Target

You can check out his demo here.

If you’re in the Chicago area, make sure to join the Chicago OSDI Meetup Group :tada:

2 Likes