October 22 we hosted our first India OSDI meetup, at the CRED office in Bengaluru. I’m very thankful for having wonderful friends helping me out organizing the event: Nancy Chauhan, Aditya Oberai, and Rohit Ghumare. We had 120 people join us for 3 excellent sessions.
Find the recording here: https://www.youtube.com/watch?v=JY85SbrHF1k
In “Delta lake tables - behind the scenes” Apoorva Aggarwal, Software Engineer 2 at Microsoft, explored the storage format of delta lake tables, its transaction log management, and the optimizations it uses to process and retrieve big data faster.
Chintan Shah (Data Product Manager), Manas Bhardwaj (Data Engineering team), and Abinasha Karana (Technology Lead) showed the thought and work that went into building the Data Quality Platform at CRED. They covered the requirements of the platform, the challenges encountered, the technologies chosen, and the tradeoffs that were made to build a high quality large-scale diverse data lake.
CRED contributes back to the Open Source project (Great Expectations, Apache-licensed) they build their DQ platform on top of, adding IAM (Identity and Access Management) roles.
In “Evolving data infrastructure for scaling a data platform”, Rohil Surana (@rohilrs on Twitter), Engineering Manager at Pixxel, and ex-Data Platform at Gojek, elaborates how at Gojek he managed and evolved the underlying infrastructure using IaC and developed a configuration and orchestration system which powered a Data Platform at scale.
He brought back time to provision Kafka from 1 week to ~4 minutes through said platform, self-serve, with unified permissions, and full traceability without having to spelunk through audit logs. For his slides: Infrastructure@Platform - Google Slides
From Twitter: