Aiven Blog

Dec 16, 2022

Evaluating your event streaming needs, the software architect way

Join David Esposito, Recovering Over-Engineer-er, in exploring streaming solutions – and how The Way of the Software Architect is key to tackling them.

david-esposito

David Esposito

|RSS Feed

Solution Architect at Aiven

andrew-hindle

Andrew Hindle

|RSS Feed

Your friendly Aiven Copywriter

What is a software architect?

If you ask ten people what an event streaming platform is, you’re likely to get ten different answers (not including “I don’t know” and “how did you get in here?”). For the purposes of this blog post, though, an event streaming platform is any infrastructure or technology that event-driven apps run on or integrate with, to provide data updates at short intervals.

This definition covers a whole range of different alternatives in terms of infrastructure - what kind of tech you can adopt, from bare-bones DIY to abstract architectures where insights present themselves as if by magic from your published events.

We’re not here to make that decision for you. We’re here to help you make the decision yourself, and understand the risks and factors involved. You don’t need all the answers, but you do need to know what questions to ask. And that's a software architect's job in a nutshell.

At their best, a software architect has:

  • Technical domain expertise at app, industry, and infrastructure levels
  • Experience, from failures or successes, in solving problems
  • Unconventional curiosity about how different systems interact, different to the developer’s laser focus on individual cases

In this post, we’re going to learn The Way of the Software Architect, and how it can help you with your event streaming solutions:

The wisdom of the architect

“Wisdom” is just a nice word for experienced failure. Or success! But mostly failure.

A software architect is curious about how systems interact, how tools and systems break each other, and what is impacted when (not if!) they do.

On a technical level:
A software architect looks at how their code impacts other code, as regards performance, resilience and reliability.

On a department level:
A software architect looks at how their code impacts other teams, as impacts compatibility, contracts and empowering.

On a company level:
A software architect looks at how their code impacts other departments and companies, specifically related to go-to-market impact.

A software architect will ask more questions than they have answers for. And that’s the way it should be.

  • What breaks first if you double your customers overnight?
  • What breaks last if you increase your number of customers tenfold overnight?
  • What sort of end-user experience do you want?
  • When an error happens, what do we do?
  • How much does one minute of downtime cost your company?

To address some of the challenges a software architect faces, you need to learn how to think like an architect. And as you can imagine, we’re going to be asking a lot of questions.

Transformation is coming

How have successful companies from a range of industries used a managed data platform as part of their transformation strategy? Check out how they solved their business and technical challenges from our ebook.

Grab it here

The inevitability of downtime

The cloud is the future, and it’s got a certain technological mystique due to the fact that so many people “know” what it is (without really knowing). But it is still a box, plugged into a wall somewhere. And failure is a question of when, not if.

Downtime will happen. Are you ready?

I can’t believe it’s not downtime

A wise software architect (or software architect thinker) will read all the fine print in their service-level agreements (SLAs), especially when it comes to downtime.

Did you know, for example, that a lot of the time maintenance windows aren’t counted as downtime at all? And that can make a critical difference.

Read that fine print and keep an eye out for exemptions!

All the nines

Pay special attention to the uptime nines — and the difference between obligated and observed uptime.

When a company offers 99.9% uptime, 99.99% uptime or 99.999% uptime, it doesn’t seem like a huge difference. But those mean the difference between seconds, minutes, or hours of downtime per year.

3 nines, for example, might mean you’re looking at about 9 hours of downtime per year. 5 nines gets you down to 5 minutes a year. That means they’re essentially contractually agreeing to about thirty seconds of downtime a month, although of course downtime doesn’t happen like that. One or two major outages a year, however — that’ll do it. This is the level of uptime SLA you see in healthcare, regulated industries, and some financial institutions — where lives, or livelihoods, are on the line.

Now, is that the same as the observed uptime — the actual time your systems spend up and running, and not down for various reasons? No. No it isn’t. First, just for example, there’s the maintenance exemption. And some companies will cover their uptime shortfall by handing out credits when they don’t meet their SLA. And that’s not great.

When disaster strikes

It’s important to have a disaster recovery plan. You do have a disaster recovery plan? That’s great! But if you haven’t run a practice disaster scenario and used that plan, then you don’t actually have a disaster recovery plan. What you have there is a plan to address what you think will probably happen during an outage.

What’s going to happen when your system experiences a failure? Is your infrastructure at risk of data loss? Are you okay with data loss? Understanding the different potential failure cases and what you’re going to do about them is critical.

Questions of team impact

Of course, through all these concerns of infrastructure stability and keeping the databases running, there’s a key question everyone is asking:

How do you earn money?

If you’re running Apache Kafka or some other databases yourself, are you earning money by just being really good at running infrastructure, or do you pay the bills by providing your customers with the data and value they want?

Where and how you invest in resources is going to be important to your bottom line.

Whose headache is this?

If you’re going it alone, you’re responsible for ops, reliability, app development … you need a huge, broad foundation of expertise. It changes how you hire talent, and how you plan and build your entire team structure.

When you’re working with a vendor or building on a platform, there’s less immediate pressure — but it also puts more responsibility on you to understand what’s expected of them, and what’s expected of you.

Make sure you have the right expertise. If you don’t have them on staff directly, do you have the right partners or support plan? Always have someone thinking about these:

  • Metrics and monitoring
  • Scalability
  • Cost reduction
  • Application architecture
  • Query optimization

The cost of lost opportunity

If you’re responsible for your own ops, you’re responsible for the reliability of your service. You have the final responsibility for juggling the need for uptime against spikes in your traffic or data-load; against unforeseen alerts; against patching your servers and running maintenance. That’s all on you, and you need to know it all.

You can’t deliver revenue-driving features if you’re too busy doing the work that’s not directly earning your company money. Don’t sacrifice developer cycles to ops tasks if you don’t have to. Wherever the ROI makes sense, consider offloading some of that responsibility.

Other perspectives that impact team efficiency

  • Alice and Bob vs. Stack Overflow: Out of all the errors and bugs you encounter, how many of them can be solved by going online to Stack Overflow to dig for a solution? And how many of them have to be solved by Alice and Bob, who built that thing years ago and are the only ones who know anything about it? Those answers affect your approach to onboarding new developers, and how quickly they can ramp up and start delivering value.

  • Build vs. Buy: What kind of investment do you need to make , to make this tech work with your current setup? Do you go on doing it yourself, working with the infrastructure you have, or do you start from scratch and get it scalable and stable from the ground up?

  • gh(T): You may have heard of the GitHub of T function. It’s all about searching tech in GitHub — Apache Kafka, Postgres, whatever — and identifying how many repositories are related to that tech, and how many devs are committed to it. This gives you an ecosystem score for that tech. If you can understand that score and its importance, you’ll see how easy it is to hire experts in that tech field directly, or if you need to hire and train.

Building on the shoulders of giants: integrations, tooling, and open source

We’ve looked at:

  • How apps are impacted by your decisions
  • How are teams impacted by your decisions

Now we’ll zoom out a bit more, and think about how entire tech departments (engineering, operations, DevOps) are impacted by your decisions and your approach to solution building.

And when it comes to building solutions, building on the shoulders of giants is key. What’s ready and waiting, already provided by the ecosystem? You can take advantage of that established foundation, and use it to earn and save (yes, it comes back to money — hey, they don’t call it the bottom line for nothing).

Proprietary vs. Open Source

A wise software architect knows how sticky some proprietary software and tooling licenses are. If you want to explore the benefits of another cloud or another tech offering, it can be challenging. You need to make sure you have a backup plan.

Observability

Build observability into your architecture from the start. Observability is a must. If an app team or engineering department makes a decision without consulting other teams — DevOps or security, say — it creates blind spots in other teams and processes. And when your teams are working blind, problems arise.

Security and governance

Streaming data – data collected in real time and dashboards that reflect your up-to-the-second business reality – is more valuable than gold in today’s tech environment. And ensuring that your customers can be confident in your handling of their personal information, or their data usage, is critical.

Don’t overlook the importance of quality control. Testing is an important part of a functional infrastructure. Remember how we said that without a disaster, your disaster recovery plan is basically just a fairy story? Well, without testing, your system stability is a joke — there, we said it.

Total Cost of Ownership

And now we’ll zoom all the way out:

  • Your company and its overall scalability
  • Your company’s place in its broader industry
  • The ways you can push the limits of that industry

The most expensive part of your infrastructure is between the keyboard and the chair. Enable your sprint teams to be fully effective and deliver value. You’re paying for them — like anything else you’re paying for, be smart and get your money’s worth!

Have a line item for everything. Networking, tooling, you name it. If a server comes with a price tag, but you don’t know your company’s and customers’ usage patterns, you won’t know whether that server is worth the cost.

Understand the tooling. Know whether you have enough expertise at your fingertips. Know what investments to make. And don’t be afraid to ask your big providers for some free credits to experiment with!

Billing patterns

Not to name names, but there’s a bit of a trend in startups and how they scale for growth.

With the annual commits model you pay for access to a given functionality. It can scale alright, but it’s essentially a licensing fee.

In recent years, it’s become more profitable to take away some of those costs, moving to a usage-based provider instead of a license. On an application ops level, you pay according to the committed or duplicated actions you take. This works fine at the start, but it scales very fast, and often unexpectedly.

The solution is reserved infrastructure. You are responsible for capacity planning and understanding your workloads. Purchase plans that meet those needs. It makes price predictability way more manageable.

So many questions

We can’t say what’s going to work for your company. Only you can say that. But hopefully we’ve given you, through the lens of the software architect, a glimpse of the questions you need to ask.

  • How often should I re-evaluate my choice of event streaming platforms?
  • When do proprietary software / solutions make sense vs. open source?
  • How much can I outsource? How much should I outsource?
  • When does self-management make sense?

This is the way

Take the first step today

Book an Aiven expert for a chat and find out if you need our managed data infrastructure.

Book a demo now

Want to read more?

To get the latest news about Aiven and our services, plus a bit of extra around all things open source, subscribe to our monthly newsletter! Daily news about Aiven is available on our LinkedIn and Twitter feeds.

And if you just want to find out about our service updates, follow our changelog.


Related resources