There's no right way to do SRE, but there is a wrong way. Here at Aiven we're getting SRE right, because maintaining a stable environment is crucial for an infrastructure-as-a-service provider. That's why we wanted to spread the goodness and help any straying friends find the true path again.
Here are some questions you can ask yourself and your teams to find out if your SRE game is on point.
1. Are your SRE efforts in a silo?
Good SRE comes from a working cooperation between developers and operations. If ops keeps dealing with the same issues again and again, clear the lines of communication between the teams and find solutions at the source.
2. Are you measuring SRE success in uptime?
Measuring SRE success is a step in the right direction. However, don't just stare at uptime. Yes, it's what you're promising your customers, but it's not the whole story, and to reach those last two nines you need more.
In addition to uptime, make sure to track availability and resilience. After all, your environment may be up, but unreachable - what use is that for your customers? Then when things go haywire and the environment crashes, good resilience is what enables it to get back up again quickly.
3. Is your SRE all about reacting to emergencies?
If you write "how to do SRE" in a search engine, it can tell you that SRE should not be reactive but PROactive. That's how much of a no-brainer it is. The reason your search engine knows about it, though, is that in many companies, it's still reactive.
Your approach needs to be all about continual improvement. When a thing breaks once, make sure it doesn't break again.
SRE is not an an ambulance service or even a hospital. It's more like the low-key, everyday health care service that your occupational nurse provides.
4. Does your SRE take a lot of resources and effort?
SRE won't run on empty, but it also doesn't need to break the bank. Invest in automation and efficiency, and watch the ROI mount up.
Repetitive tasks and monitoring are things automation excels at, so be sure to use that to your advantage. Set up alerts; set up automatic processes for known vulnerabilities (unless of course you can just fix them!).
If you want to run a stable environment, treat your SREs well, have a proper roadmap, enable cross-function communication and automate everything.
To get the latest news about Aiven and our services, plus a bit of extra around all things open source, subscribe to our monthly newsletter! Daily news about Aiven are available on our LinkedIn and Twitter feeds.
If you just want to stay find out about our service updates, follow our changelog.
Are you're still looking for a managed data platform? Sign up for a free trial at https://console.aiven.io/signup!