Site Reliability Engineering fuses the domains of software development and well-known operations.
SRE engineers craft solutions to implement the DevOps principles just as a class implements an interface.
But the story is much more than just a simple implementation of concepts that are already relatively abstract and not too well defined; we all know that DevOps is more about cultural change.
I believe that SRE engineers play a vital role within a real cloud-native business. Their unique position enables them to provide insights that, no-one else within an organisation can.
The magic of the SRE grows as you understand that they are the only people who have access to information that validates the value delivery through a combination of business, application and environment metrics.
They are in a position to provide a full picture — a picture vital to the sustainability of every business going through the entire lifecycle of a product.
The truth is that your SRE engineers using statistics can provide insights that you have never dreamt of.
When looking at the DORA (DevOps Research Agency) metrics, we can still see a clear divide between dev and the ops.
Deployment frequency and the lead time for changes are what the Ops engineer usually would not care for, am I right?
Would you instead focus on the change failure rate or, even better, time to restore the service?
The answer is that both groups are crucial alongside the business metrics.
If we look at the whole lifecycle of a system, we generally observe:
- deployment metrics
- environment metrics
- application metrics
We need to define a set of business metrics that relate to the system’s performance as a whole, with a particular focus on the business value.
Let’s walk through an example. You (a product owner) with your team implemented a feature. SRE engineers within an organisation embedded full observability across all the necessary metrics. You can see that your change has been implemented and deployed quickly. Thankfully, no breaking changes occurred, but the question remains: how do you validate that you are delivering business value?
Without collaborating with sales and other parties involved in the value stream, there is no way for you to validate this unless…you involve the SRE wizards.
Continuing with the previous example: you know that the yield of the feature you implemented should accomplish the task of “increasing users that complete purchase of all items in the basket”.
Based on the application’s logging (assuming it was appropriate), you can extract metrics to show the user system-wide requests performed throughout the journey.
These metrics can then be analysed and show the user’s progression throughout the lifecycle of the purchase and either prove or disprove the impact the change has made.
It is evident that the insight profoundly affects how the SRE is seen by the business and the feature implementors.
They effectively hold the key to the kingdom of insights that businesses can utilise as a treasure box for game-changing insights about internal operations and user behaviour.
And this is why your SRE engineers need to be statistics wizards, they need both the door and the keys to enable long-term sustainability using data-driven insights.