No Silver Bullet: Creating an Experimental Devops Culture

I wrote this article because, as a New Relic Partner, we have many customers coming to us with slow applications, environment issues, and other performance issues that we are asked to diagnose and fix. But, I began to notice a pattern, unrealistic expectations of how to solve an increasingly complex problem. They wanted to find “that one thing” to quickly fix it all. I realized we were being asked for a silver bullet.

When I started out in tech things were a little simpler

There were code and servers. As long as the servers were up and there wasn’t too much traffic everything went fine. End users were fine if a page loaded in 5-10 seconds and applications were smaller and simpler.

Expectations have changed completely. The latency of more than 3 seconds for a mobile e-commerce page is a complete fail now. Now web environments have hundreds of applications that overlap running on hybrid co-loss and a proliferation of public cloud services.

We have customers who have over 600,000 requests a minute! We have others that are running global networks in dozens of languages with multiple releases a day by different teams. As a leading services partner to Github, we know a little something about the complexity of development with some customers having over 1000 developers committing simultaneously.

We have to be willing to think a little differently now

We need to understand interdependence, baselining applications, the relationship of networks, and code. This is a modern sports car, not an old carbureted V8. Sometimes when you lift the hood up there is just a single port to plug in a diagnostic machine. We have to be willing to hook up, read the codes and then come up with a plan.

When we start an engagement I am often asked how long it will take to either fix the problem or make a significant improvement. But wait… the red fix engine light went on and you want to know what’s wrong before we have even hooked up the diagnostic machine! Do you want to know how much faster the car can go before I have even opened the hood?

But wait it gets a little more complicated

As if this wasn’t complex enough already, everyone is now using a series of SaaS tools from code repo to alerting, automation, etc. In your Ci/Cd pipeline, there may be 5-10 tools on top of the plethora of cloud services. If you add in all the software that marketing, sales, support, etc use, then we have a full dog pack to pull the sleigh fast! Each of them excels but they are all built differently. So, there is a new problem. How to harness all the dogs so they pull together right?

The answer is as much art as it is science. It takes a feel, patience, and a little wizardry. We see the most success coming with customers who are willing to experiment and try things rather than expect to fix things right away with one solution.

For example in monitoring, when we implement New Relic it’s not just a matter of installing agents and done. It’s about first understanding the stakeholders and what they need to know to be more effective. It’s about understanding workflows and business goals. The big picture is a human problem! To really solve what pole are trying to do, you have to gain visibility into not only the technical environment but the wider business context.

Moving forward with experimentation

The answer is all about experimenting and trying things out. Where do we suspect the problems lie and why? We want to combine our experience of looking from code-level up with the people on the ground at the company who see issues every day. Then we want to make a plan of attack and be steady and systematic. If it doesn’t work- move on. If it does- why and what else can we improve with this knowledge? It’s painstaking work.

Nothing is more important than the devotion to keeping this momentum of build, measure, learn, monitor, fail and keep improving. Perhaps most importantly, what can be done to make the life of your developers and DevOps team easier? Bad workflows are often at the root of ongoing business problems.

I do come bearing some good news!

The tools we have now for seeing the problems in our technical world are catching up with the problems our technical life is creating! If you combine applications like New Relic, Jenkins, Chef, Pagerduty, Jira, Slack, etc, you can create some really sleek production and feedback pipelines. There are over 120 services on AWS that can do almost anything.

We work remotely with some of the best technical teams in the world triaging issues together and solving problems with shared knowledge and it works. If you are willing to create an environment of experimentation you can go fast, safer, and have a better time building great products and user experiences!

Ready to Learn about how to improve your workflows?

Tell Us About Your Project