Tech Ops As A Metaphor For Building, Running, & Leading A Company

I am giving a short talk at the Velocity Conference this morning in NYC.  The title of this post is the title of the talk. That's because I am putting the finishing touches on the talk this morning. Since I woke up late (6am), I don't really have time to both finish the talk and do a post in the next hour. So, I'll do both at the same time and in the process publish the talk here so that anyone who wants to can get the gist of it.


I am an engineer. And so are most of the folks who work in tech ops. The engineer mindset is to build stuff, scale stuff, and make it work reliably and consistently.

When I was at MIT, I majored in Mechanical Engineering. I loved computers and software but my dad was a mechanical engineer and he encouraged me to study it. So I did.

The mechanical engineering course that really sucked me in was “Systems Engineering” where we learned about large scale systems, how to think about them, build them, and operate them. We learned about the interdependencies between the components in the system and how they could change as the system grew and scaled.

Tech Ops is Systems Engineering as applied to large scale computer systems. All of the foundational systems engineering principles are well understood, practiced, and perfected in a good tech ops team.

Management is also Systems Engineering as applied to a large and growing company.

What I want to talk about today are the similarities between Tech Ops and Management. Some of you, maybe many of you, will someday find yourself starting or leading a company. And I think the work you do on computer systems can be a metaphor for the work you will find yourself doing on people systems.

So here are eight rules for managing both computer systems and people systems along with the language we use to talk about this concept in the world of people systems.

1) things that work well at small scale break at large scale – you need different people, processes, and systems as a company grows

2) you need to instrument your system so you can see when things are reaching the breaking point, well before they break – you need to implement employee feedback systems, ideally real time systems, so you can measure how a team is functioning over time

3) there is always one problematic component in a system that causes the majority of the scaling problems and must be rewritten – team members, particularly super talented ones, that cause friction and pain in the organization need to be transitioned out, no matter what the cost

4) there is no silver bullet to scaling systems – there is no such thing as a “world class CEO” who will solve all of a company’s management problems

5) loose coupling of components is critical, you can’t have one component fail and take down the entire system – build resiliency into your organization, processes, and systems

6) blameless post-mortems are the key to learning from a tech ops crisis – fear driven organizations do not scale

7) over-reacting to a crisis is likely to make it worse – calm in the face of adversity is one of the signature traits of great organizational leaders

8) overbuilt systems are hard to implement, manage, and scale – build the organization you need when you need it, not well in advance of when you need it

I could go on and on. There will be a bunch of great talks today talking about systems engineering concepts that you can and should implement in your organization and system. But as you listen to the talks, I would challenge you to think about what the anlalog to that principal is in managing your organization – your people system. Because it turns out that is the most important system you will manage in your career.


I would like to thank Albert, Jerry, Chad, John, and Kellan for their advice and suggestions on this talk/post.