Incident Management: The Ministry of Truth

By Thomas Coyle | October 2008

I have spent 21 years in the information technology trenches, having worked my way up through the ranks. At one time or another, I have done just about everything including front-line phone support, server-slinging, data center design, network security, web development and working as a trainer. I’ve been a grunt, a senior manager, and everything in between, at mom-and-pop companies, venture capital-funded startups and Fortune 50 conglomerates. What I do now is the most exciting, interesting, well-compensated, and, frankly, entertaining job I’ve ever held: I manage the incident management team at a large financial firm.

For those unfamiliar with the ITIL framework, the goal of the incident management process is service restoration and the task of the incident manager is to drive this restoration. Put another way, while an incident manager doesn’t push the buttons to fix problems, he or she is responsible for establishing the facts, determining which teams and individuals are most needed to restore service as quickly as possible, organizing and driving those teams, documenting everything, and communicating to all appropriate parties the status of major incidents. In short: when something breaks, everyone in IT – from top to bottom – works for the incident manager until service is restored.

Effective incident managers have a reputation at all levels of the organization as the people who simply "get it done," no matter what the problem. In each of my two positions running incident management for multibillion-dollar organizations– one for a healthcare provider; the other position in finance – my incident managers and I have been the primary, direct conduits of incident information for all C-level executives. Not a day goes by that I don’t spend some time talking with the CEO, president or CIO, discussing the latest outage or reviewing upcoming projects. I have experienced no single position in IT with higher visibility across an organization, greater pressure to perform, or more potential for direct impact to the success or failure of the company.

The typical path to incident management is through the helpdesk as there are some commonalities in those two functions – a lot of time on the phone and in email, constant communication with all levels of the organization and extensive documentation. The major differences between these two duties are scope and severity: the helpdesk generally involves the resolution of tactical incidents affecting small numbers of users, while a dedicated incident management team generally deals with strategic incidents that directly impact the bottom line of the organization – the truly "ugly" failures.

The major differences between requirements for these two duties are time, experience and polished communication skills. An effective incident manager will have built a long resume that demonstrates expertise in as many areas of IT operations as possible, in order to communicate effectively with the often cross-functional technical teams working to resolve an incident. The person has polished his or her communications skills to the point of complete consistency and comfort when dealing with anyone in the organization, from end users, to front-line support, to back-end engineers, to executive leadership. The additional talents of "keeping cool under fire" and political savvy are absolutely essential for success.

Also, while not officially an incident management duty, an interesting side effect of the incident manager’s necessary extensive technical background and knowledge about cross-functional impact in a given organization is that IMs will often be called upon to review upcoming major projects holistically, outside the realm of the normal project management framework. We essentially act as consultants directly to the C-level, to function as the devil’s advocate. Surprisingly, many organizations – even those that are very large – lack an enterprise architect role to oversee cross-functional projects in this manner. My IM teams frequently have been used to fill this gap and working part-time on these projects can be an excellent resume-builder, a step towards an official enterprise architect role. 

So if you enjoy exposure, pressure and a new experience every day; have a great resume demonstrating deep and broad experience; live to solve problems and want to be known as the "go-to" guy or gal for IT in your company; want not only to know truth about all major incidents, but actually to be the definitive source of that truth to the most important people in the company; and are looking to be part of a relatively new, powerful IT discipline, incident management may be right for you. Get the ITIL Foundation certificate and think big!