I have an interest in bringing ideas from outside of the tech industry and seeing how they fit. After working with Kerim Satirli (@ksatirli) on my SysAdvent post about multiple root causes, he was kind enough to send me a book “Incident Management for Operations”. The book focuses on using the Incident Management System, pioneered in emergency services for fighting wildfires, in managing outages in tech.
“Incident Management for Operations” was authored by Rob Schnepp, Ron Vidal, and Chris Hawley of Blackrock 3 Partners. You can find the book on Amazon or Safari Books Online.
In A Nutshell
The authors have adapted the Incident Management System (IMS) for use in IT operations. IMS is a standardized, scalable method for incident response to facilitate coordination between responders. This translates nicely to organizations where separate departments or teams are responsible for different pieces of a business’s IT infrastructure, and multiple disciplines are required for incident resolution.
The book lays out the framework for IMS and includes examples of applying the framework to IT. Since implementation can vary in practice (alignment with DevOps, ITIL, etc.), the book stops short of prescribing how to setup organizations, but gives enough information to determine how your organization could adapt to IMS.
The authors provide a number of mnemonics such as “CAN” (Conditions, Actions, Needs), “STAR” (Size up, Triage, Act, Review), and “TIME” (Tone, Interaction, Management, Engagement) to aid in implementing IMS and effectively leading as an Incident Commander. If your organization implements IMS, I’d suggest making a quick reference card with these mnemonics to put on your ID badge holder in case you forget during a 3 a.m. incident.