Below you will find pages that utilize the taxonomy term “Sysadvent”
Root Cause is Plural
Below is a copy of my post from Sysadvent 2017 (Day 3). I’d like to thank Kerim Satirli (@ksatirli) once again for his help in editing the post and improving it.
Root Cause is Plural
Post-mortems are an industry standard process that happens after incidents and outages as a method of continuous learning and improvement. While the exact format varies from company to company, your post-mortem report typically addresses the Five W’s:
What happened?
What happened?
Where did it happen?
Who was impacted by the incident?
When did problem and resolution events occur?
Why did the incident occur?
The first four questions are generally easy to answer. The question that takes the majority of the time is the why. To determine why the incident occurred requires investigative skills, critical thinking, and logical deductions. Sometimes determining the true why takes multiple incidents, as various fixes are attempted before the incident is resolved, but eventually a “root cause” is designated as the root of all the problems and the report is complete.
But if your “root cause” amounts to a single failure, you have stopped your process too soon.
Take That Vacation: Eliminate Alerts Dragging You Back to the Office
I authored this as part of SysAdvent, which posts one system administration-related post each day in December, ending on the 25th. You can find the original posted here: http://sysadvent.blogspot.com/2016/12/day-15-take-that-vacation-eliminate.html
–
It’s mid afternoon and you just sat down for that holiday meal with your family and friends. Your phone goes off and you look at the number. Work, again.
Before you even read the text or answer the call with the robotic voice telling you about the latest problem, you’re wondering to yourself “how long it will take?” Your relatives are only in-town for another day or two, before you have to take them to the airport. What if it goes off again later? A holiday potentially ruined.
You read the text. Maybe it’s a false alarm. Maybe it’s not. Either way you’re out of the moment–worrying about work and if things are going to break over the holidays.
Don’t Be Your Own Grinch
It’s possible to engineer yourself and environment for success.