POSTS
The Hidden Costs of On-Call: False Alarms
The video of my LISA17 talk is posted on YouTube.
Abstract:
On-call teams, postmortems, and costs of downtime are well-covered topics of DevOps. What’s not spoken of is the costs of false alarms in your alerting. The team’s ability to effectively handle true issues is hindered by this noise. What are these hidden costs, and how do you eliminate false alarms?
While you’re at LISA17, how many monitoring emails do you expect to receive? 50? 100? How many of those need someone’s intervention? Odds are you won’t need to go off into a corner with your laptop to fix something critical on all of those emails.
Noisy monitoring system defaults and un-tuned alerts barrage us with information that isn’t necessary. Those false alerts have a cost, even if it’s not directly attributable to payroll. We’ll walk through some of these costs, their dollar impacts on companies, and strategies to reduce the false alarms.
Talk slides:
If you would like to read more about monitoring and on-call, you may enjoy these posts:
- Reducing the Stresses of On-Call
- Overcoming Monitoring Alarm Flood
- Take That Vacation: Eliminate Alerts Dragging You Back to the Office (from SysAdvent ‘16)
- Using Fault-Tree Analysis To Reduce Failures in Software
Citations:
- Multitasking: Switching Costs. American Psychological Association
- STRESS…At Work. National Institute for Occupational Safety and Health (NIOSH). Publication 99-101
- Stress and medical malpractice: organizational risk assessment and intervention. Journal of Applied Psychology 73(4):727-735
- The Cost of Poor Sleep: Workplace Productivity Loss and Associated Costs. Journal of Occupational & Environmental Medicine: January 2010 - Volume 52 - Issue 1 - pp 91-98. doi: 10.1097/JOM.0b013e3181c78c30
- 10 things to know about sleep as the clocks go back. BBC
- State of the American Workplace 2017. Gallup
- 2014 Workplace Flexibility - Overview of Flexible Work Arrangements. Society for Human Resource Management.
- How a Flex-Time Program at MIT Improved Productivity, Resilience, and Trust. Harvard Business Review
- Patient alarms often unheard, unheeded. The Boston Globe
- XKCD 1205: Automation