and melted ice cream are equal in terms of severity, both are major incidents.
How do you handle major incidents?
When you were faced with a major incident, how was it handled? Reflecting on this can tell you a lot about how you’re currently prioritizing major incidents, and highlight ways you should consider changing your
current operations.
Because we tend to under-define incidents as major, we also tend to not respond to them in the most effective way. Situations like not knowing there’s a bug in an app until a customer reports it, or having to personally call on-call responders, don’t seem like they could be super impactful. But, if the bug in the aforementioned app is an error between the database and the payment page, all of a sudden the business just lost the potential for a number of sales. And, if no one reports that bug for two weeks, that financial loss adds up quickly.
Many major incidents rely on manual intervention to be acknowledged, responded to, and resolved. But manual intervention can be slow, and there’s truth in the saying time is money. The answer to this problem
.
unfortunately isn’t training your employees to be more responsive or work faster. The answer is for your employees to work smarter, with the support of automated processes.
Upgrading Your Major Incident Response Practices
Automating certain processes in your daily operations can make a significant impact on your major incident response practice and in turn the whole of operations in your business.
Consider our melted ice cream example. Instead of waiting for a store clerk to find that a freezer door was left open and hundreds of pints of ice cream are oozing across the ground, an integrated workflow
between the software controlling the freezer temperatures and an incident management tool would be monitoring and logging temperature data. When the temperature was recorded above the preset limit, a
notification would be sent to the on-call store manager on the device of their choosing alerting them that something was wrong with the freezer. The manager can quickly respond to the incident, closing the
freezer door and checking that the product was still safe to sell—presto, a major incident avoided.
Or, in our buggy shopping cart example, instead of waiting for a customer to log a complaint that the payment process isn’t working correctly, an integrated workflow set to monitor the toolchain would identify the connection error. A notification would be sent to the on-call responders, say a DevOps Engineer and Project Manager, alerting them of the incident mere seconds after it was identified. From there, they can quickly hop on a call, find the root cause of the issue, and get working on a resolution.