Little Disasters Worse Than Big Disasters?


Photo by ˙Cаvin 〄

“The dose makes the poison.” – Paracelsus

I had coffee with a friend who runs a small IT service company on Friday, and we were talking a bit about preparing for disasters. He’s been doing this a long time and seen a lot of failed software systems.  One of the things he told me is that in his experience, IT systems are brought down from a series of small “disasters” creates more frequent & annoying dowtime than the big disasters. Yet many IT pros he works with spend their disaster recovery energy preventing and preparing for “The Big One.”

Why do we have a tendency to think of the big disasters that occur infrequently and neglect the smaller everyday problems?

I think it’s easy to look at everyday failure as annoyances that aren’t that bad.  We can work around them, and it’s not worth preventing them from happening again.  If stopped at looked at them in aggregate, you might see the ROI in preventing them through improved uptime.

If you have a frequent number of small outtages and downtime, you’re probably at risk of upsetting more customers & employees than during a large catastrophe.  If there’s a major flood, hurricane, or fire, your customers are likely going to be concerned about their own safety & property.  And they are more likely to excuse a hiccup in your website or service (if they even notice!)

I’m not suggesting you shouldn’t take the necessary precautions for a major disaster event- backup, redudancy, etc.   What I am saying is that reducing the risk of small frequent failures in your system may actually yield better results than just focusing on large scale disasters.

And the best approach would be to invest in IT and application redudancy that will help with both situations if possible.

Where do you spend your disaster recovery money?  Planning for the Big One? Or making your system more resilient to smaller failures?  Or both equally?

About Kit Merker

Product Manager @ Google - working on Kubernetes / Google Container Engine.
This entry was posted in Business Continuity, Disaster Recovery, Downtime, Technology, Uptime and tagged , , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s