We expect websites & online services to just work.  And why not?  It’s not like they need to regular breaks to stretch and walk around.  But when you’re running a website or online service, attaining higher levels of uptime is very difficult, and only gets harder the less downtime you expect.

It can be hard to visualize things that are very small or fast.  For example, it’s easier to think of light as distances (light travels slightly less than 1 foot in 1 nano second).

To make it easier to imagine downtime, here’s what your site could do with it’s “break” each month:

99.9999% – 2 seconds – Your site literally had a hiccup.

99.999% – 25 seconds – Your site spaced out and had a short day dream.

99.99% – 4 minutes 19 seconds – Your site took a smoke break.  It just needed to get some fresh air for a few minutes.

99.95% – 21 minutes 36 seconds – Your site watched an episode of Seinfeld – and Tivo’d through the commercials.

99.9% – 43 minutes 11 seconds – Your site took the dog for a long walk, and forgot to take the cell phone along.

99.5% – 3 hours 36 minutes – Your site decided to watch a football game instead of taking orders.

99% – 7 hours 12 minutes per month –  Your site took a sick day and needed to rest up.  It’ll be back to work tomorrow.

Attaining a higher uptime translates directly into profit for your business.  Not only does it let you advertise your uptime level and gain trust, but it also ensures that customers can find you, engage with you, and buy from you.  And if you’re providing an online service, your customers rely on your uptime in order to run their business.

When you start getting above four 9’s, you really have to plan and think hard about how to handle upgrades, spikes in customers, hardware failures, and software bugs.  I believe the key difference between great uptime and a site that sucks is preparation.

How much have you thought through the scenarios when your site might fail?  Do you have a mitigation plan for when things don’t go your way?  How well does your team understand the internals of your system?  Have you designed for failure?   Have you isolated components?  Do you have redundancy?

Or is your site watching The Puffy Shirt?

Note: I used to calculate the uptime.

