Proactive Server Monitoring Pitfalls

Svet Stefanov

Svet Stefanov

The following is a guest post by Svet Stefanov.  He is a freelance writer and
a contributing author to WebSitePulse’s blog. Svet is currently exploring the topic of proactive
monitoring, and how it can help small and big business steer clear of shallow waters.

The first step in fixing a problem is admitting you have one.  Fixing problems in software systems is no different. Server issues are not a thing of the past and as the Cloud continues to grow and further meshes into our lives, these problems are not going away anytime soon. Monitoring different types of servers, SaaS, or even a single website is not only a good practice, but a long-term investment. If you have the time, I would like share my thoughts on why it is better to be prepared for the worst rather than blissfully avoiding it.

I’ve heard people say that monitoring is the first line of defense. In my opinion, the first line of defense is a great recovery procedure and embedded redundancy. Adequate monitoring systems have two main functions – to detect a problem and to alert concerned parties. More advanced systems with extensions & complex configuration can also take action without human intervention based on a predefined set of rules.  But even a few small investments in monitoring can yield great improvements in reliability.

Internal or external monitoring? Continue reading

Posted in Business Continuity, Cloud, Disaster Recovery, Downtime, Technology, Uptime | Tagged , , , , | Leave a comment

Software You Hope You Never Use

Photo by miguelb

I was reminded recently of a project I worked on several years ago for a local university.  This software was a simple system to be used in emergency situations to help account for students who may have been affected and get them connected to medical services or parents as necessary.

Usually when I write software, I get excited by the idea of people using it and being able to enjoy it or at least have it solve a problem for them.  In this case, you hope that no one will ever have to use the software. 

As I think about preparedness for an emergency and creating software to be used only in extreme and pottentially catastrophic circumstances, it creates some unique design challenges.

Prediction

Continue reading

Posted in Uncategorized | 1 Comment

What You Wish You Knew During a Crisis…

From my guest post at ContinuityInsights.com

Continuity InsightsDuring a crisis, there is almost by definition a shortage of accessible information. Because of the time pressure a disaster creates, anything considered noise gets filtered out and ignored. However, if you could create a plan to track the right informatoin and make it available during difficult times, it could mean the difference between tragedy and a close call.

Continue reading (at ContinuityInsights.com)…

Posted in Business Continuity, Cloud, Disaster Recovery, Downtime, Technology, Uptime | Tagged , , , , | Leave a comment

Thoughts on Windows Azure Leap Day Downtime

I’d be remiss not to mention the Windows Azure Downtime on Leap Day.  Because of my employment at Microsoft I won’t speculate or say too much on the situation.   I have said before that cloud computing does not completely alleviate the risks of downtime.

I would like to reiterate that there are always inherent risks in building and running software, and failure is to be expected not avoided.  The best designed systems are set up for failure, and can handle these cases with grace.  This particular event with Windows Azure further highlights the need to design applications that sit on top of any infrastructure (traditional, cloud, or hybrid) in such a way that they can work when (not if) a major portion of the infrastructure fails.

Don’t be fooled into thinking that any cloud service provides a silver bullet to resiliency.  Outsourcing your IT infrastructure to a cloud provider greatly improves your resiliency to for the cost you have to pay; most of us cannot afford to build & maintain a fault tolerant world-wide infrastructure.   When a failure does occurs, don’t overlook the economies of scale that benefit the application tenants most of the time when things are working properly.

Posted in Business Continuity, Cloud, Disaster Recovery, Downtime, Technology, Uptime | Tagged , , , , , | Leave a comment

Through the Storm – Interview with Arterian IT Founder Jamison West

Jamison West

"Having a comprehensive plan that's bigger than just IT is key, but often IT can be the forcing function to get you started."

I recently had a chance to interview Jamison West of Arterian. Jamison, who founded the company that is now Arterian in 1995, envisions a future where every small to mid-sized company will have an IT partner become a vital part of its core operations team keeping them free from disaster and flourishing.

SoftwareDisastersBlog: How do you help your customers prevent and prepare for IT disasters?

Jamison West: We see with our customers that reliance on connectivity is higher than it’s ever been for businesses to execute and support their customers. People now expect email to work like instant messaging, sent and received as fast as they type it.  We try to prevent IT issues  by adding redundancy to make sure that if there are problems — natural disasters or bad weather like we had recently in Seattle  — our customers are still up and running at least for critical operations.

Continue reading

Posted in Business Continuity, Cloud, Disaster Recovery, Downtime, Technology, Uptime | Tagged , , , , , | Leave a comment

Learning from the Costa Concordia Shipwreck

Costa Concodia

Photo from csmonitor.com

On Friday January 13th, the Costa Concordia had a disaster – running into rocks off the shores off Italy’s western coast and eventually rolling onto its side in the water.   The toll on human life is tragic – several are dead (the number still growing), more missing, and everyone involved went through a traumatic experience.  

The saddest part of the story is that it appears it could have been prevented.  And if not prevented, could have been handled better.

As I’ve been following the story and reflecting on it, a few things have jumped out that I think we can learn from. 

Human Error

Continue reading

Posted in Business Continuity, Cloud, Disaster Recovery, Downtime, Technology, Uptime | Tagged , , , , , , | Leave a comment

5 Disaster Preparedness Resolutions

happy new year 2012

Photo by Creativity103

You might think that new year’s resolutions are made to be broken.  Whether it’s to exercise more, chew fingernails less, or other clichés, they are hard to follow through on.  Witness the packed gym in January that becomes empty before March.

When it comes to keeping the systems that run your business humming along, the new year is a good time to pause and reflect on what you can do differently. And  you also have the energy to take action to make it a reality. But don’t let your well-intentioned resolution become lost in the shuffle of forgotten promises of self improvement.

1. “I will learn from last year”

Continue reading

Posted in Business Continuity, Cloud, Disaster Recovery, Downtime, Technology, Uptime | Tagged , , , , | 1 Comment