On Friday January 13th, the Costa Concordia had a disaster – running into rocks off the shores off Italy’s western coast and eventually rolling onto its side in the water. The toll on human life is tragic – several are dead (the number still growing), more missing, and everyone involved went through a traumatic experience.
The saddest part of the story is that it appears it could have been prevented. And if not prevented, could have been handled better.
As I’ve been following the story and reflecting on it, a few things have jumped out that I think we can learn from.
I don’t know the details of the timeline of events that caused the accident, but I’m guessing there were a few design problems at play. For example, I’d people were ignoring early warning systems because they seemed like false alarms. It’s not whether the alarm goes off, but how seriously you take it that matters.
Also, I’d wager that the “designed” social structure on the ship prevented junior staff from questioning senior staff. This was evidenced in a video I saw where the crew would not release the lifeboats until the captain ordered abandon ship. And unfortunately it was too late for a smooth exit at that point.
Another issue that keeps coming up is the readiness of the crew to deal with an emergency. This video shows the descent into chaos. It definitely raises questions about what is the right way to train a team to deal with bad situations. How often should this be done? How would the training or simulation be performed? Is it really worth it?
There are two levels that I think about with emergency training. The first is “What information will be needed and what procedures will need to be performed?” This is relatively domain specific, and needs to be thought through carefully. What knowledge must be kept in the head vs. being readily available at the time of crisis? And how can you reduce and simplify all these procedures down to the absolute minimum to ensure success?
The second level of preparedness is more general – “What mindset should an emergency responder be in and how will they perform under pressure?” If you know all the procedures but choke under pressure, you will not be successful in handling a crisis. So any training program you have must include a “field” portion where people are asked to experience something that feels like a real crisis – confusing circumstances, time pressure, and dire consequences for non-performance.
If you can combine the two readiness techniques into a safe, comprehensive, repeatable training program, you can be ready for a disaster.
One assumption of how the ship operates is that the captain is in charge and knows what he’s doing. In this case, that assumption turned out to be wrong. How could the design of the system have been improved to avoid it?
Why did the ship go off course in the first place? Where there any systems on board (computerized or human) to notice and raise the issue? We don’t know yet, but hopefully will learn when the full review is completed.
They may have had computerized mapping systems could watching the route and complaining when things changed. Sensors below the ship could also have been watching for rocks and debris and alerted the crew when something was approaching. And the crew might have been aware that something was wrong.
Ultimately whatever detection systems they had either didn’t work or were ignored. Sometimes too many errors makes you numb.
We’ve been looking at this in the context of emergency procedures on a cruise ship, but the exact same principles apply to your software systems. If you can think about human error as flawed design, improve your signal the noise ratio, and make errors OK, then you can improve the resiliency of your systems.
How has this tragedy made you think differently about disaster preparedness and recovery? How ready are you?