I had a chat with a small software business owner the other day, and we were talking about preparing for disasters. He asked me, “Since I’ve moved to a public cloud service, I’m protected, right?” While public clouds significantly reduce a number of risks – hardware failure, human error inside a data center, physical security, and depending on the service and how it’s deployed, geographic risk, network connectivity risk, and storage redundancy, to name but a few.
But there are a number of disaster risks that could still affect you that you need think about and prepare for. And you still need to make sure you’re designing your application and deploying it properly to take advantage of any resiliency & recovery features in your cloud plaetform of choice.
For example, you may see a sudden spike in users due to seasonality, a great new promotion you’re running, a sudden viral video that mentions your site, etc. This your traditional “great problem to have”, but it may wreak havoc on your site or service if you aren’t prepared. You may need your team to pay attention to monitors proactively and increase the number of VM’s or server roles. You may also have scalability bottlenecks hidden in your software that you never exposed through testing, which might result in poor performance, or a crash. And even if your service runs perfectly fine under these conditions, you may have trouble when the spike ends and those VM’s need to be spun down to avoid additional cost.
On top of the risks that you’re still exposed to, you need to also realize that you are not just a single tenant in an ocean. You no longer know the people that touch your machines, or even which machines are “yours.” This is necessary for scale and redundancy, but it can create anxiety and make you feel a bit anonymous. Who do you call if you have questions or concerns?
What this means is you need to be prepared for what you would do in a worst case scenario when a catastrophic issue affects the entire platform. How will you learn about it before your customers? How will you explain it to them?
There are answers to these questions, but the right time to ask them is not in the middle of unplanned downtime.