The reason why the lights went out

A number of Microsoft services (Hotmail, MSN, Skydrive, Office 365) recently had an outage. Microsoft is now reporting that the issue was due to a failed DNS update. You can read the details here:


The first thing to note is that Microsoft has acknowledged and explained what the issue is. This will hopefully silence the critics claiming a “cover-up” of sorts. The second thing that it illustrates is that even on the Internet there are still critical points of failure (DNS being the case in point here).


The service being down was inconvenient, sure, but the reality is that problem was rectified fairly quickly. The major issue is the number of people impacted. That certainly makes the issue a higher profile but the reality is these things happen. Not often, but they do happen. We still suffer the occasional power outage, yet we have learned to live with that. Perhaps we need to understand that moving to the cloud will never mean 100% uptime and there will times (few and far between hopefully) that we won’t be able to access our information stored there.


Given that people should understand that, the question is what do they do to prepare for the situation. I can tell you that many people have a torch or candles for when the power goes off but what planning have they done for their IT systems? No matter where IT systems are, I’ve found most people never think they’ll have an issue. They get lulled into a false sense of security because the system is generally so reliable.


Let’s rule out technology and simply look at risk. Is there risk? If yes, how do you minimize it? Note, I said minimize not eliminate, because generally you can’t totally eliminate. If you don’t take steps to minimize risk in your business then you’ll suffer the consequences sooner or later. No matter where your technology is you need to, as the boy scouts say, “be prepared”.

