Heat Spike Caused Microsoft’s SkyDrive and Email Outage

Rising temperatures in a data center can cause significant issues.

Looks like the servers running Microsoft’s Hotmail/Outlook.com service got a little too hot under the collar.

On March 12, a service interruption affected some users of Microsoft’s Hotmail.com, Outlook.com, and SkyDrive cloud-storage service. It took a few hours for Microsoft to fully diagnose the problem and apply a fix. “In one physical region of one of our datacenters, we performed our regular process of updating the firmware on a core part of our physical plant,” read a postmortem posted on the Outlook Blog. “This is an update that had been done successfully previously, but failed in this specific instance in an unexpected way.”

That “unexpected way” was a significant spike in temperature within the data-center. “This spike was significant enough before it was mitigated that it caused our safeguards to come in to place for a large number of servers in this part of the datacenter,” the blog posting added. “These safeguards prevented access to mailboxes housed on these servers and also prevented any other pieces of our infrastructure to automatically failover and allow continued access.”

And because that infrastructure hosted elements of Hotmail.com, Outlook.com and SkyDrive, a portion of those services’ users ended up affected. Microsoft’s infrastructure software, assisted by human beings, worked in tandem to bring the issue under control, restoring the majority of impacted mailboxes before midnight.

This wasn’t Microsoft’s only outage in the past few weeks: in February, its Windows Azure cloud service went down for a brief time thanks to an expired SSL certificate. In that instance, Microsoft blamed itself for a few mistakes; first and foremost, it had acquired the necessary SSL certificates in a way that allowed the blobs, queues and tables used by storage accounts within Azure to expire at roughly the same time worldwide—creating the possibility of a global outage.

As more and more businesses migrate to the cloud, of course, the prospect of losing some vital function to an unexpected outage is one that IT administrators have come to expect. But that doesn’t make downtime any less aggravating.

 

Image: Jiri Vaclavek/Shutterstock.com

Post a Comment

Your email address will not be published.