Azure Crash Blamed on Expired SSL Certificate

Microsoft needed some time to put out the fires associated with the certificate expiration.

Windows Azure fizzled Feb. 22, as a worldwide outage affected the compute resources for several regions. The reason? An expired SSL certificate.

Microsoft began reporting an “issue” with its worldwide storage services at around 8:44 PM UTC, noting that all dependent services were impacted as well; those included Azure’s Storage services, plus media encoding and on-demand streaming.

The storage issues cascaded over to the compute environment. Microsoft posted the following message at 8:44 PM: “We are experiencing Service Management issues with Compute worldwide due to the ongoing impact to Storage SSL traffic. Virtual machine creation and new or existing hosted service deployments are impacted by this issue. Compute availability is unaffected. Further updates will be published after the mitigation steps are implemented. We apologize for any inconvenience this causes our customers.”

Microsoft also acknowledged separate problems in the North Central U.S. and Southeast Asia regions, beginning at 7:35 AM UTC, affecting the service management of the regions. That problem was fixed by 12:35 PM UTC, although both regions were later affected by the HTTPS issue.

By 9:30 PM UTC, Microsoft discovered that HTTPS operations (SSL transactions) on storage accounts worldwide had been impacted. The problem didn’t end there: by 9:45 PM UTC, the management portal (WindowsAzure.com) and service bus, along with Websites served by Azure, also went down.

A half-hour after that collapse, Microsoft began validating steps to repair the problem. On 4:15 AM on Feb. 23, recovery of the affected clusters actually began, leading to full recovery within a few hours.

A number of commenters on the Windows Azure forums believed that Microsoft had let its SSL certificate lapse—something later confirmed by Microsoft.

“Windows Azure Storage has been affected by an expired certificate,” a spokesman wrote in an emailed statement. We are working to complete the restoration as quickly as possible. We apologize for any inconvenience this has caused our customers.” Microsoft later apologized via Twitter.

In a somewhat-ironic twist of events, Netflix began reporting problems of its own about 7:30 PM ET on Friday night. Its streaming services rely on Amazon’s AWS services, and have suffered outages when Amazon’s U.S. East region has gone down. In this case, however, Amazon’s health dashboard showed all green.

 

Image: Rob Stokes/Shutterstock.com

Post a Comment

Your email address will not be published.