Microsoft Sheds Light on Azure’s Storage Growing Pains
Microsoft, like other cloud providers before it, has faced some growing pains on its Azure platform. On November 19, Azure customers began experiencing performance and availability issues on the platform. Virtual machines, websites, and Visual Studio Online were among the impacted services.
The issues were the result of a change made to the storage layer of Azure. The issues were noted at 00:51 AM UTC, and service was completely restored by 11:45 AM UTC. Microsoft provided details in a blog post by Jason Zaner, CVP of the Microsoft Azure Team. http://azure.microsoft.com/blog/2014/11/19/update-on-azure-storage-service-interruption/. While the changes had been tested in a smaller subset of the Azure Storage environment, and provided a significant importance improvement, issues were encountered that had not been seen during testing.
The change was rolled back, but a full restart of the storage front ends was required, which took a number of hours, and led to even more outages for consumers of the Azure platform. Many assure consumers are concerned about the methodology used for testing, since the issue had not been previously encountered. In addition, some customers did not receive notifications from Microsoft after the outage had been detected.
For more information on the outage and how it impacted consumers of the Azure platform, Ben Keeps has provided an excellent discussion http://www.forbes.com/sites/benkepes/2014/11/20/microsoft-delivers-a-post-mortem-the-reasons-behind-the-global-azure-alypse/
Growing Up Fast
Microsoft Azure has clearly been accelerating its position in the public cloud marketplace and with good reason. Microsoft is making big leaps into service offerings that are positioning to one day rival Amazon Web Services, the long-time incumbent in the public cloud space.
Previous outages have occurred and handled in the Azure environment, and AWS is not without its down moments also. The focus on growth will inevitably lead to some situations that could trigger similar situations. We can be sure that the Microsoft team will be extra diligent going forward to prevent further challenges such as this recent one.