Case Study on the Cost of Downtime: Amazon S3 Outage

Posted July 31st, 2008 by Jonah Paransky

aws

What Happened

On July 20th, Amazon’s S3 service offerings experienced a wide scale service outage. As a primary cloud based infrastructure, the outage disrupted a wide variety of websites, users and providers. The outage was heavily publicized in the mainstream media and in the blogosphere. Unlike the February S3 outage, Amazon provided significant detail about the status of S3 service availability.

Reactions

In general, the reaction to the outage seemed to coalesce around a common theme – that cloud computing is not yet ready for primetime. Some examples of prominent coverage from this perspective included:

Alternative voices were also heard. Michael Krigsman at IT Project Failures took the position that,

Customers hate outages, but accurate and responsive status reporting does help the ease the pain. Kudos to Amazon for learning from past mistakes.

Web Worker Daily takes a nuanced view, pointing out that if you require a high level of uptime, you may need a backup to S3. Mike also points out though that the rates are hard to beat and that S3 will continue to be attractive to many providers on the Internet.

s3 Our Perspective

Communication after a downtime incident occurs:

Amazon deserves significant credit for the communication approach they took during and after the outage. After their downtime incident back in February , Amazon began providing detailed transparent information about service delivery. They also performed a significant public post mortem, rightly praised by Michael Krigsman and others as demonstrating significant maturity. This was a good example of the application of the Seven Key Lessons to Keep in Mind When Communicating an IT Failure.

Cost of Downtime:

As with downtime incidents of infrastructures used by many, the cost of the outage was significant, both to Amazon as well as the myriad of vendors who depend of the service. SLA payouts are likely due and organizations concerned about downtime are likely looking at backup options to S3 dependence.

Lessons Learned:

The Amazon S3 outage provides a number of good lessons for IT operations professionals.

Popularity: 5% [?]

Filed Under: Business Continuity, Cloud Computing, Downtime


Leave a Comment