Recent AWS outage
If you’re running any kind of service hosted on Amazon Web Services, you’re likely to have heard about their recent service outage on 9/20/2015. It seems to have started around 2:30am PST and was fully resolved by 8:30am PST.
As usual, an outage of this magnitude encouraged many to enlighten all of us with how we should have architected our infrastructure differently.
Or you could just run in multiple regions. hn
While a good idea in theory, running your service in multiple regions is far from being easy and might not make economic sense for most services. Working around the increased latency requires significant engineering efforts and guaranteeing consistency becomes another herculean effort. It can obviously be done, but one would be wise to not underestimate the effort.
In the end, your availability is a business concern:
Systems fail. Lessons are learned. This particular failure won't happen again & if you can't tolerate downtime you need to be multi-region
— @garnaat
I believe it is perfectly fine for most services to experience occasional downtime. Ideally each incident would lead to a more resilient system but it is not realistic to assume that most of them be available in the face of a major outage of one of its major provider.
You own your availability (and so does Amazon)
— @tyler_treat
As for timing, it might have ruined your Sunday, but think about how bad it would have gotten if it happened on a Monday morning.