All 3 hyperscalers have vulnerabilities in their control planes: they're either ...

tbrownaw · 2026-02-03T04:30:38 1770093038

> It means that any service designed to survive a control plane outage must statically allocate its compute resources and have enough slack that it never relies on auto scaling. True for AWS/GCP/Azure.

That sounds oddly similar to owning hardware.

ragall · 2026-02-03T05:06:50 1770095210

In a way. It means that you can get new capacity most often, but the transition windows where a service gets resized (or mutated in general) has to be minimised and carefully controlled by ops.

everfrustrated · 2026-02-03T04:50:18 1770094218

This outage talks about what appears to be a VM control plane failure (it mentions stop not working) across multiple regions.

AWS has never had this type of outage in 20 years. Yet Azure constantly had them.

This is a total failure of engineering and has nothing to do with capacity. Azure is a joke of a cloud.

mirashii · 2026-02-03T05:02:54 1770094974

AWS had an outage that blocked all EC2 operations just a few months ago: https://aws.amazon.com/message/101925/

jamesfinlayson · 2026-02-04T00:28:15 1770164895

Yeah I remember one maybe four years ago? Existing workloads were fine but I had to go and tell my marketing department to not do anything until it was sorted because auto-scaling was busted.

everfrustrated · 2026-02-03T06:11:17 1770099077

This was the largest AWS outage in a long long time and was still constrained to a single AWS region.

Which is my point.

The same fault on Azure would be a global (all-regions) fault.

ragall · 2026-02-03T05:04:49 1770095089

I do agree that Azure seems to be a lot worse: its control plane(s) seems to be much more centralized than the other two.