9s don’t have to drop if you increase the time period! “We still guarantee the s...

rvba · 2025-10-20T18:43:41 1760985821

In a company where I worked, the tool measuring downtime was at the same server, so even if the server was down they still showed 100% up.

If the server didnt work - the tool too measure didnt work too! Genius

bityard · 2025-10-20T18:58:28 1760986708

This happened to AWS too.

February 28, 2017. S3 went down and took down a good portion of AWS and the Internet in general. For almost the entire time that it was down, the AWS status page showed green because the up/down metrics were hosted on... you guessed it... S3.

https://aws.amazon.com/message/41926/

CaptainOfCoit · 2025-10-20T20:57:31 1760993851

Happened a couple of times :)

- 2008 - https://news.ycombinator.com/item?id=116445

- 2010 - https://news.ycombinator.com/item?id=1396191

- 2015 - https://news.ycombinator.com/item?id=10033172

- 2017 - https://news.ycombinator.com/item?id=13755673 (Postmortem: https://news.ycombinator.com/item?id=13775667)

- 2024 - https://news.ycombinator.com/item?id=41770111

hinkley · 2025-10-20T23:41:42 1761003702

Five times is no longer a couple. You can use stronger words there.

bapak · 2025-10-21T06:44:58 1761029098

It happened a murder of times.

hinkley · 2025-10-21T17:38:10 1761068290

Ha! Shall I bookmark this for the eventual wiki page?

casey2 · 2025-10-22T10:44:05 1761129845

https://www.youtube.com/watch?v=HxP4wi4DhA0

Maybe they should start using real software instead of mathematicians' toy langs

Scoundreller · 2025-10-20T22:02:22 1760997742

Have we ever figured out what “red” means? I understand they’ve only ever gone to yellow.

kokanee · 2025-10-20T23:03:47 1761001427

If it goes red, we aren't alive to see it

Cthulhu_ · 2025-10-21T08:35:26 1761035726

I'm sure we need to go to Blackwatch Plaid first.

subpar · 2025-10-20T20:19:36 1760991576

obligatory https://x.com/lintzston/status/791761626890469377

belter · 2025-10-20T20:42:59 1760992979

Published in the same week of October ...9 years ago ...Spooky...

decimalenough · 2025-10-20T19:30:12 1760988612

I used to work at a company where the SLA was measured as the percentage of successful requests on the server. If the load balancer (or DNS or anything else network) was dropping everything on the floor, you'd have no 500s and 100% SLA compliance.

conductr · 2025-10-20T19:15:53 1760987753

Similar to hosting your support ticketing system with same infra. "What problem? Nobody's complaining"

hinkley · 2025-10-20T23:40:23 1761003623

I’ve been customer for at least four separate products where this was true.

I can’t explain why Saucelabs was the most grating one, but it was. I think it’s because they routinely experienced 100% down for 1% of customers, and we were in that one percent about twice a year. <long string of swears omitted>

bigiain · 2025-10-20T22:11:14 1760998274

I spent enough time ~15 years back to find an external monitoring service that did not run on AWS and looked like a sustainable business instead of a VC fueled acquisition target - for our belts-n-braces secondary monitoring tool since it's not smart to trust CloudWatch to be able to send notifications when it's AWS's shit that's down.

Sadly while I still use that tool a couple of jobs/companies later - I no longer recommend it because it migrated to AWS a few years back.

(For now, my out-of-AWS monitoring tool is a bunch of cron jobs running on a collections of various inexpensive vpses and my and other dev's home machines.)

6031769 · 2025-10-21T10:28:01 1761042481

Nagios is still a thing and you can host it wherever you like.

bigiain · 2025-10-22T03:21:57 1761103317

Interestingly, the reason I originally looked for and started using it was an unapproved "shadow IT" response to an in-house Nagios setup that was configured and managed so badly it had _way_ more downtime than any of the services I'd get shouted about at if customers noticed them down before we did...

(No disrespect to Nagios, I'm sure a competently managed installation is capable of being way better than what I had to put up with.)

AbstractH24 · 2025-10-21T14:39:15 1761057555

If its not on the dashboard, it didn't happen

echelon · 2025-10-20T18:20:36 1760984436

Common SLA windows are hour, day, week, month, quarter, and year. They're out of SLA for all of those now.

When your SLA holds within a joke SLA window, you know you goofed.

"Five nines, but you didn't say which nines. 89.9999...", etc.

SlightlyLeftPad · 2025-10-20T20:11:25 1760991085

These are typically calculated system-wide, so if you include all regions, technically only a fraction of customers are impacted.

alkhimey · 2025-10-20T20:38:55 1760992735

Customers in all regions were affected…

prmoustache · 2025-10-20T21:48:32 1760996912

Indirectly yes but not directly.

Our only impact was some atlassian tools.

captainkrtek · 2025-10-20T19:31:41 1760988701

I shoot for 9 fives of availability.

dare944 · 2025-10-21T03:38:44 1761017924

5555.55555% Really stupendous availableness!!!

president_zippy · 2025-10-20T23:01:44 1761001304

I see what you did there, mister :P

hamburglar · 2025-10-20T18:42:06 1760985726

I prefer shooting for eight eights.

decimalenough · 2025-10-20T19:30:58 1760988658

You mean nine fives.