Cert Expiration is a problem that needs a better solution when a company does not renew it. These were internal certificates. Still important but not user-facing.
One possible solution might be having the client introduce an artificial delay of 10 seconds or some other time when it encounters an expired cert, or adds an additional second of delay for every day it is expired. This degrades the connection but does not immediately break anything.
Oh please no; give me a hard fail I can localize and fix rather than some kind of awful brownout where various parts of the system just go slow and break things just as badly anyway.
Plus you'd need to be way in the guts of the TLS implementation to achieve this; if you're already there, start generating noise a week ahead of the expiration instead.
Concur. From working at Basho, one key takeaway with distributed systems is that a hard failure is much easier to remediate than a slow machine.
We wanted a database server to fail hard. Running slowly just caused cascading failures.
Of course, in this case you're effectively talking about the entire cluster crashing hard, but that's still easier to cope with than every system responding at a snail's pace.
I agree that automation is ideal. But let's face it: most companies haven't.
The goal of a business is not to have perfect engineering practices. It is to fulfill customer requests. When there is an outage in the middle of the night, I'd argue that a degraded system buys time to address the issue.
Regardless of the mechanism, having a sudden, complete breakage is not ideal for a business.
One possible solution might be having the client introduce an artificial delay of 10 seconds or some other time when it encounters an expired cert, or adds an additional second of delay for every day it is expired. This degrades the connection but does not immediately break anything.