Hacker News new | past | comments | ask | show | jobs | submit login

Complaining about "slow to reproduce" and talking _seconds_. Dear, oh dear those are rookie numbers!

Currently working a bug where we saw file system corruption after 3 weeks of automated testing, 10s of thousands of restarts. We might never see the problem again, even? Only happened once yet.






If it only happened once... it might be the final category of bugs where nothing you can do will fix it. Cosmic ray bit flipping bug. Which is something your software needs to be able to work around, or in this case, the file system itself... unless you're actually working on the file system itself, in which case, I wish you good luck.

What layers of hardware can comic rays impact? Memory with ECC is largely safe, right? What about the L1 cache and friends?

Anything can fail, at any time. The best we can do is mitigate it and estimate bounds for how likely it is to mess up. Sometimes those bounds are acceptable.



Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: