Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I always wonder how many system crashes that we put on the software or the OS are actually just sub optimal components. Computers are so complex and so fast that just a little bit of instability can probably lead to data corruption.


The optimist in me hopes that the bullwhip effect will lead to cheap ram in a few years, and that the glut allows for the wider adoption and support of ECC memory.


I’d just like to see a repeat of the glut of HBM-backed processors like the Xeon Max 9480 that dipped as low as 900/cpu, about 2000ish all in, and with bandwidth that compares favorably to a 3090.


Not just wholesale crashes, but all sorts of misbehavior. For example, cheap WiFi/BT/ethernet can wreak havoc on your connectivity and out of spec USB peripherals can cause all sorts of problems. Both can bring sleep/power saving problems.

Most people using computers aren't technical enough to be able to discern these things, however, and many buy the cheapest thing on the shelf and so these subpar components persist.


Sometimes it's both. I had some crazy data corruption problems that turned out to be a one-two punch of a buggy anti-cheat driver from a game I was playing and a defective M.2 SSD slot on my motherboard. Without the combination of both factors everything was fine, but when I played the game with that slot populated, the disk in that slot started getting corrupted and failing to respond to requests from the OS (eventually hanging the system).

Wild troubleshooting adventure.


My sympathies. I've had to track some of those sorts of things down and sometimes you wonder about your sanity.


I've been building computers for my friends and I for 25 years and the two worst "random stability" issues I had were high quality but aging PSUs.


Yup. When building "upcycled" PCs out of used second-to-last-gen components, I learned very quickly to only ever use brand-new, high quality PSUs ... the alternative is insanity


My anecdotal experience over the last 15 years of personal PCs.

I've had one case of Corsair memory which went faulty after a year (was replaced without question by the supplier) and around 3 PSU failures.

However, on the 3 times I've done upgrades (typically motherboard + RAM + CPU) in that time I've been able to keep my existing PSU without stability issues.

So I wouldn't say it's "insanity" to keep your current PSU when upgrading, but based on your experience if I had stability issues it may be the first thing I test.


Yea, I've ran bunch of office PC's with nearly 20 year old components 24/7 without any stability issues (acting space heaters doing CPU intensive tasks in winter).

No need to replace a quality PSU until you start having issues.


It has been 25 years, but back in college I had a job refurbishing and repairing PCs. Most problems were caused by cheap no name hardware. Most the quality hardware rarely had problems.


Maybe when quality hardware has problems, the owner knows how to deal with it, but when no-name hardware has problems, the owner has no clue how to build a computer


Maybe. But then again, as someone who dual boots, I see one of the OS crashing and giving an alround worse experience then the other, on the same exact hardware, while the other just chugs along.

Now, I'm not someone good at maths or physics, so maybe, somehow, it's actually more likely than not that the worse OS gets to run when there's worse solar activity going on or whatever else has en effect on my hardware, which also doesn't seem to affect memtest for some reason.

But the likelihood can't be that high. Can it?


It could easily be flakey hardware and different drivers. Not necessarily better or worse, but one driver cause the hardware to ocassionaly fail in exciting ways, like DMA to the wrong address if jusy the right access patterns happen.

If you've got an IO-MMU and everything aligns properly, devices can't DMA to the wrong place anymore, which might make it easier to track things down.


Even Linus Torvalds has said that Windows probably has a worse reputation for stability than it ought to simply due to people having bad RAM.


Given the number of computers now it's practically a certainty. One of the big challenges with diagnosing non-ECC RAM is that you can't directly measure it so you're left trying to eliminate issues elsewhere first and measuring symptoms or stress testing to determine if RAM is the most likely failure, meanwhile the system literally can't remember correctly. And that's if you're actively looking to diagnose, as opposed to accepting some gremlin in the machine.


I think it's crazy that we still use non-ECC RAM.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: