This article contains a fundamental flaw. It estimates the upset rate in memory ...

jessriedel · on Feb 8, 2010

I don't understand your first paragraph. Shouldn't the cosmic ray flux through each bit be the same for each bit, and unchanged as you increase the total amount of memory? If I'm a given bit, should having a second stick of RAM 10 inches away affect whether I flip during a given time period?

InclinedPlane · on Feb 8, 2010

Shouldn't the cosmic ray flux through each bit be the same for each bit, and unchanged as you increase the total amount of memory?

For identically manufactured ram, generally yes. The total number of upsets you'll see from a collection of 10 sticks of RAM will be roughly 10x higher than from 1 stick of RAM. However, there are huge variations in RAM, especially when you're comparing modern RAM to RAM manufactured in, say, 1988. The main figure used in the article (1.3e-12 upsets/bit/hour) comes from a study of a Cray Y-MP 8 system that had a main memory system containing approximately 32,000 SRAM chips. This amount of memory is measured in cubic meters, yet today the same number of bits of RAM fits on half or a quarter of a single DIMM.

Suffice it to say, the cosmic ray flux through the Cray Y-MP 8's main memory system and through half of a 2GB DIMM is significantly different, by orders of magnitude. At the same time, the memory cell in the Y-MP 8 and the memory cell in a 2GB DDR2 DIMM will have a different rate of sensitivity to cosmic ray flux, translating to a different rate of upsets for the same rate of neutron flux per memory cell. However, these two factors don't balance each other out, modern memory cells aren't thousands of times more sensitive to cosmic rays even though they take up thousands of times less space. The result is that a figure of upsets/bits/year can only be taken to be constant so long as the memory technology remains constant. That is most decidedly not the case here. If one were using 4GB of Cray Y-MP ram (which would likely fill an entire server rack, and more) perhaps you'd see the SEU rates the author calculates. However, most folks these days are using 4GB of RAM in 2 tiny DIMMs which may have, at most, a combined cross-sectional area (of the actual memory chips) of at most maybe 16 cm^2. This has non-trivial effects on the SEU rate.

jessriedel · on Feb 8, 2010

Oh, OK. I guess then I wouldn't say that upsets/bit/hour is an incorrect unit. (It's clearly what you want to know to calculate the chance of error for a given piece of RAM.) It's just that this parameter varies across time and manufacturers. Using the value from a particular model of RAM manufactured in 1988 is sure to lead to wrong conclusions.

Thanks.

ynniv · on Feb 8, 2010

The rate of cosmic ray intersection is most likely directly proportional to the physical cross section of the silicon. The author uses per-bit empirical data from 10 years ago, when memory was 100 times less dense, and then extrapolates to the present. It would likely be more correct to use a per-chip (or per cm^2) rate.

adamc · on Feb 8, 2010

I think his point was that as you get a higher number of bits per volume, the number of flips per GB would go down, basically because for a given number of bits, a higher density would imply they are exposed to less flux.

Tuna-Fish · on Feb 8, 2010

Based on pure gut feeling, I doubt that sensitivity has increased at all for at least a decade now. Anything that penetrates into the casing is either something that will not interact with it, or has enough energy to flip a bit both in a 180nm and 32nm process.