Despite what the article says, if you're using floating point numbers you're already using sloppy arithmetic. That's not just a sarcastic point, it's actually important; given that you're already not being precise it isn't necessarily surprising that you can trade away even more precision for speed, rather than it being some sort of binary yea/nay proposition that cracks numeric algorithms wide open.
"Off by 1% or so" leads me to guess it is implemented by using 8-bit numbers, and not necessarily with any particular sloppiness added by the chips, just the fact that the precision is small. Visual and audio processing could be set up in such a way that you wouldn't overflow those numbers because you know precisely what's coming in. You'd have to be careful about overflow and underflow but, per my first paragraph, you already have to be careful about that. It also makes sense in such apps that silicon would be more profitably used computing more significant bits more often rather than dithering about getting the bits in the 2^-50 absolutely precisely correct, a good insight. I don't know if that's what they're doing because it's hard to reverse engineer back through a science journalist but "8 bit math processors in quantity" -> "science journalist" -> "this article" is at least plausible.
In summary: Low-precision high-dynamic range arithmetic (floating point with small mantissa, ~1% error) uses ~100x less power and die area than IEEE floating point. The errors are acceptable for a huge class of applications (basically anything you'd consider running on a GPU today).
I don't quite understand how he gets a 10,000x speedup from a 100x transistor count decrease. Does die area increase with the square of transistor count?
What he's doing is representing numbers with their logarithms, with limited precision. A floating-point multiplier/divider, then, turns into a fairly small adder, which is much smaller and faster. Square roots and squaring turn into bit shifting. They have some clever method for doing addition/subtraction efficiently. And since they can fit all this in a small area with short critical paths, they can clock it very, very fast, and include a lot of them on a chip.
Well, he says it's ~100x faster than a GPU, and GPUs are ~100x faster than CPUs (in the applications for which they are suited), so the 10,000x figure is the speedup from CPUs.
Science journalism could solve this problem if they just provided a goddamned citation to the relevant paper or talks. We're not asking for them to print nothing but transcripts. I'm tired of having to poke people's and institution's names into google with a couple of likely keywords and hoping for the best.
>Despite what the article says, if you're using floating point numbers you're already using sloppy arithmetic. That's not just a sarcastic point, it's actually important; given that you're already not being precise it isn't necessarily surprising that you can trade away even more precision for speed, rather than it being some sort of binary yea/nay proposition that cracks numeric algorithms wide open.
So it appears in this case that they are doing what you guess: low-precision, high-dynamic-range calculations using normal computing hardware. But it's not at all clear that errors introduced by floating-point calculations (which are rounding errors, not physical errors) are equivalent--from an engineering perspective--to hardware which accepts physical errors in exchange for performance boosts. It's conceivable that future computing technology will yield dramatic speed advantages on error-prone hardware in a way that no numerical tricks on error-free hardware could duplicate.
I disagree. Turing's original paper where he introduced the Turing machine addressed the question of whether a discrete machine is interchangeable with an analog mind; Turing's argument is that even if the states are continuously variable there comes a point where two states are too close to be distinguished by the mind itself, and therefore it is permissible to say that the number of states IS finite and can be modeled by a discrete state Turing machine. Now put Shannon's information theory in the mix: a channel, even a sloppy one, has a bit rate and an error rate. My intuition is that whatever speed increase you get from allowing sloppy cirfuits will be matched by an increased error rate. If you trade off error rate of a sloppy channel against using fewer bits of precision in a discrete/non error prone channel, you should get the same effective information rate.
All of these claims of increasing performance by sacrificing precision are achieving it by ignoring the tradeoff between rate and precision. It would be the same as if I claimed to speed up a lossless compression algorithm by discarding every other byte while measuring data rate as if I were not losing information.
My point is any tradeoff you can make by losing information in hardware, you can make the same information tradeoff in software.
I understand that an analog computer can be simulated by a digital computer and vice versa, given enough error correction. That doesn't mean that the speed-up scaling will be the same.
This isn't something that can be answered purely mathematically (with a Turing machine argument); it depends on the specific physics and engineering realities. The trade-off between precision and error rate is determined by math, but the trade-off between error rate and speed is determined by physics.
I have no knowledge on circuit design, but you should play carefully with randomness. For most algorithms, 1% imprecision with normal/uniform distribution may not be a big deal. But if the distribution is irregular, you may have some trouble since the errors add up.
According to the presentation linked above, you've got to look carefully at each application, to see how it holds up when you decrease floating-point precision like this. But the list of applications where this would work is very impressive.
I would dearly like to go back several years and explain this to the authors of a transaction processing system I once worked on. It stored all its currency values (for very small units sold in large quantities) with floating point data types....
The chip architecture described in the article reminds me of the DEC MasPar system [1] we had at uni back in the mid-90s. 2048 processors (IIRC), where each processor could only communicate directly with it's 8 neighbours. If you wanted to get decent performance out of it, you had to think carefully about you were going to get your data onto each of the processors.
This would be beautiful for protein folding. That particular application is extremely parallel, numerically heavy, and should tolerate the loss of precision very well. It also eats up processing power like a black hole, so a few orders of magnitude speed improvement would definitely be nice.
Based on my experience, different stages of folding should require differing precision.
For ClusPro (protein docking, http://dx.doi.org/10.1002/prot.22835), we the first stage are rough energy functions for global sampling of the protein surface. For these functions, we use floats because it is a rigid body sampling and is very tolerant to clashes. However, when it comes to the minimization/refinement stage, we have seen weird things happen with floats and instead use doubles.
Similarly, the functions used in early stages of protein folding can probably deal with loss of precision, but the stages for producing high quality structures would not.
Any type of physics simulation, from protein folding to FEM to real-time game physics, plus image/video processing, graphics rendering, computer vision, speech recognition, neural networks, ...
"Off by 1% or so" leads me to guess it is implemented by using 8-bit numbers, and not necessarily with any particular sloppiness added by the chips, just the fact that the precision is small. Visual and audio processing could be set up in such a way that you wouldn't overflow those numbers because you know precisely what's coming in. You'd have to be careful about overflow and underflow but, per my first paragraph, you already have to be careful about that. It also makes sense in such apps that silicon would be more profitably used computing more significant bits more often rather than dithering about getting the bits in the 2^-50 absolutely precisely correct, a good insight. I don't know if that's what they're doing because it's hard to reverse engineer back through a science journalist but "8 bit math processors in quantity" -> "science journalist" -> "this article" is at least plausible.