Analog computers (aka: Op-amps) are incredibly cool. But there's one major, major problem with them: Digital electronics introduce a severe amount of noise, lowering their accuracy and applicability.
Its not hard to make a fast analog computer out of transistors (Indeed: transistors are innately analog devices). What is hard, is to have that analog computer work with a digital computer on the same power network... especially if you want the analog computer's noise floor to be usable. The noise levels are just too much.
--------
Because your values are stored as the physical voltage on various devices, you may write 1.05V... but maybe when you read it, its 1.04V due to digital noise (~1% error in this case). Is that acceptable?
8-bits is just 2-digits of accuracy. 16-bits is ~5 digits, still possible. 32-bits (12 digits of accuracy) is a pipedream.
However, programmers have been lowering the accuracy of neural net programs / computers lower and lower. Apparently, its a field of compute where accuracy is just not that important. 2 or 5 digits of accuracy (8-bit compute or 16-bit compute) is acceptable, with so many Tensorflow programs using BFloat16 units or smaller.
Is there any reason that a device which smooths signal could not be introduced in the circuit, separating the analog computer from the rest of the digital electronics on the same circuit.
Electronics is not my studied field so I may be missing some things, but this seems like a rather simple problem for an EE to handle, no?
> Is there any reason that a device which smooths signal could not be introduced in the circuit, separating the analog computer from the rest of the digital electronics on the same circuit.
When you "smooth out 4GHz signals", it means you ignore 4GHz signals.
Which means you no longer can operate at 4GHz. Instead, you operate at 1GHz (or slower). Any form of "smoothing" literally cuts your processing rate down, and now the digital circuit is just way faster than you.
--------
Analog circuits smooth out 60Hz (power-line noise), and 30,000 Hz (VRM noise / Power-supply noise) all the time and try to operate at higher frequencies where those noises are non-existant.
In these cases, you can use a "high pass filter" (smooth out low-frequencies)... which is roughly a memory-limiter. (If you forget everything every 0.008 seconds, then 60Hz noise (aka: noise that only occurs every 0.016 seconds) will be forgotten.
--------
The low-pass filter (smooth out the high-frequencies) is the opposite, its an averaging / smoothing filter that you're more familiar with. If you average together all values every 0.01 seconds, you'll not be able to see 1000Hz information (aka: you have a 100Hz low-pass filter. High frequencies are averaged away, so the noise in the 1000Hz or 10,000Hz band is lost).
But it also means all information in the 1000Hz or 10,000Hz+ bands are also lost.
------
Not that I ever dealt with GHz level circuits mind you. I'm just electronic hobbyist, who deals with MHz level issues. I'm sure GHz levels have all sorts of weird problems.
But analog devices are quite fun to use and design. They're incredible... cheap and effective, no coding required. I do recommend people to play with Op-Amps and get your derivatives, integrals, multipliers, logarithms, addition and subtraction circuits all set up. You can do a lot of math at near instantaneous speeds (nanoseconds level delay) all analog.
An Op-Amp with 800MHz gain/bandwidth (such as LTC6228: https://www.analog.com/media/en/technical-documentation/data...) can probably operate over 100MHz signals easily. Or in digital terms: that's a "computer" that can calculate (derivative/integral/multiplier/logarithm/addition/subtraction) every 10-nanoseconds... roughly the speed of a modern L2 cache lookup.
That's fast. And "OpAmps" are kind of the easy-mode circuit.
Digital errors are due to the clock. They pull the voltage down on each clock tick. Then all the circuits use power, and then everything settled until the start of the next clock tick.
This error is high frequency with a huge number of harmonics (square wave), meaning it transmits as a radio wave very effectively, propagating the error to be correlated on many different analog parts.
Sure, but it’s effectively random which way the model learns the parameter impact and how values are used by subsequent layers. I’m assuming layers in the NN. Adding noise with things like dropout encourages the model to ‘grow’ redundancy.
Hmmm... I see you're coming in from another angle but its also related to noise.
-------
I guess I was relating analog-noise to quantization noise of the FP16 format. For example, the number 4098 cannot be represented in FP16, you round that number to 4096 or 4100.
This is seen as "noise" perhaps in your perspective, but I'm seeing it as "loss of precision". From my perspective, neural nets are clearly able to work even with low-precision (ie: numbers like 4098 being rounded to 4096 in every step of every calculation). "Noise" in my model is this "number shifting" effect.
So from my side of things: neural nets seem to work with 8-bits or 16-bits of precision. And analog circuits are able to achieve that level of precision in practice (though its called 2-digits of precision or 5-digits of precision in the analog-world, maybe SNR ratio of 24db or 48db).
---------
But from your perspective, noisy inputs are themselves a strategy that people use to train neural nets with. Literally adding noise and then compensating with a larger model / more training.
Maybe its all noise... one artificial, the other caused by physical limitation (quantization noise is... noise after all), but it was a bit confusing for a post or two because I think we were coming in from different perspectives for a second.
It is counter intuitive. Noise is often an essential part to training, especially when it comes to generalization. Sometimes there is enough noise in the training data to be sufficient.
Controlling how much noise and where it goes so that you end up with something usable is still a bit of an art. Not just in designing the model but also the training and distillation process. It’s a pain in the ass, but if the perf is there it can be worth it.
I’m inference speed bottlenecked so I’m keeping an eye on this sort of tech.
I've done enough neural net stuff to know that noise can knock you out of local minima / local maxima. Well, not even neural nets... machine learning in general benefits.
But there's a difference between noise that you can control (ie: on during training and you can increase/decrease at will) and noise that's fundamental to the system and uncontrollable.
Still, since all neural nets show a resilience to a degree of noise, maybe that makes analog-circuits suddenly viable.
The problem I'd see is that the raw randomness of electronics is not the nice, uniform distribution that indeed may help NNs learns. It can be, often is, biased, skewed, time-varying and just about anything else. It could vary from one part of the chip to another, one chip to another and so-forth. The various circuits exists to turn a wide range of noise into digital ons and offs. Which means the raw circuits can have this wide range unless controlled.
It's worth considering that no one uses raw electronic noise for random/pseudo-random number generators. And that's an application that's in demand.
Dropout, the example given, is not a normally distributed error - more like a salt noise. The impact on subsequent layers isn’t Normal either. My point is certain error modes other than normally distributed may not be so bad. E.g. skew as roughly half of the redundant paths for information will be inverted in the latent space.
I’m not a hardware guy, but in my experience DNNs can be made rather resilient. To me it looks like it could work. If it does great.
Its not hard to make a fast analog computer out of transistors (Indeed: transistors are innately analog devices). What is hard, is to have that analog computer work with a digital computer on the same power network... especially if you want the analog computer's noise floor to be usable. The noise levels are just too much.
--------
Because your values are stored as the physical voltage on various devices, you may write 1.05V... but maybe when you read it, its 1.04V due to digital noise (~1% error in this case). Is that acceptable?
8-bits is just 2-digits of accuracy. 16-bits is ~5 digits, still possible. 32-bits (12 digits of accuracy) is a pipedream.
However, programmers have been lowering the accuracy of neural net programs / computers lower and lower. Apparently, its a field of compute where accuracy is just not that important. 2 or 5 digits of accuracy (8-bit compute or 16-bit compute) is acceptable, with so many Tensorflow programs using BFloat16 units or smaller.
As such, maybe neural nets can be done in analog.