So the initial “hey, this might be a good idea” implementation performs *slightl...

LudwigNagasena · on Jan 12, 2023

The question was "If it's not new or novel, why aren't people using it?". For example, take a look at this paper: https://arxiv.org/abs/1905.11786. It was published 3 years ago, it also does parallel layer-wise optimization, it even talks about being inspired by biology, though the objective function is different from Hinton's. Why aren't people using it? Because it performs worse. It is that simple. Is it an interesting area to explore? Probably. There are millions of interesting areas to explore. It doesn't mean it is worth using, at least yet.

ekleraki · on Jan 12, 2023

It is slower to the same backprop that we have used for decades now.

No comparisons to AdamW were made.

In fact, this algorithm uses backprop at its core, but propagating through 0 layers.

oneoff786 · on Jan 12, 2023

Very significantly worse.