Hacker News new | past | comments | ask | show | jobs | submit login

So the initial “hey, this might be a good idea” implementation performs slightly worse than something that has had literally billions of dollars thrown at it?



The question was "If it's not new or novel, why aren't people using it?". For example, take a look at this paper: https://arxiv.org/abs/1905.11786. It was published 3 years ago, it also does parallel layer-wise optimization, it even talks about being inspired by biology, though the objective function is different from Hinton's. Why aren't people using it? Because it performs worse. It is that simple. Is it an interesting area to explore? Probably. There are millions of interesting areas to explore. It doesn't mean it is worth using, at least yet.


It is slower to the same backprop that we have used for decades now.

No comparisons to AdamW were made.

In fact, this algorithm uses backprop at its core, but propagating through 0 layers.


Very significantly worse.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: