The question was "If it's not new or novel, why aren't people using it?". For example, take a look at this paper: It was published 3 years ago, it also does parallel layer-wise optimization, it even talks about being inspired by biology, though the objective function is different from Hinton's. Why aren't people using it? Because it performs worse. It is that simple. Is it an interesting area to explore? Probably. There are millions of interesting areas to explore. It doesn't mean it is worth using, at least yet.