Hacker News new | past | comments | ask | show | jobs | submit login

I agree with you. I suppose you could simulate having a large population by maintaining a set of snapshots, but it still seems like not the same thing as a classic evolutionary technique. There is no sane way to do things like crossover I guess, unless you can average the weights from multiple snapshots (a strategy that might not work at all IMO).

I think I have seen research papers addressing the crossover of neural network optimization and EA, but I think direct optimization techniques have been preferred for managing the actual weights. Manipulating the input data is more along the lines of perturbation methods. If you look hard enough you can probably see whatever you want but it's a stretch.

To me, neural network training is a distinct task that draws from many areas of optimization and does not fit neatly under any one umbrella.




Admittedly I haven't dug too deep into the references, including the paper the article refers to--and I'm not too well versed in diffusion models either. Those caveats aside, I'm also having trouble making the analogy, and similarly for me the hangup is about crossover. If I recall correctly, what distinguishes evolution from, say, simulated annealing at varying temperatures, is precisely the crossover operator. It's not obvious where crossover is happening here.


Crossover is not a defining or essential part of an EA. Reproduction with variation under selection pressure is the definition. Crossover is one possible mechanism of reproduction with variation. Simple mutation is another.


I would not call back propagation and similar techniques "mutation". It certainly isn't random in the same sense as other random searches like EA. Maybe the exact mechanism of iteration is not that important as long as some aspect of it is random, but it's still a novel interpretation of EA.


Fair enough, I'm convinced now.


> I suppose you could simulate having a large population by maintaining a set of snapshots, but it still seems like not the same thing as a classic evolutionary technique.

I've got an experiment using this exact technique running on a machine right now, but I wouldn't argue that it is a good alternative for an actual population.

The reason I attempted this is because snapshotting a candidate is an unbelievably cheap operation. You don't have to allocate memory once the snapshot buffer is in place. It's just a bunch of low-level memory block copy operations. I can achieve several million iterations per candidate per hour with this hack, but only at the expense of maintaining a large population.

My conclusions so far are that it is interesting to use a snapshot/restore approach, but it seems to only take you into local minima faster than everything else. Real population diversity is an information theory thing that will implicate memory bandwidth, et. al., and slow things down by 3-4 OOM from where we'd prefer to be.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: