Hacker News new | past | comments | ask | show | jobs | submit login

Yeah .. but his first impulse was to write backprop from scratch. I saw the lectures, been dabbling with NN for years, and I never thought to do it. I always thought the Stanford people made you do it on assn 1 to pay your dues or something. I continue to think of Carmack as the Master hacker.



> his first impulse was to write backprop from scratch

backprop is a very simple algorithm, nothing to fear there. The problems are to calculate the derivates if you want to be flexible building your model. But for feedforward networks with sigmoid activation, the equations to update the weights are a joke.


I'm not sure we agree on the definition of very simple. It maybe very simple if you already know it...


https://m.youtube.com/watch?v=i94OvYb6noo&feature=youtu.be&t...

You won't regret. One of the best explanation of backprop on the internet.


Backpropagation is based on linear optimization (aka calculate the maximum or minimum of a function based on the derivades of that function, this is taught before university in my country). And also in the chain rule to calculate the derivades of functions (first year of university).

But I meant, if you see the equations and the steps without understanding completely the insights, it is a joke of an algorithm. It just does some multiplications and applies the new gradients, move to the previous layer and repeat.


It's been a while, but I remember backprop starting at the end of the neural net, and working backwards. Each weight that contributed to a wrong answer had its weight value weakened or even reversed by some small factor. And each weight that contributed to a correct answer had its weight value strengthened by some small factor.

So it's probably as simple as

newWeight = oldWeight +/- (stepValue * someFactor)


exactly, the someFactor is the error of the next layer calculated with the derivades of the functions (as you would calculate the minimum of a funtion using the derivades). The tricky part is to calculate the derivades, but since auto differentiation we can do a lot of cool stuff.


The differentiation is probably why I wouldn't have bothered to hack it myself. Curious how you/others would tackle it? What do you mean by auto differentiation?





Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: