Hacker News new | past | comments | ask | show | jobs | submit login

There's no great answer to this question. It is a bunch of tricks. Fundamentally:

If you're saying FizzBuzz doesn't work, presumably you mean that encoding the n directly doesn't work. Neither does encoding n from 0 to 1 or between -1 and 1 (and don't forget: obviously don't use relu with -1 to 1). It doesn't.

Neural networks can do a LOT of things, but they cannot deal with numbers. And they certainly cannot deal with natural or real numbers. BUT they can deal with certain encodings.

Instead of using the number directly, give one input to the neural network per bit of the number. That will work. Just pass in the last 10 bits of the number.

Or cheat and use transformers. Pass in the last 5 generations and have it construct the next FizzBuzz line. That will work. Because it's possible.

To make the number-based neural network for FizzBuzz "perfect" think about it. The neural network needs to be able to divide by 3 and 5. They can't. You can't fix that. You must make it possible for the neural network to learn the algorithm for dividing by 3 and 5 ... 2, 3 and 5 are relative primes (and actual primes). So "cheat" and pass in numbers in base 15 (by one-hot encoding the number mod 15 for example).

PM me if you'd like to debug whatever network you have together over zoom or Google meets or whatever.

https://en.wikipedia.org/wiki/One-hot

This may be catastrophically wrong. I only have a master's in machine learning (a European master's degree, meaning I've written several theses on it (didn't pass first time, had to work full time to be able to study), and I was writing captcha crackers using ConvNets in 2002. But I've never been able to convince anyone to hire me to do anything machine learning related.




Thanks for answering, what you wrote here is exactly the sort of thing I'm talking about. Something implicit that's known but not obvious if you look at the first few lectures of the first few courses (or blogs or announcements, etc).

You mention bag of tricks and that's indeed one issue but its worse than that because it includes knowing what "silent problems" needs a trick applied to it in the first place!

Indeed, despite using vectors everywhere, NN are bad with numerical input encoded as themselves! Its almost like the only kind of variables you can have are fixed size enums. That you then encode into vectors that are as far apart as possible, and unit vectors ("one hot vectors") do this. But that's not quite it and sometimes you can still some meaningful metric on the input that's preserved in the encoding (example: word embeddings). And so its again unclear what you can give it and what you can't.

In this toy example, I have an idea of what the shape of the solution is. But generally I do not and would not know to use a base 15 encoding or to send it the last 5 (or 15) outputs as inputs. I know you already sort of addressed this point in your last few paragraphs.

I'm still trying out toy problems at the time so it might be a "waste" of your time to troubleshoot these but I'm happy to take you up on the offer. HN doesn't have PMs though.

Do you remember when you first learned about the things you are using in your reply here? Was it in a course or just asking someone else who worked on NN for longer? I learned through by googling and finding comment threads like these! But they are not easy to collect or find together.


(I've added an email to my profile. I hope you can see it. Feel free to flick me an email or google chat me)


> This may be catastrophically wrong. I only have a master's in machine learning (a European master's degree, meaning I've written several theses on it (didn't pass first time, had to work full time to be able to study), and I was writing captcha crackers using ConvNets in 2002. But I've never been able to convince anyone to hire me to do anything machine learning related.

Oh wow, those are great credentials. I'm surprised that you haven't run across a position yet. Maybe it is a matter of your location? It seems like a lot of these jobs want onsite workers, which can be a real problem.

TBH, I get the feeling that a lot of us without such credentials are in a similar position right now. Slowly trying to work our way towards what seems to be a big new green field, but having a really unclear path to getting there...




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: