There's no great answer to this question. It is a bunch of tricks. Fundamentally...

leminimal · on July 2, 2023

Thanks for answering, what you wrote here is exactly the sort of thing I'm talking about. Something implicit that's known but not obvious if you look at the first few lectures of the first few courses (or blogs or announcements, etc).

You mention bag of tricks and that's indeed one issue but its worse than that because it includes knowing what "silent problems" needs a trick applied to it in the first place!

Indeed, despite using vectors everywhere, NN are bad with numerical input encoded as themselves! Its almost like the only kind of variables you can have are fixed size enums. That you then encode into vectors that are as far apart as possible, and unit vectors ("one hot vectors") do this. But that's not quite it and sometimes you can still some meaningful metric on the input that's preserved in the encoding (example: word embeddings). And so its again unclear what you can give it and what you can't.

In this toy example, I have an idea of what the shape of the solution is. But generally I do not and would not know to use a base 15 encoding or to send it the last 5 (or 15) outputs as inputs. I know you already sort of addressed this point in your last few paragraphs.

I'm still trying out toy problems at the time so it might be a "waste" of your time to troubleshoot these but I'm happy to take you up on the offer. HN doesn't have PMs though.

Do you remember when you first learned about the things you are using in your reply here? Was it in a course or just asking someone else who worked on NN for longer? I learned through by googling and finding comment threads like these! But they are not easy to collect or find together.

candiodari · on July 2, 2023

(I've added an email to my profile. I hope you can see it. Feel free to flick me an email or google chat me)

jsight · on July 2, 2023

> This may be catastrophically wrong. I only have a master's in machine learning (a European master's degree, meaning I've written several theses on it (didn't pass first time, had to work full time to be able to study), and I was writing captcha crackers using ConvNets in 2002. But I've never been able to convince anyone to hire me to do anything machine learning related.

Oh wow, those are great credentials. I'm surprised that you haven't run across a position yet. Maybe it is a matter of your location? It seems like a lot of these jobs want onsite workers, which can be a real problem.

TBH, I get the feeling that a lot of us without such credentials are in a similar position right now. Slowly trying to work our way towards what seems to be a big new green field, but having a really unclear path to getting there...