Hacker News new | past | comments | ask | show | jobs | submit login

But we still generalize it to bigger and bigger without seeing matching brace problems of all sizes we can handle.

In the paper I don't think they are looking for perfect, but they show which models can't seem to learn significantly beyond the size of examples they explicitly saw.




The {specific (number I (messed up was a depth of 3)}.

I missed it because it was spaced over a few lines and I was tired.


But not because you only trained up to examples of two.


I don't know what point you're making, GPT-3 almost certainly has training examples for greater depth.

Pattern recognition can fail. Mine does, GPT's does.

I count further than I subitize, but from experience my mind may wander and I may lose track after 700 of a thing even if there's literally nothing else to focus on.

But I still didn't notice an error at depth 3.


It seems transformers don't generalize in this class and neural turing machines and neural stack machines do.

You get the general rule (in linguistics known as competency) but can't flawlessly do it (performance). Transformers can't seem to get competence here.

https://en.m.wikipedia.org/wiki/Linguistic_competence


Ah, gotcha. Thanks for the clear explanation and the link! :)




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: