But we still generalize it to bigger and bigger without seeing matching brace pr...

ben_w · on Jan 23, 2023

The {specific (number I (messed up was a depth of 3)}.

I missed it because it was spaced over a few lines and I was tired.

cma · on Jan 23, 2023

But not because you only trained up to examples of two.

ben_w · on Jan 23, 2023

I don't know what point you're making, GPT-3 almost certainly has training examples for greater depth.

Pattern recognition can fail. Mine does, GPT's does.

I count further than I subitize, but from experience my mind may wander and I may lose track after 700 of a thing even if there's literally nothing else to focus on.

But I still didn't notice an error at depth 3.

cma · on Jan 23, 2023

It seems transformers don't generalize in this class and neural turing machines and neural stack machines do.

You get the general rule (in linguistics known as competency) but can't flawlessly do it (performance). Transformers can't seem to get competence here.

https://en.m.wikipedia.org/wiki/Linguistic_competence

ben_w · on Jan 23, 2023

Ah, gotcha. Thanks for the clear explanation and the link! :)