From quizzing it a bit it has good knowledge but limited reasoning. For example ...

apitman · 2024-09-26T05:21:13 1727328073

> it oversold Rust a bit and claimed Python had issues it doesn't actually have

So exactly like a human

fennecfoxy · 2024-09-26T12:24:13 1727353453

Well the feathers heavier than lead thing is definitely somewhere in training data.

Imo we should be testing reasoning for these models by presenting things or situations that neither the human or machine has seen or experienced.

Think; how often do humans have a truly new experience with no basis on past ones? Very rarely - even learning to ride a bike it could be presumed that it has a link to walking/running and movement in general.

Even human "creativity" (much ado about nothing) is creating drama in the AI space...but I find this a super interesting topic as essentially 99.9999% of all human "creativity" is just us rehashing and borrowing heavily from stuff we've seen or encountered in nature. What are elves, dwarves, etc than people with slightly unusual features. Even aliens we create are based on: humans/bipedal, squid/sea creature, dragon/reptile, etc. How often does human creativity really, _really_ come up with something novel? Almost never!

Edit: I think my overarching point is that we need to come up with better exercises to test these models, but it's almost impossible for us to do this because most of us are incapable of creating purely novel concepts and ideas. AGI perhaps isn't that far off given that humans have been the stochastic parrots all along.

ravetcofx · 2024-09-26T02:59:29 1727319569

I wonder if spelling out the weight would work better. two kilogram for wider token input.

dotnet00 · 2024-09-26T03:32:31 1727321551

It still confidently said that the feathers were lighter than the lead. It did correct itself when I asked it to check again though.