Humans can't one-shot non trivial planning tasks either. It's the one problem i have with all the papers that try to evaluate planning for LLMs.
Step away from that approach and they're ok.
https://innermonologue.github.io/
https://tidybot.cs.princeton.edu/
Humans can't one-shot non trivial planning tasks either. It's the one problem i have with all the papers that try to evaluate planning for LLMs.
Step away from that approach and they're ok.
https://innermonologue.github.io/
https://tidybot.cs.princeton.edu/