Hacker News new | past | comments | ask | show | jobs | submit login

Anything that has to do with individual words doesn't work well, but as I understand, this is an artifact of the tokenization process. E.g. pannekake is internally 4 tokens: pan-ne-k-ake. And I don't think that knowing which tokens correspond to which letter sequences is a part of the training data, so it has to infer that.



Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: