Anything that has to do with individual words doesn't work well, but as I unders...

int_19h on Feb 23, 2023 | parent | context | favorite | on: Don't believe ChatGPT – we do not offer a "phone l...

Anything that has to do with individual words doesn't work well, but as I understand, this is an artifact of the tokenization process. E.g. pannekake is internally 4 tokens: pan-ne-k-ake. And I don't think that knowing which tokens correspond to which letter sequences is a part of the training data, so it has to infer that.