according to openai, the least likely model to hallucinate is gpt-5-thinking-mini, and it hallucinates 26% of the time. Seems to me the problems of LLMs boldly producing lies are far from solved. But sure, they lied years ago too.
according to openai, the least likely model to hallucinate is gpt-5-thinking-mini, and it hallucinates 26% of the time.
You're not so bad at hallucinating, yourself. We find
that gpt-5-main has a hallucination rate (i.e., percentage of factual claims that contain minor or major errors) 26% smaller than GPT-4o ...
That's the only reference to "26%" that I see in the model card.
I get that in the age of AI, you didn't want to read the data i linked; that's fine. your ctrl-f search found a reference to 26%. However, on page thirteen, the rate is described as 0.26; I interpreted that as 26% because it's cross referenced in the blog post that i also linked.
Posting multiple links and asserting that somewhere within one of them the reader will find confirmation of an apparently-absurd statement amounts to an attempted DoS attack on the reader's attention. It's not a sign of good faith. Obviously a model that hallucinates 26% of the time on typical tasks would be of no interest to anyone outside a research environment, so regardless of where the real story is found, it's safe to say it's in there somewhere. It's just not my job to look for it.
On some classes of queries, weak models will hallucinate closer to 100% of the time. One of my favorite informal benchmarks is to throw a metaphorical dart at a map and ask what's special about the smallest town nearby. That's a good trick if you want to observe genuine progress being made in the latest models.
On other tasks, typically the ones that matter, hallucination rates are approaching zero. Not quickly enough for my preference, but the direction is clear enough.
Says someone who lectures on how LLMs worked two years ago.