LLMs are no longer trying to just reproduce the distribution of online text as a whole to push the state of the art, they are focused on a different distribution of “high quality” - whatever that means in your domain. So it is possible that this process matches a “better” distribution for some tasks by removing erroneous information or sampling “better” outputs more frequently.