The problem is that it’s doing a search of your relatively niche problem, and pr...

The problem is that it’s doing a search of your relatively niche problem, and probably getting pretty poor results. The text from the search is then more highly weighted than the base model, but with relative junk, so it performs better without the additional (unhelpful) context.

You see this with Bing search on ChatGPT as well, and I’ve seen it in my own projects.