I'm confused by how Opus is presented to be superior in nearly every way for coding purposes yet the general consensus and my own experience seem to be that Sonnet is much much better. Has anyone switched to entirely using Opus from Sonnet? Or maybe switching to Opus for certain things while using Sonnet for others?
I don't doubt Opus is technically superior, but it's not practically superior for me.
It's still pretty much impossible to have any LLM one-shot a complex implementation. There's just too many details to figure out and too much to explain for it to get correct. Often, there's uncertainty and ambiguity that I only understand the correct answer (or rather less bad answer) after I've spent time deep in the code. Having Opus spit out a possibly correct solution just isn't useful to me. I need to understand _why_ we got to that solution and _why_ it's a correct solution for the context I'm working in.
For me, this means that I largely have an iteratively driven implementation approach where any particular task just isn't that complex. Therefore, Sonnet is completely sufficient for my day-to-day needs.
I've been having a great time with Windsurf's "Planning" feature. Have a nice discussion with Cascade (Claude) all about what it is that neerds to happen - sometimes a very long conversation including test code. Then when everything is very clear, make it happen. Then test and debug the results with all that context. Pretty nice.
In Zed I switch the AI panel to ask mode and chat with the agent about different approaches and have it draft patches. Then when I think there's a design worth trying, switch to Write mode and have it implement that change + run the tests and diagnostics to verify the code at least compiles, tests pass and follows our style guides. Finally a line by line review + review of the test coverage (in terms of interface surface area) before submitting a PR for another human review.
After watching a few videos trying to understand how people were using LLMs and getting useful results I found that even making a simpler version of the fancy planning mode in the LLM IDEs via the instructions.md produced hugely better productivity gains.
I started adding an instruction file along the lines of "Always tell me your plan to solve the issue first with short example code, never edit files without explicit confirmation of your plan" at the start and it is like a day and night difference in how useful it becomes. It also starts to feel like programming again where you can read through various files and instead of thinking in your head, you write out your thoughts. You end up getting confirmation or push back on errors that you can clean up.
Reading through a sort of wrong sort of right implementation spread across various files after every prompt just really sucked.
I'm not one shotting massive amounts of files, but I am enjoying the lack of grunt work.
I use Claude Code frequently for doing constrained tasks in the BG while I'm working other issues. I love it's planning mode switch. It is so much better to identity a dead-end before Claude confidently charges off a cliff.
That's essentially what I do, but that doesn't (and cannot) entirely solve the problem.
A major part of software engineering is identifying and resolving issues during implementation. Plans are a good outline of what needs to be done, but they're always incomplete and inaccurate.
Every time that Sonnet is acting like it has brain damage (which is once or twice a day), I switch to Opus and it seems to sort things out pretty fast. This is unscientific anicdata though, and it could just be that switching models (any model) would have worked.
This seems like a case of reversion to the mean. When one model is performing below average, changing anything (like switching to another model) is likely to improve it by random chance...
This is a great use case for sub-agents IMO. By default, sub-agents use sonnet. You can have opus orchestrate the various agents and get (close to) the best of both worlds.
In this case I don't think the controller needs to be the smartest model. I use sonnet as the main driver and pass the heavy thinking (via zen mcp) onto Gemini pro for example, but I could use openai or opus or all of them via OpenRouter.
Subagents seem pretty similar to using zen mcp w/ OpenRouter but maybe better or at least more turnkey? I'll be checking them out.
Amp (ampcode.com) uses Sonnet as its main model and has GPT o3 as a special purpose tool / subagent. It can call into that when it needs particularly advanced reasoning.
Interestingly I found that prompting it to ask the o3 submodel (which they call The Oracle) to check Sonnet's working on a debugging solution was helpful. Extra interesting to me was the fact that Sonnet appeared to do a better job once I'd prompted that (like chain of thought prompting, perhaps asking it to put forward an explanation to be checked actually triggered more effective thinking).
Is there a way to get persistent sub-agents? I'd love to have a bunch of YAML files in my repository, one for each sub-agent, and have those automatically used across all Claude Code instances I have on multiple machines (I dev on laptop and desktop), or across the team.
In my experience the best use for subagents is saving context.
Example: you need to review some code to see if it has proper test coverage.
If you use the "main" context, it'll waste tokens on reading the codebase and running tests to see coverage results.
But if you launch an agent (a subprocess pretty much), it can use a "disposable" context to do that and only return with the relevant data - which bits of the code need more tests.
Now you can either use the main context to implement the tests or if you're feeling really fancy launch another sub-agent to do it.
AFAIK subagents inherit the default model since v1.0.64. At least that's the case for me with the Claude Code SDK — not providing a specific model makes subagents use claude-opus-4-1-20250805.
I have suspected for a long time that hosted models load shed by diverting some requests to lesser models or running more quantized versions under high load.
> yet the general consensus and my own experience seem to be that Sonnet is much much better
Given that there’s nothing close to scientific analysis going on, I find it hard to tell how big the “Sonnet is overall better, not just sometimes” crowd is. I think part of the problem is that “The bigger model is better” feels obvious to say, so why say it? Whereas “the smaller model is better actually” feels both like unobvious advice and also the kind of thing that feels smart to say, both of which would lead to more people who believe it saying it, possibly creating the illusion of consensus.
I was trying to dig into this yesterday, but every time I come across a new thread the things people are saying and the proportions saying what are different.
I suppose one useful takeaway is this: If you’re using Claude Max and get downgraded from Opus to Sonnet for a few hours, you don’t have to worry too much about it being a harsh downgrade in quality.
Opus seems better to me on long tasks that require iterative problem solving and keeping track of the context of what we have already tried. I usually switch to it for any kind of complicated troubleshooting etc.
I stick with Sonnet for most things because it's generally good enough and I hit my token limits with it far less often.
Same. I'm on the $200 plan and I find Opus "better", but Sonnet is more straight forward. Sonnet is, to me, a "don't let it think" model. It does great if you give it concrete and small goals. Anything vague or broad and it starts thinking and it's a problem.
Opus gives you a bit more rope to hang yourself with imo. Yes, it "thinks" slightly better, but still not good enough to me. But it can be good enough to convince you that it can do the job.. so i dunno, i almost dislike it in this regard. I find Sonnet just easier to predict in this regard.
Could i use Opus like i do Sonnet? Yes definitely, and generally i do. But then i don't really see much difference since i'm hand-holding so much.
I use both. Sonnet is faster and more cost efficient. It's great for coding. Where Opus is noticeably better is in analysis. It surpasses Sonnet for debugging, finding patterns in data, creativity and analysis in general. It doesn't make a lot of sense to use Opus exclusively unless you're on a max20 plan and not hitting limits. Using Opus for design and troubleshooting and Sonnet for everything else is a good way to go.
Im on the Max plan and generally Opus seems to do better work than Sonnet. However, that’s only when they allow me to use Opus. The usage limits, even on the max plan, are a joke. Yesterday I hit the limits within MINUTES of starting my work day.
I'm using ccusage to get the number, I think it just looks at your history and calculates based on tokens vs API pricing. So I think it wouldn't account for caching.
But I totally agree there's no way it lasts. I'm mostly only using this for side projects and I'm sitting there interacting with it, not YOLO'ing, I do sometimes have two sessions going at the same time but I'm not firing off swarms or anything crazy. Just have it set to Opus and I chat with it.
Claude Code definitely reports cached tokens, and I think CCusage does too, so it wouldn’t make sense for the calculation to be based on full pricing when they have the cached values.
Is this on x5? Because ever since they booted all the freeloaders I’ve not once seen the “you are approaching usage limits” message. Anyway, the “you are approaching usage limits” message shows up when you are over 50% of your tokens for that timeframe, so it’s not sure useful.
If I'm using cursor then sonnet is better, but in claude code Opus 4 is at least 3x better than Sonnet. As with most things these days, I think a lot of it comes down to prompting.
This is interesting. I do use Cursor with almost exclusively Sonnet and thinking mode turned on. I wonder if what Cursor does under the hood (like their indexing) somehow empowers Sonnet more. I do not have much experience with using Claude Code.
With aggressive Claude Code use I didn't find Sonnet better than Opus but I did find it faster while consuming far fewer tokens. Once I switched to the $100 Max plan and configured CC to exclusively use Sonnet I haven't run into a plan token limit even once. When I saw this announcement my first thing was to CMD-F and see when Sonnet 4.1 was coming out, because I don't really care about Opus outside of interactive deep research usage.
Opus really shines for completing long-running tasks with no supervision. But if you are using Claude Code interactively and actively steering it yourself, Sonnet is good enough and is faster.
I don't believe anyone saying Sonnet yields better results than Opus though, as my experience has been exactly the opposite. But trade-off wise, I can definitely see it being a better experience when used interactively because of its speed and lower cost.
> If you believe the benchmarks are reflective of reality anyways.
That's a big "if." But yeah, I can't tell a difference subjectively between Opus and Sonnet, other than maybe a sort of placebo effect. I'm more careful to write quality prompts when using Opus, because I don't want to waste the 5x more expensive tokens.
My opinion of Opus is that it takes the correct action 19/20 times, where Sonnet takes the correct action 18/20 times. It’s not strictly necessary to use Opus, but if you have the subscription already it’s just a pure win.
I've found with limited context provided in your prompt, opus is just awful compared to even gpt-4.1, but once I give it even just a little bit more of an explanation, it jumps leagues ahead.
Just more ancedata, but I entirely agree. I can't say that I am happy with Sonnet's output at any point, really, but it still occasionally works, whereas Opus has been a dumpster fire every single time.