Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm confused by how Opus is presented to be superior in nearly every way for coding purposes yet the general consensus and my own experience seem to be that Sonnet is much much better. Has anyone switched to entirely using Opus from Sonnet? Or maybe switching to Opus for certain things while using Sonnet for others?


I don't doubt Opus is technically superior, but it's not practically superior for me.

It's still pretty much impossible to have any LLM one-shot a complex implementation. There's just too many details to figure out and too much to explain for it to get correct. Often, there's uncertainty and ambiguity that I only understand the correct answer (or rather less bad answer) after I've spent time deep in the code. Having Opus spit out a possibly correct solution just isn't useful to me. I need to understand _why_ we got to that solution and _why_ it's a correct solution for the context I'm working in.

For me, this means that I largely have an iteratively driven implementation approach where any particular task just isn't that complex. Therefore, Sonnet is completely sufficient for my day-to-day needs.


I've been having a great time with Windsurf's "Planning" feature. Have a nice discussion with Cascade (Claude) all about what it is that neerds to happen - sometimes a very long conversation including test code. Then when everything is very clear, make it happen. Then test and debug the results with all that context. Pretty nice.


This is basically what I do. I have a specific "planning mode" prompt I work through.

It's very, very helpful. However, there are still a lot of problems I only discover/figure out after I've been working in the code.


Can you explain what you do exactly? Do you enable plan mode and use with chat...?


In Zed I switch the AI panel to ask mode and chat with the agent about different approaches and have it draft patches. Then when I think there's a design worth trying, switch to Write mode and have it implement that change + run the tests and diagnostics to verify the code at least compiles, tests pass and follows our style guides. Finally a line by line review + review of the test coverage (in terms of interface surface area) before submitting a PR for another human review.


After watching a few videos trying to understand how people were using LLMs and getting useful results I found that even making a simpler version of the fancy planning mode in the LLM IDEs via the instructions.md produced hugely better productivity gains.

I started adding an instruction file along the lines of "Always tell me your plan to solve the issue first with short example code, never edit files without explicit confirmation of your plan" at the start and it is like a day and night difference in how useful it becomes. It also starts to feel like programming again where you can read through various files and instead of thinking in your head, you write out your thoughts. You end up getting confirmation or push back on errors that you can clean up.

Reading through a sort of wrong sort of right implementation spread across various files after every prompt just really sucked.

I'm not one shotting massive amounts of files, but I am enjoying the lack of grunt work.


I use Claude Code frequently for doing constrained tasks in the BG while I'm working other issues. I love it's planning mode switch. It is so much better to identity a dead-end before Claude confidently charges off a cliff.


Could you share some of the videos that you watched ? Can you make a video yourself ? That will help a lot of us.


You can also always have it create design docs and mermaid diagrams for each task. Outline the why much easier earlier, shifting left


That's essentially what I do, but that doesn't (and cannot) entirely solve the problem.

A major part of software engineering is identifying and resolving issues during implementation. Plans are a good outline of what needs to be done, but they're always incomplete and inaccurate.


Every time that Sonnet is acting like it has brain damage (which is once or twice a day), I switch to Opus and it seems to sort things out pretty fast. This is unscientific anicdata though, and it could just be that switching models (any model) would have worked.


This seems like a case of reversion to the mean. When one model is performing below average, changing anything (like switching to another model) is likely to improve it by random chance...


Anthropic say Opus is better, benchmarks & evals say Opus is better, Opus has more parameters and parameters determine how much a NN can learn.

Maybe Opus just is better


Even if it's better on average, doesn't mean it's better for every possible query


This is a great use case for sub-agents IMO. By default, sub-agents use sonnet. You can have opus orchestrate the various agents and get (close to) the best of both worlds.


In this case I don't think the controller needs to be the smartest model. I use sonnet as the main driver and pass the heavy thinking (via zen mcp) onto Gemini pro for example, but I could use openai or opus or all of them via OpenRouter.

Subagents seem pretty similar to using zen mcp w/ OpenRouter but maybe better or at least more turnkey? I'll be checking them out.


Amp (ampcode.com) uses Sonnet as its main model and has GPT o3 as a special purpose tool / subagent. It can call into that when it needs particularly advanced reasoning.

Interestingly I found that prompting it to ask the o3 submodel (which they call The Oracle) to check Sonnet's working on a debugging solution was helpful. Extra interesting to me was the fact that Sonnet appeared to do a better job once I'd prompted that (like chain of thought prompting, perhaps asking it to put forward an explanation to be checked actually triggered more effective thinking).


Is there a way to get persistent sub-agents? I'd love to have a bunch of YAML files in my repository, one for each sub-agent, and have those automatically used across all Claude Code instances I have on multiple machines (I dev on laptop and desktop), or across the team.



Thanks!


In my experience the best use for subagents is saving context.

Example: you need to review some code to see if it has proper test coverage.

If you use the "main" context, it'll waste tokens on reading the codebase and running tests to see coverage results.

But if you launch an agent (a subprocess pretty much), it can use a "disposable" context to do that and only return with the relevant data - which bits of the code need more tests.

Now you can either use the main context to implement the tests or if you're feeling really fancy launch another sub-agent to do it.


AFAIK subagents inherit the default model since v1.0.64. At least that's the case for me with the Claude Code SDK — not providing a specific model makes subagents use claude-opus-4-1-20250805.


Great, now even computers need to leave the IC track if they want continued career progression.


Maybe context rot? If model's output seems to be getting worse or in a rut, then try just clearing context / starting a new session.


Switching models with the same context, in this case.


They both seem to behave differently depending on how loaded the system seems to be.


I have suspected for a long time that hosted models load shed by diverting some requests to lesser models or running more quantized versions under high load.


I think OpenRouter saves tokens by summarizing queries through another model, IIRC.


switching models great best practice whether get stuck or not

can look at primal check the mean or dual get out of local minima

in all cases, model, tokenizer, etc is just enough different that will generally pay off in spaces quickly


Exactly that.


> yet the general consensus and my own experience seem to be that Sonnet is much much better

Given that there’s nothing close to scientific analysis going on, I find it hard to tell how big the “Sonnet is overall better, not just sometimes” crowd is. I think part of the problem is that “The bigger model is better” feels obvious to say, so why say it? Whereas “the smaller model is better actually” feels both like unobvious advice and also the kind of thing that feels smart to say, both of which would lead to more people who believe it saying it, possibly creating the illusion of consensus.

I was trying to dig into this yesterday, but every time I come across a new thread the things people are saying and the proportions saying what are different.

I suppose one useful takeaway is this: If you’re using Claude Max and get downgraded from Opus to Sonnet for a few hours, you don’t have to worry too much about it being a harsh downgrade in quality.


Opus seems better to me on long tasks that require iterative problem solving and keeping track of the context of what we have already tried. I usually switch to it for any kind of complicated troubleshooting etc.

I stick with Sonnet for most things because it's generally good enough and I hit my token limits with it far less often.


Same. I'm on the $200 plan and I find Opus "better", but Sonnet is more straight forward. Sonnet is, to me, a "don't let it think" model. It does great if you give it concrete and small goals. Anything vague or broad and it starts thinking and it's a problem.

Opus gives you a bit more rope to hang yourself with imo. Yes, it "thinks" slightly better, but still not good enough to me. But it can be good enough to convince you that it can do the job.. so i dunno, i almost dislike it in this regard. I find Sonnet just easier to predict in this regard.

Could i use Opus like i do Sonnet? Yes definitely, and generally i do. But then i don't really see much difference since i'm hand-holding so much.


I use both. Sonnet is faster and more cost efficient. It's great for coding. Where Opus is noticeably better is in analysis. It surpasses Sonnet for debugging, finding patterns in data, creativity and analysis in general. It doesn't make a lot of sense to use Opus exclusively unless you're on a max20 plan and not hitting limits. Using Opus for design and troubleshooting and Sonnet for everything else is a good way to go.


Im on the Max plan and generally Opus seems to do better work than Sonnet. However, that’s only when they allow me to use Opus. The usage limits, even on the max plan, are a joke. Yesterday I hit the limits within MINUTES of starting my work day.


I'm a bit confused by people hitting usage limits so quickly.

I use Opus exclusively and don't hit limits. ccusage reports I'm using the API-equivalent of $2000/mo


You always have to ask which plan they're paying for. Sometimes people complain about the $20 per month plan...


There's no Opus quota on that plan at all.


In this case I'm replying to someone who lead with "I'm on the Max plan" but I realize now that's ambiguous, maybe they are on 5x while I'm on 20x.


That's insane. Are you accounting for caching? If not, there's no way this is going to last


I'm using ccusage to get the number, I think it just looks at your history and calculates based on tokens vs API pricing. So I think it wouldn't account for caching.

But I totally agree there's no way it lasts. I'm mostly only using this for side projects and I'm sitting there interacting with it, not YOLO'ing, I do sometimes have two sessions going at the same time but I'm not firing off swarms or anything crazy. Just have it set to Opus and I chat with it.


Claude Code definitely reports cached tokens, and I think CCusage does too, so it wouldn’t make sense for the calculation to be based on full pricing when they have the cached values.


Is this on x5? Because ever since they booted all the freeloaders I’ve not once seen the “you are approaching usage limits” message. Anyway, the “you are approaching usage limits” message shows up when you are over 50% of your tokens for that timeframe, so it’s not sure useful.


Yeah, you need to actively cherry pick which model to use in order to not waste tokens on stuff that would be easily handed by a simpler model.


same here constantly hit the Opus limits after minutes on Max plan


If I'm using cursor then sonnet is better, but in claude code Opus 4 is at least 3x better than Sonnet. As with most things these days, I think a lot of it comes down to prompting.


This is interesting. I do use Cursor with almost exclusively Sonnet and thinking mode turned on. I wonder if what Cursor does under the hood (like their indexing) somehow empowers Sonnet more. I do not have much experience with using Claude Code.


I now eagerly await Sonnet 4.1, only because of this release.


With aggressive Claude Code use I didn't find Sonnet better than Opus but I did find it faster while consuming far fewer tokens. Once I switched to the $100 Max plan and configured CC to exclusively use Sonnet I haven't run into a plan token limit even once. When I saw this announcement my first thing was to CMD-F and see when Sonnet 4.1 was coming out, because I don't really care about Opus outside of interactive deep research usage.


Opus really shines for completing long-running tasks with no supervision. But if you are using Claude Code interactively and actively steering it yourself, Sonnet is good enough and is faster.

I don't believe anyone saying Sonnet yields better results than Opus though, as my experience has been exactly the opposite. But trade-off wise, I can definitely see it being a better experience when used interactively because of its speed and lower cost.


Strategy I'm playing with, we'll see how good of results I get, is to prompt Opus to analyze and plan but not implement.

E.g. prompt to read a paper, read some source, then write out a terse document meant to be read by machine not human.

Then switch to Sonnet, have it read that document, and do the actual implementation work.


I use opus or gemini 2.5 pro for plan mode and sonnet for act mode in Cline. https://cline.bot

It's my experience that Opus is better at solving architectural challenges where sonnet struggles.


I notice that on the "Agentic Coding" benchmark cited in the article Sonnet 4 outperformed Opus 4 (by 0.2%), and under performs Opus 4.1 (by -1.8%).

So this release might change that consensus? If you believe the benchmarks are reflective of reality anyways.


> If you believe the benchmarks are reflective of reality anyways.

That's a big "if." But yeah, I can't tell a difference subjectively between Opus and Sonnet, other than maybe a sort of placebo effect. I'm more careful to write quality prompts when using Opus, because I don't want to waste the 5x more expensive tokens.


My opinion of Opus is that it takes the correct action 19/20 times, where Sonnet takes the correct action 18/20 times. It’s not strictly necessary to use Opus, but if you have the subscription already it’s just a pure win.


I've found with limited context provided in your prompt, opus is just awful compared to even gpt-4.1, but once I give it even just a little bit more of an explanation, it jumps leagues ahead.


I feel the same way. I usually use Opus to help with coding and documentation, and I use Sonnet for emails and so on.


100% opus all the time. Sonnet seems to get confused much faster and need more hand holding in my experience.


Yes, Opus is very noticeably better at programming in both Rust and Zig in my experience. I wish it were cheaper!


Opus is superior to understand the big picture and the direction.

Sonnet is great at banging it out.


It's ridiculously overpriced in the API. Just like o3 used to be.


Just more ancedata, but I entirely agree. I can't say that I am happy with Sonnet's output at any point, really, but it still occasionally works, whereas Opus has been a dumpster fire every single time.


That’s very strange. Sonnet is hot garbage and Opus is a miracle, for me. I also don’t see anyone praising sonnet anywhere.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: