I'm confused by how Opus is presented to be superior in nearly every way for cod...

SkyPuncher · 2025-08-05T17:42:51 1754415771

I don't doubt Opus is technically superior, but it's not practically superior for me.

It's still pretty much impossible to have any LLM one-shot a complex implementation. There's just too many details to figure out and too much to explain for it to get correct. Often, there's uncertainty and ambiguity that I only understand the correct answer (or rather less bad answer) after I've spent time deep in the code. Having Opus spit out a possibly correct solution just isn't useful to me. I need to understand _why_ we got to that solution and _why_ it's a correct solution for the context I'm working in.

For me, this means that I largely have an iteratively driven implementation approach where any particular task just isn't that complex. Therefore, Sonnet is completely sufficient for my day-to-day needs.

bdamm · 2025-08-05T20:52:44 1754427164

I've been having a great time with Windsurf's "Planning" feature. Have a nice discussion with Cascade (Claude) all about what it is that neerds to happen - sometimes a very long conversation including test code. Then when everything is very clear, make it happen. Then test and debug the results with all that context. Pretty nice.

SkyPuncher · 2025-08-06T14:04:35 1754489075

This is basically what I do. I have a specific "planning mode" prompt I work through.

It's very, very helpful. However, there are still a lot of problems I only discover/figure out after I've been working in the code.

jstummbillig · 2025-08-05T21:25:52 1754429152

Can you explain what you do exactly? Do you enable plan mode and use with chat...?

trenchpilgrim · 2025-08-05T23:33:48 1754436828

In Zed I switch the AI panel to ask mode and chat with the agent about different approaches and have it draft patches. Then when I think there's a design worth trying, switch to Write mode and have it implement that change + run the tests and diagnostics to verify the code at least compiles, tests pass and follows our style guides. Finally a line by line review + review of the test coverage (in terms of interface surface area) before submitting a PR for another human review.

Larrikin · 2025-08-06T00:05:02 1754438702

After watching a few videos trying to understand how people were using LLMs and getting useful results I found that even making a simpler version of the fancy planning mode in the LLM IDEs via the instructions.md produced hugely better productivity gains.

I started adding an instruction file along the lines of "Always tell me your plan to solve the issue first with short example code, never edit files without explicit confirmation of your plan" at the start and it is like a day and night difference in how useful it becomes. It also starts to feel like programming again where you can read through various files and instead of thinking in your head, you write out your thoughts. You end up getting confirmation or push back on errors that you can clean up.

Reading through a sort of wrong sort of right implementation spread across various files after every prompt just really sucked.

I'm not one shotting massive amounts of files, but I am enjoying the lack of grunt work.

lsaferite · 2025-08-08T12:24:38 1754655878

I use Claude Code frequently for doing constrained tasks in the BG while I'm working other issues. I love it's planning mode switch. It is so much better to identity a dead-end before Claude confidently charges off a cliff.

anjanb · 2025-08-06T04:42:12 1754455332

Could you share some of the videos that you watched ? Can you make a video yourself ? That will help a lot of us.

ssk42 · 2025-08-05T19:35:51 1754422551

You can also always have it create design docs and mermaid diagrams for each task. Outline the why much easier earlier, shifting left

SkyPuncher · 2025-08-06T14:07:57 1754489277

That's essentially what I do, but that doesn't (and cannot) entirely solve the problem.

A major part of software engineering is identifying and resolving issues during implementation. Plans are a good outline of what needs to be done, but they're always incomplete and inaccurate.

adastra22 · 2025-08-05T16:57:04 1754413024

Every time that Sonnet is acting like it has brain damage (which is once or twice a day), I switch to Opus and it seems to sort things out pretty fast. This is unscientific anicdata though, and it could just be that switching models (any model) would have worked.

gpm · 2025-08-05T17:24:59 1754414699

This seems like a case of reversion to the mean. When one model is performing below average, changing anything (like switching to another model) is likely to improve it by random chance...

keeeba · 2025-08-05T20:26:46 1754425606

Anthropic say Opus is better, benchmarks & evals say Opus is better, Opus has more parameters and parameters determine how much a NN can learn.

Maybe Opus just is better

8n4vidtmkvmk · 2025-08-06T04:19:48 1754453988

Even if it's better on average, doesn't mean it's better for every possible query

monatron · 2025-08-05T17:15:21 1754414121

This is a great use case for sub-agents IMO. By default, sub-agents use sonnet. You can have opus orchestrate the various agents and get (close to) the best of both worlds.

rapind · 2025-08-05T21:51:00 1754430660

In this case I don't think the controller needs to be the smartest model. I use sonnet as the main driver and pass the heavy thinking (via zen mcp) onto Gemini pro for example, but I could use openai or opus or all of them via OpenRouter.

Subagents seem pretty similar to using zen mcp w/ OpenRouter but maybe better or at least more turnkey? I'll be checking them out.

mark_undoio · 2025-08-05T22:57:41 1754434661

Amp (ampcode.com) uses Sonnet as its main model and has GPT o3 as a special purpose tool / subagent. It can call into that when it needs particularly advanced reasoning.

Interestingly I found that prompting it to ask the o3 submodel (which they call The Oracle) to check Sonnet's working on a debugging solution was helpful. Extra interesting to me was the fact that Sonnet appeared to do a better job once I'd prompted that (like chain of thought prompting, perhaps asking it to put forward an explanation to be checked actually triggered more effective thinking).

adastra22 · 2025-08-05T18:35:52 1754418952

Is there a way to get persistent sub-agents? I'd love to have a bunch of YAML files in my repository, one for each sub-agent, and have those automatically used across all Claude Code instances I have on multiple machines (I dev on laptop and desktop), or across the team.

mwigdahl · 2025-08-05T19:31:25 1754422285

Yep: https://docs.anthropic.com/en/docs/claude-code/sub-agents

adastra22 · 2025-08-06T02:31:35 1754447495

Thanks!

theshrike79 · 2025-08-06T08:04:04 1754467444

In my experience the best use for subagents is saving context.

Example: you need to review some code to see if it has proper test coverage.

If you use the "main" context, it'll waste tokens on reading the codebase and running tests to see coverage results.

But if you launch an agent (a subprocess pretty much), it can use a "disposable" context to do that and only return with the relevant data - which bits of the code need more tests.

Now you can either use the main context to implement the tests or if you're feeling really fancy launch another sub-agent to do it.

ondrsh · 2025-08-06T07:57:13 1754467033

AFAIK subagents inherit the default model since v1.0.64. At least that's the case for me with the Claude Code SDK — not providing a specific model makes subagents use claude-opus-4-1-20250805.

riwsky · 2025-08-06T01:00:05 1754442005

Great, now even computers need to leave the IC track if they want continued career progression.

HarHarVeryFunny · 2025-08-05T17:44:16 1754415856

Maybe context rot? If model's output seems to be getting worse or in a rut, then try just clearing context / starting a new session.

adastra22 · 2025-08-05T18:34:55 1754418895

Switching models with the same context, in this case.

j45 · 2025-08-05T17:10:52 1754413852

They both seem to behave differently depending on how loaded the system seems to be.

api · 2025-08-05T17:15:23 1754414123

I have suspected for a long time that hosted models load shed by diverting some requests to lesser models or running more quantized versions under high load.

parineum · 2025-08-05T17:27:17 1754414837

I think OpenRouter saves tokens by summarizing queries through another model, IIRC.

aghilmort · 2025-08-06T01:55:22 1754445322

switching models great best practice whether get stuck or not

can look at primal check the mean or dual get out of local minima

in all cases, model, tokenizer, etc is just enough different that will generally pay off in spaces quickly

anonzzzies · 2025-08-05T16:59:59 1754413199

Exactly that.

Uehreka · 2025-08-05T17:30:12 1754415012

> yet the general consensus and my own experience seem to be that Sonnet is much much better

Given that there’s nothing close to scientific analysis going on, I find it hard to tell how big the “Sonnet is overall better, not just sometimes” crowd is. I think part of the problem is that “The bigger model is better” feels obvious to say, so why say it? Whereas “the smaller model is better actually” feels both like unobvious advice and also the kind of thing that feels smart to say, both of which would lead to more people who believe it saying it, possibly creating the illusion of consensus.

I was trying to dig into this yesterday, but every time I come across a new thread the things people are saying and the proportions saying what are different.

I suppose one useful takeaway is this: If you’re using Claude Max and get downgraded from Opus to Sonnet for a few hours, you don’t have to worry too much about it being a harsh downgrade in quality.

MostlyStable · 2025-08-05T16:54:41 1754412881

Opus seems better to me on long tasks that require iterative problem solving and keeping track of the context of what we have already tried. I usually switch to it for any kind of complicated troubleshooting etc.

I stick with Sonnet for most things because it's generally good enough and I hit my token limits with it far less often.

unshavedyak · 2025-08-05T17:13:36 1754414016

Same. I'm on the $200 plan and I find Opus "better", but Sonnet is more straight forward. Sonnet is, to me, a "don't let it think" model. It does great if you give it concrete and small goals. Anything vague or broad and it starts thinking and it's a problem.

Opus gives you a bit more rope to hang yourself with imo. Yes, it "thinks" slightly better, but still not good enough to me. But it can be good enough to convince you that it can do the job.. so i dunno, i almost dislike it in this regard. I find Sonnet just easier to predict in this regard.

Could i use Opus like i do Sonnet? Yes definitely, and generally i do. But then i don't really see much difference since i'm hand-holding so much.

jm4 · 2025-08-05T17:51:43 1754416303

I use both. Sonnet is faster and more cost efficient. It's great for coding. Where Opus is noticeably better is in analysis. It surpasses Sonnet for debugging, finding patterns in data, creativity and analysis in general. It doesn't make a lot of sense to use Opus exclusively unless you're on a max20 plan and not hitting limits. Using Opus for design and troubleshooting and Sonnet for everything else is a good way to go.

biinjo · 2025-08-05T17:17:09 1754414229

Im on the Max plan and generally Opus seems to do better work than Sonnet. However, that’s only when they allow me to use Opus. The usage limits, even on the max plan, are a joke. Yesterday I hit the limits within MINUTES of starting my work day.

furyofantares · 2025-08-05T17:46:46 1754416006

I'm a bit confused by people hitting usage limits so quickly.

I use Opus exclusively and don't hit limits. ccusage reports I'm using the API-equivalent of $2000/mo

rirze · 2025-08-05T17:49:55 1754416195

You always have to ask which plan they're paying for. Sometimes people complain about the $20 per month plan...

stavros · 2025-08-05T17:57:56 1754416676

There's no Opus quota on that plan at all.

furyofantares · 2025-08-05T18:20:09 1754418009

In this case I'm replying to someone who lead with "I'm on the Max plan" but I realize now that's ambiguous, maybe they are on 5x while I'm on 20x.

Bolwin · 2025-08-05T18:52:13 1754419933

That's insane. Are you accounting for caching? If not, there's no way this is going to last

furyofantares · 2025-08-05T19:01:28 1754420488

I'm using ccusage to get the number, I think it just looks at your history and calculates based on tokens vs API pricing. So I think it wouldn't account for caching.

But I totally agree there's no way it lasts. I'm mostly only using this for side projects and I'm sitting there interacting with it, not YOLO'ing, I do sometimes have two sessions going at the same time but I'm not firing off swarms or anything crazy. Just have it set to Opus and I chat with it.

Aeolun · 2025-08-05T23:59:37 1754438377

Claude Code definitely reports cached tokens, and I think CCusage does too, so it wouldn’t make sense for the calculation to be based on full pricing when they have the cached values.

Aeolun · 2025-08-06T00:01:56 1754438516

Is this on x5? Because ever since they booted all the freeloaders I’ve not once seen the “you are approaching usage limits” message. Anyway, the “you are approaching usage limits” message shows up when you are over 50% of your tokens for that timeframe, so it’s not sure useful.

epolanski · 2025-08-05T17:28:27 1754414907

Yeah, you need to actively cherry pick which model to use in order to not waste tokens on stuff that would be easily handed by a simpler model.

dsrtslnd23 · 2025-08-05T20:16:03 1754424963

same here constantly hit the Opus limits after minutes on Max plan

dested · 2025-08-05T17:05:02 1754413502

If I'm using cursor then sonnet is better, but in claude code Opus 4 is at least 3x better than Sonnet. As with most things these days, I think a lot of it comes down to prompting.

jzig · 2025-08-05T17:14:04 1754414044

This is interesting. I do use Cursor with almost exclusively Sonnet and thinking mode turned on. I wonder if what Cursor does under the hood (like their indexing) somehow empowers Sonnet more. I do not have much experience with using Claude Code.

datameta · 2025-08-05T16:49:41 1754412581

I now eagerly await Sonnet 4.1, only because of this release.

astrostl · 2025-08-05T18:16:17 1754417777

With aggressive Claude Code use I didn't find Sonnet better than Opus but I did find it faster while consuming far fewer tokens. Once I switched to the $100 Max plan and configured CC to exclusively use Sonnet I haven't run into a plan token limit even once. When I saw this announcement my first thing was to CMD-F and see when Sonnet 4.1 was coming out, because I don't really care about Opus outside of interactive deep research usage.

sothatsit · 2025-08-05T22:57:50 1754434670

Opus really shines for completing long-running tasks with no supervision. But if you are using Claude Code interactively and actively steering it yourself, Sonnet is good enough and is faster.

I don't believe anyone saying Sonnet yields better results than Opus though, as my experience has been exactly the opposite. But trade-off wise, I can definitely see it being a better experience when used interactively because of its speed and lower cost.

cmrdporcupine · 2025-08-06T13:25:58 1754486758

Strategy I'm playing with, we'll see how good of results I get, is to prompt Opus to analyze and plan but not implement.

E.g. prompt to read a paper, read some source, then write out a terse document meant to be read by machine not human.

Then switch to Sonnet, have it read that document, and do the actual implementation work.

chisleu · 2025-08-06T06:27:03 1754461623

I use opus or gemini 2.5 pro for plan mode and sonnet for act mode in Cline. https://cline.bot

It's my experience that Opus is better at solving architectural challenges where sonnet struggles.

gpm · 2025-08-05T17:18:59 1754414339

I notice that on the "Agentic Coding" benchmark cited in the article Sonnet 4 outperformed Opus 4 (by 0.2%), and under performs Opus 4.1 (by -1.8%).

So this release might change that consensus? If you believe the benchmarks are reflective of reality anyways.

jimbo808 · 2025-08-05T17:22:42 1754414562

> If you believe the benchmarks are reflective of reality anyways.

That's a big "if." But yeah, I can't tell a difference subjectively between Opus and Sonnet, other than maybe a sort of placebo effect. I'm more careful to write quality prompts when using Opus, because I don't want to waste the 5x more expensive tokens.

Aeolun · 2025-08-05T23:53:23 1754438003

My opinion of Opus is that it takes the correct action 19/20 times, where Sonnet takes the correct action 18/20 times. It’s not strictly necessary to use Opus, but if you have the subscription already it’s just a pure win.

sky2224 · 2025-08-05T22:55:00 1754434500

I've found with limited context provided in your prompt, opus is just awful compared to even gpt-4.1, but once I give it even just a little bit more of an explanation, it jumps leagues ahead.

brenoRibeiro706 · 2025-08-05T17:10:29 1754413829

I feel the same way. I usually use Opus to help with coding and documentation, and I use Sonnet for emails and so on.

tkz1312 · 2025-08-06T09:02:11 1754470931

100% opus all the time. Sonnet seems to get confused much faster and need more hand holding in my experience.

rtfeldman · 2025-08-05T16:52:47 1754412767

Yes, Opus is very noticeably better at programming in both Rust and Zig in my experience. I wish it were cheaper!

j45 · 2025-08-06T14:15:26 1754489726

Opus is superior to understand the big picture and the direction.

Sonnet is great at banging it out.

seunosewa · 2025-08-05T17:05:50 1754413550

It's ridiculously overpriced in the API. Just like o3 used to be.

taormina · 2025-08-05T17:42:15 1754415735

Just more ancedata, but I entirely agree. I can't say that I am happy with Sonnet's output at any point, really, but it still occasionally works, whereas Opus has been a dumpster fire every single time.

ssss11 · 2025-08-05T21:28:20 1754429300

That’s very strange. Sonnet is hot garbage and Opus is a miracle, for me. I also don’t see anyone praising sonnet anywhere.