Sure, but also the METR study showed the rate of change is t doubles every 7 mon...

alecbz · 2025-10-15T18:48:54 1760554134

Only skimmed the paper, but I'm not sure how to think about "length of task" as a metric here.

The cases I'm thinking about are things that could be solved in a few minutes by someone who knows what the issue is and how to use the tools involved. I spent around two days trying to debug one recent issue. A coworker who was a bit more familiar with the library involved figured it out in an hour or two. But in parallel with that, we also asked the library's author, who immediately identified the issue.

I'm not sure how to fit a problem like that into this "duration of human time needed to complete a task" framework.

conception · 2025-10-15T21:38:31 1760564311

This is an excellent example of human “context windows” though and it could be the llm could have solved the easy problem with better context engineering. Despite 1M token windows, things still start to get progressively worse after 100k. LLMs would overnight be amazingly better with a reliable 1M window.

ben_w · 2025-10-15T21:43:28 1760564608

Fair comment.

While I think they're trying to cover that by getting experts to solve problems, it is definitely the case that humans learn much faster than current ML approaches, so "expert in one specific library" != "expert in writing software".

Pulcinella · 2025-10-15T02:39:44 1760495984

But will it actually get better or will it just get faster and more power efficient at failing to pair parentheses/braces/brackets/quotes?

ben_w · 2025-10-15T09:12:47 1760519567

Read the linked METR study please.

Or watch the Computerphile video summary/author interview, if you prefer: https://m.youtube.com/watch?v=evSFeqTZdqs