Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Sure, but also the METR study showed the rate of change is t doubles every 7 months where t ~= «duration of human time needed to complete a task, such that SOTA AI can complete same with 50% success»: https://arxiv.org/pdf/2503.14499

I don't know how long that exponential will continue for, and I have my suspicions that it stops before week-long tasks, but that's the trend-line we're on.





Only skimmed the paper, but I'm not sure how to think about "length of task" as a metric here.

The cases I'm thinking about are things that could be solved in a few minutes by someone who knows what the issue is and how to use the tools involved. I spent around two days trying to debug one recent issue. A coworker who was a bit more familiar with the library involved figured it out in an hour or two. But in parallel with that, we also asked the library's author, who immediately identified the issue.

I'm not sure how to fit a problem like that into this "duration of human time needed to complete a task" framework.


This is an excellent example of human “context windows” though and it could be the llm could have solved the easy problem with better context engineering. Despite 1M token windows, things still start to get progressively worse after 100k. LLMs would overnight be amazingly better with a reliable 1M window.

Fair comment.

While I think they're trying to cover that by getting experts to solve problems, it is definitely the case that humans learn much faster than current ML approaches, so "expert in one specific library" != "expert in writing software".


But will it actually get better or will it just get faster and more power efficient at failing to pair parentheses/braces/brackets/quotes?

Read the linked METR study please.

Or watch the Computerphile video summary/author interview, if you prefer: https://m.youtube.com/watch?v=evSFeqTZdqs




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: