Ars points out that these findings contradict those of other experiments and then goes on to postulate as to why. I clicked on the link to the other experiment:
when data is combined across three experiments and 4,867 developers, our analysis reveals a 26.08% increase (SE: 10.3%) in completed tasks among developers using the AI tool
By comparison, this experiment considered 16 developers. That’s 0.3% as many as the experiments its findings contradict. Fortunately, the authors don’t claim their findings are broadly applicable. They even have a table that reads:
We do not provide evidence that | Clarification —- | —- AI systems do not currently speed up many or most software developers | We do not claim that our developers or repositories represent a majority or plurality of software development work AI systems do not speed up individuals or groups in domains other than software de- velopment | We only study software development AI systems in the near future will not speed up developers in our exact setting | Progress is difficult to predict, and there has been substantial AI progress over the past five years [2] There are not ways of using existing AI systems more effectively to achieve positive speedup in our exact setting | Cursor does not sample many tokens from LLMs, it may not use optimal prompting/scaffolding, and domain/repository-specific training/finetuning/few-shot learning could yield positive speedup
That said, the study has been an interesting read so far. I highly recommend reading it directly rather than just the news posts about it. Check out their own blog post: https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/
I personally find the psychological effect - the devs thought they were 20% faster even afterward - to be pretty interesting, as it suggests that even if more time overall is spent, use of AI could reduce cognitive load and potentially side effects like burnout.
I’d like to see much larger scale studies set up like this, as well as studies of other real world situations. For example, how does this affect the amount of time this takes 10,000 different developers to onboard onto an unfamiliar repository?
https://knowyourmeme.com/memes/inb4--2