AI Prediction Authority: I Still Underestimated the Speed of AI. Achieving "AI Research and Development Automation" by the End of This Year Is Truly Possible

robot
Abstract generation in progress

The rapid advancement of artificial intelligence capabilities is catching even the most cautious forecasters off guard.

Renowned AI prediction researcher Ajeya Cotra recently admitted that her forecast for AI progress by 2026, published just two months ago, was significantly conservative. The trigger for this self-correction was the performance of Anthropic’s latest model, Claude Opus 4.6, in the authoritative METR benchmark tests. The software engineering “time span” for this model has reached about 12 hours, far exceeding Cotra’s previous prediction of around 24 hours by the end of 2026. This means AI’s real progress in software engineering has advanced nearly ten months earlier than she anticipated.

Even more striking, Cotra has raised her probability estimate for “full automation of AI research and development.” She maintains a 10% chance that AI will fully take over research conception and implementation without human intervention by the end of this year, and explicitly states: “This is the first time I can’t find any solid extrapolations to confidently say this won’t happen very soon.” This statement has garnered widespread attention in the AI prediction community.

Cotra previously served as the head of AI safety research funding at Coefficient Giving, one of the world’s largest AI safety funding organizations. She is now affiliated with METR—a firm focused on AI capability assessment.

Forecasts fall short: judgments from two months ago are outdated

On January 14, Cotra predicted, based on historical trends where the time span roughly doubled less than twice per year from 2019 to 2025, that the 50% success rate for the most advanced models by the end of 2026 would correspond to a time span of about 24 hours, with an 80% prediction of 40 hours.

However, just about two months after her forecast, Opus 4.6 was evaluated to have a time span of approximately 12 hours. In the METR test set, among 19 software engineering tasks estimated to take humans over 8 hours, Opus 4.6 was able to at least partially complete 14, and reliably solve 4 of them. Cotra admits that, despite ten more months of progress, AI agents still fail about half of the 24-hour tasks, making her previous confidence “no longer credible.”

It is also noteworthy that Cotra points out the uncertainty in current time span estimates has increased significantly—Opus 4.6’s 95% confidence interval ranges from 5.3 hours to 66 hours. This is partly due to the small number of long tasks, the reliance on estimated human completion times, and the near saturation of benchmark tests themselves.

Capability boundaries: traditional evaluation frameworks are failing

As AI agents approach or surpass tasks requiring dozens of hours, Cotra believes the very concept of “time span” is being challenged.

She notes that task decomposability increases markedly with scale: debugging tasks of one hour are nearly impossible to split and run in parallel; a day-long development task can be roughly divided but with fuzzy boundaries; projects spanning a month or more are naturally suited to being broken into multiple parallel sub-tasks. Once AI agents can reliably complete tasks of 80 hours or more, in theory, continuous progress can be made by “management-level AI” assigning tasks and “execution-level AI” working in parallel, enabling ongoing advancement of projects of any size.

Cotra’s colleague Tom has proposed using the calendar time required for a large team to complete a task—rather than individual man-hours—as a better measure of “intrinsic difficulty.” Cotra believes that as AI enters this new scale, the “single-person time” metric may begin to grow super-exponentially, making the upper limit of software engineering capabilities by year’s end extremely difficult to estimate.

She also admits that such large-scale task decomposition won’t work perfectly in practice—participants’ intuitive grasp of the overall context can’t be fully replaced by Jira tickets or Asana tasks. However, she believes that for a significant class of software projects, this approach “may prove surprisingly effective.”

Key milestone: AI research automation could become a reality this year

Among all predictions, Cotra’s assessment of the probability of “full automation of AI research and development” has attracted the most attention.

She defines this probability as: AI systems fully undertaking research conception and implementation without human involvement. In her January forecast, she assigned a 10% chance, which received feedback from many peers in the AI prediction field suggesting this number might be too high. But after the performance of Opus 4.6, she states that 10% “feels reasonable again.”

Cotra remains cautious. She points out that fully automating AI R&D not only requires advanced software engineering capabilities but also breakthroughs in “research judgment” and “creativity,” areas where current AI systems still lag significantly behind human researchers. She believes the likelihood of achieving this within the next three to five years is much higher than within this year.

However, her wording has fundamentally shifted: “This is the first time I can’t find any solid extrapolation to confidently say it won’t happen very soon.”

Risk warning and disclaimer

Market risks exist; investments should be cautious. This article does not constitute personal investment advice and does not consider individual users’ specific investment goals, financial situations, or needs. Users should consider whether any opinions, viewpoints, or conclusions in this article are suitable for their particular circumstances. Invest at your own risk.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
0/400
No comments
  • Pin