AI News 2025-12-25

Research Insights

LLM

OpenAI update: GPT-5.2-Codex.
METR reports a record-setting time on their task length benchmark: Opus 4.5 reaches almost 5 hours (though sparsity in the available evaluations at this time horizon make data increasingly unrealiable).
- From this, we can update our predictions. It appears that progress continues along the established exponential, with capabilities doubling every 4-5 months.

AI Agents

Google: Distributional AGI Safety. They argue that AGI may first arise as an emergent property of a collection of agents.

Safety