General
- The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs.
- Anthropic Economic Index: Tracking AI’s role in the US and global economy.
- OpenAI: How People use ChatGPT.

Research Insights
- Language Models Are Capable of Metacognitive Monitoring and Control of Their Internal Activations. Arguably, perfecting this metainterp is a more useful way to do interpretability and alignment. That is, rather than try to reverse-engineer an AI’s brain, one simply isolates the subset of faithful metacognitions, and use those for the model to inspect itself.
- Meta: Language Self-Play For Data-Free Training.
- DeepSeek paper: DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning.
LLM
- OpenAI releases an update to their coding system: GPT-5 Codex.
- Frontier LLMs were tested in the ICPC 2025 programming competition.
- OpenAI achieved gold, getting 12/12 correct (best human achieved 11/12). ChatGPT-5 was able to get 11/12 correct, and their experimental reasoning system was also able to get the last (most challenging) question correct.
- Google Gemini 2.5 Deep Think achieved gold (10/12 correct).
Safety
Agents
- Google DeepMind: Virtual Agent Economies. Sandboxes economies could be used to allow agents to cooperate and compete, e.g. negotiating for access to resources.
- Google announce: Powering AI commerce with the new Agent Payments Protocol (AP2). This extension to their Agent2Agent (A2A) protocol marks a sign that they wish future agents to be able to spend money on behalf of their user.
Image Synthesis
- Reve enables easy editing of images, where image elements can be selected to modify/move/etc.
- There is academic work along similar lines. E.g. Generative Blocks World: Moving Things Around in Pictures.
World Synthesis
- World Labs shows off: Generating Bigger and Better Worlds.
Cars
- Waymo released some safety data (based 96M miles driven). The results are biased somewhat by the subset of regions/conditions that Waymo are allowed to drive (they claim that they account for that in their analysis). Nevertheless, the results are impressive, showing fewer crashes/injuries for Waymo driving compared to human.

Robots