AI News 2025-09-18

General

Research Insights

Language Models Are Capable of Metacognitive Monitoring and Control of Their Internal Activations. Arguably, perfecting this metainterp is a more useful way to do interpretability and alignment. That is, rather than try to reverse-engineer an AI’s brain, one simply isolates the subset of faithful metacognitions, and use those for the model to inspect itself.
Meta: Language Self-Play For Data-Free Training.
DeepSeek paper: DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning.

LLM

OpenAI releases an update to their coding system: GPT-5 Codex.
Frontier LLMs were tested in the ICPC 2025 programming competition.
- OpenAI achieved gold, getting 12/12 correct (best human achieved 11/12). ChatGPT-5 was able to get 11/12 correct, and their experimental reasoning system was also able to get the last (most challenging) question correct.
- Google Gemini 2.5 Deep Think achieved gold (10/12 correct).

Safety

Agents

Google DeepMind: Virtual Agent Economies. Sandboxes economies could be used to allow agents to cooperate and compete, e.g. negotiating for access to resources.
Google announce: Powering AI commerce with the new Agent Payments Protocol (AP2). This extension to their Agent2Agent (A2A) protocol marks a sign that they wish future agents to be able to spend money on behalf of their user.

Image Synthesis

Reve enables easy editing of images, where image elements can be selected to modify/move/etc.
- There is academic work along similar lines. E.g. Generative Blocks World: Moving Things Around in Pictures.

World Synthesis

Cars

Waymo released some safety data (based 96M miles driven). The results are biased somewhat by the subset of regions/conditions that Waymo are allowed to drive (they claim that they account for that in their analysis). Nevertheless, the results are impressive, showing fewer crashes/injuries for Waymo driving compared to human.

Robots

Video, from Active Intelligent Systems (ACT) Lab at the Southern University of Science and Technology (SUSTech), shows a Unitree robot responding very nimbly to extreme perturbations. (Also, dancing.)