General
- New York Times: Pay More Attention to A.I.
- Nature commentary: Does AI already have human-level intelligence? The evidence is clear. The vision of human-level machine intelligence laid out by Alan Turing in the 1950s is now a reality. Eyes unclouded by dread or hype will help us to prepare for what comes next.
- Distinguish between inference scaling and “larger tasks use more compute”.
Research Insights
- Learning to Continually Learn via Meta-learning Agentic Memory Designs. Rather than prescribing a memory architecture, a meta-agent automatically designs it.
- Goodfire: Features as Rewards: Using Interpretability to Reduce Hallucinations. This RLFR paradigm uses interpretability to extract model beliefs, providing a feedback signal for RL.
- Replicating Human Motivated Reasoning Studies with LLMs. Current AIs seem not to engage in motivated reasoning, which represents a limit to using them to model humans.
LLM
- Anthropic releases Claude Opus 4.6. State-of-the-art on ARC-AGI.
- OpenAI releases GPT-5.3-Codex.
- OpenAI releases GPT-5.3-Codex-Spark, optimized for real-time coding.
- Google release Gemini 3 Deep Think, intended for science and engineering. (Impressive benchmarks.)
- Google release Gemini 3.1 Pro, which is based on the Gemini 3 Deep Think core intelligence improvements. Pareto on ARC-AGI.
- MiniMax M2.5 is an open-source model (230B) optimized for coding and agentics.
- Anthropic reveal Claude Sonnet 4.6, 1M context window.
AI Agents
- OpenAI launches OpenAI Frontier, for managing agents in enterprise.
AI Safety
- The Hot Mess of AI: How Does Misalignment Scale with Model Intelligence and Task Complexity? (preprint)
Image Synthesis
- Seedream 5.0 available.
- Google release Nano Banana 2, a more fast/cheap version with still high quality.
Audio
- Google introduces Lyria 3 music generation model.
Video
- Kling 3.0.
- Seedance 2.0.
- Xmax AI demos real-time interactive video.
Science
- Semi-Autonomous Mathematics Discovery with Gemini: A Case Study on the Erdős Problems.
- Google: PaperBanana: Automating Academic Illustration for AI Scientists.
- Google: Accelerating Scientific Research with Gemini: Case Studies and Common Techniques.
- Google: Accelerating Mathematical and Scientific Discovery with Gemini Deep Think.
- Edison release LABBench 2: 1,900 questions to measure AI on scientific tasks.
- OpenAI announces use of GPT-5 for autonomous protein synthesis.
- Allen AI (Ai2) AstraLabs launch AutoDiscovery: Uncover surprising insights hidden in your data. The goal is to help identify what research to pursue.
- Allen AI (Ai2) publication in Nature: Synthesizing scientific literature with retrieval-augmented language models (blog writeup).
- GPT‑5.2 derives a new result in theoretical physics. In a new preprint, GPT‑5.2 proposed a formula for a gluon amplitude later proved by an internal OpenAI model and verified by the authors.
Cars
- The Waymo World Model: A New Frontier For Autonomous Driving Simulation. Built on Google DeepMind Genie 3.
- Tesla releases safety data. Driving with Full Self-Driving (Supervised) yields a lower collision rate than without (and especially when compared to the US driver average).
Robots
- Epoch AI analyze the state of robotics: Where Autonomy Works: Evaluating Robot Capabilities in 2026. They conclude: “navigation is deployed at scale, manipulation mostly is not”.