General
- The U.S. Army Reserve has formed a new Detachment 201: Executive Innovation Corps. The group (which includes OpenAI CPO Kevin Weil, Palantir CTO Shyam Sankar, Meta CTO Andrew Bosworth, and Bob McGrew) will focus on tech issues.
- Epoch reports continued progress in AI, including on their hard FrontierMath benchmark.

- OpenAI has been awarded a US Department of Defense contract for $200M to develop AI models for defense applications.
- Future of Work with AI Agents: Auditing Automation and Augmentation Potential across the U.S. Workforce. Interesting delineation of jobs:

Research Insights
- Distillation Robustifies Unlearning (preprint, demo, discussion). Normal unlearning suppresses knowledge in a model, but adversarial prompting or fine-tuning can bring the knowledge/behavior back. They show that distilling into a new model more reliably eradicates the undesired information.
- Self-Adapting Language Models. The models update their own fine-tuning data and update directives.
LLM
- Google is nearing release of Gemini 2.5 Pro Deep Think, which deploys more inference-time compute to improve reasoning.
- Google launch 2.5 Flash-Lite, a very fast (very low cost) reasoning model.
- Google DeepMind technical report: Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities. The paper lists a thousand contributors. (Informal summary.)
Agents
- Anthropic describes the multi-agent system underlying Claude’s research capabilities: How we built our multi-agent research system.
Safety
- Anthropic blog post: SHADE-Arena: Evaluating sabotage and monitoring in LLM agents (paper).
- Avoiding Obfuscation with Prover-Estimator Debate. They show that honesty is incentivized at equilibrium (under certain conditions).
- OpenAI: Toward understanding and preventing misalignment generalization.
- Paper: Persona Features Control Emergent Misalignment.
- They show that intentional misalignment training (e.g. to write bad code) causes an emergent “evil” personality. But this can be detected and countered.
Video
- Seedance 1.0: Exploring the Boundaries of Video Generation Models.
- Mystery model “Kangaroo” being tested in the AI video arena.
- Hailuo AI (MiniMax) unveils Hailuo 02 (examples: various, various, tsunami, fight scene, fox running, blogger).
- Midjourney releases their video model. Outputs are often beautiful (early examples, release examples: various, various, Ethan Mollick, highly rated, complex environments).
Science
- Automation of Systematic Reviews with Large Language Models. By automating the medical document review process, they can save years of human labor.
Hardware
- AMD unveiled its new MI350 chip, optimized for AI workloads. They are focusing on open/compliant coding standards, and energy/cost efficiency.
Cars
- Data from Waymo: New Insights for Scaling Laws in Autonomous Driving. They show scaling laws apply to autonomous driving: using more data and more compute for training yields reliable improvements in performance.
Robots
- RoboBrain 2.0 is an open-source, general purpose robot control model (video).
- 1X World Model: Evaluating Bits, not Atoms (preprint).
- Generalist shows off a model that can enable relatively simple robot arms to perform precise work.
- Hexagon robotics announces Aeon humanoid robot (on wheels), optimized for industrial work (video).