General
- In November 2024, this paper made bold claims: Artificial Intelligence, Scientific Discovery, and Product Innovation; in particular that AI greatly increased patenting and that top performers benefiting most. MIT conducted an investigation and found the work fraudulent: Assuring an accurate research record. The claims of AI-generated patents cannot be substantiated. The other results may be true, but this particular report should not be used as evidence.
- Survey shows US workers are rapidly adopting AI: 30% of workers in Dec 2024; now up to 40% of workers.
- Pew research: How the U.S. Public and AI Experts View Artificial Intelligence. Significant negative sentiment and worry about the future of AI.
- Publication of: ARC-AGI-2: A New Challenge for Frontier AI Reasoning Systems.
- METR provides a preliminary update on their analysis of “AI task completion”, by including examples other than software engineering tasks. The results suggest different scaling based on task type, but a general trend of exponentially increasing capabilities.

Research Insights
- Qwen publish: Parallel Scaling Law for Language Models (code). They propose parallel computation for improved scaling of training-time and inference-time compute. This requires a learned transformation step before going into the model, and a learned aggregation on model output.
- Harnessing the Universal Geometry of Embeddings. They find evidence for a “Strong Platonic Representation Hypothesis” wherein all models learn essentially the same representation. This implies a well-defined “consensus reality” for any given dataset.
LLM
- Why We Think (Lilian Weng) provides a nice review of inference-time compute methods.
- Google announce Gemini 2.5 Pro Deep Think. It demonstrates extremely good performance on math and code benchmarks.
- Google announce Gemini Diffusion, a text diffusion model that enables 5× faster generation.
- Google add native audio output to Gemini 2.5 Pro and 2.5 Flash.
- Google add compute use capabilities to Gemini.
- Google is expanding its rollout of AI Mode for the main Google search product.
- Anthropic announce: Claude Sonnet 4 and Claude Opus 4.
Agents
- OpenAI announces: A research preview of Codex in ChatGPT. Whereas Codex-CLI runs locally, this new system runs on OpenAI’s servers. Uses Codex-1 (based on o3, optimized for coding), and can be used for things like: understanding a repo, fixing bugs in a repo, etc.
- Google adds an Agent Mode to Gemini, allowing you to delegate tasks for it to work on.
- Google release Jules, an asynchronous coding agent.
- Google published a video demo of their Project Astra research prototype, an AI assistant operating from your smartphone.
Image Synthesis
- Google announce Imagen 4.
Video
- Viggle introduce LIVE, real-time webcam character/avatar animation (that runs in browser).
- Google announce Veo 3. It also natively generates audio. Examples: conversation, cooking, singing, simple story, cinematic action sequence, car show interviews, We Can Talk, podcat, various.
- Google announce Flow, an AI filmmaking tool that integrates with Veo.
- Google announce that NotebookLM will be adding video overviews, with graphics generated to match the audio presentation.
Audio
- Google announce improvements to their Lyria 2 music generator.
Science
- FutureHouse report AI-accelerated research into combating a form of blindness: Demonstrating end-to-end scientific discovery with Robin: a multi-agent system.
- Generalization bias in large language model summarization of scientific research. They show that AI summarization, though not actually erroneous, is less precise than human experts.
Hardware
- OpenAI announces that io (led by Jony Ive) is merging with them, in order to develop hardware optimized for interfacing with AI.
- Rumors for what this might mean: Tantalizing details of Jony Ive’s AI device leak after OpenAI meeting.
Robots