AI News 2025-03-20

General

Kevin Roose, New York Times: Powerful A.I. Is Coming. We’re Not Ready.
Predictions: Nicholas Carlini: My Thoughts on the Future of “AI”.
- “I have very wide error bars on the potential future of large language models, and I think you should too.”
Predictions: Glimpses of AI Progress: Mental models for fast times.
Strategic opinion piece: AI Dominance Requires Interpretability and Standards for Transparency and Security.

Research Insights

Google DeepMind: Mixture-of-Depths: Dynamically allocating compute in transformer-based language models.
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models (project, code, hf). They combine autoregressive and diffusion approaches to text generation.
Research from METR: Measuring AI Ability to Complete Long Tasks. A very valuable way to gauge AI utility is to compare to the length of the equivalent human effort for the task. As AI improves in coherence, we can expect it to tackle progressively longer-horizon tasks.
- Media report (Nature): AI could soon tackle projects that take humans weeks.

LLM

Baidu announce Ernie 4.5 and X1 (use here). They claim that Ernie 4.5 is comparable to GPT-4o, and that X1 is comparable to DeepSeek R1; but with lower API costs (Earnie 4.5 is 1/4 the price of 4o, while X1 is 1/2 of R1). They plan to open-source the models on June 30th.
Mistral release Mistral Small 3.1 24B. They report good performance for the model size (e.g. outperforming GPT-4o-mini and Gemma 3).
LG AI Research announce EXAONE Deep, a reasoning LLM (2.4B, 7.8B, 32B variants; weights) that scores well on math benchmarks.
Nvidia release Llama-Nemotron models, which can do reasoning (try it here).

Safety

Vision

Image Synthesis

Gemini 2.0 Flash Experimental (available in Google AI Studio) is multimodal, with image generation capabilities. By having the image generation “within the model” (rather than as an external tool), one can iterate on image generation much more naturally. This incidentally obviates the need for more specialized image tools (can do colorization, combine specified people/places/products, remove watermarks, etc.).

Video

ReCamMaster: Camera-Controlled Generative Rendering from A Single Video (preprint, code) Allows one to modify the camera position/motion on an existing video clip (try it here).
WarpTuber enables real-time VTubing with AI avatar models (tutorial video).
Sync labs tease a lip-sync model.
Stability AI announces an image-to-video model that generates camera motion over the estimated 3D scene (code).

Audio

Udio have distilled their v1.5 model into a faster and higher-quality variant: Allegro.
Canopy Labs Orpheus 3B is a high-quality open-source text-to-speech model (weights: pretrain, finetune; try using).
AudioX: Diffusion Transformer for Anything-to-Audio Generation.

Science

Robots

Figure reports: BotQ: A High-Volume Manufacturing Facility for Humanoid Robots. Initially targeting 12k robots/year; eventually ramping up to 100k/year.
EngineAI video shows fluid dancing (behind the scenes: teaching, filming).
Nvidia (in partnership with Google DeepMind and Disney Research) announces announces Newton, an open-source physics engine for robotic simulation.
Boston Dynamics video of Atlas showing improved athletics. Of note is that this is accomplished using reinforcement learning in simulations.