AI News 2025-04-03

General

AI timeline visualization: The Road to AGI (2015–2025), by James Campbell and Emiliano Garcia-Lopez (code).
Anthropic releases the second report for their economic index: Anthropic Economic Index: Insights from Claude 3.7 Sonnet.
xAI buys the 𝕏 (formerly Twitter) social media platform. The internal all-stock deal values xAI at $80B and 𝕏 at $33B ($45B minus $12B debt).
OpenAI raises an additional $40B at $300B valuation.
LLMs shown to be viable for therapy: First Therapy Chatbot Trial Yields Mental Health Benefits.
- Paper: Randomized Trial of a Generative AI Chatbot for Mental Health Treatment.
LLMs as viable tutors: LLM Support for Tutors GPT-4 boosts remote tutors’ performance in real time, study finds.
- Paper: Tutor CoPilot: A Human-AI Approach for Scaling Real-Time Expertise.
New study evaluates human ability to detect AI in a more stringent setting: Large Language Models Pass the Turing Test.
- Note earlier work already showing LLMs passing less-stringent Turing Tests:
  - 2023-05: Human or Not? A Gamified Approach to the Turing Test
  - 2023-10: Does GPT-4 pass the Turing test?
  - 2024-05: People cannot distinguish GPT-4 from a human in a Turing test
  - 2024-07: GPT-4 is judged more human than humans in displaced and inverted Turing tests

Research Insights

Meta preprint: Multi-Token Attention. They combine attention (query, key, head operations) over multiple tokens; convolution operations allow nearby queries/keys to affect each other’ss attention weights.
Danijar Hafner et al. (Google DeepMind) present DreamerV3.
- Media report: AI masters Minecraft: DeepMind program finds diamonds without being taught. The Dreamer system reached the milestone by ‘imagining’ the future impact of possible decisions.
- Paper: Mastering diverse control tasks through world models.

Safety

Control AI releases: The Direct Institutional Plan. They suggest designing policies that prevent development of superintelligence, and spreading awareness among democratic institutions.
Google DeepMind: DeepMind: Taking a responsible path to AGI.
- Paper: An Approach to Technical AGI Safety and Security

LLM

OpenAI pushed an update to their 4o model. This has significantly improved its ranking (e.g. now best non-reasoning model on coding benchmark).
An interesting test of GPT-4o in-context image generation: it is unable to generate an image of a maze with a valid solution; at lest when the maze is a square. However, if you ask it to make an image of a diamond orientation maze (45° rotated square), it succeeds to have a valid solution. We can rationalize this based on the sequential order of autoregressive generation. By generating first from the start of the maze (and only its local neighborhood), and similarly finishing with this sort of locality, the model can more correctly build a valid solution. (Conversely, the usual square orientation requires longer-range reasoning across image patches.)
- At first, this might seem like just another silly oddity. But it shows how recasting a problem, just by changing the generation order, can massively change model performance. This sheds light on how they “think” and suggests that alternate generation strategies could perhaps unlock capabilities.
  - For instance, one could imagine an LLM with different branches (like MoE?) where each branch is trained on a different autoregression strategy (left-to-right, right-to-left, block diffusion, random, etc.) such that the overall LLM can invoke/combine different kinds of thinking modes.
- Another trick is to ask it to generate an image of a maze with the solution identified, and then update the image to remove the solution. This is a visual analog of “think step-by-step” and other inference-time-compute strategies. This implies that current models have untapped visual reasoning capabilities that could be unlocked by allowing them to visually iterate on problems.
Anthropic announces Claude for Education, which provides a university-wide solution tailored to education.

AI Agents

Amazon introduce Nova Act, a research preview for agents controlling web browsers.
AI Digest has started an experiment: they launched 4 computer-use agents, and gave them the task of getting donations for a charity of their choice. The agents can chat to each other, and human visitors can also chat with them. They have begun to (slowly) work on the problem. You can view their ongoing activities here.
General Agents claims they have a general-purpose computer-use agent (Ace) that operates your local computer.
OpenAI release a new benchmark: PaperBench: Evaluating AI’s Ability to Replicate AI Research (paper, code).
Zapier adds MCP support, so AI agents can now access a very broad range of web apps (Slack, Google Sheets, Notion, etc.).

Audio

ElevenLabs:
- Adds native, low-latency RAG for conversational AI.
- Launch Actor Mode, where you can use your voice to direct the AI’s performance.
Udio introduces Styles, allowing generation from a provided audio clip.
Mureka AI enables more fine-grained music generation.

Image Synthesis

Ideogram 3.0 released.
OpenAI adds regional selection to their new in-context 4o image generator, allowing tailored updates to images.

Video

Runway ML launches a new model: Gen-4. Improvements in realism, physics, character consistency, etc. Sample short videos: The Lonely Little Flame, The Herd, The Retrieval, NYC is a Zoo, Scimmia Vede. (More examples.)
Meta unveils: MoCha: Towards Movie-Grade Talking Character Synthesis (preprint). Remarkably good human character generation from audio input.
Higgsfield is showing good camera control.
Sync introduces lipsync-2, which generates expressive synced video that maintains expressiveness.

Science

Curated list of science datasets: Awesome Materials & Chemistry Datasets.

Robot

KEENON Robotics introduces wheeled humanoid XMAN-R1.
Unitree Dex5 Dexterous Hand (20 degrees of freedom).
More video of Figure robots performing real work in a BMW factory.

S	M	T	W	T	F	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Leave a Reply