General
- OpenAI’s data scraping wins big as Raw Story’s copyright lawsuit dismissed by NY court. The crux is that the plaintiffs could not demonstrate a concrete, actual harm from OpenAI’s actions.
- An article on Reuters: OpenAI and others seek new path to smarter AI as current methods hit limitations. It repeats the assertions (disputed by many experts in the community) that next-generation models (under development) are under-performing, and that AI labs are hitting data walls. They also emphasize that the path forward involves more “inference-time compute” to unlock reasoning.
- It is interesting to see the article including a quote from Ilya Sutskever, who has been largely quiet in the public sphere, after his departure from OpenAI and founding of SSI.
- The AI Semiconductor Landscape.
- Lex Fridman interviews Anthropic: Dario Amodei (CEO), Amanda Askell (develops Claude’s personality), Chris Olah (works on mechanistic interpretability).
Research Insights
- The Surprising Effectiveness of Test-Time Training for Abstract Reasoning (code). They implement temporary updates to weights at inference-time, using a loss and gradients in the usual (training) manner. They show strong performance on ARC tasks.
- Mansi Sakarvadia’s thesis: Towards Interpreting Language Models: A Case Study in Multi-Hop Reasoning. Develops a system to allow the user to inject prompt-specific information into inference, which can improve multi-step reasoning. Also describes Attention Lens, to convert attention heads into interpretable tokens.
- Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding (code).
LLM
- OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models (weights, preprint).
- Release of: Qwen2.5-Coder Series: Powerful, Diverse, Practical. Currently at the top of the coding leaderboard.
AI Agents
- Microsoft introduces: Magentic-One: A Generalist Multi-Agent System for Solving Complex Tasks.
- Microsoft releases an experimental library: TinyTroupe 🤠🤓🥸🧐: LLM-powered multiagent persona simulation for imagination enhancement and business insights.
- Nous Research announces: Introducing the Forge Reasoning API Beta and Nous Chat: An Evolution in LLM Inference. They claim this provides an easy way to take an existing model and run it in a reasoning mode (using inference-time compute).
- Mina Fahmi produced this image listing the ways that human and AI could work together:
Video
- AutoVFX: Physically Realistic Video Editing from Natural Language Instructions (preprint, code, examples).
- Pollo AI has released a video generator. Outputs are quite good, though not quite challenging the state-of-the-art.
- Current quality of video generations:
- Plants dancing.
- Insect on tree.
- Trailers for The Silmarillion and The Fall of Gondolin (by Abandoned Films).
- Moody sci-fi.
- Migration (made by combining Runway ML Gen3-Alpha and traditional animation).
- After the Winter (music made using Suno v4).
- Horror: Ridge to Southwest.
- The Gardener (by Machine Mythos).
World Synthesis
- ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model. Just two images of a scene are enough to reconstruct a 3D model.
Science
- AI protein-prediction tool AlphaFold3 is now open source (code).
- Robot that watched surgery videos performs with skill of human doctor.
Robots
- New Deep Robotics video shows very good terrain navigation from a quadruped-with-wheels design.