General
- In response to the White House call: Public Comment Invited on Artificial Intelligence Action Plan.
Research Insights
LLM
- Sudowrite Muse is an LLM designed specifically for creative writing, generating text that is more evocative than typically chatbot (“helpful assistant”) output.
- Relatedly, Sam Altman posted some text from an LLM trained to be good at creative writing. The output is indeed more evocative than usual ChatGPT writing.
- RWKV: Reinventing RNNs for the Transformer Era. They demonstrate a reasoning model that achieves meaningful performance with just 0.1B parameters: RWKV7-G1 “Goose One” (github, weights).
- R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcing Learning.
- Google releases updates to its open-source models: Gemma 3 (technical report). They are small/efficient models, exceeding the prior Pareto front (e.g. 1338 LMArena ELO with just 27B parameters). Multimodal (text, image, video), 128k context window. Available as 1B, 4B, 12B, 27B.
- Cohere introduces Command A (weights), a 111B multilingual model (256k context) that reportedly has good performance/price.
AI Agents
- ReSo: A Reward-driven Self-organizing LLM-based Multi-Agent System for Reasoning Tasks.
- Fleur is an (open-source) unofficial “app store” for Claude (description). It seems to act as a user-friendly wrapper around MCP servers.
- Last week, Manus AI claimed a general computer use agent (video). Reactions are mixed.
- An open-source equivalent has been created: OWL (Optimized Workforce Learning).
- OpenAI releases the responses API and a developer SDK for agents (modernization of swarm). The new tools enable easy handoff between agents, arbitrary computer use, and more.
- Here is an online demo of using these methods to control a web browser in a virtual machine.
Safety
- OpenAI blog post: Detecting misbehavior in frontier reasoning models. They study how the natural-language chain-of-thought operates in reasoning models. They find that aggressive optimization of reasoning, especially optimizing for the CoT to not exhibit misaligned text, leads to model behaviors where undesired thoughts are not expressed in CoT (but are nevertheless activated). Conversely, under-optimized CoT remains human-legible, providing an opportunity to detect and modify undesired behavior. They advocate for strongly avoiding over-optimization of CoT, thereby keeping it legible; noting that this may require hiding the CoT from the end-user (e.g. so model can freely consider dangerous topics in the CoT, while ultimately not expressing these in the response to the user).
- Dan Hendrycks, Eric Schmidt and Alexandr Wang released: Superintelligence Strategy, a detailed essay about ASI risks, with concrete mitigation suggestions, including Mutual Assured AI Malfunction (MAIM).
- Time writeup: The Nuclear-Level Risk of Superintelligent AI.
- Zvi Mowshowitz analysis: On MAIM and Superintelligence Strategy.
Audio
- Elevenlabs adds speed control for text-to-speech; can be controlled down to the word level to control a performance.
- Tavus are demoing AI avatars (audio and video) that are fairly lifelike and responsive.
- Nvidia release Audio Flamingo 2 (paper, code), an audio-language model with long-context and understanding of non-speech audio.
- Sesame has now released the weights for their remarkable conversational audio model (demo, example): use, code, weights.
Image Synthesis
- NEX AI announce image model Ikon-2 (examples). Optimized for realism (for marketing, brands, etc.).
- Luma Labs claim a breakthrough in image generation: Breaking the Algorithmic Ceiling in Pre-Training with Inductive Moment Matching (paper, code). They introduce Inductive Moment Matching (IMM), as an alternative to the domination of autoregressive (for discrete signals) and diffusion (for continuous) paradigms. They report that it is much more compute-efficient.
- Nvidia presents: SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation. Remarkably, this model can generate images in 0.1s on a H100 GPU.
Video
- Hedra releases Character 3, an improved video avatar model, that can lip-sync to provided audio.
- Captions AI’s Mirage model also achieves more emotive lip-sync than older methods.
Science
- In Jan 2024, Google described Articulate Medical Intelligence Explorer (AMIE). They now report advancements (paper), going from diagnosis towards long-term treatment.
- Medical Hallucination in Foundation Models and Their Impact on Healthcare.
- Sakana’s AI scientist (v2) has written a paper that was accepted as a peer-reviewed publication. The experiment was conducted with the knowledge of the conference; reviewers did not know which papers were human or AI-generated.
Robots
- Google DeepMind announce: Gemini Robotics brings AI into the physical world. Videos show examples of deploying these models to the Apollo robot developed by Apptronik.