AI News 2025-06-12

General

Research Insights

LLM

  • Anthropic adds Claude Gov; models intended for national security.
  • Mistral announces Magistral, a reasoning model. Two variants: 24B open-source or a larger enterprise version via API.
    • An interesting result from the report (section 7.2: Eating the multimodal free lunch): They base model is multi-modal, but RL is done using text only. Yet, they observe this text-only training does not harm multi-modal performance; in fact multi-modal performance improves. This suggests modalities are well-entangled and that transfer learning between modalities is naturally occurring.
  • OpenAI announced the released of o3-pro (release notes).

Vision

  • Meta announce V-JEPA 2 (paper) a vision model that builds a world model, and could be useful for robotic control.

Audio

World Synthesis

Science

Cars

  • Tesla has provided some updated details on their current “full self-driving” (FSD) implementation. Some claims: 3.5B miles driven by FSD across 6 million vehicles, 54% safer than human.

Robots

Posted in AI, News | Tagged , , , , , , , , | Leave a comment

AI News 2025-06-05

General

Research Insights

LLM

  • LisanBench (github) is a new benchmark that evaluates long-term task coherence (“stamina”) through a game where the LLM must progressively alter a word (one character at a time), always yielding a valid English word, to build the longest possible chain. Although highly contrived, this does seem to test longer-range planning. The results conform to vibes about model intelligence.
  • Anthropic has launched Claude Explains, a blog of AI generated posts (with human verification). The focus (currently) appears to be teaching simple coding concepts.
  • OpenAI announces updates to ChatGPT for business.
    • Deep research can now search across defined private data repositories (Sharepoint, Google Drive, Dropbox, etc.).
    • Chat queries and data analysis requests can draw directly from connected data sources.
    • ChatGPT now supports custom connectors, based on MCP.
    • Being deployed for Teams, Enterprise, and Edu.
    • Record mode transcribes meetings, providing a summary document with pointers to the transcript/timecode.
  • Google updated Gemini 2.5 Pro.

Agents

Safety

  • Yoshua Bengio launches LawZero, a non-profit dedicated to advancing safe-by-design AI.

Audio

  • Elevenlabs introduces a multi-modal assistant, that can handle mixture between voice and text input (at the same time; not requiring toggling between modes). It does seem like a productive way to interact with an AI.
  • Play AI is open-sourcing PlayDiffusion (demo) a diffusion-LLM for speech, which allows for inpainting (example).
  • Bland announces an improvement in their text-to-speech model, with cloning of voice, accent, style, etc. They claim it is finally past the uncanny valley.

Image Synthesis

Video

  • AMC is integrating Runway ML genAI into its workflows (mostly for ideation, pre-vis, and promotional materials).
  • Luma introduces Modify Video, allowing style transfer or video-generation conditioned on an input video.

Science

Posted in AI, News | Tagged , , , , , , , | Leave a comment

AI News 2025-05-29

General

  • Essay by Pete Koomen: AI Horseless Carriages (video version: Why AI Apps Still Feel Broken with Pete Koomen). It makes the case that our current approach of adding AI to existing applications is akin to early horseless carriages (which added engines to existing carriage designs; instead of being designed from scratch to optimally take advantage of an engine). Future AI-first applications need to rethink the user experience in light of AI capabilies.

Research Insights

LLM

Agents

  • OpenAI updates Operator to use the o3 model.
  • Manus introduce a system that will build a slide deck on demand.

Safety & Interpretability

Audio

  • Kyutai demos Unmute, a text-to-speech and speech-to-text capability. Will be open-sourced.
  • Anthropic announce that they will begin rolling out voice conversation mode.
  • Chatterbox TTS is a high-quality open source speech synthesis model (try).

Image Synthesis

Video

  • Viggle Live enables real-time avatar control.
  • Workflow: Use Google Street View imagery combined with image synthesis (e.g. Runway References) and then video generation (e.g. Runway Gen3) to generate a sequence of “on location” clips.
  • Google DeepMind report SignGemma, a forthcoming open model for converting sign language video into text.

World Synthesis

Science

  • OpenAI adds to ChatGPT scaffolding the ability to visualize molecules (RDKit library).

Robots

Posted in AI, News | Tagged , , , , , , , , , | Leave a comment

AI News 2025-05-22

General

Research Insights

LLM

Agents

  • OpenAI announces: A research preview of Codex in ChatGPT. Whereas Codex-CLI runs locally, this new system runs on OpenAI’s servers. Uses Codex-1 (based on o3, optimized for coding), and can be used for things like: understanding a repo, fixing bugs in a repo, etc.
  • Google adds an Agent Mode to Gemini, allowing you to delegate tasks for it to work on.
  • Google release Jules, an asynchronous coding agent.
  • Google published a video demo of their Project Astra research prototype, an AI assistant operating from your smartphone.

Image Synthesis

Video

Audio

  • Google announce improvements to their Lyria 2 music generator.

Science

Hardware

Robots

  • Video of LimX Dynamics TRON 1, performing various real-world relevant tasks.
  • Video of Tesla Optimus performing real-world tasks autonomously. Reportedly, all tasks are accomplished using a single neural network trained on human POV data.
Posted in AI, News | Tagged , , , , , , , , | Leave a comment

AI News 2025-05-15

General

Research Insights

LLM

  • OpenAI add o4-mini to their reinforcement fine-tuning API.
  • ByteDance releases SeedCoder 8B.
  • OpenAI adds GPT-4.1 to the ChatGPT web product.
  • OpenAI release HealthBench. In addition to providing a useful way to track progress on LLMs for healthcare applications, the current results demonstrate just how effective existing LLMs can be in this application space.

Agents

Safety

Audio

Video

World Synthesis

  • Enigma Labs claims they have made the first multiplayer AI-generative video game (a multiplayer car racing game). They say they will open-source the work eventually. Although the gameplay video shows crude graphics, it is further evidence that generative environments are a key part of future entertainment.

Science

Hardware

Robots

  • Tesla shows a video of Optimus robot dancing. Fluid motion like this tests the limit of hardware and software (latency, real-time compensation, etc.).
Posted in AI, News | Tagged , , , , , , , | Leave a comment

AI News 2025-05-08

General

Research Insights

LLM

Audio

Video

Brain

Robots

Posted in AI, News | Tagged , , , , , | Leave a comment

AI News 2025-05-01

General

Research Insights

LLM

Safety

Audio

Image Synthesis

  • Freepik and Fal announce F-Lite (tech report), an open-source image model (10B, trained on 80M images).
  • Midjourney has pushed updated to the v7 model (improving quality and coherence), adds an experimental aesthetic intensity parameter, and launches a new omni-reference feature (example outputs).

Video

  • Runway roll out their references feature to all paying users, which allows one to include specific characters/environments/elements in generations.

Science

Robots

Posted in AI, News | Tagged , , , , , , | Leave a comment

AI News 2025-04-24

General

Research Insights

LLM

AI Agents

Audio

  • Nari Labs Dia is a text-to-speech (TTS) model that can generate remarkably realistic and emotional output (example).

Video

Hardware

  • Google demos next-generation smart glasses with AI integration (TED talk).
Posted in AI, News | Tagged , , , , , | Leave a comment

AI Impact Predictions

Debates about future AI progress or impact are often confused, because different people have very different mental-models for the expected pace, and the time-horizon over which they are projecting.

This figure is my attempt to clarify:

The experimental datapoints come from the METR analysis: Measuring AI Ability to Complete Long Tasks (paper, code/data). The “count the OOMs” and “new regime” curves are extrapolated fits to the data. The other curves are ad-hoc, drawn just to give a sense of how a particular mental model might translate to capability-predictions.

The figure tries to emphasize:

  • Task complexity covers many orders-of-magnitude. Although imperfect, we can think about the timescale over which “coherent progress” must be made as a proxy for measuring generally useful capabilities.
  • There are many models for progress, and they vary dramatically in predictions.
  • Nevertheless, except for scenarios that fundamentally doubt AI progress is possible, the main disagreement among models is over the timescale required to reach a given kind of impact.
  • The concerns one has (economic, social, existential) will depend on one’s model. (Of course one’s concerns will also be influenced by other assessments, such as the wisdom we expect leaders to exhibit at different stages of rollout.)
  • It is difficult to define intelligence. Yet, it seems quite defensible to say that we have transitioned from clearly sub-human AI, into a “jagged intelligence” regime where a particular AI system will out-perform humans in some tasks (e.g. rapid knowledge retrieval) but under-perform in other tasks (e.g. visual reasoning). As we move through the jagged frontier, we should expect more and more human capabilities to be replicated in AI, even while some other subset remains unconquered.
  • The definition of “AGI” is also unclear. Instead of a clear line being crossed, we should expect a greater fraction of people to acknowledge AI as generally-capable, as systems cross through the jagged frontier.

The primary goal of the figure is to clarify discussions. I.e. we should specify which kinds of scenarios we find plausible, which impacts are thus considered possible, and which time-span we are currently discussing.

Posted in AI | Tagged , , , , | 1 Comment

AI News 2025-04-17

General

Research Insights

LLM

  • Zyphra releases an open-source reasoning model: ZR1-1.5B (weights, try using).
  • Anthropic adds to Claude a Research capability, and Google Workspace integration.
  • OpenAI announces GPT-4.1 models in the API. Optimized for developers (instruction following, coding, diff generation, etc.), 1M context length, etc.; three models (4.1, 4.1-mini, 4.1-nano) provide control of performance vs. cost. Models can handle text, image, and video.
    • They also have a prompting guide for 4.1.
    • OpenAI have released a new eval for long-context: MRCR.
    • OpenAI intends to deprecate GPT-4.5 in the next few months.
  • OpenAI announces o3 and o4-mini reasoning models.
    • These models are explicitly trained to use tools as part of their reasoning process.
    • They can reason over images in new ways.
    • Improved scores on math and code benchmarks (91-98% on AIME, ~75% on scientific figure reasoning, etc.).
    • o3 is strictly better than o1 (higher performance with lower inference cost); o1 will be deprecated.
    • OpenAI will be releasing coding agent applications; starting with Codex CLI, which allows one to deploy coding agents easily.
    • METR has provided evaluations of capabilities.
    • As part of the release, they also provided data showing how scaling RL is yielding predictable improvements.

Safety

Video

Audio

Science

Posted in AI, News | Tagged , , , , , | Leave a comment