AI News 2025-09-18

General

Research Insights

LLM

  • OpenAI releases an update to their coding system: GPT-5 Codex.
  • Frontier LLMs were tested in the ICPC 2025 programming competition.
    • OpenAI achieved gold, getting 12/12 correct (best human achieved 11/12). ChatGPT-5 was able to get 11/12 correct, and their experimental reasoning system was also able to get the last (most challenging) question correct.
    • Google Gemini 2.5 Deep Think achieved gold (10/12 correct).

Safety

Agents

Image Synthesis

World Synthesis

Cars

  • Waymo released some safety data (based 96M miles driven). The results are biased somewhat by the subset of regions/conditions that Waymo are allowed to drive (they claim that they account for that in their analysis). Nevertheless, the results are impressive, showing fewer crashes/injuries for Waymo driving compared to human.

Robots

  • Video, from Active Intelligent Systems (ACT) Lab at the Southern University of Science and Technology (SUSTech), shows a Unitree robot responding very nimbly to extreme perturbations. (Also, dancing.)
Posted in AI, News | Tagged , , , , , , | Leave a comment

AI News 2025-09-11

General

Research Insights

LLM

Image Synthesis

Video

World Synthesis

Posted in AI, News | Tagged , , , , | Leave a comment

AI News 2025-09-04

General

Research Insights

LLM

  • Apertus is a fully open LLM (weights), trained on open datasets and released open-source. Made in Switzerland.

Audio

Science

Robots

  • Impressive video demonstrating ability of a humanoid robot to play table tennis.
Posted in AI, News | Tagged , , , , | Leave a comment

AI News 2025-08-28

General

Research Insights

Image Synthesis

Audio

Video

World Synthesis

Science

Robots

Posted in AI, News | Tagged , , , , , | Leave a comment

AI News 2025-08-21

Research Insights

LLM

Audio

  • ElevenLabs releases video-to-music; it can generate soundtrack matched to the provided video.

Vision

Image Synthesis

  • A stealth/mystery model is being tested: nano-banana (speculation is that it is from Google). Early examples show it has startling ability to edit images based on natural language requests.

Video

  • Higgsfield product-to-video demonstrates ability to add objects into existing footage. This shows the increasingly powerful modality of genAI video editing.
  • Runway Act-Two updates to include changing voice performance alongside video generation.

World Synthesis

  • Runway ML announces Game Worlds. Turn-based text-adventure games with generated narrative and images.

Science

Hardware

  • Luci Pin aims to deliver an AI device that can see/hear your context (but not fail as the Humane AI Pin did).
  • Tai Necklace aims to deliver an AI device that looks like jewelry instead of a device.
Posted in AI, News | Tagged , , , , , , , | Leave a comment

AI News 2025-08-14

General

Research Insights

LLM

Image Synthesis

Video

  • Pika have developed a video generation model that uses an audio input for performance control.
  • SkyReels A3 is audio-conditioned video generation (examples).

World Synthesis

Science

Robots

Posted in AI, News | Tagged , , , , , , | Leave a comment

AI News 2025-08-07

Research Insights

  • New Anthropic research: Persona vectors: Monitoring and controlling character traits in language models. Many interesting results, including inducing a particular behavior by adjusting activations in a particular direction. Used at inference-time, this can induce or inhibit behavior, but at a cost in capability (as previously known). E.g. one can steer away from the “evil” direction, but one worsens model task performance. But interestingly, one can steer the model during training to prevent certain behaviors from ever being learned. Counter-intuitively, one actually steers towards an undesired behavior (e.g. in the evil direction) during training. This acts as a sort of inoculation, since the model doesn’t need to add the over-emphasized behavior to its learned weights; and at runtime (when the bias is no longer present) it snaps back to desired behavior (e.g. towards the good direction).
  • Forethought: How quick and big would a software intelligence explosion be?

LLM

  • Qwen adds: Qwen3-4B-Instruct-2507 and Qwen3-4B-Thinking-2507.
  • Anthropic release Claude Opus 4.1, improving reasoning and coding performance.
  • OpenAI announces the release of two open-weight reasoning models: gpt-oss-120b (for servers or high-end desktops) and gpt-oss-20b (for desktop/laptop). Local reasoning model (full access to chain-of-thought) that should be good for agentics (hf, github, test). Supposedly similar in capability to o4-mini.
  • OpenAI announces GPT-5.
    • Reasoning model that selects the right amount of compute. Multiple models behind the scenes: GPT5 (default), GPT5-mini, GPT5-nano, GPT5-Pro (for Pro tier only). Available in API.
    • It’s better. Strong performance across many metrics: 75% on SWE-bench, 84% MMMU, 100% AIME 2025. Better writing, better coding. Improved voice. Can see via video input.
    • Can now select among different “personalities”.
    • Trained (in part) by using o3 to generate teaching datasets.

Audio

Video

World Synthesis

Science

Robots

Posted in AI, News | Tagged , , , , , , | Leave a comment

AI News 2025-07-31

General

Research Insights

LLM

  • Google NotebookLM rolls out a new user interface, and adds video overviews.

Safety

Audio

  • Suno launches Suno Radio. Constant stream of community music.

Video

World Synthesis

Science

Hardware

Robots

Posted in AI, News | Tagged , , , , , , , , | Leave a comment

AI News 2025-07-24

General

Research Insights

LLM

Vision

Video

World Synthesis

Science

Robots

  • Video of Ubtech Walker S2. The robot can swap its own battery, allowing for high utilization.
Posted in AI, News | Tagged , , , , , , | Leave a comment

AI News 2025-07-17

General

Research Insights

  • Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety.
    • It is worth remembering that there will be tradeoffs between performance and monitorability. Various “thinking in latent space” approaches have demonstrated improved performance, but that obfuscates AI thinking (requiring imperfect mechanistic interpretability to then attempt to recover internal processes). We should be willing to give up some immediate performance gains in order to increase our ability to align (which will yield more long-term gains).

LLM

Agents

  • OpenAI launches Agents (video). It uses a combination of text-browsing agentics (like Deep Research) and visual-browsing (like Operator) to handle open-ended asynchronous tasks. It can access connectors (Google Drive, etc.), tools (image generation), and can generate (e.g.) slide decks. Achieves 42% on Humanity’s Last Exam and 27% on FrontierMath.

Audio

Video

  • Runway is starting to deploy Act-Two, an improved motion capture model that can transfer a video performance to an AI avatar based on a single input image.

Cars

Posted in AI, News | Tagged , , , , | Leave a comment