AI News 2025-08-07

Research Insights

  • New Anthropic research: Persona vectors: Monitoring and controlling character traits in language models. Many interesting results, including inducing a particular behavior by adjusting activations in a particular direction. Used at inference-time, this can induce or inhibit behavior, but at a cost in capability (as previously known). E.g. one can steer away from the “evil” direction, but one worsens model task performance. But interestingly, one can steer the model during training to prevent certain behaviors from ever being learned. Counter-intuitively, one actually steers towards an undesired behavior (e.g. in the evil direction) during training. This acts as a sort of inoculation, since the model doesn’t need to add the over-emphasized behavior to its learned weights; and at runtime (when the bias is no longer present) it snaps back to desired behavior (e.g. towards the good direction).
  • Forethought: How quick and big would a software intelligence explosion be?

LLM

  • Qwen adds: Qwen3-4B-Instruct-2507 and Qwen3-4B-Thinking-2507.
  • Anthropic release Claude Opus 4.1, improving reasoning and coding performance.
  • OpenAI announces the release of two open-weight reasoning models: gpt-oss-120b (for servers or high-end desktops) and gpt-oss-20b (for desktop/laptop). Local reasoning model (full access to chain-of-thought) that should be good for agentics (hf, github, test). Supposedly similar in capability to o4-mini.
  • OpenAI announces GPT-5.
    • Reasoning model that selects the right amount of compute. Multiple models behind the scenes: GPT5 (default), GPT5-mini, GPT5-nano, GPT5-Pro (for Pro tier only). Available in API.
    • It’s better. Strong performance across many metrics: 75% on SWE-bench, 84% MMMU, 100% AIME 2025. Better writing, better coding. Improved voice. Can see via video input.
    • Can now select among different “personalities”.
    • Trained (in part) by using o3 to generate teaching datasets.

Audio

Video

World Synthesis

Science

Robots

Posted in AI, News | Tagged , , , , , , | Leave a comment

AI News 2025-07-31

General

Research Insights

LLM

  • Google NotebookLM rolls out a new user interface, and adds video overviews.

Safety

Audio

  • Suno launches Suno Radio. Constant stream of community music.

Video

World Synthesis

Science

Hardware

Robots

Posted in AI, News | Tagged , , , , , , , , | Leave a comment

AI News 2025-07-24

General

Research Insights

LLM

Vision

Video

World Synthesis

Science

Robots

  • Video of Ubtech Walker S2. The robot can swap its own battery, allowing for high utilization.
Posted in AI, News | Tagged , , , , , , | Leave a comment

AI News 2025-07-17

General

Research Insights

  • Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety.
    • It is worth remembering that there will be tradeoffs between performance and monitorability. Various “thinking in latent space” approaches have demonstrated improved performance, but that obfuscates AI thinking (requiring imperfect mechanistic interpretability to then attempt to recover internal processes). We should be willing to give up some immediate performance gains in order to increase our ability to align (which will yield more long-term gains).

LLM

Agents

  • OpenAI launches Agents (video). It uses a combination of text-browsing agentics (like Deep Research) and visual-browsing (like Operator) to handle open-ended asynchronous tasks. It can access connectors (Google Drive, etc.), tools (image generation), and can generate (e.g.) slide decks. Achieves 42% on Humanity’s Last Exam and 27% on FrontierMath.

Audio

Video

  • Runway is starting to deploy Act-Two, an improved motion capture model that can transfer a video performance to an AI avatar based on a single input image.

Cars

Posted in AI, News | Tagged , , , , | Leave a comment

AI News 2025-07-10

General

Research Insights

LLM

  • gremllm is clever and/or diabolical. It is a Python library that generates on-the-fly the attributes and methods of a Python object. Thus, one need not actually define the methods for a new class; simply allow the LLM to hallucinate them when they are called for.
    • Although this sounds silly and dangerous, there are viable use-cases. In March 2023 (site and code no longer online), there was some exploration of “imaginary programming” wherein one would define a function’s requirements but never actually code the function (the LLM would instead stand-in for the function at call time).
  • xAI release Grok 4 (and Grok 4 Heavy). Benchmarks are strong, taking the lead on several, including 100% on AIME, 44% on Humanity’s Last Exam, and 16% on ARC-AGI-2 (c.f. 9% Claude Opus 4). If real-world utility matches benchmarks, then Grok 4 may take the lead as the best model.

Safety

World Synthesis

  • Odyssey is again teasing their “interactive video” system (precursor to generative playable games).

Science

Robots

Posted in AI, News | Tagged , , , , , | Leave a comment

AI News 2025-07-03

General

Research Insights

LLM

Agents

Safety

Image Synthesis

Video

World Synthesis

Science

Cars

  • Tesla shows off a car self-driving (no occupants) from factory to customer’s address. No doubt the route was carefully selected and vetted. Nevertheless, it is impressive.
  • Tesla launched a limited rollout of their full-self driving Robotaxi (with in-vehicle employee monitor, for now), in Texas.

Robots

  • K-Scale announces that you can now order one of their open-source humanoid robots (9k$ early adopter price; 16k$ nominal price).
Posted in AI, News | Tagged , , , , , , , , , | Leave a comment

AI News 2025-06-26

General

Research Insights

  • Dense SAE Latents Are Features, Not Bugs. They find pairs of opposing features that fire very frequently. Far from being useless, they find these encode meaningful concepts.
  • Sakana AI: Reinforcement Learning Teachers of Test Time Scaling (preprint). Rather than using to RL to improve solution, they focus on improving the ability for the model to teach other models. RL reward is based on how useful generated training examples are (to smaller learning models); rather than being rewarded on correctness of their final answer.

LLM

Agents

Audio

  • ElevenLabs introduces 11ai, a voice conversational assistant; exploits MCP to enable connection to resources (calendar, etc.).
  • ElevenLabs introduces Voice Design v3, an improvement to their text-to-voice system for designing a voice.

Image Synthesis

Video

World Synthesis

Science

  • Google DeepMind releases AlphaGenome (including API capabilities); it takes base-pair sequences as input, and predicts genomic behavior outputs.

Robots

Posted in AI, News | Tagged , , , , , , , , | Leave a comment

AI News 2025-06-19

General

Research Insights

LLM

Agents

Safety

Video

Science

Hardware

  • AMD unveiled its new MI350 chip, optimized for AI workloads. They are focusing on open/compliant coding standards, and energy/cost efficiency.

Cars

Robots

Posted in AI, News | Tagged , , , , , , , , | Leave a comment

AI News 2025-06-12

General

Research Insights

LLM

  • Anthropic adds Claude Gov; models intended for national security.
  • Mistral announces Magistral, a reasoning model. Two variants: 24B open-source or a larger enterprise version via API.
    • An interesting result from the report (section 7.2: Eating the multimodal free lunch): They base model is multi-modal, but RL is done using text only. Yet, they observe this text-only training does not harm multi-modal performance; in fact multi-modal performance improves. This suggests modalities are well-entangled and that transfer learning between modalities is naturally occurring.
  • OpenAI announced the released of o3-pro (release notes).

Vision

  • Meta announce V-JEPA 2 (paper) a vision model that builds a world model, and could be useful for robotic control.

Audio

World Synthesis

Science

Cars

  • Tesla has provided some updated details on their current “full self-driving” (FSD) implementation. Some claims: 3.5B miles driven by FSD across 6 million vehicles, 54% safer than human.

Robots

Posted in AI, News | Tagged , , , , , , , , | Leave a comment

AI News 2025-06-05

General

Research Insights

LLM

  • LisanBench (github) is a new benchmark that evaluates long-term task coherence (“stamina”) through a game where the LLM must progressively alter a word (one character at a time), always yielding a valid English word, to build the longest possible chain. Although highly contrived, this does seem to test longer-range planning. The results conform to vibes about model intelligence.
  • Anthropic has launched Claude Explains, a blog of AI generated posts (with human verification). The focus (currently) appears to be teaching simple coding concepts.
  • OpenAI announces updates to ChatGPT for business.
    • Deep research can now search across defined private data repositories (Sharepoint, Google Drive, Dropbox, etc.).
    • Chat queries and data analysis requests can draw directly from connected data sources.
    • ChatGPT now supports custom connectors, based on MCP.
    • Being deployed for Teams, Enterprise, and Edu.
    • Record mode transcribes meetings, providing a summary document with pointers to the transcript/timecode.
  • Google updated Gemini 2.5 Pro.

Agents

Safety

  • Yoshua Bengio launches LawZero, a non-profit dedicated to advancing safe-by-design AI.

Audio

  • Elevenlabs introduces a multi-modal assistant, that can handle mixture between voice and text input (at the same time; not requiring toggling between modes). It does seem like a productive way to interact with an AI.
  • Play AI is open-sourcing PlayDiffusion (demo) a diffusion-LLM for speech, which allows for inpainting (example).
  • Bland announces an improvement in their text-to-speech model, with cloning of voice, accent, style, etc. They claim it is finally past the uncanny valley.

Image Synthesis

Video

  • AMC is integrating Runway ML genAI into its workflows (mostly for ideation, pre-vis, and promotional materials).
  • Luma introduces Modify Video, allowing style transfer or video-generation conditioned on an input video.

Science

Posted in AI, News | Tagged , , , , , , , | Leave a comment