Kevin G. Yager | Academic Summary

Anthropic raises $13B at valuation of $183B.
General Social Agents. The paper shows that modern LLMs can be used to simulate social science experiments (c.f. other examples).
Generative AI as Seniority-Biased Technological Change: Evidence from U.S. Résumé and Job Posting Data. The study finds that AI is leading to senior-biased effects; where they actualize gains at the expense of juniors (who see fewer hiring opportunities).

Research Insights

BED-LLM: Intelligent Information Gathering with LLMs and Bayesian Experimental Design. Apple and Oxford demonstrate improvements in problem-solving by teaching AI agents to ask high-quality questions.
How large language models encode theory-of-mind: a study on sparse parameter patterns. ToM is fragile in the sense that perturbing a small subset (0.001%) of weights degrades it.

LLM

Apertus is a fully open LLM (weights), trained on open datasets and released open-source. Made in Switzerland.

Audio

ElevenLabs introduces an improved sound effects model.

Science

Optical generative models. They implement diffusion-based image generation in an optical computing scheme.
OpenAI announces “OpenAI for Science”. Details forthcoming, but it appears to be an attempt to demonstrate the utility of existing models for science.
Continued work by Meta AI on mapping brain activation using AI methods: Disentangling the Factors of Convergence between Brains and Computer Vision Models.

Robots

Impressive video demonstrating ability of a humanoid robot to play table tennis.

Posted in AI, News | Tagged audio, LLM, research, robots, Science | Leave a comment

AI News 2025-08-28

Posted on 2025-08-28 by KevinYager

General

How much energy does Google’s AI use? We did the math. Energy use has dropped ~33× in the last ~12 months.
- Paper: Measuring the environmental impact of delivering AI at Google Scale.
Interesting experiment: You are the assistant now. A model fine-tuned to play the role of the user, forcing you, the human, to act as the LLM.
aiXiv: A Next-Generation Open Access Ecosystem for Scientific Discovery Generated by AI Scientists (code). They aim to create a repository for AI-generated papers.

Research Insights

Deep Think with Confidence (paper). Using parallel sampling and confidence estimation to improve reasoning traces (c.f. entropix).
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs. Updates to memory and retrieval policy allow it to improve without changing the underlying LLM weights.
Beyond Turing: Memory-Amortized Inference as a Foundation for Cognitive Computation. Activation patterns can be saved, to provide a form of learning/adaptation.
A Taxonomy of Transcendence. (C.f. other papers reporting on models sometimes showing capabilities that exceed that of their training data.)

Image Synthesis

Google announce Gemini 2.5 Flash Image (aka nano-banana), a remarkable model that can coherently edit images via conversational input.

Audio

Microsoft released VibeVoice (1.5B) text-to-speech (MIT Licensed).

Video

ByteDance Waver 1.0 (github).
- Paper: Waver: Wave Your Way to Lifelike Video Generation.
Release of Wan2.2-S2V (14B): Audio-Driven Cinematic Video Generation.
Krea have a realtime generative video model. Can do realtime video-to-video.

World Synthesis

Mirage 2: The Next Leap in Generative World Engines.

Science

Intern-S1: A Scientific Multimodal Foundation Model (hf).

Robots

Nvidia releases Jetson AGX Thor modules; GPUs ($3,500) optimized for (humanoid) robots.

Posted in AI, News | Tagged image synthesis, research, robots, Science, video, world synthesis | Leave a comment

AI News 2025-08-21

Posted on 2025-08-21 by KevinYager

Research Insights

Reinforcement Learning with Rubric Anchors (model).

LLM

Google releases new ultra-small open-source model: Gemma 3 270M.
Prophet Arena aims to test how AI can predict the future.
Deepseek v3.1.

Audio

ElevenLabs releases video-to-music; it can generate soundtrack matched to the provided video.

Vision

Towards generalizable and interpretable three-dimensional tracking with inverse neural rendering. They recast the task of 3D tracking to instead be fitting a neural rendering to the vision data. This leverages the availability of compute and modern model capabilities. It also shows AI/ML approaches increasingly converging on the predictive coding motif used in biological brains.

Image Synthesis

A stealth/mystery model is being tested: nano-banana (speculation is that it is from Google). Early examples show it has startling ability to edit images based on natural language requests.

Video

Higgsfield product-to-video demonstrates ability to add objects into existing footage. This shows the increasingly powerful modality of genAI video editing.
Runway Act-Two updates to include changing voice performance alongside video generation.

World Synthesis

Runway ML announces Game Worlds. Turn-based text-adventure games with generated narrative and images.

Science

Hardware

Luci Pin aims to deliver an AI device that can see/hear your context (but not fail as the Humane AI Pin did).
Tai Necklace aims to deliver an AI device that looks like jewelry instead of a device.

Posted in AI, News | Tagged audio, hardware, image synthesis, LLM, research, Science, video, vision | Leave a comment

AI News 2025-08-14

Posted on 2025-08-14 by KevinYager

General

Apple release Embedding Atlas, for visualizing text data (demo).
Anthropic is facilitating access to their models for federal workers.

Research Insights

Interesting approach to evaluate one aspect of LLM knowledge: query it repeatedly over longitude/latitude values so that it builds up a map of the Earth. How Does A Blind Model See The Earth? A tiny LLM eval with pretty pictures.

LLM

OpenAI’s unreleased model that previously scored Gold at IMO and second-place in AtCoder, has now also achieved Gold at IOI.

Image Synthesis

Google releases Imagen 4.

Video

Pika have developed a video generation model that uses an audio input for performance control.
SkyReels A3 is audio-conditioned video generation (examples).

World Synthesis

Tencent have developed: Yan – Foundational Interactive Video Generation. It seems not as coherent as Genie 3, but shows that many groups are making progress in this area.
Skywork announces Matrix-Game 2.0: An Open-Source, Real-Time, and Streaming Interactive World Model.

Science

Google: How AI is helping advance the science of bioacoustics to save endangered species.
Meta: TRIBE: TRImodal Brain Encoder for whole-brain fMRI response prediction. Advanced brain modeling, won 1st place at Algonauts 2025 brain modeling competition.
Compositional Flows for 3D Molecule and Synthesis Pathway Co-design.
- Writeup: A new AI tool designs medical drugs and tells scientists how to make them.
A personal health large language model for sleep and fitness coaching.

Robots

Fourier GR-3 is a humanoid intended for human care.
Figure humanoid (using Helix model) can fold laundry.

Posted in AI, News | Tagged image synthesis, LLM, research, robots, Science, video, world synthesis | Leave a comment

AI News 2025-08-07

Posted on 2025-08-07 by KevinYager

Research Insights

New Anthropic research: Persona vectors: Monitoring and controlling character traits in language models. Many interesting results, including inducing a particular behavior by adjusting activations in a particular direction. Used at inference-time, this can induce or inhibit behavior, but at a cost in capability (as previously known). E.g. one can steer away from the “evil” direction, but one worsens model task performance. But interestingly, one can steer the model during training to prevent certain behaviors from ever being learned. Counter-intuitively, one actually steers towards an undesired behavior (e.g. in the evil direction) during training. This acts as a sort of inoculation, since the model doesn’t need to add the over-emphasized behavior to its learned weights; and at runtime (when the bias is no longer present) it snaps back to desired behavior (e.g. towards the good direction).
Forethought: How quick and big would a software intelligence explosion be?

LLM

Qwen adds: Qwen3-4B-Instruct-2507 and Qwen3-4B-Thinking-2507.
Anthropic release Claude Opus 4.1, improving reasoning and coding performance.
OpenAI announces the release of two open-weight reasoning models: gpt-oss-120b (for servers or high-end desktops) and gpt-oss-20b (for desktop/laptop). Local reasoning model (full access to chain-of-thought) that should be good for agentics (hf, github, test). Supposedly similar in capability to o4-mini.
OpenAI announces GPT-5.
- Reasoning model that selects the right amount of compute. Multiple models behind the scenes: GPT5 (default), GPT5-mini, GPT5-nano, GPT5-Pro (for Pro tier only). Available in API.
- It’s better. Strong performance across many metrics: 75% on SWE-bench, 84% MMMU, 100% AIME 2025. Better writing, better coding. Improved voice. Can see via video input.
- Can now select among different “personalities”.
- Trained (in part) by using o3 to generate teaching datasets.

Audio

Elevenlabs introduces Eleven Music.
Kitten TTS (github, hf) is just 15M parameters.

Video

xAI/Grok add image and video capabilities, emphasizing fast generation.

World Synthesis

Google unveils: Genie 3: A new frontier for world models. A world simulator that renders at ~20 fps, memory, and prompting-on-the-go. It implicitly handles real-world physics and interactions.
- Machine Learning Street Talk video

Science

Autopoiesis claims that their AI co-scientist achieves 92.4% GPQA Diamond accuracy.
Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving.
Leap Laboratories claims their Discovery Engine is helping scientists make discoveries. They have announced three papers:

Robots

Unitree A2 Stellar Hunter. An iterative improvement on their already capable quadruped design.

Posted in AI, News | Tagged audio, LLM, research, robots, Science, video, world synthesis | Leave a comment

AI News 2025-07-31

Posted on 2025-07-31 by KevinYager

General

DOE Announces Site Selection for AI Data Center and Energy Infrastructure Development on Federal Lands. Four sites were selected: Idaho National Laboratory, Oak Ridge Reservation, Paducah Gaseous Diffusion Plant, and Savannah River Site.
OpenAI is building a datacenter in Europe: Introducing Stargate Norway.

Research Insights

Hierarchical Reasoning Model. Reportedly good performance on ARC-AGI 1 &2, using just 27M params.
Google paper provides insight into how LLMs can do in-context adaptation: Learning without training: The implicit dynamics of in-context learning. They find that the transformer architecture leads to each token generating a “temporary patch” that steers the model, as if it were fine-tuned on the context data.
New paper with a bold claim: AlphaGo Moment for Model Architecture Discovery. They claim that searches through different AI/Ml architectures can consistently yield discoveries/improvements, including surprises humans might not have considered.

LLM

Google NotebookLM rolls out a new user interface, and adds video overviews.

Safety

Frontier AI Risk Management Framework in Practice: A Risk Analysis Technical Report.

Audio

Suno launches Suno Radio. Constant stream of community music.

Video

Runway introduce Aleph, a chat-based contextual video transforming system (example uses).
Luma introduce Modify with Instructions, chat-based contextual video transformation. (Comparison of Runway Aleph and Luma Modify.)
Memories.ai introduces a Large Visual Memory Model. The claim is that it enables reasoning over video with nearly unlimited context length. This is accomplished by generating concise semantic codes for each frame, allowing search over long videos.
Wan 2.2 released, an open-source MoE video generation model (github, huggingface).

World Synthesis

Hunyuan announce Hunyuan3D World Model 1.0 (github, huggingface). Currently supports movement within a small circle from the initial viewpoint. But helps point the way towards more general-purpose sims.

Science

Hardware

Google announce: SensorLM: Learning the language of wearable sensors.

Robots

Unitree R1 humanoid now available (6k$).
Chinese home appliance maker Haier launches a humanoid robot.
LimX have a new design: Oli (22k$).

Posted in AI, News | Tagged audio, hardware, LLM, research, robots, safety, Science, video, world synthesis | Leave a comment

AI News 2025-07-24

Posted on 2025-07-24 by KevinYager

General

METR analyzes the rate of adoption of AI technologies: After the ChatGPT Moment: Measuring AI’s Adoption How quickly has AI been diffusing through the economy?
- In general, tech adoption is faster now than in the past. AI adoption is even faster than the modern typical trend
There are interesting parallels between human psychology and AI responses. New research: Call Me A Jerk: Persuading AI to Comply with Objectionable Requests. Shows that LLMs can be persuaded using methods similar to humans. This doesn’t mean LLM internal cognition is humanlike, but at least shows how LLMs are reproducing (modeling) human foibles.
OpenAI shares an update on the progress of compute buildout: Stargate advances with 4.5 GW partnership with Oracle. New data center capacity will power jobs, growth, and AI that benefits more people.
Anthropic releases a report: Build AI in America (blog summary).
The US White House releases: Winning the Race: America’s AI Action Plan.
Pew Research: Google users are less likely to click on links when an AI summary appears in the results. About 20% of search results now include an AI summary; and, not surprisingly, users are less likely to use the traditional search results when an AI answer is already provided.
Pew Research: 34% of U.S. adults have used ChatGPT, about double the share in 2023.

Research Insights

Reasoning-Finetuning Repurposes Latent Representations in Base Models (blog). The work shows that RL elicits behaviors, rather than training in new capabilities from scratch. They assess this based on using a steering vector in the base model to influence behavior of the subsequent RL model.
Learning from one and only one shot. A new “information lattice learning” approach shows good generalization (better than transformer, for certain visual tasks).
Subliminal Learning: Language models transmit behavioral traits via hidden signals in data (blog post). Remarkable result showing that an LLM can transmit subliminal information to another LLM (as long as it based on the same base model).

LLM

OpenAI and Google achieved gold in the difficult International Math Olympiad (IMO) competition.
- OpenAI gold was done with an LLM not using tools (theorem prover, etc.). It was a special unreleased LLM; but it was not optimized on math problems per se, but rather implementing some new kind of improved algorithm (presumably RL?). This level of capability is expected to be included in a future OpenAI model (but not GPT-5).
- Google Gemini report: Advanced version of Gemini with Deep Think officially achieves gold-medal standard at the International Mathematical Olympiad. Google’s AlphaProof achieved silver in 2024. The current result is more impressive since it was achieved by a general-purpose LLM.
- Surprisingly, there is also a report that the regular commercially-available Gemini 2.5 Pro can achieve gold in IMO, if prompted correctly: Gemini 2.5 Pro Capable of Winning Gold at IMO 2025.
- There is also a claim of using o4-mini-high to achieve IMO gold (using the Crux agent framework).
- Taken together, these results suggest that reasoning models are already at a very high level of math competence. Moreover, it adds some evidence that existing LLMs have many latent skills, that RL can unlock.
Context Rot: How Increasing Input Tokens Impacts LLM Performance.
The Kimi-K2 technical report provides useful details for others trying to implement advanced LLM training.
Qwen release Qwen3-Coder-480B-A35B-Instruct (code, weights), and an associated command-line coding agent (fork of Gemini Code).

Vision

Conversational image segmentation with Gemini 2.5. Very interesting demonstration of being able to specify/select/segment based on vague natural-language input requirements.

Video

Decart shows: MirageLSD: The First Live-Stream Diffusion AI Video Model. It can do realtime video-to-video (e.g. style transfer).

World Synthesis

Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models (preprint).

Science

Pioneering an AI clinical copilot with Penda Health: Study of 40,000 patient visits finds clinicians using AI copilot made fewer errors.
- Paper: AI-based Clinical Decision Support for Primary Care: A Real-World Study.

Robots

Video of Ubtech Walker S2. The robot can swap its own battery, allowing for high utilization.

Posted in AI, News | Tagged LLM, research, robots, Science, video, vision, world synthesis | Leave a comment

AI News 2025-07-17

Posted on 2025-07-17 by KevinYager

General

Gwern essay: LLM Daydreaming: Proposal & discussion of how default mode networks for LLMs are an example of missing capabilities for search and novelty in contemporary AI systems.
METR analysis: Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity (blog, commentary/analysis). Their results show that use of AI tools by programmers decreased productivity. This is surprising to the large number of programmers who use LLMs to improve their work. One caveat is that this was analyzing productivity gains for developers working on their own codebases (whereas the biggest gains for LLM usage comes from exploring unfamiliar topics/code). One might also wonder how the results change for free-form use instead of as prescribed in this study. Nevertheless, this does weaken the argument for AI already delivering productivity gains.
METR is showing results for a wider range of tasks (this appears to be an update to their earlier report from May 2025). The landmark METR result showed that AI execution of software engineering tasks (as measured by “task time” for a human to complete) was increasing exponentially. Now they add a variety of other tasks to the evaluation. Each tasks shows some form of exponential scaling, though the magnitude and scaling differ.

Research Insights

Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety.
- It is worth remembering that there will be tradeoffs between performance and monitorability. Various “thinking in latent space” approaches have demonstrated improved performance, but that obfuscates AI thinking (requiring imperfect mechanistic interpretability to then attempt to recover internal processes). We should be willing to give up some immediate performance gains in order to increase our ability to align (which will yield more long-term gains).

LLM

Kimi AI releases Kimi-K2, an open-source (permissive license) 1T parameter model (MoE, 32B active). It performs extremely well across multiple benchmarks, especially coding, despite being a non-reasoning model (try it, API, code, weights).
Mirix proposes a more advanced structure for save-and-retrieve memory: MIRIX: Multi-Agent Memory System for LLM-Based Agents.

Agents

OpenAI launches Agents (video). It uses a combination of text-browsing agentics (like Deep Research) and visual-browsing (like Operator) to handle open-ended asynchronous tasks. It can access connectors (Google Drive, etc.), tools (image generation), and can generate (e.g.) slide decks. Achieves 42% on Humanity’s Last Exam and 27% on FrontierMath.

Audio

Thinksound can add audio to a video (examples). Similar to the previously-reported MMAudio (examples).
Mistral releases Voxtral, an open-source audio model.

Video

Runway is starting to deploy Act-Two, an improved motion capture model that can transfer a video performance to an AI avatar based on a single input image.

Cars

Lucid, Nuro, and Uber are partnering to offer robotaxis.

Posted in AI, News | Tagged audio, cars, LLM, research, video | Leave a comment

AI News 2025-07-10

Posted on 2025-07-10 by KevinYager

General

Harvard Business Review: What Gets Measured, AI Will Automate.
OpenAI: Working with 400,000 teachers to shape the future of AI in schools. OpenAI joins the American Federation of Teachers to launch the National Academy for AI Instruction.

Research Insights

Predicting thinking time in Reasoning models. By predicting estimate token needs during CoT reasoning, feedback to user (e.g. progress bar) could be provided. Models internally have state that is predictive of further token usage.
Strategic Intelligence in Large Language Models: Evidence from evolutionary Game Theory. LLMs can coherently succeed at iterative games. Models from different vendors have identifiable personalities in how they approach games.
How We Replicated Five Peer-Reviewed Papers in Five Hours.
- Prior work: The Discovery Engine.
- Preprint: Benchmarking the Discovery Engine.

LLM

gremllm is clever and/or diabolical. It is a Python library that generates on-the-fly the attributes and methods of a Python object. Thus, one need not actually define the methods for a new class; simply allow the LLM to hallucinate them when they are called for.
- Although this sounds silly and dangerous, there are viable use-cases. In March 2023 (site and code no longer online), there was some exploration of “imaginary programming” wherein one would define a function’s requirements but never actually code the function (the LLM would instead stand-in for the function at call time).
xAI release Grok 4 (and Grok 4 Heavy). Benchmarks are strong, taking the lead on several, including 100% on AIME, 44% on Humanity’s Last Exam, and 16% on ARC-AGI-2 (c.f. 9% Claude Opus 4). If real-world utility matches benchmarks, then Grok 4 may take the lead as the best model.

Safety

Anthropic: Why Do Some Language Models Fake Alignment While Others Don’t? (code).

World Synthesis

Odyssey is again teasing their “interactive video” system (precursor to generative playable games).

Science

AI4Research: A Survey of Artificial Intelligence for Scientific Research.

Robots

Huggingface announces: Reachy Mini – The Open-Source Robot for Today’s and Tomorrow’s AI Builders. Appears to be optimized for education and hobbyist hacking.

Posted in AI, News | Tagged LLM, research, robots, safety, Science, world synthesis | Leave a comment