AI News 2025-07-24

General

METR analyzes the rate of adoption of AI technologies: After the ChatGPT Moment: Measuring AI’s Adoption How quickly has AI been diffusing through the economy?
- In general, tech adoption is faster now than in the past. AI adoption is even faster than the modern typical trend
There are interesting parallels between human psychology and AI responses. New research: Call Me A Jerk: Persuading AI to Comply with Objectionable Requests. Shows that LLMs can be persuaded using methods similar to humans. This doesn’t mean LLM internal cognition is humanlike, but at least shows how LLMs are reproducing (modeling) human foibles.
OpenAI shares an update on the progress of compute buildout: Stargate advances with 4.5 GW partnership with Oracle. New data center capacity will power jobs, growth, and AI that benefits more people.
Anthropic releases a report: Build AI in America (blog summary).
The US White House releases: Winning the Race: America’s AI Action Plan.
Pew Research: Google users are less likely to click on links when an AI summary appears in the results. About 20% of search results now include an AI summary; and, not surprisingly, users are less likely to use the traditional search results when an AI answer is already provided.
Pew Research: 34% of U.S. adults have used ChatGPT, about double the share in 2023.

Research Insights

Reasoning-Finetuning Repurposes Latent Representations in Base Models (blog). The work shows that RL elicits behaviors, rather than training in new capabilities from scratch. They assess this based on using a steering vector in the base model to influence behavior of the subsequent RL model.
Learning from one and only one shot. A new “information lattice learning” approach shows good generalization (better than transformer, for certain visual tasks).
Subliminal Learning: Language models transmit behavioral traits via hidden signals in data (blog post). Remarkable result showing that an LLM can transmit subliminal information to another LLM (as long as it based on the same base model).

LLM

OpenAI and Google achieved gold in the difficult International Math Olympiad (IMO) competition.
- OpenAI gold was done with an LLM not using tools (theorem prover, etc.). It was a special unreleased LLM; but it was not optimized on math problems per se, but rather implementing some new kind of improved algorithm (presumably RL?). This level of capability is expected to be included in a future OpenAI model (but not GPT-5).
- Google Gemini report: Advanced version of Gemini with Deep Think officially achieves gold-medal standard at the International Mathematical Olympiad. Google’s AlphaProof achieved silver in 2024. The current result is more impressive since it was achieved by a general-purpose LLM.
- Surprisingly, there is also a report that the regular commercially-available Gemini 2.5 Pro can achieve gold in IMO, if prompted correctly: Gemini 2.5 Pro Capable of Winning Gold at IMO 2025.
- There is also a claim of using o4-mini-high to achieve IMO gold (using the Crux agent framework).
- Taken together, these results suggest that reasoning models are already at a very high level of math competence. Moreover, it adds some evidence that existing LLMs have many latent skills, that RL can unlock.
Context Rot: How Increasing Input Tokens Impacts LLM Performance.
The Kimi-K2 technical report provides useful details for others trying to implement advanced LLM training.
Qwen release Qwen3-Coder-480B-A35B-Instruct (code, weights), and an associated command-line coding agent (fork of Gemini Code).

Vision

Conversational image segmentation with Gemini 2.5. Very interesting demonstration of being able to specify/select/segment based on vague natural-language input requirements.

Video

Decart shows: MirageLSD: The First Live-Stream Diffusion AI Video Model. It can do realtime video-to-video (e.g. style transfer).

World Synthesis

Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models (preprint).

Science

Pioneering an AI clinical copilot with Penda Health: Study of 40,000 patient visits finds clinicians using AI copilot made fewer errors.
- Paper: AI-based Clinical Decision Support for Primary Care: A Real-World Study.

Robots

Video of Ubtech Walker S2. The robot can swap its own battery, allowing for high utilization.

S	M	T	W	T	F	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Leave a Reply