Nvidia announce: RLP: Reinforcement as a Pretraining Objective (paper). They apply RL in the pre-training phase (instead of only post‑training), treating chain-of-thought as actions which can be rewarded by information gain.
OpenAI announceSora 2 (system card). More realistic, includes sound, ability to add a specific person to a scene, multiple aesthetics. The app is iOS only (for now) and emphasizes social aspects (friend invites, etc.).
OpenAI achieved gold, getting 12/12 correct (best human achieved 11/12). ChatGPT-5 was able to get 11/12 correct, and their experimental reasoning system was also able to get the last (most challenging) question correct.
Google Gemini 2.5 Deep Think achieved gold (10/12 correct).
Google DeepMind: Virtual Agent Economies. Sandboxes economies could be used to allow agents to cooperate and compete, e.g. negotiating for access to resources.
Waymo released some safety data (based 96M miles driven). The results are biased somewhat by the subset of regions/conditions that Waymo are allowed to drive (they claim that they account for that in their analysis). Nevertheless, the results are impressive, showing fewer crashes/injuries for Waymo driving compared to human.
Robots
Video, from Active Intelligent Systems (ACT) Lab at the Southern University of Science and Technology (SUSTech), shows a Unitree robot responding very nimbly to extreme perturbations. (Also, dancing.)
A stealth/mystery model is being tested: nano-banana (speculation is that it is from Google). Early examples show it has startling ability to edit images based on natural language requests.
Video
Higgsfield product-to-video demonstrates ability to add objects into existing footage. This shows the increasingly powerful modality of genAI video editing.
Runway Act-Two updates to include changing voice performance alongside video generation.
World Synthesis
Runway ML announcesGame Worlds. Turn-based text-adventure games with generated narrative and images.
New Anthropic research: Persona vectors: Monitoring and controlling character traits in language models. Many interesting results, including inducing a particular behavior by adjusting activations in a particular direction. Used at inference-time, this can induce or inhibit behavior, but at a cost in capability (as previously known). E.g. one can steer away from the “evil” direction, but one worsens model task performance. But interestingly, one can steer the model during training to prevent certain behaviors from ever being learned. Counter-intuitively, one actually steers towards an undesired behavior (e.g. in the evil direction) during training. This acts as a sort of inoculation, since the model doesn’t need to add the over-emphasized behavior to its learned weights; and at runtime (when the bias is no longer present) it snaps back to desired behavior (e.g. towards the good direction).
OpenAI announces the release of two open-weight reasoning models: gpt-oss-120b (for servers or high-end desktops) and gpt-oss-20b (for desktop/laptop). Local reasoning model (full access to chain-of-thought) that should be good for agentics (hf, github, test). Supposedly similar in capability to o4-mini.
Reasoning model that selects the right amount of compute. Multiple models behind the scenes: GPT5 (default), GPT5-mini, GPT5-nano, GPT5-Pro (for Pro tier only). Available in API.
It’s better. Strong performance across many metrics: 75% on SWE-bench, 84% MMMU, 100% AIME 2025. Better writing, better coding. Improved voice. Can see via video input.
Can now select among different “personalities”.
Trained (in part) by using o3 to generate teaching datasets.
Google paper provides insight into how LLMs can do in-context adaptation: Learning without training: The implicit dynamics of in-context learning. They find that the transformer architecture leads to each token generating a “temporary patch” that steers the model, as if it were fine-tuned on the context data.
New paper with a bold claim: AlphaGo Moment for Model Architecture Discovery. They claim that searches through different AI/Ml architectures can consistently yield discoveries/improvements, including surprises humans might not have considered.
LLM
Google NotebookLM rolls out a new user interface, and adds video overviews.
Memories.aiintroduces a Large Visual Memory Model. The claim is that it enables reasoning over video with nearly unlimited context length. This is accomplished by generating concise semantic codes for each frame, allowing search over long videos.
Hunyuan announceHunyuan3D World Model 1.0 (github, huggingface). Currently supports movement within a small circle from the initial viewpoint. But helps point the way towards more general-purpose sims.