Research Insights
- New Anthropic research: Persona vectors: Monitoring and controlling character traits in language models. Many interesting results, including inducing a particular behavior by adjusting activations in a particular direction. Used at inference-time, this can induce or inhibit behavior, but at a cost in capability (as previously known). E.g. one can steer away from the “evil” direction, but one worsens model task performance. But interestingly, one can steer the model during training to prevent certain behaviors from ever being learned. Counter-intuitively, one actually steers towards an undesired behavior (e.g. in the evil direction) during training. This acts as a sort of inoculation, since the model doesn’t need to add the over-emphasized behavior to its learned weights; and at runtime (when the bias is no longer present) it snaps back to desired behavior (e.g. towards the good direction).
- Forethought: How quick and big would a software intelligence explosion be?
LLM
- Qwen adds: Qwen3-4B-Instruct-2507 and Qwen3-4B-Thinking-2507.
- Anthropic release Claude Opus 4.1, improving reasoning and coding performance.
- OpenAI announces the release of two open-weight reasoning models: gpt-oss-120b (for servers or high-end desktops) and gpt-oss-20b (for desktop/laptop). Local reasoning model (full access to chain-of-thought) that should be good for agentics (hf, github, test). Supposedly similar in capability to o4-mini.
- OpenAI announces GPT-5.
- Reasoning model that selects the right amount of compute. Multiple models behind the scenes: GPT5 (default), GPT5-mini, GPT5-nano, GPT5-Pro (for Pro tier only). Available in API.
- It’s better. Strong performance across many metrics: 75% on SWE-bench, 84% MMMU, 100% AIME 2025. Better writing, better coding. Improved voice. Can see via video input.
- Can now select among different “personalities”.
- Trained (in part) by using o3 to generate teaching datasets.
Audio
- Elevenlabs introduces Eleven Music.
- Kitten TTS (github, hf) is just 15M parameters.
Video
- xAI/Grok add image and video capabilities, emphasizing fast generation.
World Synthesis
- Google unveils: Genie 3: A new frontier for world models. A world simulator that renders at ~20 fps, memory, and prompting-on-the-go. It implicitly handles real-world physics and interactions.
Science
- Autopoiesis claims that their AI co-scientist achieves 92.4% GPQA Diamond accuracy.
- Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving.
- Leap Laboratories claims their Discovery Engine is helping scientists make discoveries. They have announced three papers:
- Growth Cost and Transport Efficiency Tradeoffs Define Root System Optimization Across Varying Developmental Stages and Environments in Arabidopsis
- Automated Discovery of Patterns in T-Cell Receptor Physicochemical Signatures
- Explaining Surface Layer Theory Departures in Marine Flux Profiles with Data-Driven Discovery
Robots
- Unitree A2 Stellar Hunter. An iterative improvement on their already capable quadruped design.