General
- Andrej Karpathy has a knack for distilling the trends in AI/ML:
- 2017-11: Software 2.0 (“Gradient descent can write code better than you. I’m sorry.”)
- 2022-10: Transformers as general-purpose differentiable computers (talk)
- 2023-09: LLM as kernel of a new Operating System (diagram/diagram, OS analogies)
- 2025-02: Vibe coding
- 2025-06: Software 3.0 (talk): “Prompts as Programs”. Software 1.0 is code; 2.0 is model weights; 3.0 is prompts.
- 2025-06: “Context Engineering” instead of “Prompt Engineering”
- Now (2025-06): Prediction of LLMs being scaled down into “cognitive cores”; small edge-optimized (on-device inference) LLMs that have minimal knowledge but maximized reasoning and tool-use abilities. Can rapidly iterate to retrieve required results and build answers.
- Epoch reports on improvements in context window length: LLMs now accept longer inputs, and the best models can use them more effectively.

Research Insights
- A Comment On “The Illusion of Thinking”: Reframing the Reasoning Cliff as an Agentic Gap.
- Response to Apple’s paper: The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity, which argues LLMs fail as complexity increases, demonstrating a lack of true reasoning.
- This new paper argues that what seems like a lack of reasoning is more like a lack of agentic ability (tool access, etc.).
- Towards Scalable Parameter Decomposition. They show a method to decompose models using parameters (rather than activations).
- Paper with technical details: Stochastic Parameter Decomposition (code)
- VLMs can think visually without generating pixels. Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens (paper, code).
- The Automated LLM Speedrunning Benchmark: Reproducing NanoGPT Improvements. The benchmark measures agentic abilities by asking the agent to improve the training speed for a small LLM (as a proxy for more general “AI recursive self-improvement”). Current agents do surprisingly badly (sub-human performance even with significant hints). Going forward, this eval (or variants thereof) should prove useful to measure “useful agentic” performance.
LLM
- Inception Labs launch Mercury, a diffusion LLM. The fast inference of diffusion architecture puts it in a new regime for speed-vs-performance.
Agents
- Anthropic tested Claude’s ability to operate a small business: Project Vend: Can Claude run a small shop? (And why does that matter?). Although surprisingly capable in certain ways, the agent overall lost money over time, and had a mini-identity-crisis for a day.
Safety
Image Synthesis
- Open source model for image editing: OmniGen2: Exploration to Advanced Multimodal Generation (preprint, code, demo).
- Qwen-VLo image generation (try). Conversational interface for image generation.
- ByteDance XVerse: Consistent Multi-Subject Control of Identity and Semantic Attributes via DiT Modulation (preprint).
Video
- Nano (Greyscale Labs) is a visual effects plugin that exploits ML depth-estimation to allow editing of volumetric haze.
- Nvidia: UniRelight: Learning Joint Decomposition and Synthesis for Video Relighting. Enables relighting of an existing image by estimate albedo.
- Alibaba: OmniAvatar: Efficient Audio-Driven Avatar Video Generation with Adaptive Body Animation. Expressive avatar control.
World Synthesis
- Mirage Research Preview: The World’s First AI-Native UGC Game Engine Powered by Real-Time World Model. We are getting closer to real-time generative gameplay.
Science
- Chai-2: Zero-shot antibody design in a 24-well plate.
- Microsoft reports on improved AI medical diagnostics: The Path to Medical Superintelligence.
- Ai2 introduces new benchmark: SciArena: A New Platform for Evaluating Foundation Models in Scientific Literature Tasks (cast votes, data, code). The top model is currently OpenAI o3.
- A foundation model to predict and capture human cognition.
Cars
- Tesla shows off a car self-driving (no occupants) from factory to customer’s address. No doubt the route was carefully selected and vetted. Nevertheless, it is impressive.
- Tesla launched a limited rollout of their full-self driving Robotaxi (with in-vehicle employee monitor, for now), in Texas.
Robots