General
- A study of LLM adoption: The Widespread Adoption of Large Language Model-Assisted Writing Across Society. They detect usage rates of 5-25% in public documents across a range of sectors.
- The US Department of Energy organized a “Jam Session” where 1,000 National Lab scientists tested frontier models from OpenAI and Anthropic.
- Project Alexandria: Towards Freeing Scientific Knowledge from Copyright Burdens via LLMs (project page). They argue that LLMs represent a technically feasible and legal means of freeing the vast knowledge currently stored in closed archives (protected by copyright law). They propose using LLMs to generate knowledge-units that capture the important facts and relations, while being sufficiently stylistically distinct.
- Anthropic raises $3.5B at $61.5B valuation.
- OpenAI post: How we think about safety and alignment.
Research Insights
- The First Few Tokens Are All You Need: An Efficient and Effective Unsupervised Prefix Fine-Tuning Method for Reasoning Models. They streamline reasoning training by only training on the first few tokens of traces, relying on longer-range consistency effects for this to be sufficient.
- Chain of Draft: Thinking Faster by Writing Less. They train the LLM to generate draft-like intermediate reasoning plans that are minimal but useful (similar to how a person might first sketch out an idea, before filling in the details). This yields good reasoning performance with fewer tokens.
- Infinite Retrieval: Attention Enhanced LLMs in Long-Context Processing. They exploit the LLM’s existing attentional mechanism for information retrieval (on texts of arbitrary length).
- LongRoPE2: Near-Lossless LLM Context Window Scaling (code).
- Atom of Thoughts for Markov LLM Test-Time Scaling (code). They describe a method that can be applied to any LLM, where reasoning processes are broken into separable steps, so that the outcome of each step can be compressed into an answer, after which intermediate states can be ignored. This allows more efficient reasoning (using fewer tokens) when solving complex problems.
- Meta/FAIR et al.: Improving the Scaling Laws of Synthetic Data with Deliberate Practice. By generating only the most challenging/necessary synthetic data examples, the overall process becomes more efficient.
LLM
- Google releases a new challenging benchmark for LLMs: BIG-Bench Extra Hard. The current leader on this measure is o3-mini-high, which gets a score of 45%.
- DeepSeek release Fire-Flyer File System (3FS), a parallel file system for efficient utilization of SSDs and RDMA networks.
- DeepSeek publish details on their inference system: DeepSeek-V3/R1 Inference System Overview.
- Thinking Slow, Fast: Scaling Inference Compute with Distilled Reasoners. They distill from Transfomers into more efficient Mamba models, showing that scaling these weaker reasoners (via inference compute) can yield a better use of a given inference compute budget.
- Claude Code is now available to all.
- Qwen releases their reasoning model: QwQ-32B (demo). They claim it is competitive with other open-source reasoning models (e.g. DeepSeek-R1 671B).
AI Agents
- Anthropic provide details on how they implement computer use: Monitoring computer use via hierarchical summarization.
- Agentic Deep Graph Reasoning Yields Self-Organizing Knowledge Networks. The LLM agent generates a knowledge graph by iteratively thinking about a particular topic. The graph exhibits structural features (small-world nodes/clusters with interconnections) suggesting meaningful knowledge structuring.
- Manus AI claim they have a truly general AI agent. It appears (video) to be a computer use agent.
Audio
- Sesame have a demo of a voice audio chatbot that is remarkably fast and natural-sounding (example). They claim that they will open-source soon.
- Podcastle (podcasting platform) introduces Asyncflow, a library of 450 AI voices.
Video
- Pika 2.2 unveiled. Higher resolution, 10s generations, ability to set keyframe anywhere.
- TheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem Understanding. An workflow that generates video and voiceover explainers (project page, examples).
- HunyuanVideo-I2V image-to-video model.
Science
- Microsoft introduces Dragon Copilot, which will assist with clinical workflows and paperwork.
- Google launches: Data Science Agent in Colab: The future of data analysis with Gemini.
Robots
- Figure announces that it is accelerating deployment plans, starting in-home alpha testing this year.
- UBTECH claims they are deploying swarm methods, where individual humanoid robots share knowledge and communicate to collaborate on problems (apparently being tested in Zeekr’s car factory).
- Dexmate introduce their semi-humanoid Vega.
- Proception are working on a humanoid, starting with the hand.