AI News 2025-02-13

General

Advancing Reasoning in Large Language Models: Promising Methods and Approaches.
Andrej Karpathy released a 3.5 hour YouTube video: Deep Dive into LLMs like ChatGPT. A Good introduction for someone who wants to start understanding the details behind chatbots (without dwelling on the specific architectural details).
Sam Altman blog post: Three Observations.
1. AI intelligence roughly scales as the logarithm of the resources used (especially input data, training computer, and inference compute).
2. Cost of a given AI capability-level drops 10× every 12 months.
3. The socioeconomic value of linearly increasing intelligence is super-exponential.
The Anthropic Economic Index analyzes LLM usage.
- Their first report: Which Economic Tasks are Performed with AI? Evidence from Millions of Claude Conversations.
On the Emergence of Thinking in LLMs I: Searching for the Right Intuition.
Sam Altman provided an update on future OpenAI plans.
- GPT-4.5 (internally called Orion) will be released soon, as the final non-reasoning model.
- GPT-5 will will released thereafter. It will be a meta-model, that correctly selects the right internal model/tools appropriate to the current request. Everyone (free, Plus, Pro) will have access to GPT-5, but the total amount of thinking/intelligence will be different in the different tiers (presumably this will be some combination of higher tiers favoring calling bigger models and using more inference-time compute).
- These simplifications will be true via web/ChatGPT and API.

Research Insights

Universal Sparse Autoencoders: Interpretable Cross-Model Concept Alignment. Contrastive learning (e.g. CLIP) showed a way to train in a multi-modal way; e.g. to align images and text into the same latent space. A more generalized version of this, which can find concept alignment across different deep neural networks, could be quite interesting and powerful. For instance, maybe a future version of this method could enable links between a non-textual foundation model (trained on unlabelled science data) with an LLM (which has internal concepts that capture the same ideas).
Looped Transformers are Better at Learning Learning Algorithms. Transformers are excellent general-purpose function approximators; however they are typically used in a single-pass mode without iteration. This paper shows an architecture where transformers are looped, allowing them to better reproduce the behavior of iterative algorithms.
A new approach for a reasoning model: Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach (code, model). Their current model (only 3.5B parameters) doesn’t exceed state-of-the-art reasoning models, but it shows promise (e.g. tests at larger scale).
Step Back to Leap Forward: Self-Backtracking for Boosting Reasoning of Language Models.
ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates.
Jeff Clune et al. release: Automated Capability Discovery via Foundation Model Self-Exploration (preprint, code). Models explore their own abilities, identifying capabilities and weaknesses.
Dan Hendrycks et al. release: Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs (paper, github). There are many interesting results. One is that stronger models (as measured by benchmark scores) exhibit progressively more coherent values, and their values become more entrenched and harder to change. From a safety perspective, one can interpret this in different ways. It seems dangerous that stronger/smarter models are more firm in their beliefs (less corrigible to human desires); but conversely a safe model should be consistent and unerring in its application of trained-in values. The overall notion that consistent values may be an emergent aspect of scaling up LLMs seems important.
Meta preprint: LLM Pretraining with Continuous Concepts. This adds to a growing body of work where LLM’s think in a latent space rather than in the output token stream. In this case, they modify the training task to capture the requirement that concepts should be encoded in the continuous internal representation.

LLM

OpenAI announce that o1 and o3-mini now have file and image upload capabilities.
Distillation Scaling Laws. Is it better to directly train a small model, or to train a larger model and distill that into a smaller model? The answer is complicated. Roughly, if on a tight compute budget, then directly training a small model may be better. However, if the cost of the big model is “free” (you want to have the big model for other purposes, etc.) then distillation of course can be efficient.

Safety & Security

Auditing Prompt Caching in Language Model APIs. They use the response speed to detect whether a given input has been previously cached. This allows one to detect whether someone else has already input that prompt, which thereby leaks information between users. This has a similar flavor to other attacks based on timing or energy use; a system leaks information when it implements internal efficiencies. Leakage can be stopped, but only by giving up the efficiency/speed gains.

Voice

Kyutai releases Hibiki, a speech-to-speech realtime translation model (code).
Zyphra releases an open-source text-to-speech (TTS) model with voice cloning: Zonos (hybrid, transformer).

Video

ByteDance releases Goku, a flow-based video generation model. Goku: Flow Based Video Generative Foundation Models (examples).

Science

Protein codes promote selective subcellular compartmentalization (preprint). They develop a protein language model that can predict where proteins will localize within a cell, based on specific sequences.
From Mechanistic Interpretability to Mechanistic Biology: Training, Evaluating, and Interpreting Sparse Autoencoders on Protein Language Models (visualizer, code, huggingface).
Google DeepMind shows yet more progress on math problems: Gold-medalist Performance in Solving Olympiad Geometry with AlphaGeometry2.

Hardware

Groq has secured $1.5B to expand AI inference infrastructure in Saudi Arabia.

Robots

Foundation Robotics announce the Phantom robot (a rebrand of the Alex robot, after their acquisition of Boardwalk Robotics). The design involves different designs for upper and lower body, that can be selected based on usage. They seem to be testing with customers.
Mentee Robotics shows their v3 design.
New UK-based robotics startup: Humanoid; their first design is HMND 01.
Training a robot to stand up efficiently and robustly: HoST: Learning Humanoid Standing-up Control Across Diverse Postures (video).

S	M	T	W	T	F	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28

Leave a Reply