General
- Detailed introduction (200 page ebook): Foundations of Large Language Models.
- Inference Magazine is a new publication on AI progress. Many interesting articles. For instance:
- OpenAI has announced (with the White House) a partnership called The Stargate Project. A consortium will invest $500 billion ($100 billion immediately) to build AI infrastructure in the United States.
- Google agrees to new $1 billion investment in Anthropic. This adds to Google’s existing $2B investment (through which it owns 10% of Anthropic), and expands a cloud contract. This appears to be in addition to Anthropic’s ongoing effort to raise another $2B (at $60B valuation).
Research Insights
- The Hyperfitting Phenomenon: Sharpening and Stabilizing LLMs for Open-Ended Text Generation. They report an counter-intuitive result wherein intentionally over-fitting a trained LLM on a small set of samples yields improvements on long-generation tasks (rather than the kind of low-performance (e.g. repetition) one typically associates with over-fitting.
- Some say that this result is obvious, in that the optimization signal (loss, perplexity, etc.) is just a proxy for the actual desired performance (token accuracy).
- Do generative video models learn physical principles from watching videos? (project, code) They find some aspects of physics are not learned, and that strong visual fidelity is not a guarantee that underlying physics are learned.
- Google: Learn-by-interact: A Data-Centric Framework for Self-Adaptive Agents in Realistic Environments.
- Physics of Skill Learning. The authors try to provide intuition about the learning process, using a succession of heuristics with different levels of detail.
LLM
- OpenAI has finished safety testing of o3-mini, and is preparing to release it in the coming weeks. o3-mini is reportedly worse than o1-PRO, but much faster.
- Deepwriter AI claims their system has written an entire 203 page without human involvement. Generation involved 1,100 API calls to Gemini Flash-Exp 2.0, and took ~4 hours.
- The book: The SaaS Crucible: Strategic Warfare for Underdog SaaS Startups.
- DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning.
- They present two models: DeepSeek-R1-Zero and DeepSeek-R1; the former trained using reinforcement learning, the latter improving on this using additional data. They claim performance competitive with o1-mini or even o1.
- They also released 6 distilled models (based on Llama or Qwen).
- Available via Ollama.
- Kimi release a similar report on the power of RL for improving reasoning in LLMs: Kimi k1.5: Scaling Reinforcement Learning with LLMs.
- DeepLearning.ai have released a course on how to use Anthropic’s Computer Use mode.
- OpenAI announce Operator (launch video), a computer-use agent that can conduct tasks in a virtualized web browser instance.
- Anthropic adds a “Citations”, a RAG implementation available through the API.
Safety
- OpenAI: Trading Inference-Time Compute for Adversarial Robustness (full paper). The results suggest that inference-time compute can be used to improve safety (guardrails, alignment, etc.). This makes sense, given that inference-compute increases capabilities, and alignment can be viewed as a particular kind of capability (desired response).
Image Synthesis
- Runway ML releases access to Frames, an image model.
- Google DeepMind reports: Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps (preprint). The take-home-message is that inference-time scaling improves image synthesis is a reliable way, similar to how it improves text-generation (e.g. reasoning). They apply a search process to find noise that yields a better generation.
Video
- Example of using Hunyuan vid2vid to replace an actor in a scene.
- Netflix releases: Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise. A video model that allows controllable animations.
- Hailuo “Subject Reference” is enabling consistent characters in video generations (examples).
- Video Depth Anything: Consistent Depth Estimation for Super-Long Videos.
Audio
- Bland AI (now bland.com) is running a publicity stunt where you can call their AI on your phone, and after 10-60 seconds of talking, it will clone your voice and start talking to you in your own voice. Intentionally unnerving, and a good reminder that we must now be skeptical of suspicious phone calls (even if they sound like loved ones), and for banks to stop using voice-print as a security factor.
Science
- Published: Simulating 500 million years of evolution with a language model. (This was previously released as a preprint.) The ESM3 foundation model is trained on sequence, structure, and function of proteins. You can (e.g.) input a desired function and it will generate a candidate protein.
- OpenAI has created an AI model for longevity science. More specifically, GPT-4b micro was trained to predict variants of protein factors with increased/controlled function. Since this model is not yet broadly available, we can’t estimate the utility. But it reinforces the notion that there is still plenty of opportunity space for tuned/task-specific advances wherever we have data and compute.
Robots
- A video of a nimble wheeled quadruped built by DEEP Robotics.