General
- Anthropic CEO Dario Amodei has published an opinion piece about the future of AI: Machines of Loving Grace: How AI Could Transform the World for the Better. While acknowledging the real risks, the piece focuses on how AI could bring about significant benefits for humankind.
- Max Tegmark uses this as an opportunity to offer a rebuttal to the underlying thesis of “rapidly developing strong AI is a net good”: The AGI Entente Delusion. He views a competitive race to AGI as a suicide race, since efforts to align AI are lagging our ability to improve capabilities. He proposes a focus on Tool AI (instead of generalized AI), so that we can reap some of the benefits of advanced AI, with fewer of the alignment/control problems. This view focuses on government regulation proportionate to capability/risk. So, in principle, if companies could demonstrate sufficiently controllable AGI, then it could meet safety standards and deployed/sold.
- (Nuclear) Energy for AI:
- The US Department of Energy is committing $900M to build and deploy next-generation nuclear technology (including small reactors).
- Google announced it will work with Kairos Power to use small nuclear reactors to power future data centers.
- Amazon is investing $500M in small modular reactors, to expand genAI.
- A group (Crusoe, Blue Owl Capital, and Primary Digital Infrastructure) announced $3.4B joint venture to build a 200 MW datacenter (~100k B200 GPUs) in Texas. Initial customers will be Oracle and OpenAI.
- The growing commitments to build-out power for datacenters makes it increasingly plausible that AI training will reach 1029 FLOPS by 2030 (10,000× today’s training runs).
- Here is an interesting comment by gwern on Lesswrong (via this), that explains why it is so hard to find applications for AI, and why the gains have been so small (relative to the potential):
If you’re struggling to find tasks for “artificial intelligence too cheap to meter,” perhaps the real issue is identifying tasks for intelligence in general. …significant reorganization of your life and workflows may be necessary before any form of intelligence becomes beneficial.
…organizations are often structured to resist improvements. …
… We have few “AI-shaped holes” of significant value because we’ve designed systems to mitigate the absence of AI. If there were organizations with natural LLM-shaped gaps that AI could fill to massively boost output, they would have been replaced long ago by ones adapted to human capabilities, since humans were the only option available.
If this concept is still unclear, try an experiment: act as your own remote worker. Send yourself emails with tasks, and respond as if you have amnesia, avoiding actions a remote worker couldn’t perform, like directly editing files on your computer. … If you discover that you can’t effectively utilize a hired human intelligence, this sheds light on your difficulties with AI. Conversely, if you do find valuable tasks, you now have a clear set of projects to explore with AI services.
Research Insights
- There is growing evidence of LLMs building a meaningful world model, and exhibiting reasoning capabilities that scale with inference-time computer. However, some are still concerned that LLMs are merely stochastic parrots, doing advanced pattern matching that lacks true generalization. A new paper from Apple analyzes reasoning: GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models. Their results suggest LLM reasoning is quite sensitive to setup, and thus fragile.
- Is In-Context Learning Sufficient for Instruction Following in LLMs? (code) They study scaling of inference-time-compute (in-context learning) vs. post-training (fine-tuning the model). In the small-sample regime, the scaling laws are similar.
- Paper from Google DeepMind: Astute RAG: Overcoming Imperfect Retrieval Augmentation and Knowledge Conflicts for Large Language Models. Resolves conflicts between internal LLM knowledge and retrieved information. Among other things, this prevents deterioration of base LLM responses. Overall RAG performance also increases, as the system iteratively refines the retrieval.
- The Walnut Plan group is attempting to generate an open-source analog to some of the iterative-reasoning of OpenAI’s o1 model. O1 Replication Journey: A Strategic Progress Report – Part 1 (code). They describe “journey learning” wherein search explores trial-and-error, correction, backtracking, and reflection. The idea is that the model should learn to reproduce not just the shortcut to the right answer, but the reasoning process that led to it.
- Efficient Dictionary Learning with Switch Sparse Autoencoders (code). Switch SAE is a method to more efficiently train the interpretable SAE representation.
- Thinking LLMs: General Instruction Following with Thought Generation. Tunes a trained LLM to exhibit internal thinking during answer generation (similar to the presumed OpenAI o1 approach).
- Tackling the Abstraction and Reasoning Corpus with Vision Transformers: the Importance of 2D Representation, Positions, and Objects. By altering the tokenization to explicitly track 2D positions in images, the visual transformer achieves better representation. In particular, this may be an important ingredient for improved capability on visual reasoning.
- LLMs are Turing complete: Autoregressive Large Language Models are Computationally Universal.
Safety
- Anthropic posted some updates to their Responsible Scaling Policy.
- OpenAI posted: Evaluating fairness in ChatGPT (paper: First-Person Fairness in Chatbots). They find (for instance) that the user’s name influences outputs, following stereotypes learned from the training data.
LLM
- Mistral released Ministral-8B-Instruct-2410.
- Nvidia released Llama-3.1-Nemotron-70B-Reward. Early reports show it beating GPT-4o and Claude 3.5 Sonnet, only losing to o1. You can try it here.
- OpenAI released an early version of a Windows desktop app.
AI Agents
- Optima: Optimizing Effectiveness and Efficiency for LLM-Based Multi-Agent System (project, code).
- OpenAI releases Swarm, a simple framework for making and orchestrating multi-agent systems (cookbook with examples).
- Model Swarms: Collaborative Search to Adapt LLM Experts via Swarm Intelligence. Collaborative search of different “experts” leads to improved outputs. Uses evolution in weight-space to drive behavior.
Audio
- F5-TTS is a featureful CC-BY licensed text-to-speech system (demo). See also E5/F5-TTS (examples).
- Suno adds image or video prompting; e.g. creating songs based on your photos.
- Adobe has reportedly presented Presto, a fast text-to-music system.
Image Synthesis
- Abode presented Project Perfect Blend, which adds tools to Photoshop for “harmonizing” assets into a single composite. E.g. it can relight subjects and environments to match.
Vision
Video
- Luma AI released v1.1.0 of their DreamMachine API (which now includes callbacks).
- TANGO: Co-Speech Gesture Video Reenactment with Hierarchical Audio-Motion Embedding and Diffusion Interpolation (preprint, hf). Realistic avatars with body motion, good emotional affect, etc.
- Adobe reveals Generative Extend, which can generate a continuation to existing footage.
- Pika added some new effects: Crumble, Dissolve, Deflate and Ta-Da.
- Current quality of video generations:
- Building the Pyramids.
- People showing realistic emotion (using Hailuo AI); relevant to: Can we Distinguish Human from AI?
- Keyframes and Luma AI to make novel speed-ramp motion.
World Synthesis
- There is ongoing work to use AI to simulate video games. This is partly as a path to neural video games, but also a means to test AI’s ability to intuit proper world models. Previously we saw how a diffusion model could simulate Doom, or a video model could reproduce Super Mario Bros. Now, a part of Counter-Strike has been reproduced in a neural network, as a playable system (code).
- Text-to-motion: DART A Diffusion-Based Autoregressive Motion Model for Real-Time Text-Driven Motion Control.
Science
- BrainLM: A foundation model for brain activity recordings. 111M parameter model trained on 6,700 hours of fMRI data, can predict brain networks.
Cars
- At Tesla’s “We, Robot” event, they showed the design for their future autonomous vehicles: Cybercab and Robovan. The designs are futuristic.
Robots
- At Tesla’s “We, Robot” event, they demoed some Optimus robots walking among the crowd, talking to people, serving drinks, handing out goodies. Unfortunately it was not made clear whether these were autonomous or teleoperated. Tesla eventually admitted the robots were “human assisted”; likely this means voice and complex arm-tasks were teleoperated, while standing/walking was autonomous.
- Robot Era shared a video of their STAR1 humanoid running in the Gobi Desert, at ~4 m/s or ~8 mph (previously: walking on Great Wall of China).
- LimX Dynamics is now taking preorders for the bipedal (+wheeled) TRON1 ($15k). Lacking arms, this seems suitable for monitoring.