General
- Elon Musk’s xAI raising up to $6 billion to purchase 100,000 Nvidia chips for Memphis data center. This is in addition to their existing 100,000 H100 GPU cluster (~100 exaflops FP16). If these are B100 GPUs, that would increase total compute to ~274 exaflops.
- A US government commission released a report; among other things, it calls for a Manhattan-Project style AI initiative. (C.f. Leopold Aschenbrenner‘s Situational Awareness.)
- Max Tegmark offers a rebuttal to this report: AGI Manhattan Project Proposal is Scientific Fraud. He contends that the report-writers misrepresent the scientific consensus, in that they seem to report that AGI will be easily controlled.
Research Insights
- The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use.
- LLaVA-o1: Let Vision Language Models Reason Step-by-Step (code).
LLM
- New study: AI-generated poetry is indistinguishable from human-written poetry and is rated more favorably. At least part of the effect may come from non-experts judging the simpler and more conventional AI poems as being more understandable and superior (and thus human), while the complexity and inconsistency of human-generated poetry is perceived as incoherence.
- Nevertheless, this again shows that for short-form generation, AI has already reached human-level, and can be considered super-human in certain narrow ways.
- Mistral releases a new large model (Mistral-Large-Instruct-2411, 123B) and Pixtral Large multimodal model (weights).
- DeepSeek announces DeepSeek-R1-Lite-Preview. This is a “reasoning” model (inference-time chain-of-thought) that seems to be similar to OpenAI’s o1. Like o1, it achieves impressive results on math and science benchmarks. Some of the CoT reasoning traces are quite interesting (e.g.). The weights are not yet available, but they claim they will release it open-source.
- Also interesting to consider the rate of progress. A couple years ago, the prediction was we might reach 46% in the MATH benchmark by 2025. Instead, we now have a general LLM getting 92%. And o1 has also scored 97% on a challenging math exam (with novel questions that are nowhere in the training data).
AI Agents
- Stripe adds mechanisms for AI agents to trigger payments.
- Generative Agent Simulations of 1,000 People (code). They interview humans, using those to define the set of AI agents.
- Builds on their prior work: 2023-10: Generative Agents: Interactive Simulacra of Human Behavior.
- AWS releases a multi-agent orchestrator framework.
- Paper: Agent-as-a-Judge: Evaluate Agents with Agents. Argues for using evaluation agents in workflows.
- Automated-AI-Web-Researcher-Ollama. Code for using local LLMs to automated online research.
- Someone is trying to use a team of AI agents to write a full book autonomously. Different agents are responsible for different characters, or different aspects of writing (consistency, researching facts, etc.).
Image Synthesis
- A recent survey of 11,000 people has completed: How Did You Do On The AI Art Turing Test? The median score (to differentiate AI and human art) was 60%, a bit above chance. AI art was often preferred by humans. Overall, AI art has already crossing a Turing-Test threshold.
Audio
- Suno releases their v4 music generator.
- ElevenLabs now offers ability to build conversational AI agents.
Video
- Pickle AI is offering a virtual avatar for your meetings ($30/month). You still attend the meeting, and talk when you want. But your avatar pretends to pay attention, and lip-syncs your speech. So this is an alternative to having your camera turned off.
- Runway releases some small updates, including longer (20s) video-to-video, vertical aspect ratio for Act-One, and more camera controls.
- Current quality of video generations:
- Coca-Cola holiday ad (c.f. McDonald’s commercial, Aug 2024), and parody thereof.
- A Dream Within A Dream (by PZF, selected for the Czech International AI Film Festival).
- Making Friends (by Everett World; see also Childhood Dream and City Echoes).
- Anime: test shots, Ultimate Ceremony, Echoes of Love.
- Echoes of Grace (KakuDrop using Sora).
Science
- Sequence modeling and design from molecular to genome scale with Evo. A 7B genomic multi-modal foundation model trained on 2.7 million genomes. It can interpret DNA, RNA, and protein sequences; and can predict across molecular, system, and genomic scales. Can be used to predict effect of mutations, design CRISPR systems, etc.
Hardware
- Google has a history of using deep reinforcement learning for automated chip design. This work has been met with some skepticism. Google has now published a rebuttal, claiming that the era of AI chip design is well upon us: That Chip Has Sailed: A Critique of Unfounded Skepticism Around AI for Chip Design.
- April 2020 blog post: Chip Design with Deep Reinforcement Learning.
- June 2021 paper: A graph placement methodology for fast chip design.
- Sept 2023 blog post: How AlphaChip transformed computer chip design.
- August 2024 preprint: ShortCircuit: AlphaZero-Driven Circuit Design (code).