General
- University of Michigan has a very AI-forward set of tools for their community.
- Time magazine has an article (rare for mainstream media) that talks about AGI as a real possibility: Silicon Valley Takes Artificial General Intelligence Seriously—Washington Must Too.
- Ethan Mollick posted: Thinking Like an AI. Although it doesn’t contain anything revelatory for those already deeply familiar with modern AI, it is a useful introduction to those who want to understand heuristically what LLMs can do.
- Miles Brundage has left OpenAI. He just published a personal blog post: Why I’m Leaving OpenAI and What I’m Doing Next.
- He’s leaving so that he can publish more and more openly, and work on general AI policy in the non-profit space.
- He says: “In short, neither OpenAI nor any other frontier lab is ready, and the world is also not ready. To be clear, I don’t think this is a controversial statement among OpenAI’s leadership, and notably, that’s a different question from whether the company and the world are on track to be ready at the relevant time…”
- “I think the upsides of AI are already big and could be dramatically bigger, as are the downsides.”
- “I think it’s likely that in the coming years (not decades), AI could enable sufficient economic growth that an early retirement at a high standard of living is easily achievable (assuming appropriate policies to ensure fair distribution of that bounty).”
- Transluce launches, as a non-profit AI research lab. To kick things off, they released some research:
- The US White House issues a memo: Memorandum on Advancing the United States’ Leadership in Artificial Intelligence; Harnessing Artificial Intelligence to Fulfill National Security Objectives; and Fostering the Safety, Security, and Trustworthiness of Artificial Intelligence.
Research Insights
- Looking Inward: Language Models Can Learn About Themselves by Introspection. They test whether LLMs can predict their own responses to questions, compared to another model trained only on the model’s outputs (not inner state). The ability of a model to predict outputs of its inner state can be interpreted as a weak sort of introspection.
- The previously-mentioned entropix method (initiated by xjdr) is gaining momentum. The basic idea is that instead of just considering the top-k tokens at each step, one looks at the entropy (and variance of entropy) across tokens to better select. (E.g. high uncertainty can be used to trigger deeper chain-of-thought consideration.) See here for more discussion. The latest is a flurry of posts suggesting that this method is significantly improving evals for a variety of open source models. Nothing is certain, since this is rapidly evolving (these volunteers have only been working on it for a couple weeks), and the possibility of honest self-deception is high. Still, this idea remains worth keeping an eye on.
- Automatically Interpreting Millions of Features in Large Language Models.
- TreeBoN: Enhancing Inference-Time Alignment with Speculative Tree-Search and Best-of-N Sampling.
- A Theoretical Understanding of Chain-of-Thought: Coherent Reasoning and Error-Aware Demonstration. LLMs can learn improved reasoning from “failure” chain-of-thought traces.
Safety/Policy
- Anthropic published a blog post: Sabotage evaluations for frontier models. They consider the different ways a model could try to interfere (steer humans, code sabotage, sandbagging, undermine oversight).
- OpenAI adds a chief economist to their staff.
LLM
- The OpenAI Chat Completion API now supports audio input (allowing one to skip a separate transcription step).
- Google’s Notebook LM has capture much attention, in part due to the useful “chat with my PDFs” feature, but mostly the cool “generate podcast” trick. You can now customize the podcast generation.
- MotherDuck have added a “prompt()” function to their SQL database, such that you can weave LLM calls into your SQL lookups.
- BlendSQL appears to be an open-source attempt to do something similar: combine LLM calls with SQL.
- Meta released Meta Spirit LM an open source multimodal language model that freely mixes text and speech.
- Anthropic announces a new Claude 3.5 Haiku model, as well as a new version of their excellent Claude 3.5 Sonnet model. This new model can “use a computer” (still experimental), available via API.
- Perplexity plans to release a reasoning mode, where it can agentically search and collate information.
Tools
- Perplexity announced a new feature for combined search over user-provided files and the web (video).
- Perplexity Pro Search now has a reasoning mode, for handling more complex queries.
Audio
- Elevenlabs adds Voice Design, allowing you to generate a new voice by text-prompting what it should sound like.
Image Synthesis
- Adobe shows off the ability to do 3D rotations on 2D vector graphic assets.
- Stability AI releases Stable Diffusion 3.5.
- Ideogram Canvas is a UI for generating images from existing image assets; e.g. region reprompting (tutorial video).
- OpenAI released a result on greatly accelerating image generation: Simplifying, stabilizing, and scaling continuous-time consistency models (preprint).
- Midjourney released their image editor, allowing genAI transformation of uploaded images/photos (examples from Grimes: 1, 2) and “retexturing” (effectively depth ControlNet).
Video
- Meta released MovieGenBench: benchmarks for video generation.
- Haiper AI release their 2.0 video model.
- Runway ML announces Act One, which allows performance transfer from a video onto a character (without motion capture).
- Genmo AI released an open-source video model, Mochi 1, that appears competitive (weights, try).
- Release of Open-Sora-Plan-v1.3.0 (example video).
- Current quality of video generations:
- Meta Movie Gen examples.
- Emotional range of Minimax.
- Car commercial: Bear.
- Diner conversation.
- Loved and Lost (a meditation on grief).
Science
- Meta has released a large dataset on inorganic materials: Open Materials 2024 (OMat24) Inorganic Materials Dataset and Models (code, datasets, checkpoints, blogpost).
- An AI model for cancer diagnosis shows promise (96% accuracy).
- FutureHouse have used their PaperQA2 tool to generate “Wikipedia-style” articles for all 19,255 human genes.
Hardware
- The US TSMC fab is doing well: TSMC’s Arizona Chip Production Yields Surpass Taiwan’s in Win for US Push.
Robots
- A video of Fourier’s GR-2 robot standing up.
- Video of Engine AI robot walking. As noted, the more upright (locked knees) gait is more energy-efficient, compared to the squatted (bended knee) walking of many other designs.
- Clone Robotics continue to pursue their micro-hydraulic bio-mechanical approach to robotics; they now have a torso.