AI News 2024-08-29

Research Insights

  • LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs (code). Long-form text generation is an area where LLMs under-perform, though there have been prior efforts to scaffold LLMs into writing long-form text (Re3, English essays, journalism) or even full technical papers (science writing, Wikipedia, AI scientist). This latest preprint introduces a new benchmark and fine-tunes LLMs to extend the coherence length of output.
  • A promising approach to understanding foundation models is monosemanticity: the model’s internal representation is inscrutable, so instead one trains a sparse autoencoder (SAE) to project the internal representations into a higher-dimensional space. The high-D space allows disentangling/isolation of concepts while sparsity tries to enforce a legible number of concepts. In any case, it works (Anthropic, OpenAI), with meaningful (to human) categories naturally appearing in the SAE space.
    • Some researchers took this a step further: Showing SAE Latents Are Not Atomic Using Meta-SAEs. They essentially apply the SAE concept recursively, training another meta-SAE on the first layer. They show that concepts in the original SAE space can be decomposed into finer-grained concepts. More generally, this implies a viable approach to decompose concepts in a hierarchical, tree-like manner (dashboard to explore concept).

LLM

  • Anthropic:
  • Google:
    • Released three new Gemini models: updated Gemini 1.5 Pro and Gemini 1.5 Flash, and a very compact-but-capable Gemini 1.5 Flash 8B.
    • Google Advanced users can now make Gemini Gems (similar to custom GPTs).
  • Cursor AI is a VSCode style IDE with LLM assistance built-in (tab-completion, chatting, and directed in-code diffs/rewrites). Although it has been around for a while, it has recently gained increased attention, including a recommendation from Andrej Karpathy (who has long advocated for English being the programming language of the future). LLM integration into IDE does indeed further enhance the value, making it amazingly easy to generate and evaluate code.
    • Others note how combining it with voice input makes for a powerful interface.
    • Cursor have a blog post on how they accelerated LLMs to make this kind of interface fast and smooth.

AI Agents

  • Motleycrew (code) is a multi-agent framework for enable flexible interaction patterns.

Policy

Philosophy

Audio

Video

Vision

World Synthesis

  • Adding subsurface scattering to Gaussian Splatting (preprint). It’s amazing how quickly the various nuances of traditional vertex graphics are being added to the newer neural/Gaussian methods.
  • Google presents work on using diffusion models to simulate video games: Diffusion Models Are Real-Time Game Engines (example video, project page). They train a diffusion model to predict the next frame in the DOOM video game. Humans can barely tell the difference. Obviously it is computationally inefficient to simulate a game like Doom in this way, but it points towards a future where video games (and simulated worlds more broadly) are real-time rendered using neural/diffusion methods.

Hardware

Robots

This entry was posted in AI, News and tagged , , , , , . Bookmark the permalink.

Leave a Reply