AI News 2025-01-16

General

Research Insights

Safety

LLM

AI Agents

Audio

Image Synthesis

Video

Science

Robots

  • Latest video of Unitree’s humanoid robot shows a more humanlike gait, and navigating more rugged terrain.
Posted in AI, News | Tagged , , , , , | Leave a comment

AI News 2025-01-09

General

Research Insights

  • PRIME: Process Reinforcement Through Implicit Rewards (data/models, code)
    • Builds on prior work: Free Process Rewards without Process Labels.
    • The basic idea is: chain-of-thought (CoT) is a useful way to improve reasoning. But how to train better CoT? You can give scores to good vs. bad chains, but then the model only gets whole-chain feedback. It would be better to know where the reasoning chain went wrong (or right). In PRIME, alongside training the LLM, they train an LLM that acts as a per-token reward model. It learns what CoT-steps are looking good vs. bad, and so can provide more fine-grained direction control.
  • Differential Transformer. Explanation: The traditional transformer architecture spreads attention and can thus get distracted by noise (especially with large context). The differential architecture alters the attention equation so as to better amplify relevant context and suppress noise. This should improve retrieval and reduce hallucinations, especially for large contexts.
  • Metadata Conditioning Accelerates Language Model Pre-training. Pre-pending training data with meta-data (e.g. “from wikipedia.org”), for part of the training, allows more control. Training can be more data-efficient, and inference can be more steerable (by invoking a meta-data field associated with the desired output style).

LLM

AI Agents

Video

  • Fine-tuning of video models to a particular style is now starting. Examples of Hunyuan Video LoRAs.
  • Nvidia’s new GeForce RTX 5090 graphics card can use neural rendering for real-time ray-tracing (where only ~10% of pixels are computed using traditional ray-tracing, and a neural model is used to interpolate from that).

World Synthesis

  • Nvidia present Cosmos, a set of foundation models trained on 20 million hours of video. Intended to accelerate training (e.g. via synthetic data generation) of models for robotics, autonomous driving, industrial settings, etc.

Science

Brain

Hardware

  • Nvidia described their BG200 NVL72 rack-sized supercomputer: 72 Blackwell GPUs, 1.4 exaFLOPS of compute, and 130 trillion transistors. For fun, Jensen Huang showed what the corresponding compute would look like if all placed on a single wafer as a superchip, though that is not how it is actually manufactured or used.
  • Nvidia announces $3,000 personal AI supercomputer called Digits, which uses a GB10 superchip. A single unit can run a 200B model; linking two should allow one to run 405B models.

Robots

Posted in AI, News | Tagged , , , , , , , , | Leave a comment

AI New 2025-01-02

General

Research Insights

  • An interesting effect: fine-tuning GPT-4o on responses where the first letter of each line spells out H-E-L-L-O leads to a model that can correctly explain this underlying rule (even though the rule was never provided to it). This is surprising since when generating a reply, a token-wise prediction cannot “see ahead” and know that it will spell out HELLO; yet the LLM is somehow able to predict its own behavior, suggesting it has some knowledge of its own internal state.
    • Further testing with the pattern HELOL gave far worse results, implying strong reliance on the existence of the HELLO pattern in the training data.
  • Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs. The authors analyze whether we are efficiently using inference-time compute, and propose mitigate strategies to avoid overthinking.

AI Agents

  • Huggingface introduce smolagents, a lightweight framework for agents.
  • Agentarium is a Python framework for orchestrating agents.
  • Eliza is a framework for AI models to access resources (documents, Discord, Twitter, etc.).

Audio

3D

  • zoo.dev is developing workflows for CAD where one can switch between generative and traditional-edit modes.

Science

Robots

Posted in AI, News | Tagged , , , , , | Leave a comment

AI News 2024-12-26

General

Research Insights

LLM

  • OpenAI reveal a new reasoning model: o3. It scores higher on math and coding benchmarks, including setting a new record of 87.5% on ARC-AGI Semi-Private Evaluation. This suggests that the model is exhibiting new kinds of generalization and adaptability.
    • The ARC-AGI result becomes even more impressive when one realizes that the prompt they used was incredibly simple. It does not seem that they prompt engineered, nor used a bespoke workflow for this benchmark (the ARC-AGI public training set was included in o3 training). Moreover, some of the failures involve ambiguities; even when it fails, the solutions it outputs are not far off. While humans still out-perform AI on this benchmark (by design), we are approaching the situation where the problem is not depth-of-search, but rather imperfect mimicking of human priors.
    • The success of o3 suggests that inference-time scaling has plenty of capacity; and that we are not yet hitting a wall in terms of improving capabilities.
  • More research as part of the trend of improving LLMs with more internal compute, rather than external/token-level compute (c.f. Meta and Microsoft research):
  • Qwen released: QvQ-72B-preview visual reasoning model.
  • DeepSeek release DeepSeek-V3-Base (weights), 671B params. This is noteworthy as a very large open-source model, noteworthy for achieving competitive to state-of-the-art performance, and noteworthy for having (supposedly) required relatively little compute (15T tokens, 2.788M GPU-hours on H800, only $5.5M).

Safety

Video

Audio

  • Adobe Sketch2Sound allows one to imitate sound effects, and use AI to convert it into appropriate sounds. This allows art direction for Foley sound.
  • MMAudio enables video-to-audio; i.e. it can add a soundtrack to silent video (project, code, examples: 1, 2).

World Synthesis

Science

Hardware

  • Nvidia unveils a small form-factor compute platform (suitable for robotics).
  • Raven Resonance is another attempt to deliver augmented reality glasses.

Robots

Posted in AI, News | Tagged , , , , , , , , | Leave a comment

AI News 2024-12-19

General

  • Ilya Sutskever was co-recipient of the test-of-time award at NeurIPS 2024, for the 2014 paper: Sequence to Sequence Learning with Neural Networks, currently cited >28,000 times. Video of his speech here, in which he makes many provocative points: compute is growing but data is not (we only have one Internet, data is the fossil fuel of AI); scaling still matters, and we must determine what to scale; what comes next will be a mix of agents, synthetic data, and inference-time computer; strongly reasoning systems will be unpredictable; superintelligence is coming.
  • Anthropic present Clio, a system that provides an aggregated view of what people are using Claude to do. So this allows one to observe trends in AI usage. Paper: Clio: Privacy-Preserving Insights into Real-World AI Use.

OpenAI

Research Insights

LLM

  • Microsoft releases a small-but-capable model: Phi-4 (14B). It heavily uses synthetic data generation and post-training to improve performance (including on reasoning tasks).
  • Google’s Project Mariner, a chrome extension for agentic AI.
  • Google release Gemini 2.0 Flash Thinking, a reasoning model (available in AI studio).

Safety

  • Anthropic releases a new method to jailbreak AI models, using an automated attack method. By identifying this vulnerability, one can build future models to resist it. Paper: Best-of-N Jailbreaking (code). The method iteratively makes small changes to prompts, attempting to slide through countermeasures.
    • The flavor of successful attacks also gives insights into LLMs. Successful prompts may involve strange misspellings or capitalizations; or unusual images with text and colored boxes arranged peculiarly. This is similar to other adversarial attacks (e.g. on image classification models). They have a certain similarity to human optical illusions: generating perverse arrangements meant to trick otherwise useful processing circuits. Improved model training can progressively patch these avenues; but it’s hard to imagine models that completely eliminate them until one achieves truly robust intelligence.
  • Anthropic publish: Alignment Faking in Large Language Models. They find evidence for alignment faking, wherein the model selectively complies with an objective in training, in order to prevent modification of its behavior after training. Of course the setup elicited this behavior, but it is surprising in the sense that LLMs don’t have persistent memory/awareness, and troubling in the sense that this shows even LLMs can engage in somewhat sophisticated scheming (e.g. they have evidence for these decisions going on during the LLM forward-pass, not in chain-of-thought).

Video

Audio

  • ElevanLabs introduce a Flash TTS model, with latency of just 75 milliseconds.

World Synthesis

Science

Brain

Posted in AI, News | Tagged , , , , , , , , | Leave a comment

AI News 2024-12-12

OpenAI

  • Dec 5: o1 is out of preview. The updated o1 is faster (uses fewer tokens) while improving performance. And they have introduced a “Pro” version of o1 (thinks for even longer).
    • Here’s an example from a biomedical professor about o1-pro coming up with a legitimately useful and novel research idea.
  • Dec 5: There is now a ChatGPT Pro tier, $200/month for unlimited access to all the best models (including o1 Pro).
  • Dec 6: Reinforcement Fine-Tuning Research Program. Selected orgs will be able to RL OpenAI models for specific tasks. This is reportedly much more sample-efficient and effective than traditional fine-tuning. It will be reserved for challenging engineering/research tasks.
  • Dec 9: Sora officially released (examples).
  • Dec 10: Canvas has been improved and made available to all users.
  • Dec 11: ChatGPT integration into Apple products.
  • Dec 12: ChatGPT can pretend to be Santa.

Google

Research Insights

  • Google DeepMind: Mastering Board Games by External and Internal Planning with Language Models. Search-based planning is used to help LLMs play games. They investigate both externalized search (MCTS) and internalized (CoT). The systems can achieve high levels of play. Of course the point is not to be better than a more specialized/dedicated neural net trained on that game; but to show how search can unlock reasoning modalities in LLMs.
  • Training Large Language Models to Reason in a Continuous Latent Space. Introduces Chain of Continuous Thought (COCONUT), wherein you directly feed the last hidden state as the input embedding for the next token. So instead of converting to human-readable tokens, the state loops internally, providing a continuous thought.
  • New preprint considers how “capability density” is increasing over time: Densing Law of LLMs. They find that, for a given task, every 3 months the model size needed to accomplish it is halved. This shows that hardware scaling is not the only thing leading to consistent improvements.

LLM

  • Meta released Llama 3.3 70B, which achieves similar performance to Llama 3.1 405B. Meta also announced plans for a 2GW datacenter in Louisiana, for future open-source Llama releases.
  • Ruliad introduces Deepthought 8B (demo), which claims good reasoning for the model size.
  • Stephen Wolfram released a post about a new Notebook Assistant that integrates into Wolfram Notebooks. Wolfram describes this as a natural-language interface to a “computational language”.
  • GitIngest is a tool to “turn codebases into prompt-friendly text”. It will take a github repository, and turn it into a text document for easy inclusion into LLM context.
  • While we haven’t seen a “new class of model” (bigger/better than GPT4) in quite a while, it’s worth remembering the substantial improvements we’ve seen from perfecting the existing systems (from Epoch AI benchmarks). On Ph.D.-level Q&A, over the last year we’ve gone from no-better-than-random to roughly human-expert:

AI Agents

Audio

  • ElevenLabs added GenFM to their web product: you can now generate AI podcasts, and listeners can tune in on the ElevenReader app.

Image Synthesis

Vision

3D

Science

Posted in AI, News | Tagged , , , , , , | Leave a comment

AI News 2024-12-05

General

  • The End of Productivity: Why creativity is the new currency of success. The essay argues that focus on pure productivity (and metrics) misses the things that humans value most. And that, potentially, the era of AI will actually shift in an emphasis from human productivity to human creativity being the focus of value.
  • An interesting experiment (assuming it’s true): an AI jailbreaking contest. An AI agent was tasked with not approving an outgoing money transfer. Anyone can spend a small amount of money to send the AI a message. The money is added to the pool, and the cost-per-message increases slightly. It started at $10/message, and quickly grew to $450/message with a prize-pool of $50k. At that point, someone tricked the AI by sending a message that explained an inverted meaning of approveTransfer. So, they won the money.
    • This acts as the usual reminder that modern LLMs are not robust against dedicated attackers that seek to trick them and extract information.
  • Reportedly: Elon Musk lands priority for Nvidia GB200 delivery in January with US$1.08 billion. Paying a premium to get earlier access to next-gen chips may well be a good strategy.
  • An interesting blog post by Lilian Weng: Reward Hacking in Reinforcement Learning. Some notes about modern RLHF applied to LLMs (based on this paper):
    • RLHF increases human approval, but not necessarily correctness.
    • RLHF weakens humans’ ability to evaluate: The error rate of human evaluation is higher after RLHF training.
    • RLHF makes incorrect outputs more convincing to humans. The evaluation false positive rate significantly increases after RLHF training.
  • Andrej Karpathy provides an interesting historical look at how the transformer architecture was invented (c.f. Attention Is All you Need.)
  • A critical analysis of “openness” in AI: Why ‘open’ AI systems are actually closed, and why this matters. They note that the current version of “open” does not preclude concentration of power.

Research Insights

LLM

  • Amazon enters the fight with Nova (docs, benchmarks). Although not leading on benchmarks, they promise good performance-per-dollar; will be available on Amazon Bedrock.

AI Agents

Audio

  • Hume adds a voice creation mode where one can adjust intuitive sliders to pick out the desired voice.
  • ElevenLabs previously announced intentions to build a conversational AI platform. This capability is now launching; they claim it their interface makes it extremely easy to build a conversational voice bot, and allows you to select the LLM that is called behind-the-scenes.

Video

  • Google et al. show off: Generative Omnimatte: Learning to Decompose Video into Layers (preprint). It can separate a video into distinct layers, including associating affects (e.g. shadows) with the correct layer (parent object), and inpainting missing portions (e.g. occluded background). Obvious utility for visual effects work: can be used to make a particular person/object invisible (including their shadows), to apply edits to just one component (object or background), etc.
  • Invideo are demoing a system where a single prompt generates an entire video sequence telling a story (example). I think that creators generally want more granular control of output so they can put together a precise narrative. But there are use-cases where this kind of fully automated generation may make sense.
    • It’s easy to look at the output and find the visual or narrative flaws. But also interesting to remember how advanced this is compared to what was possible 6-9 months ago. There is obviously a huge amount of untapped potential in these kinds of systems, as they become more refined.
  • Runway tease a prototype for a system to enable control over generative video, where videos are defined by keyframes and adjusting the connection/interpolation between them (blog post).
    • In October 2023, there were some prototypes of a “prompt travel” idea wherein a video was generated by picking a path through the image-generation latent space. One would define keyframe images, and the system would continually vary the effective prompt to interpolate between them (preprint, animatediff-cli-prompt-travel). This provided a level of control (while not being robust enough to actually enforce coherent temporal physics). Runway’s approach (leveraging a video model) may finally enable the required control and consistency.
  • Tencent announce an open-source video model: Hunyuan Video (example, video-to-video example).

World Synthesis

Science

Brain

  • Whole-brain mapping is advancing. We recently saw release of a fly brain map (140,000 neurons). Now, a roadmap effort claims that whole-brain mapping for mammalian brains should be possible in the coming years.

Hardware

  • ASML released a hype-video describing the complexity of modern lithography (in particular the computational lithography aspect). There is no new information, but it’s a nice reminder of the nature of the state-of-the-art.
  • I never grow tired of looking at plots of Moore’s Law:

Robots

  • MagicLab released a video purporting to show multi-(humanoid)robot collaboration on tasks.
Posted in AI, News | Tagged , , , , , , , | Leave a comment

AI News 2024-11-28

General

Research Insights

LLM

AI Agents

Image Synthesis

Audio

Video

World Synthesis

Science

Hardware

Robots

  • Although the Unitree G1 humanoid robot was announced with a price of $16k (c.f.), the latest price chart shows a range of configurations, with prices from $40k to $66k.
  • Mercedes is running a trial for use of Apptronik robot in their Austin lab.
Posted in AI, News | Tagged , , , , , , , | Leave a comment

AI News 2024-11-21

General

Research Insights

LLM

  • New study: AI-generated poetry is indistinguishable from human-written poetry and is rated more favorably. At least part of the effect may come from non-experts judging the simpler and more conventional AI poems as being more understandable and superior (and thus human), while the complexity and inconsistency of human-generated poetry is perceived as incoherence.
    • Nevertheless, this again shows that for short-form generation, AI has already reached human-level, and can be considered super-human in certain narrow ways.
  • Mistral releases a new large model (Mistral-Large-Instruct-2411, 123B) and Pixtral Large multimodal model (weights).
  • DeepSeek announces DeepSeek-R1-Lite-Preview. This is a “reasoning” model (inference-time chain-of-thought) that seems to be similar to OpenAI’s o1. Like o1, it achieves impressive results on math and science benchmarks. Some of the CoT reasoning traces are quite interesting (e.g.). The weights are not yet available, but they claim they will release it open-source.

AI Agents

Image Synthesis

  • A recent survey of 11,000 people has completed: How Did You Do On The AI Art Turing Test? The median score (to differentiate AI and human art) was 60%, a bit above chance. AI art was often preferred by humans. Overall, AI art has already crossing a Turing-Test threshold.

Audio

Video

Science

Hardware

Posted in AI, News | Tagged , , , , , , | Leave a comment

AI News 2024-11-14

General

Research Insights

LLM

AI Agents

Video

World Synthesis

Science

Robots

  • New Deep Robotics video shows very good terrain navigation from a quadruped-with-wheels design.
Posted in AI, News | Tagged , , , , , , | Leave a comment