AI News 2024-08-15

Research Insights

  • An empirical investigation of the impact of ChatGPT on creativity. They find that people using ChatGPT as an aid generate more creative outputs, though these are mostly incremental ideas. The results are roughly consistent with an earlier study that using genAI makes individual users more creative, but also reduces the overall diversity of ideas from the group of users.
  • Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers. They describe rStar (code), self-play mutual reasoning approach. A small model adds to Monte Carlo Tree Search using some defined reasoning heuristics. Mutually consistent trajectories can be emphasized.
    • The body of work describing inference-time search strategies continues to grow. They all show improvements of various sorts. It remains unclear whether there is one strategy that substantially out-performs.

LLMs

  • Qwen released Qwen2-math, 1.5B, 7B, 72B (huggingface, github). Top performance on math tasks.
  • Anthropic is experimenting with adding inline actions to Artifacts. For instance, you can select code and pick “Improve” or “Explain”.
  • Anthropic released prompt caching, which can greatly reduce inference costs.
  • Researchers released LLMs tuned for healthcare.
  • xAI released a beta of Grok-2. They have also achieved roughly “GPT-4” caliber performance, with benchmarks similar to GPT-4o-mini, Claude 3.5 Sonnet, or Gemini 1.5-Pro. The system has real-time access to 𝕏 posts; there are mixed reactions about whether this is useful or not.
    • Grok 2 currently uses Flux for image generation. The implementation is less restricted than other major image synthesis providers.
  • OpenAI making incremental progress:
    • Finally released the GPT-4o system card, which describes some aspects of training and safety.
    • Quietly pushed out an updated to GPT-4o. People do indeed report that it feels slightly smarter.
    • Released a new-and-improved SWE-bench Verified, to enable better evaluation of AI ability to solve real-world software issues.

AI Agents

Safety

Image

Video

World Synthesis

Hardware

Robots

This entry was posted in AI, News and tagged , , , , , , , , . Bookmark the permalink.

Leave a Reply