General
- Harvard Business Review: What Gets Measured, AI Will Automate.
- OpenAI: Working with 400,000 teachers to shape the future of AI in schools. OpenAI joins the American Federation of Teachers to launch the National Academy for AI Instruction.
Research Insights
- Predicting thinking time in Reasoning models. By predicting estimate token needs during CoT reasoning, feedback to user (e.g. progress bar) could be provided. Models internally have state that is predictive of further token usage.
- Strategic Intelligence in Large Language Models: Evidence from evolutionary Game Theory. LLMs can coherently succeed at iterative games. Models from different vendors have identifiable personalities in how they approach games.
- How We Replicated Five Peer-Reviewed Papers in Five Hours.
- Prior work: The Discovery Engine.
- Preprint: Benchmarking the Discovery Engine.
LLM
- gremllm is clever and/or diabolical. It is a Python library that generates on-the-fly the attributes and methods of a Python object. Thus, one need not actually define the methods for a new class; simply allow the LLM to hallucinate them when they are called for.
- Although this sounds silly and dangerous, there are viable use-cases. In March 2023 (site and code no longer online), there was some exploration of “imaginary programming” wherein one would define a function’s requirements but never actually code the function (the LLM would instead stand-in for the function at call time).
- xAI release Grok 4 (and Grok 4 Heavy). Benchmarks are strong, taking the lead on several, including 100% on AIME, 44% on Humanity’s Last Exam, and 16% on ARC-AGI-2 (c.f. 9% Claude Opus 4). If real-world utility matches benchmarks, then Grok 4 may take the lead as the best model.
Safety
World Synthesis
Science
Robots
- Huggingface announces: Reachy Mini – The Open-Source Robot for Today’s and Tomorrow’s AI Builders. Appears to be optimized for education and hobbyist hacking.