AI News 2025-01-30

General

Humanity’s Last Exam has now released their dataset of 3,000 challenging problems.
- NY Times article: When A.I. Passes This Test, Look Out. The creators of a new test called “Humanity’s Last Exam” argue we may soon lose the ability to create tests hard enough for A.I. models.
It’s worth having periodic reminders that models keep improving in capability while reducing in cost. So the cost-per-capability is dropping truly dramatically. Open-weights models put continued economic pressure on this trend, forcing closed providers to keep lowering costs (even if they have a lead in performance). This graphic (from here) shows the trend:

Nous Research announces Psyche on Solana, a training system that can handle distributed training across heterogenous hardware.
Huggingface announces Inference Providers Hub, providing access to many compute providers through one interface.
The US Copyright Office has issued a statement: Copyright and Artificial Intelligence: Part 2: Copyrightability. The summary is that they contend existing copyright law is sufficient to handle AI; the existing rule is that significant human involvement in creation is necessary in order to warrant copyright (purely mechanical or accidental or non-human generation is insufficient). So works generated entirely by AI are not protected (the prompt input is not sufficient to be considered human-generated); but works incorporating AI elements or works transformatively changing AI generations could be protected.
Mark Zuckerberg discusses Llama 4 training progress. Training is ongoing (Llama-4-mini is done pre-training), models will be natively multi-modal, upcoming models will include reasoning, Meta’s stated goal is to have leadership models, agentic applications are anticipated.
- Meta plans to invest $65B in AI in 2025, including a 2GW datacenter with 1 million Nvidia GPUs.
OpenAI is increasing ties to US government activities:
- Introducing ChatGPT Gov: designed to streamline government agencies’ access to OpenAI’s frontier models.
- OpenAI partners with U.S. National Laboratories on scientific research, nuclear weapons security. OpenAI statement: Strengthening America’s AI leadership with the U.S. National Laboratories: OpenAI’s latest line of reasoning models will be used by nation’s leading scientists to drive scientific breakthroughs.

Research Insights

Interpretability in Parameter Space: Minimizing Mechanistic Description Length with Attribution-based Parameter Decomposition. Blog post: Attribution-based parameter decomposition. They claim a new method for mapping neural net parameters into interpretable features. Their method establish a minimal and simple set of parameters that reproduce model behavior.
TopoNets: High Performing Vision and Language Models with Brain-Like Topography. Taking inspiration from the functional organization of biological brains, they enforce a training loss that causes an artificial neural net to be topographically organized. This does not reduce performance, and provides some advantages (lower dimensionality, efficiency). This might also have implications for interpretability.
Tell me about yourself: LLMs are aware of their learned behaviors. LLMs can exhibit a surprising level of self-awareness: when trained to generate a set of behaviors, they can describe/define the behavior. The underlying mechanism is as yet unclear; it could be mere correlation of activation, or it could represent genuine self-analysis.

LLM

Release of Qwen2.5-1M model, with a 1 million token context (technical report).
Release of Qwen2.5-VL, a vision-language model.
DeepSeek releases Janus Pro 1B (includes image generation and chat with PDF). It can run local/in-browser via WebGPU (demo here).
Open Thoughts has launched as an effort to curate quality datasets for training reasoning models (e.g. validated synthetic reasoning traces). Initial dataset has 114k traces.
Open-R1 is an attempt to reproduce the DeepSeek-R1 model/result/method in a fully open manner.
OpenAI has added a “think” option to GPT-4o, allowing it to invoke some form of chain-of-thought.

Safety

New safety report published: International Safety Report: The International Scientific Report on the Safety of Advanced AI (January 2025).
- In comparing model performance, they included some (previously unreleased) early test results from OpenAI (page 11), confirming that o3 outperforms across a wide range of technical and reasoning benchmarks.
Review paper: Open Problems in Mechanistic Interpretability.

AI Agents

AI agentic computer use is growing. Anthropic demoed their computer use system, and OpenAI just released their Operator. Convergence AI now has Proxy, another kind of computer use agent.

Audio

Multimodal Art Projection (m-a-p) released a new open-source full-song music generation model: YuE (乐).

Video

Pika announces v2.1 of their video model (examples: 1, 2, 3)
Hailuo introduces T2V-01-Director, which allows natural language specification of camera movements (more examples).
Luma adds 4K upscaling of videos generated using their Dream Machine model.
Krea adds consistent character feature for video.
Alibaba Qwen added a video model to their chatbot (try it here, examples).

Science

Reid Hoffman launched Manas AI, which aims to accelerate drug discovery.

Robots

Unitree shows off their humanoid robot’s abilities through choreographed dance.

S	M	T	W	T	F	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

Leave a Reply