Research Insights
LLM
- Google releases new ultra-small open-source model: Gemma 3 270M.
- Prophet Arena aims to test how AI can predict the future.
- Deepseek v3.1.
Audio
- ElevenLabs releases video-to-music; it can generate soundtrack matched to the provided video.
Vision
- Towards generalizable and interpretable three-dimensional tracking with inverse neural rendering. They recast the task of 3D tracking to instead be fitting a neural rendering to the vision data. This leverages the availability of compute and modern model capabilities. It also shows AI/ML approaches increasingly converging on the predictive coding motif used in biological brains.
Image Synthesis
- A stealth/mystery model is being tested: nano-banana (speculation is that it is from Google). Early examples show it has startling ability to edit images based on natural language requests.
Video
- Higgsfield product-to-video demonstrates ability to add objects into existing footage. This shows the increasingly powerful modality of genAI video editing.
- Runway Act-Two updates to include changing voice performance alongside video generation.
World Synthesis
- Runway ML announces Game Worlds. Turn-based text-adventure games with generated narrative and images.
Science
- Capabilities of GPT-5 on Multimodal Medical Reasoning.
- RosettaFold 3: Accelerating Biomolecular Modeling with AtomWorks and RF3.
- BInD: Bond and Interaction-generating Diffusion Model for Multi-objective Structure-based Drug Design.
Hardware
- Luci Pin aims to deliver an AI device that can see/hear your context (but not fail as the Humane AI Pin did).
- Tai Necklace aims to deliver an AI device that looks like jewelry instead of a device.