General
- Ethan Mollick writes about “AI in organizations: Some tactics”, talking about how individuals are seeing large gains from use of AI, but organizations (so far) are not.
- Many staff are hiding their use of AI, with legitimate cause for doing so: orgs often signal risk-averse and punitive bureaucracy related to AI, staff worry that productivity gains won’t be rewarded (or indeed punished, as expectations rise), staff worry contributions won’t be regarded, etc.
- Mollick offers concrete things that orgs can do to increase use of AI:
- Reduce fear. Do not have punitive rules. Publicly encourage the use of AI.
- Provide concrete, meaningful incentives to those who use AI to increase efficiency.
- Build a sort of “AI Lab” where domain experts test all the tools and see whether they can help with business processes.
- The 2024 Nobel Prize in Physics has been awarded to John J. Hopfield and Geoffrey E. Hinton, for developing artificial neural networks.
- The 2024 Nobel Prize in Chemistry has been awarded to to David Baker for computational protein design, and to Demis Hassabis and John Jumper for AI protein prediction (AlphaFold).
- Lex Friedman interviews the team that builds Cursor. Beyond just Cursor/IDEs, the discussion includes many insights about the future of LLMs.
Research Insights
- Auto-Demo Prompting: Leveraging Generated Outputs as Demonstrations for Enhanced Batch Prompting. Generated outputs can be used as in-context demonstrations for answering later questions. So in a large batch of questions, by generating answers as Q/A pairs, these reinforce the answering pattern for subsequent work in the batch. This doesn’t improve smarts per se, but enforces a sort of self-consistency.
- ToolGen: Unified Tool Retrieval and Calling via Generation. Special embedding tokens are reserved for representing tools, enabling more seamless tool usage.
- xjdr has been challenging the prevailing assumption about using top_k sampling in LLMs (where one only considers the k most probably tokens at each step, in order to enforce some measure of consistency to training target). Entropix instead considers the entropy and variance of entropy. There are early hints that this can lead to improved output (e.g. reasoning) and may also provide a new way to steer models.
- For instance, during inference, if the next token has high uncertainty, then one can instead make the model wait, and reconsider. In this way, one can induce chain-of-thought grounded by the token probabilities.
- Differential Transformer. The goal is to amplify attention onto relevant tokens, thereby rejecting noise. This enables better performance on long-context tasks.
- Deciphering the Factors Influencing the Efficacy of Chain-of-Thought: Probability, Memorization, and Noisy Reasoning. Chain-of-thought performance reflects both memorization and (probability) reasoning.
LLM
- Last week, OpenAI added Canvas (a side-bar where one can collaborate with ChatGPT on code or text). Here are some videos that demo the feature. The implication is that Canvas could become a sort of universal interface for collaborating with AI.
- StackBlitz announces bolt.new, AI-accelerated coding where the AI handles the dev environment (sounds similar to Replit and Pythagora).
- gptme is an AI assistant inside your terminal. (Sounds similar to the earlier Shell-AI.)
- Abacus AI released Dracarys 2 (fine-tune of Qwen 2.5 72B-Instruct) which is optimized for coding. It is currently at #3 on the LiveCodeBench leaderboard (with only the closed-source o1-preview and o1-mini ahead).
- Documention ingestion for RAG is typically a pain, especially for PDF. Grobid is one open-source package available for this. Chunkr just announced they are open-sourcing their code.
AI Agents
- Altera is using GPT-4o to build agents. As an initial proof-of-concept, they have AI agents that can play Minecraft.
- CORE-Bench is a new benchmark (leaderboard) for assessing agentic abilities. The task consists of reproducing published computational results, using provided code and data. This task is non-trivial (top score right now is only 21%) but measurable.
- OpenAI released a new benchmark: MLE-bench (paper) which evaluates agents using machine-learning engineering tasks.
- AI Agents are becoming more prominent; but there is a wide range of definitions being used implicitly, all the way from “any software process” (“agent” is already in use for any software program that tries to accomplish something has been called) all the way to “AGI” (needs to be completely independent and intelligent). This thread is trying to crowd-source a good definition.
- Some that resonate with me:
- (1): agent = llm + memory + planning + tools + while loop
- (2): An AI system that’s capable of carrying out and completing long running, open ended tasks in the real world.
- (3): An AI agent is an autonomous system (powered by a Large Language Model) that goes beyond text generation to plan, reason, use tools, and execute complex, multi-step tasks. It adapts to changes to achieve goals without predefined instructions or significant human intervention.
- To me, a differentiating aspect of an agent (compared to a base LLM) is the ability to operate semi-autonomously (without oversight) for some amount of time, and make productive progress on a task. A module that simply returns an immediate answer to a query is not an agent. So, there must be some kind of iteration (multiple calls to LLM) for it to count. So I might offer something like:
- AI Agent: A persistent AI system that autonomously and adaptively completes open-ended tasks through iterative planning, tool-use, and reasoning.
- Some that resonate with me:
Image Synthesis
- FacePoke is a real-time image editor that allows one to change a face’s pose (code, demo), based on LivePortrait.
- A few months ago, Paints-UNDO (code) unveiled an AI method for not just generating an image, but approximating the stepwise sketching/drawing process that leads up to that image This is fun, maybe useful as a sort of drawing tutorial; but also undermines one of the few ways that digital artists can “prove” that their art is not AI generated (by screen-capturing the creation process).
- Inverse Painting (code) is similar for generating step-wise sequences of paintings.
Video
- ByteDance paper: Loong: Generating Minute-level Long Videos with Autoregressive Language Models. They show a method for generating long (one minute) generative video (example). Since it’s just a research demo, the image/motion quality is not state-of-the-art. But if this method can be refined and expanded, it will perhaps allow longer high-quality videos.
- Meta announces Movie Gen, a set of media foundation models (paper). Frontier-quality video generation (examples: 1, 2), including video editing and generating a character from input photo. No release of models or product or even demo just yet.
- A remarkably capable open-source video model has appeared (model, code, examples).
- Current quality of video generations:
- AI Avatar (using HeyGen).
- Generic Movies.
- Pyramid-flow (open source) model: examples.
World Synthesis
- MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion (project page). Enables reconstruction of a 3D scene from video, even when there is motion of subjects in the video.
- DreamMesh4D: Video-to-4D Generation with Sparse-Controlled Gaussian-Mesh Hybrid Representation (paper). 4D object (scene-evolution-over-time) representation from monocular video.
Science
- A provocative idea is to use LLMs (and other foundation models) as stand-ins for people; i.e. to use their trained behavior/heuristics as a measure of human behavior/psychology.
- 2021 November: Explaining and predicting human behavior and social dynamics in simulated virtual worlds: reproducibility, generalizability, and robustness of causal discovery methods.
- 2024 February: Are Large Language Models (LLMs) Good Social Predictors?
- 2024 April: Automated Social Science: Language Models as Scientist and Subjects.
- 2024 July: Perils and opportunities in using large language models in psychological research.
- 2024 August: Predicting Results of Social Science Experiments Using Large Language Models (demo).
- Now 2024 October:
- Large Language Models based on historical text could offer informative tools for behavioral science. Gain insight into deceased people/populations by modeling historical text.
- Intelligence at the Edge of Chaos. Uses AI to probe the origin of intelligence.
- I Want to Break Free! Anti-Social Behavior and Persuasion Ability of LLMs in Multi-Agent Settings with Social Hierarchy. Establishes multi-agent hierarchies; where power dynamics are at play, counter-productive behaviors arise. They user prisoner-guard setups to intentionally illicit and study anti-social and persuasion effects.