AI News 2024-06-27

Research Insights

Anthropic

  • Anthropic released Claude 3.5 Sonnet. It is better than the larger Claude 3 Opus, and beats GPT-4o on many evals. (So presumably 3.5 Opus will be very smart?) It also has “artifacts”, which are sidebar visualizations/interactions that it can update and modify based on your requests. Interestingly, it seems to use special <antThinking> tags so that it can do chain-of-thought but have that output hidden from the user.

OpenAI

  • OpenAI acquired Rockset, a database/analytics company. The intended use seems to be for customers (especially corporate) to integrate data retrieval into LLM products.
  • Multi is a MacOS app for slick collaborative screenshare. They are shutting down their offering and instead “joining” OpenAI (merging with? being acquired by?). Some are guessing this means OpenAI will launch a radically new kind of operating system, where AI agents are first-class components. I think the simpler prediction is that they want their AI agent to “screenshare” by being able to see what’s on your screen and point at things, or even edit things or click buttons (with your permission). That would be useful.
  • Announced a partnership with TIME. Could either represent training data, or integration of sourced results in future ChatGPT replies (probably both). This is on top of other partnerships they’ve announced: Financial Times, Stack Overflow, Reddit, News Corp, Vox Media, The Atlantic, Apple.
  • Taken together, these make it seem like OpenAI are putting more focus on delivering a compelling consumer product.
  • On the research side, OpenAI put out a preprint showing how an LLM can be trained to critique another LLM. The critic can catch errors in the code output of ChatGPT. Small step towards iteration loops to improve outputs.

LLMs

  • Nvidia releases Nemotron-4 340B models and training dataset.
  • Google opens developer access to Gemini 1.5 Pro with 2M context window. That’s a lot of context.

Science

  • AlphaFold is already having a sizable impact on protein structure determination. Now, startup EvolutionaryScale has announced ambitions to enable programmable biology. Their preprint is equally ambitious: Simulating 500 million years of evolution with a language model. (See also prior publication cred.) They have open-sourced their ESM3 foundation model, which is trained on sequence, structure, and function of proteins. So you can (e.g.) input a desired function and it will generate a candidate protein. If these claims pan out, this could accelerate bio/medical research.
  • Some new work has demonstrated an RNA method for gene editing. In terms of utility, this is similar to CRISPR; in fact it could provide some capabilities beyond what CRISPR can do. Combined with more and more AI-based bio-design, this could lead to some interesting developments.

Robots

  • Kinda novel approach to AI/control for robotics: Dreamitate involves having the AI ‘dream’ an upcoming action (i.e. predict what the required action would look like in its camera vision), and then imitate that set of actions. The advantage here is that this leverages the power of generative video. You train a model on a bunch of video, so that it can correctly predict the next frame. Then that’s what you use for robot control. (This is the sense in which OpenAI claim Sora is a world-simulator and hence can be used to understand and act.)
  • A related robot-control effort: Embodied Instruction Following in Unknown Environments. Multi-modal model for robot following commands. Language model to understand human request. Builds a high-level plan and steps within it. Explores environment if necessary to learn more. Leveraging LLM means it can handle arbitrary tasks that it wasn’t specifically trained on.

Vision

  • Supervision is a generic (and open-source) vision system. Seems to work very well for semantic video tracking.
  • Microsoft open-sourced Florence-2, a lightweight vision-language foundation model useful for captioning, object detection, grounding, and segmentation. Interestingly, they created their training dataset by taking existing data and existing specialized models to create a unified set of well-labelled images. So this is another example of AI generating improved training data for AI.

Virtual Avatars

Tools

  • One idea for easily creating AI workflows is to use spreadsheet-like interfaces, where cells can invoke AI/LLM/etc. in order to run tasks across a whole bunch of data. V7 Go and Otto are offering this.

Hardware

  • Groq transitioned to being an AI cloud compute provider, instead of trying to sell people their custom chips directly. Their pricing on many models (including Whisper Large V3) are very good. They clearly have something to offer.
  • Etched raises $125M for their specialized chips.
  • Preprint recasts LLMs in a way that avoids matrix multiplication. Some are claiming this means the end of GPUs and Nvidia; that seems unlikely to me since there are so many current (and future!) data/ML/AI tasks that benefit from GPU/CUDA. But it is an interesting reminder that we don’t know what the optimal software architecture will be, thus it’s hard to know what the right hardware will be.
Posted in AI, News | Leave a comment

Towards a Science Exocortex

What is the future of AI in science? I propose that the community should work together to build an exocortex—an expansion to a researcher’s cognition and volition made possible by AI agents operating on their behalf.

The rise of large language models (LLMs) presages a true paradigm shift in the way intellectual work is conducted. But what will this look like in practice? How will it change science.

LLMs are often used as chatbots, but that perhaps misses their true potential, which is as decision-making agents. Andrej Karpathy (1,2) thus centers LLMs as the kernel (orchestration agent) for a new kind of operating system. The LLM triggers tools and coordinates resources, on behalf of the user.

In the future, every person might have an exocortex: a persistently-running swarm of AI agents that work on their behalf, thereby augmenting their cognition and volition. Crucially, the AI agents do not merely communicate with the human; they talk to each other, solving complex problems through iterative work, and only surfacing the most important results or decisions for human consideration. The exocortex multiplies the human’s intellectual reach.

A science exocortex can be built by developing a set of useful AI agents (for experimental control, for data exploration, for ideation), and then connecting them together to allow them to coordinate and work on more complex problems.

Here is an arXiv preprint with more details: https://arxiv.org/abs/2406.17809

The exocortex is obviously speculative. It is a research problem to identify the right design, build it, and deploy it for research. But the potential upside is enormous, in terms of liberating scientists from micro-managing details, allowing them to focus on high-level scientific problems; and correspondingly for massively accelerating the pace of scientific discovery.

Posted in AI | Leave a comment

AI News 2024-06-14


Research Insights

  • TextGrad tries to do the equivalent of gradient backpropagation for LLMs; computing “gradients” of performance in the text input/outputs sent between LLMs so that you can automatically optimize the behavior of interconnected LLM agents. I don’t know if this particular approach is the right one, but something like this seems promising.
  • Mixture-of-Agents appears to be applying a well-rationalized architecture to the general “LLMs working together” workflow. Layers of models are used, with initial/rough LLM replies being fed into the next layer, whereupon the LLM-output can be further refined. Selection of models within layers can be used to increase diversity (use different LLMs to balance each other) and performance (the best LLM for a given input can be emphasized). They show improved performance compared to just using one of the underlying LLMs single-shot. (Video going through paper.)
  • Aidan McLaughlin claims that we are ~1 year away from AGI, because current models combined with search (testing out many options) can already unlock enormous capabilities. Seems like an overzealous take, but there is mounting evidence of search greatly improving capabilities. For instance, Ryan Greenblatt claims he was able to massively improve performance on one of the most challenging benchmarks simply by using GPT-4o to sample thousands of options and pick the best one.
  • There’s also plenty of academic papers working on search methods. New preprint: Transformers meet Neural Algorithmic Reasoners. They seem to combine LLMs with graph neural networks except instead of searching/iterating in the text outputs, they refine internal to the LLM by using graph methods.

World Synthesis

Neural radiance and Gaussian splatting are making it possible to generate high-quality 3D imagery that is fast to render. Where is this headed?

  • These methods are bandwidth-efficient. To interact with a 3D scene traditionally, one would either need to render on the server and transmit 2D video (high-latency), or transmit tons of 3D data (vertex models) so the user’s computer can render locally (assuming their computer is powerful enough). But now you just transmit a point-cloud, which is fast to render. (You can play with examples: Luma captures.)
  • These methods are scalable. They’ve been adapted to large-scale scenes. Google Maps is already integrating this in select ways, and we will probably soon see a true virtual-Earth product (where you can move around in 3D anywhere).
  • Text-to-3D is steadily improving (Point-E, threestudio, ProlificDreamer, DreamFusion, Magic3D, SJC, Latent-NeRF, Fantasia3D, TextMesh, Zero-1-to-3, Magic123, InstructNeRF2NeRF, Control4D, DreamFusion, Cat3D). Neural methods should allow one to merge together real 3D (from photoscans) with traditional 3D renders and with AI generations.
  • Given the progress in generative images (2D), objects (3D), and video (2D+time=3D), the obvious next step is 4D: volumetric scene evolving in time. There was initial work on dynamic view synthesis from videos, and dynamic scene rendering. And now, Vidu4D demonstrates generation of non-rigid 3D objects transforming appropriately over time. Currently crude; but you can see the potential.
  • Some folks (e.g. David Holz, founded of Midjourney) see the end goal as having immersive environments that are neural-rendered in real-time, so that you can do exploration and interaction with worlds generated according to your inputs. (A holodeck, of sorts.)

Video

Audio

  • Camb.ai released an open-source voice generation/cloning model. 140 languages, reportedly very good quality. Not sure how it compares to ChatTTS. But it’s nice to have a variety of open-source options.
  • ElevenLabs have added video-to-audio to their many AI-audio options.
  • Google DeepMind demonstrate video-to-audio, which can generate plausible audio (sound effects, music) for a video clip.

Apple

  • Apple announces a bunch of AI features. It’s the expected stuff: integrated writing assistants, on-the-fly generation of images and emojis, a much-smarter Siri.
  • OpenAI will now be available in Apple products.
  • At first, people were concerned that all AI requests were being routed to OpenAI. But it actually sounds like Apple is industry-leading in terms of user privacy with cloud-computing/AI: many parts of the workflow will operate on-device, and cloud aspects use a hardened architecture (encryption, stateless, enforceable guarantees, etc.).
Posted in AI, News | Leave a comment

Situational Awareness

Leopold Aschenbrenner (previously at OpenAI) offers some unique perspectives on the future of AI. His paper “situational awareness” paints a picture of an inevitable AI-Manhatten project.

If you want to look into his arguments, here are some different formats:

It’s hard to summarize that much material. But here are my notes on the main points he argues:

  • Geopolitics will undoubtedly be at play once we get close to AGI; and definitely when ASI is at play.
  • Most people talk about AI as a project of corporate research labs (which it currently is), but as capabilities improve, it will be impossible for the national security apparatus to ignore.
  • Simple scaling arguments suggest we will reach AGI in ~2-3 years, unless we hit a barrier (he lists many). Of course, we may well hit a barrier; but caution requires us to plan assuming AGI could be very near.
  • Once you have AGI, you will achieve ASI very quickly. One of the easiest jobs to automate with AGIs will be AI research, so you will suddenly have an army of tireless AI researchers making exponential improvements. This is probably enough to go from AGI to ASI within a year.
  • Obviously, whoever controls ASI will have a massive geopolitical advantage (superhuman cyber-warfare, autonomous drone swarms, rapid development of new WMDs, optimal allocation of resources, etc.).
  • The US nuclear arsenal, the bedrock of recent global peace and security, will become essentially obsolete.
  • The corporate labs are operating like startups, with almost no regard for security. They need to transition to a strong security mindset sooner rather than later. Some of the key insights for building AGI and ASI are likely being developed right now. And those insights are not being safeguarded.
  • Obviously (within this mindset) open-sourcing anything would be irresponsible. Everything must be kept secret.
  • Western democracies are on the cusp of making a serious error, wherein they cede control of AI (and thus AGI and thus ASI and thus the future of the species) to an authoritarian regime.
  • We are very soon going to see major geopolitics (including espionage, assassinations, combat, bombing datacenters, etc.) focused on AI; as soon as more leaders “wake up” to what’s going on.
  • So, the US will aggressively pursue but lock-down AI research. It is a strategic asset. The US will invest in an enormous (multi-trillion $) Manhattan-style project to develop AGI first.
  • This will involve building massive compute clusters on US soil, investing in the research enterprise, locking it down using nuclear-weapons caliber security, and building tons of power plants (including bypassing clean energy laws if that’s what it takes to deliver the required power).
  • So, the near-future will be a contentious time period, with greater hostilities between countries and a greater threat to democracy.

His opinions are mostly predictions, but he is also prescriptive in the sense that he believes the West (and the US in particular) need to win this. I don’t agree with all his claims, but many of his points are hard to argue against. He is indeed correct that most of the general discussion on AI (across many ‘sides’) is missing some key points.

Posted in AI | Leave a comment

How to break apart Python pathlib Paths?

Python pathlib is the modern way to handle file paths. But I always forget how to break apart a path into components (directory part, filename part, etc.). This image is a cheat-sheet for working with Path, breaking it apart into root, directory path, filename, suffix, etc.

Posted in Helpguide | Leave a comment

How to convert dates/times in Python?

Working with dates and times in Python often involves converting between the various possible representations. Here is a graphic to quickly lookup how to convert between the different formats (epoch, struct_time, Python datetime object, string representation, and matplotlib date convention).

Posted in Helpguide | Leave a comment

Quarantine Email

Awhile ago, I wrote some code to graph my email behavior (in the spirit of Stephen Wolfram). I have been teleworking for the last 11 weeks (due to COVID19), and was curious to see how this shows up in my email behavior. First let’s start with a baseline by looking at average behavior over the last 5 years.

This is a stacked plot, with email I send (purple, bottom), and email I receive (blue for internal email, green for external).

There are many caveats. This is work email only. I am (intentionally) only measuring archived email, and exclude spam or any email that I delete rather than archive. This leads to the ‘sent’ (which is always saved) being much larger than ‘received’ (since I only archive things that are useful, not the thousands of emails generated every time someone needs to schedule a meeting). The data has some artifacts related to changes in how the lab has managed email over the years (and changes in my threshold for delete vs. archive). Nevertheless, it gives us something to consider.

The overall increase in email volume over time is apparent. The dip at the end of December each year of course coincides with holidays. We can also look at the average distribution of email over the course of a week:

Emails obviously come in mostly during working hours (though there is plenty of off-hours traffic from automated systems, other time zones, and colleagues who just aren’t sensible). My email sending follows clear patterns, including how I set aside weekday morning to handle backlogs of low-priority requests. Now we can compare during-quarantine to before-quarantine.

What changes do we see? I am receiving less external email than usual (not surprising, given how many collaborations are currently on hold). Email both sent and received is less contained to normal working hours. This is due both to crisis-management activities, and the blending of work and life.

Posted in Data | Tagged , , , , , | Leave a comment

New Layout

I have moved the website over to WordPress.

Posted in News | 1 Comment