LLM Programming Loops: Giving the Model an IDE

April, 2023

A lot of effort has been put into making LLMs write code, and they are pretty good at it. However, LLMs often make small and subtle errors that are hard to debug, and make it difficult to rely on LLMs for programming.

At some level, the programming tasks often assigned to LLMs are "unfair" in the sense that they are given access to far fewer resources and feedback than human programmers. Even experienced human programmers struggle to write correct code without the ability to run and test it, so why do we expect LLMs to do so?

The solution is to give LLMs access to the tools that humans find helpful for programming. These include the ability to run and test code, search codebases, and read documentation.

Letting the Model Run the Code

One of the most useful features of the Wolfram plugin for ChatGPT is that it gives the LLM the ability to run code. Once it can run code, it can be prompted into some amazing workflows that lets it replicate the coding/debugging loop that human programmers use.

For example, here I gave it a relatively simple prompt asking it to write code, test whether it works, and iterate on the code if it does not. This is already enough that the LLM enters a debug loop where it quickly iterates on its code. In this example, it still couldn't figure it out, but with a hint it found a solution after only two iterations:

With some prompting, the Wolfram plugin can used to access documentation (because there are built-in functions for getting docs). However, I've found that it reads documentation much more reliably when it is given a dedicated endpoint.

A Full IDE

With the Wolfram plugin, the LLM can run code, but it needs more to be able to write or edit more substantial pieces of code.

Using ChatGPTPluginKit, I developed a plugin gives ChatGPT the ability to:

The full code for this plugin is available here.

This functionality gives the LLM access to a sort of IDE, just as a human programmer would use. Once pointed at a codebase, it works remarkably well at modifying and testing it with very little supervision:

Just as we are still finding new tools to help humans program in IDEs, there are surely more tools that will help LLMs program beyond the ones tried here. Whether those tools are fundamentally the same as those that humans find helpful is an interesting question that could provide some insight into how both humans and LLMs program, but more experiments are needed to answer it.