LLMs in Academia

March, 2024

What role does academia have in working with LLMs over the next several years?

LLMs are currently used in several ways in academia, including:

Training LLMs
Steering, interpretability, etc.
Understanding how LLMs work
Applications of LLMs

Of these, the most promising direction in the next few years will be applications, along with some research of how LLMs work.

Training LLMs

Training LLMs is competitive, resource intensive, grindy, and not very academically interesting. Improvements are incremental, and achieved through immense cost and effort that already exists in spades in industry. That costly work is dominated with unacademic tasks like trying to make better heuristics to clean the training data to be 1% better, or making deals to get licenses for better training data, etc.

My intuition is that, like CNNs and other ML techniques, progress happens very fast, and then slows down. Similarly, I don’t think we will see many big breakthroughs that make transformer models qualitatively better than they are now. Models will keep getting incrementally better, but GPT-7 won’t do anything qualitatively different from what GPT-4 can do today, and that improvement will be achieved through grinding.

This is work that academia is not very good at, and it shouldn’t try to compete with the far more motivated efforts in industry.

Steering, Interpretability, etc.

A lot of academic work on LLMs has been directed at making LLMs more practically usable (and “safe”) by making them easier to steer, and more interpretable. However, LLMs already accomplish both of these tasks surprisingly well.

Steering

RLHF just works really well. It is amazingly effective at getting LLMs to behave in the ways that their trainers intend, as well as in the ways that their users intend (by making them receptive to prompting).

As far as I can tell, there are two main branches of LLM steering research: getting LLMs to do what users tell them, and getting them to behave in the ways that their creators intend (i.e. AI ethics), and preventing Skynet. For the first and second, RLHF already works remarkably well, and more grindy work (in industry) will make them even better. For the third, it is hard to say much because it is so speculative. It is like saying that you are designing anti-UFO missiles to protect humanity from invasion. It is hard to know how to design such a thing without ever having seen a UFO, and it isn’t clear if the UFOs will ever arrive at all. (Even if they do, maybe RLHF is the best defense anyway?)

Interpretability

LLMs are already remarkably interpretable by default. You can just ask them why they will say something. (There is some subtlety about things like whether the explanation comes before or after the statement, which can determine the causal direction, but ultimately you can still get textual explanations from LLMs.)

Linear regressions are very simple and interpretable by looking at their coefficients, but even they cannot give you a paragraph of text explaining exactly why they gave a particular output. A textual explanation is often the ultimate product of any interpretability effort, as even numerical results (like those from looking at the coefficients of a linear regression) are converted by humans into textual explanations to be reasoned about.

The main concern researchers have is a bit vague, but often comes down to the idea that the LLM could hallucinate an explanation, or otherwise be “deceptive” and “lie”. I don’t know what it would mean for an LLM to be deceptive, and though I’m sure there are papers about it, I am skeptical that such concept is coherent. This may be a poor analogy, but saying an LLM can be deceptive might be like saying that you inner monologue is deceptive. I’m not sure whether that’s even possible, or what it would mean.

Also, steering through RLHF may make models even better at explaining themselves, making this point even more moot.

Understanding how LLMs work

Nobody truly understands how or why LLMs work, including their creators. We can look at all the incremental steps, but it isn’t clear how those steps come together to produce the observed emergent behavior.

At the same time, LLMs are very easy to probe and study. In that way, there could be a whole field of akin to the “biology” of LLMs dedicated to cutting up and running experiments on LLMs in “lab conditions”. The goal would not be intervention (as with interpretability and steering), but merely with understanding how this piece of essentially alien technology works.

(Much of the existing work in this direction goes under the “interpretability” banner, though it is distinct from other interpretability efforts in that it is more concerned with basic science than applications.)

This is an interesting direction with some promise, and one that academia is well suited for. My only concern comes from my intuition that LLMs are going to turn out to be a big computationally irreducible mess about which very little can be said. Just as we can understand how neurons work without understanding how brains work, I would guess that we will be able to understand all the linear algebra, but that there will be no satisfying grand narrative for how LLMs work. In other words, that we fundamentally cannot see the forest for the trees. If we could reduce the behavior of an LLM to something relatively simple and interpretable, then it would probably mean that the LLM isn’t doing anything interesting.

(Incidentally, the fact that LLMs are hard to sparsify may be evidence for this intuition.)

There is interesting academic work to be done here, but the results will likely be as vague (and perhaps unsatisfying) as results from neuroscience.

Applications of LLMs

There are endless (often interdisciplinary) applications of LLMs across academia. There are lots of previously hard problems that have now become easy, as well as numerous problems that nobody ever thought of because the tools didn’t exist. Now that they do exist, researchers from across disciplines should start adopting LLMs just as they have been adopting computers.

Many of these applications come from making textual data computable. Previously, it was difficult to work with textual data beyond a superficial level. LLMs have changed this, opening up vast new datasets for study, as well as adding primitives for researching non-numerical data.

Conclusion

The most exciting direction for LLM work in academia will be in applications, and some study of how/why LLMs work. Many of the other prominent directions (particularly those that focus on LLM development) are either competing with industry in a way that will be painful and fruitless, going after solved problems, or don’t make much sense philosophically.