Do LLMs understand the world? // some kind of joke
The article discusses the recent interest in understanding how large language models (LLMs) represent meaning, sparked by their impressive performance in language tasks. The authors argue that LLMs can represent meaning through the probability distributions over possible continuations of a given sentence or text. They propose that the “meaning” of a sentence is the equivalence class of sentences that produce similar continuation distributions.
The article addresses skepticism about whether LLMs can truly understand meaning, as they operate solely on linguistic forms rather than being grounded in the external world. The authors suggest that the probability distributions over continuations may capture aspects of how we interpret the world, and that meaning for humans could involve probabilities over multisensory experiences, which language models approximate in a more limited textual space.
While acknowledging that LLMs may not grasp meaning in the same way as humans, the authors argue that their internal representations have semantic value and exhibit structural relationships akin to those in natural language. They posit that alignment between LLMs and humans could be achieved through translation between the model’s “inner language” and natural language, given shared experiences through training data.
The article situates these questions within the broader philosophical discourse on the relationship between representations and the world, suggesting that LLMs offer a new perspective on these long-standing debates.
In our paper “Meaning representations from trajectories in autoregressive models”, we propose an answer to this question. For a given sentence, we consider the probability distribution over all possible sequences of tokens that can follow it, and the set of all such distributions defines a representational space.
In the field of natural-language processing (NLP), it is widely recognized that the distribution of words in language is closely related to their meaning. This idea is known as the “distributional hypothesis” and is often invoked in the context of methods like word2vec embeddings, which build meaning representations from statistics on word co-occurrence. But we believe we are the first to use the distributions themselves as the primary way to represent meaning. This is possible since LLMs offer a way to evaluate these distributions computationally.
We propose to extract meaning representations from autoregressive language models by considering the distribution of all possible trajectories extending an input text. This strategy is prompt-free, does not require fine-tuning, and is applicable to any pre-trained autoregressive model. Moreover, unlike vectorbased representations, distribution-based representations can also model asymmetric relations (e.g., direction of logical entailment, hypernym/hyponym relations) by using algebraic operations between likelihood functions. These ideas are grounded in distributional perspectives on semantics and are connected to standard constructions in automata theory, but to our knowledge they have not been applied to modern language models. We empirically show that the representations obtained from large models align well with human annotations, outperform other zero-shot and prompt-free methods on semantic similarity tasks, and can be used to solve more complex entailment and containment tasks that standard embeddings cannot handle. Finally, we extend our method to represent data from different modalities (e.g., image and text) using multimodal autoregressive models. Our code is available at: https://github.com/tianyu139/ meaning-as-trajectories