Cognitive Architectures for Language Agents // CoALA generalized decision-making process to choose actions
Recent efforts have augmented large language models (LLMs) with external resources (e.g., the Internet) or internal control flows (e.g., prompt chaining) for tasks requiring grounding or reasoning, leading to a new class of language agents. While these agents have achieved substantial empirical success, we lack a systematic framework to organize existing agents and plan future developments. In this paper, we draw on the rich history of cognitive science and symbolic artificial intelligence to propose Cognitive Architectures for Language Agents (CoALA). CoALA describes a language agent with modular memory components, a structured action space to interact with internal memory and external environments, and a generalized decision-making process to choose actions. We use CoALA to retrospectively survey and organize a large body of recent work, and prospectively identify actionable directions towards more capable agents. Taken together, CoALA contextualizes today’s language agents within the broader history of AI and outlines a path towards language-based general intelligence.
The key points from the paper:
- The paper proposes Cognitive Architectures for Language Agents (CoALA), a conceptual framework for designing and understanding language agents. Language agents use large language models (LLMs) like GPT to interact with the world.
- CoALA structures agents into modules for memory, action spaces, and decision procedures. It divides memory into working memory and long-term memory (procedural, semantic, episodic). The action space has external grounding actions that affect the environment, and internal actions for reasoning, retrieval, and learning. Decision procedures execute cycles of planning (proposing and evaluating actions) and execution.
- CoALA draws inspiration from decades of research on production systems and cognitive architectures in AI. The paper discusses the history of these ideas and makes an analogy between production systems (logical rewrite rules) and LLMs (probabilistic text rewrite models).
- The paper reviews how techniques like prompt engineering can be seen as imposing algorithmic control flow on LLMs. It suggests cognitive architectures can similarly help develop more systematic and capable LLM agents.
- CoALA provides a lens to understand and compare recent LLM agents like SayCan, ReAct, Voyager, Generative Agents, and Tree of Thoughts. It also suggests future directions like more complex decision procedures, integrating retrieval and reasoning, meta-learning agent code, and aligning LLMs to human values.
- In summary, the paper offers both a theoretical framework to conceptualize LLM agents and concrete suggestions to advance the field towards more general intelligence. By bridging cognitive science and modern AI, CoALA aims to systematically unleash the potential of LLMs.
The key ideas behind CoALA:
CoALA (Cognitive Architectures for Language Agents) is a conceptual framework proposed to systematically design and understand language agents based on large language models. It structures agents into memory modules, action spaces, and decision procedures. Memory is divided into working and long-term (procedural, semantic, episodic) to represent different types of knowledge. The action space consists of external grounding actions that interact with the environment as well as internal reasoning, retrieval, and learning actions that operate on the agent’s knowledge. Decision procedures loop through proposing, evaluating and selecting actions to plan ahead. CoALA connects insights from decades of research on production systems and cognitive architectures in AI to modern large language models. It provides a theoretical lens to analyze and compare existing agents, as well as suggest new directions like meta-learning agent code. Overall, CoALA aims to synergize cognitive science and modern AI to systematically unlock the potential of large language models for general intelligence.
Agents consist of: memory (short,long), decision procedures (CoT, ToT,etc.), tools usage (code gen,exec), search (external, private), planning, actions, etc. Agents can mimic humans in almost every aspect and surpasses them in speed and scale.