Self-adaptive LLMs // at real-time
Unlocking the Future of AI: Introducing Transformer² and the Power of Adaptation
Adaptation is one of nature’s most remarkable capabilities, enabling life to thrive in dynamic environments. Inspired by this, Transformer² (“Transformer-squared”) revolutionizes AI by introducing self-adaptiveness — the ability for models to dynamically adjust to new tasks in real-time.
What Makes Transformer² Different?
Traditional AI models are static, requiring retraining for new tasks. Transformer² changes this by leveraging Singular Value Decomposition (SVD) to analyze and modify the “brain” (weight matrices) of large language models (LLMs). SVD decomposes these matrices into independent components, enabling the model to fine-tune its capabilities for specific tasks, such as math or coding.
How It Works
During training, Singular Value Finetuning (SVF) uses reinforcement learning to create compact z-vectors that amplify or dampen specific components. At inference, the model adapts via prompt-based, classifier-based, or few-shot methods, ensuring robust performance across diverse tasks.
Breakthrough Results
Transformer² outperforms traditional methods like LoRA across tasks such as math (GSM8K), coding (HumanEval), and reasoning (ARC-Challenge). Few-shot adaptation highlights the model’s ability to combine expertise from multiple domains, while cross-model z-vector transfer demonstrates scalability and knowledge sharing between LLMs.
The Future of AI
Transformer² represents a shift from static models to “living intelligence” — systems that continually adapt and learn. This advancement opens doors to AI capable of real-time adaptation, collaborative problem-solving, and seamless integration of new knowledge. As we move toward this future, Transformer² sets the stage for smarter, more resilient AI solutions.
Self-adaptive large language models (LLMs) aim to solve the challenges posed by traditional fine-tuning methods, which are often computationally intensive and static in their ability to handle diverse tasks. We introduce Transformer2 , a novel self-adaptation framework that adapts LLMs for unseen tasks in real-time by selectively adjusting only the singular components of their weight matrices. During inference, Transformer2 employs a two-pass mechanism: first, a dispatch system identifies the task properties, and then task-specific “expert” vectors, trained using reinforcement learning, are dynamically mixed to obtain targeted behavior for the incoming prompt. Our method outperforms ubiquitous approaches such as LoRA, with fewer parameters and greater efficiency. Transformer2 demonstrates versatility across different LLM architectures and modalities, including vision-language tasks. Transformer2 represents a significant leap forward, offering a scalable, efficient solution for enhancing the adaptability and task-specific performance of LLMs, paving the way for truly dynamic, self-organizing AI systems. Our code is available at https://github.com/SakanaAI/self-adaptive-llms
Over more than a decade there has been an extensive research effort of how effectively utilize recurrent models and attentions. While recurrent models aim to compress the data into a fixed-size memory (called hidden state), attention allows attending to the entire context window, capturing the direct dependencies of all tokens. This more accurate modeling of dependencies, however, comes with a quadratic cost, limiting the model to a fixed-length context. We present a new neural long-term memory module that learns to memorize historical context and helps an attention to attend to the current context while utilizing long past information. We show that this neural memory has the advantage of a fast parallelizable training while maintaining a fast inference. From a memory perspective, we argue that attention due to its limited context but accurate dependency modeling performs as a short-term memory, while neural memory due to its ability to memorize the data, acts as a long-term, more persistent, memory. Based on these two modules, we introduce a new family of architectures, called Titans, and present three variants to address how one can effectively incorporate memory into this architecture. Our experimental results on language modeling, common-sense reasoning, genomics, and time series tasks show that Titans are more effective than Transformers and recent modern linear recurrent models. They further can effectively scale to larger than 2M context window size with higher accuracy in needle-in-haystack tasks compared to baselines.