In-context Continual Learning LLM // Beyond RAG

LLM/ML problems (hallucinations): catastrophic forgetting (FT), Bad RAG accuracy & security (in case of external APIs)

sbagency
6 min readDec 29, 2024
https://arxiv.org/pdf/2412.15563

Existing continual learning (CL) methods mainly rely on fine-tuning or adapting large language models (LLMs). They still suffer from catastrophic forgetting (CF). Little work has been done to exploit in-context learning (ICL) to leverage the extensive knowledge within LLMs for CL without updating any parameters. However, incrementally learning each new task in ICL necessitates adding training examples from each class of the task to the prompt, which hampers scalability as the prompt length increases. This issue not only leads to excessively long prompts that exceed the input token limit of the underlying LLM but also degrades the model’s performance due to the overextended context. To address this, we introduce InCA, a novel approach that integrates an external continual learner (ECL) with ICL to enable scalable CL without CF. The ECL is built incrementally to pre-select a small subset of likely classes for each test instance. By restricting the ICL prompt to only these selected classes, InCA prevents prompt lengths from becoming excessively long, while maintaining high performance. Experimental results demonstrate that InCA significantly outperforms existing CL baselines, achieving substantial performance gains.

Existing continual learning (CL) research in NLP has primarily focused on fine-tuning or adapting LLMs for individual tasks, either by learning trainable prompts or adapters or updating the LLM’s parameters. While these approaches can improve CL accuracy, their effectiveness remains limited due to catastrophic forgetting (CF). On the other hand, in-context learning with LLMs has proven highly effective across various NLP tasks. However, its application to CL is hindered by the limited context window of LLMs. As the number of tasks increases, the in-context prompt grows, often exceeding the token limit or leading to performance degradation due to overextended context, which may include irrelevant information. This paper proposed a novel method to address these challenges by leveraging an external continual learner. Our method is replay-free and does not fine-tune or adapt the LLM, treating it solely as a black box. Experiments show that our method markedly outperforms baselines, without suffering from CF.

https://x.com/mbodhisattwa/status/1843296291792855288
https://arxiv.org/pdf/2412.03782

The ability of language models to learn a task from a few examples in context has generated substantial interest. Here, we provide a perspective that situates this type of supervised few-shot learning within a much broader spectrum of meta-learned in-context learning. Indeed, we suggest that any distribution of sequences in which context non-trivially decreases loss on subsequent predictions can be interpreted as eliciting a kind of in-context learning. We suggest that this perspective helps to unify the broad set of in-context abilities that language models exhibit — such as adapting to tasks from instructions or role play, or extrapolating time series. This perspective also sheds light on potential roots of in-context learning in lower-level processing of linguistic dependencies (e.g. coreference or parallel structures). Finally, taking this perspective highlights the importance of generalization, which we suggest can be studied along several dimensions: not only the ability to learn something novel, but also flexibility in learning from different presentations, and in applying what is learned. We discuss broader connections to past literature in meta-learning and goal-conditioned agents, and other perspectives on learning and adaptation. We close by suggesting that research on in-context learning should consider this broader spectrum of in-context capabilities and types of generalization.

In this paper, we have provided a perspective on in-context learning that situates the few-shot incontext learning in large language models within a broader spectrum of in-context learning. Indeed, we suggest that any nontrivial sequential dependencies effectively induce some kind of incontext learning. This perspective helps to connect standard supervised ICL to the broader contextual capabilities of language models, such as instruction following or role playing. Our perspective also highlights potential roots of ICL in more basic contextual language processing. Finally, seeing the broader spectrum of ICL suggests several types of generalization that can be evaluated: generalization of what is learned in context, and how flexibly it can be learned and applied. We hope that our perspective will prove useful for researchers interested in the capabilities of large language models, as well as those more generally interested in the links between meta-learning, ICL, goal-conditioned agents, and other research on adaptive sequential behavior. A call to action for ICL research: Our main goal in articulating this perspective is to advocate for ICL research to expand its focus beyond the few-shot supervised setting, by incorporating other kinds of in-context learning and generalization. We suggest that there will likely be mechanistic and behavioral interactions among the many kinds of ICL. Considering these interactions will be necessary to fully understand the generalization behavior and internal functions of large models trained on rich sequential data.

https://arxiv.org/pdf/2412.18295

The growing ubiquity of Retrieval-Augmented Generation (RAG) systems in several realworld services triggers severe concerns about their security. A RAG system improves the generative capabilities of a Large Language Models (LLM) by a retrieval mechanism which operates on a private knowledge base, whose unintended exposure could lead to severe consequences, including breaches of private and sensitive information. This paper presents a blackbox attack to force a RAG system to leak its private knowledge base which, differently from existing approaches, is adaptive and automatic. A relevance-based mechanism and an attackerside open-source LLM favor the generation of effective queries to leak most of the (hidden) knowledge base. Extensive experimentation proves the quality of the proposed algorithm in different RAG pipelines and domains, comparing to very recent related approaches, which turn out to be either not fully black-box, not adaptive, or not based on open-source models. The findings from our study remark the urgent need for more robust privacy safeguards in the design and deployment of RAG systems.

This paper presented an adaptive procedure that allows a malicious user to extract information from the private knowledge base of a RAG system. Thanks to an anchor-based mechanism, paired with automatically updated relevance scores, the proposed algorithm allows a user equipped with a open-source tools (that can run on a domestic computer) to craft attacks that significantly overcome all the considered competitors in terms of coverage, leaked knowledge, query building time. These findings remark the urgent need for more robust safeguards in the design of RAG systems (see Appendix H for details on new upcoming safeguarding techniques). Our future work will consider a targeted version of the attack, which should be easily implemented by using a set of pre-designed anchors.

--

--

sbagency
sbagency

Written by sbagency

Tech/biz consulting, analytics, research for founders, startups, corps and govs.

No responses yet