Large Language Models (LLMs) have ushered in a transformative era in the field of natural language processing, excelling in tasks related to text comprehension and generation. Nevertheless, they encounter difficulties when confronted with chaotic contexts (e.g., distractors rather than long irrelevant context), leading to the inadvertent omission of certain details within the chaotic context. In response to these challenges, we introduce the “Thread of Thought” (ThoT) strategy, which draws inspiration from human cognitive processes. ThoT systematically segments and analyzes extended contexts while adeptly selecting pertinent information. This strategy serves as a versatile “plug-and-play” module, seamlessly integrating with various LLMs and prompting techniques. In the experiments, we utilize the PopQA and EntityQ datasets, as well as a Multi-Turn Conversation Response dataset (MTCR) we collected, to illustrate that ThoT significantly improves reasoning performance compared to other prompting techniques.
The key points in this paper step-by-step:
The paper introduces a new strategy called “Thread of Thought” (ThoT) to help large language models handle chaotic contexts more effectively.
Chaotic contexts refer to situations where the input contains a lot of complex, interwoven information from different sources. This leads to challenges for LLMs in extracting the most relevant details to answer questions or generate responses.
ThoT takes inspiration from human cognition and prompts the LLM to systematically segment and analyze extended contexts. It guides the model through the information in a structured, stepwise manner to identify pertinent content while dismissing irrelevant details.
ThoT can be seamlessly integrated as a plug-and-play module with different LLMs and prompting techniques, without needing complex retraining or modifications.
Experiments were conducted using question answering and conversational response datasets containing chaotic contexts. ThoT outperformed methods like chain of thought prompting and showed significant improvements in reasoning ability.
ThoT represents an efficient way to enhance LLMs’ processing of chaotic contexts. It improves comprehension over long paragraphs and protects against misleading information. The two-tiered prompting aligns the model’s reasoning process closer to human cognitive patterns.
Thread of Thought is an effective strategy to handle messy contextual information by methodically guiding LLMs to analyze, summarize and extract relevant details from chaotic contexts. It’s a simple yet powerful approach to boost reasoning performance.
We explore how generating a chain of thought — a series of intermediate reasoning steps — significantly improves the ability of large language models to perform complex reasoning. In particular, we show how such reasoning abilities emerge naturally in sufficiently large language models via a simple method called chain-ofthought prompting, where a few chain of thought demonstrations are provided as exemplars in prompting. Experiments on three large language models show that chain-of-thought prompting improves performance on a range of arithmetic, commonsense, and symbolic reasoning tasks. The empirical gains can be striking. For instance, prompting a PaLM 540B with just eight chain-of-thought exemplars achieves state-of-the-art accuracy on the GSM8K benchmark of math word problems, surpassing even finetuned GPT-3 with a verifier
Memory-augmented Large Language Models (LLMs) have demonstrated remarkable performance in long-term human-machine interactions, which basically relies on iterative recalling and reasoning of history to generate high-quality responses. However, such repeated recall-reason steps easily produce biased thoughts, i.e., inconsistent reasoning results when recalling the same history for different questions. On the contrary, humans can keep thoughts in the memory and recall them without repeated reasoning. Motivated by this human capability, we propose a novel memory mechanism called TiM (Think-in-Memory) that enables LLMs to maintain an evolved memory for storing historical thoughts along the conversation stream. The TiM framework consists of two crucial stages: (1) before generating a response, a LLM agent recalls relevant thoughts from memory, and (2) after generating a response, the LLM agent post-thinks and incorporates both historical and new thoughts to update the memory. Thus, TiM can eliminate the issue of repeated reasoning by saving the post-thinking thoughts as the history. Besides, we formulate the basic principles to organize the thoughts in memory based on the well-established operations, (i.e., insert, forget, and merge operations), allowing for dynamic updates and evolution of the thoughts. Furthermore, we introduce Locality-Sensitive Hashing into TiM to achieve efficient retrieval for the long-term conversations. We conduct qualitative and quantitative experiments on real-world and simulated dialogues covering a wide range of topics, demonstrating that equipping existing LLMs with TiM significantly enhances their performance in generating responses for long-term interactions
The key points about Think-in-Memory (TiM):
- TiM is a novel memory mechanism that enables large language models (LLMs) to maintain an evolved memory for storing historical thoughts along a conversation.
- It consists of two stages:
— Recalling stage: Before generating a response, the LLM agent recalls relevant thoughts from memory.
— Post-thinking stage: After generating a response, the LLM agent incorporates new thoughts into the memory.
- TiM avoids the issue of inconsistent reasoning that occurs when LLMs repeatedly reason over the same context. By saving post-thinking thoughts as history, TiM eliminates repeated reasoning.
- TiM organizes thoughts in the memory based on operations like insert, forget, and merge, allowing the thoughts to be dynamically updated.
- Locality-sensitive hashing is used for efficient retrieval of relevant thoughts from long-term memory.
- Experiments on dialog datasets show TiM improves LLMs’ performance on long-term conversations across metrics like response correctness and coherence.
- Overall, TiM enhances LLMs’ ability to process lengthy context and generate high-quality responses for long-term interactions. It provides a more natural, human-like memory mechanism.
Significant scientific discoveries have driven the progress of human civilisation. The explosion of scientific literature and data has created information barriers across disciplines that have slowed the pace of scientific discovery. Large Language Models (LLMs) hold a wealth of global and interdisciplinary knowledge that promises to break down these information barriers and foster a new wave of scientific discovery. However, the potential of LLMs for scientific discovery has not been formally explored. In this paper, we start from investigating whether LLMs can propose scientific hypotheses. To this end, we construct a dataset consist of background knowledge and hypothesis pairs from biomedical literature. The dataset is divided into training, seen, and unseen test sets based on the publication date to control visibility. We subsequently evaluate the hypothesis generation capabilities of various top-tier instructed models in zero-shot, few-shot, and fine-tuning settings, including both closed and open-source LLMs. Additionally, we introduce an LLM-based multi-agent cooperative framework with different role designs and external tools to enhance the capabilities related to generating hypotheses. We also design four metrics through a comprehensive review to evaluate the generated hypotheses for both ChatGPT-based and human evaluations. Through experiments and analyses, we arrive at the following findings: 1) LLMs surprisingly generate untrained yet validated hypotheses from testing literature. 2) Increasing uncertainty facilitates candidate generation, potentially enhancing zero-shot hypothesis generation capabilities. These findings strongly support the potential of LLMs as catalysts for new scientific discoveries and guide further exploration.