Chain-of-table // use tables in LLMs reasoning
There are a lot if Chain-of-X reasoning approaches, but it won’t be helpful if there is no relevant data.
People use tables every day to organize and interpret complex information in a structured, easily accessible format. Due to the ubiquity of such tables, reasoning over tabular data has long been a central topic in natural language processing (NLP). Researchers in this field have aimed to leverage language models to help users answer questions, verify statements, and analyze data based on tables. However, language models are trained over large amounts of plain text, so the inherently structured nature of tabular data can be difficult for language models to fully comprehend and utilize.
The post introduces “Chain-of-Table”, a novel framework that enhances large language models’ (LLMs) ability to understand and reason over tabular data. The key idea is to guide LLMs to iteratively generate operations and update the given table to represent a reasoning chain tailored to the question at hand. This allows the LLM to transform complex tables into simpler, more manageable forms, enabling more accurate and reliable predictions.
The framework consists of three main stages: 1) dynamically planning the next tabular operation based on the current table state and question, 2) generating arguments for the selected operation, and 3) executing the operation to produce a new intermediate table. This process is repeated, creating a chain of operations and intermediate tables that reveal the reasoning process.
Experiments on benchmarks like WikiTQ and TabFact show that Chain-of-Table achieves new state-of-the-art performance, outperforming generic reasoning methods and program-aided methods. It exhibits better robustness on harder questions requiring longer operation chains and larger input tables. The iterative table transformations and explicit reasoning chains enable LLMs to handle complex tabular data more effectively.
Overall, Chain-of-Table provides a promising approach for enhancing LLMs’ tabular reasoning capabilities by leveraging the structured nature of tables and guiding models to express intermediate reasoning steps.
Table-based reasoning with large language models (LLMs) is a promising direction to tackle many table understanding tasks, such as table-based question answering and fact verification. Compared with generic reasoning, table-based reasoning requires the extraction of underlying semantics from both free-form questions and semi-structured tabular data. Chain-of-Thought and its similar approaches incorporate the reasoning chain in the form of textual context, but it is still an open question how to effectively leverage tabular data in the reasoning chain. We propose the CHAIN-OF-TABLE framework, where tabular data is explicitly used in the reasoning chain as a proxy for intermediate thoughts. Specifically, we guide LLMs using in-context learning to iteratively generate operations and update the table to represent a tabular reasoning chain. LLMs can therefore dynamically plan the next operation based on the results of the previous ones. This continuous evolution of the table forms a chain, showing the reasoning process for a given tabular problem. The chain carries structured information of the intermediate results, enabling more accurate and reliable predictions. CHAINOF-TABLE achieves new state-of-the-art performance on WikiTQ, FeTaQA, and TabFact benchmarks across multiple LLM choices.
We explore how iterative revising a chain of thoughts with the help of information retrieval significantly improves large language models’ reasoning and generation ability in long-horizon generation tasks, while hugely mitigating hallucination. In particular, the proposed method — retrieval-augmented thoughts (RAT) — revises each thought step one by one with retrieved information relevant to the task query, the current and the past thought steps, after the initial zero-shot CoT is generated. Applying RAT to GPT-3.5, GPT-4, and CodeLLaMA-7b substantially improves their performances on various long-horizon generation tasks; on average of relatively increasing rating scores by 13.63% on code generation, 16.96% on mathematical reasoning, 19.2% on creative writing, and 42.78% on embodied task planning. The demo page can be found in https://craftjarvis.github.io/RAT.