LLMs hallucinations // can it be corrected by LLMs

Lol: “recent attempts to self-correct logical or reasoning errors often cause correct answers to become incorrect”

4 min readJan 25, 2024

https://blog.research.google/2024/01/can-large-language-models-identify-and.html

In this work, we created an evaluation benchmark dataset that the wider academic community can use to evaluate future LLMs. We further showed that LLMs currently struggle to find logical errors. However, if they could, we show the effectiveness of backtracking as a strategy that can provide gains on tasks. Finally, a smaller reward model can be trained on general mistake-finding tasks and be used to improve out-of-domain mistake finding, showing that mistake-finding can generalize.

Here is a summary of the key points from the blog post:

- Gladys Tyen wrote an article discussing whether large language models (LLMs) can identify and correct their own mistakes.

- Tyen and colleagues created a new benchmark dataset called BIG-Bench Mistake to evaluate mistake-finding by LLMs. The dataset consists of Chain-of-Thought (CoT) reasoning traces labeled with mistake locations.

- Experiments found that LLMs perform poorly at independently identifying reasoning mistakes in CoT traces. The best model achieved only 52.9% accuracy.

- Using mistake-finding as a proxy for answer correctness was also ineffective, not much better than always assuming the answer was wrong.

- However, when given the mistake location, LLMs were able to successfully backtrack and correct errors in CoT reasoning. A simple method produced more accuracy gains from correcting wrong answers than losses from changing right ones.

- Fine-tuning a small model for mistake-finding allowed it to generalize to new unseen tasks, performing better than zero-shot prompting of a large model.

- Overall, the results show promise for backtracking as a strategy to improve LLM reasoning, if their mistake-finding ability can be improved. The new BIG-Bench Mistake dataset provides a benchmark to develop this capability.

While self-correction has shown promise in improving LLM outputs in terms of style and quality (e.g. Chen et al., 2023; Madaan et al., 2023), recent attempts to self-correct logical or reasoning errors often cause correct answers to become incorrect, resulting in worse performances overall (Huang et al., 2023). In this paper, we break down the self-correction process into two core components: mistake finding and output correction. For mistake finding, we release BIG-Bench Mistake, a dataset of logical mistakes in Chain-of-Thought reasoning traces. We provide benchmark numbers for several state-of-the-art LLMs, and demonstrate that LLMs generally struggle with finding logical mistakes. For output correction, we propose a backtracking method which provides large improvements when given information on mistake location. We construe backtracking as a lightweight alternative to reinforcement learning methods, and show that it remains effective with a reward model at 60–70% accuracy.

Here are the key points about whether LLMs can find reasoning errors and correct them:

- The authors divide the self-correction process into two components: mistake finding and output correction. They argue that mistake finding is a key capability for reasoning, but current LLMs struggle with it.

- They release a new dataset, BIG-Bench Mistake, containing annotated logical mistakes in reasoning chains generated by PaLM 2. Benchmark results on this dataset show state-of-the-art LLMs perform poorly on mistake finding, even for unambiguous cases.

- For output correction, the authors propose a backtracking method to improve outputs given the mistake location. This method successfully corrects incorrect outputs while minimally affecting correct ones.

- Backtracking can be used with a small trained reward model to identify mistakes, instead of relying on gold labels. It is effective even with low reward model accuracy around 60–70%.

- The authors construe backtracking as a form of “verbal reinforcement learning” that allows iterative improvement without weight updates. It provides a lightweight alternative to conventional deep RL methods for LLM correction.

The core findings are that current LLMs struggle to identify reasoning mistakes, but can correct outputs if given information about mistake locations. The authors advocate for more progress on mistake finding, and propose backtracking as an effective correction method.

LLMs hallucinations // can it be corrected by LLMs

Lol: “recent attempts to self-correct logical or reasoning errors often cause correct answers to become incorrect”

Written by sbagency

No responses yet