Hallucination correction [M]LLM // humans need it too)

sbagency
3 min readOct 27, 2023

--

https://arxiv.org/pdf/2310.16045.pdf

Hallucination is a big shadow hanging over the rapidly evolving Multimodal Large Language Models (MLLMs), referring to the phenomenon that the generated text is inconsistent with the image content. In order to mitigate hallucinations, existing studies mainly resort to an instruction-tuning manner that requires retraining the models with specific data. In this paper, we pave a different way, introducing a training-free method named Woodpecker. Like a woodpecker heals trees, it picks out and corrects hallucinations from the generated text. Concretely, Woodpecker consists of five stages: key concept extraction, question formulation, visual knowledge validation, visual claim generation, and hallucination correction. Implemented in a post-remedy manner, Woodpecker can easily serve different MLLMs, while being interpretable by accessing intermediate outputs of the five stages. We evaluate Woodpecker both quantitatively and qualitatively and show the huge potential of this new paradigm. On the POPE benchmark, our method obtains a 30.66%/24.33% improvement in accuracy over the baseline MiniGPT-4/mPLUG-Owl. The source code is released at https://github.com/BradyFU/Woodpecker.

The researchers compared the process to a woodpecker healing tree, saying it will pick out or identify and correct hallucinations from the generated text. Basically, Woodpecker doesn’t require retraining the model, it identifies the errors, and goes through the following five-stage process to fix the hallucinations on its own.

  • To begin, it finds the main objects mentioned in the text.
  • Next, it asks questions about the main objects, to better understand what they are and what are their qualities/features.
  • Then, the framework uses expert models to answer these questions in a step called “visual knowledge validation.” This simply means checking if the information makes sense using images or visual data.
  • After that, it turns the questions and answers into a knowledge base about what’s in the picture. This includes details about objects and their characteristics in the image.
  • In the last step, Woodpecker corrects hallucinations and includes supporting evidence from the visual knowledge base.

The concept of error correction isn’t new. The simple pattern is writer — corrector approach. One model generates content, another verifies. It can be more complex: more models for generation and for verification.

Advanced prompt engineering // for hallucinations prevention, for humans and models

The more accurate and fulfilled your prompt is, the fewer hallucinations you can get.

https://www.youtube.com/watch?v=kl52EfHq-Jw
https://www.youtube.com/watch?v=kl52EfHq-Jw
https://chat.openai.com/share/a603c1f8-4528-47f4-9c97-7e1ff52350cc

--

--

sbagency
sbagency

Written by sbagency

Tech/biz consulting, analytics, research for founders, startups, corps and govs.

Responses (1)