Rank RAG // not a surprise

Single LLM for top-k Ranking and Answer Generation in RAG

2 min readJul 10, 2024

Large language models (LLMs) typically utilize the top-k contexts from a retriever in retrieval-augmented generation (RAG). In this work, we propose a novel instruction fine-tuning framework RankRAG, which instruction-tunes a single LLM for the dual purpose of context ranking and answer generation in RAG. In particular, the instruction-tuned LLMs work surprisingly well by adding a small fraction of ranking data into the training blend, and outperform existing expert ranking models, including the same LLM exclusively fine-tuned on a large amount of ranking data. For generation, we compare our model with many strong baselines, including GPT-4–0613, GPT-4-turbo-2024–0409, and ChatQA-1.5, an open-sourced model with the state-of-the-art performance on RAG benchmarks. Specifically, our Llama3-RankRAG significantly outperforms Llama3-ChatQA-1.5 and GPT-4 models on nine knowledge-intensive benchmarks. In addition, it also performs comparably to GPT-4 on five RAG benchmarks in the biomedical domain without instruction fine-tuning on biomedical data, demonstrating its superb capability for generalization to new domains.

RankRAG Inference: Retrieve-Rerank-Generate Pipeline As RankRAG incorporates an additional reranking step, the inference pipeline for each question is modified as a retrieve-rerank-generate pipeline, described as follows: (1) the retriever R first retrieves top-N contexts from the corpus. (2) the RankRAG model calculates the relevance score between the question and retrieved N contexts as the probability of generating the answer as True using the prompt in Table 1, then reranks contexts to only retain top-k (k ≪ N) contexts, which are then used as the input for the generation step. (3) The top-k contexts, along with the question, are concatenated and fed back into the RankRAG model to generate the final answer.

— -

Conclusion: In this work, we introduce a new RAG framework, RankRAG, which instruction-tunes a single LLM for both ranking and answer generation. We find that the instruction tuned LLMs can outperform existing expert ranking models by only adding a small fraction of ranking data into the training blend. We compare our RankRAG with the state-of-the-art RAG models on comprehensive knowledgeintensive benchmarks and demonstrate RankRAG significantly outperform all of them on nine general-domain and five biomedical benchmarks for RAG

Rank RAG // not a surprise

Single LLM for top-k Ranking and Answer Generation in RAG

Written by sbagency

No responses yet