RAG vs FT // hack & use both

Both RAG and fine-tuning are effective techniques, and their suitability depends on the specific application, dataset nature, and available resources.

sbagency
5 min readJan 18, 2024
https://arxiv.org/pdf/2401.08406.pdf

There are two common ways in which developers are incorporating proprietary and domain-specific data when building applications of Large Language Models (LLMs): Retrieval-Augmented Generation (RAG) and Fine-Tuning. RAG augments the prompt with the external data, while fine-Tuning incorporates the additional knowledge into the model itself. However, the pros and cons of both approaches are not well understood. In this paper, we propose a pipeline for fine-tuning and RAG, and present the tradeoffs of both for multiple popular LLMs, including Llama2–13B, GPT-3.5, and GPT-4. Our pipeline consists of multiple stages, including extracting information from PDFs, generating questions and answers, using them for fine-tuning, and leveraging GPT-4 for evaluating the results. We propose metrics to assess the performance of different stages of the RAG and fine-Tuning pipeline. We conduct an in-depth study on an agricultural dataset. Agriculture as an industry has not seen much penetration of AI, and we study a potentially disruptive application — what if we could provide location-specific insights to a farmer? Our results show the effectiveness of our dataset generation pipeline in capturing geographic-specific knowledge, and the quantitative and qualitative benefits of RAG and fine-tuning. We see an accuracy increase of over 6 p.p. when fine-tuning the model and this is cumulative with RAG, which increases accuracy by 5 p.p. further. In one particular experiment, we also demonstrate that the fine-tuned model leverages information from across geographies to answer specific questions, increasing answer similarity from 47% to 72%. Overall, the results point to how systems built using LLMs can be adapted to respond and incorporate knowledge across a dimension that is critical for a specific industry, paving the way for further applications of LLMs in other industrial domains.

This study focuses on evaluating the capabilities of large language models (LLMs), specifically LLama 2, GPT-3.5, and GPT-4, in addressing complex problems in agriculture. The researchers assess the performance of these models using techniques such as RAG (Retrieval-Augmented Generation) and fine-tuning. The study establishes performance baselines and discusses the strengths and limitations of each approach.

The key contributions of the paper include providing insights into the benefits and costs of using RAG and fine-tuning. RAG is effective in contexts with contextually relevant data, such as interpreting farm data, but it may lead to more verbose outputs. Fine-tuning, while offering precise and succinct outputs, requires extensive initial work and is more efficient for handling large datasets. The study compares the two approaches and presents a table summarizing their differences.

Additionally, the research introduces a pipeline for applying RAG and fine-tuning techniques in various LLMs, promoting innovation and collaboration across industries. The focus on agriculture demonstrates how these strategies can lead to more efficient models in the question and answer generation process.

The study also explores the generation of relevant questions and answers for industry-specific datasets using structured document understanding, GPT-4 for question generation, and RAG for answer generation. The generated questions are specific to their respective sections, and the model utilizes the entire text to produce insightful answers. The separation of question and answer generation allows for efficient token usage, opening up possibilities for using different models or approaches for each component.

While GPT-4 consistently outperforms other models, the study emphasizes the trade-offs associated with its fine-tuning and inference costs. In conclusion, both RAG and fine-tuning are effective techniques, and their suitability depends on the specific application, dataset nature, and available resources. The study suggests further investigations on combining these approaches and exploring dataset generation pipelines for industry-specific LLM applications, along with examining the knowledge gained through fine-tuning and improving structured extraction from documents.

https://arxiv.org/pdf/2401.08967v1.pdf

One way to enhance the reasoning capability of Large Language Models (LLMs) is to conduct Supervised Fine-Tuning (SFT) using Chain-ofThought (CoT) annotations. This approach does not show sufficiently strong generalization ability, however, because the training only relies on the given CoT data. In math problemsolving, for example, there is usually only one annotated reasoning path for each question in the training data. Intuitively, it would be better for the algorithm to learn from multiple annotated reasoning paths given a question. To address this issue, we propose a simple yet effective approach called Reinforced Fine-Tuning (ReFT) to enhance the generalizability of learning LLMs for reasoning, with math problemsolving as an example. ReFT first warmups the model with SFT, and then employs on-line reinforcement learning, specifically the PPO algorithm in this paper, to further fine-tune the model, where an abundance of reasoning paths are automatically sampled given the question and the rewards are naturally derived from the ground-truth answers. Extensive experiments on GSM8K, MathQA, and SVAMP datasets show that ReFT significantly outperforms SFT, and the performance can be potentially further boosted by combining inference-time strategies such as majority voting and re-ranking. Note that ReFT obtains the improvement by learning from the same training questions as SFT, without relying on extra or augmented training questions. This indicates a superior generalization ability for ReFT. The code of this work is publicly available1 .

--

--

sbagency
sbagency

Written by sbagency

Tech/biz consulting, analytics, research for founders, startups, corps and govs.

No responses yet