Is RAG commoditized? What after? -Agents

RAG isn’t perfect solution, but there is not so much options

sbagency
4 min readAug 22, 2024
https://www.youtube.com/watch?v=L0kBWyziFlc

The speaker introduces a presentation focused on the construction industry, emphasizing its significance and the role of their company, Trun Tools, which is a leading generative AI provider for this sector. They highlight the complexity of construction projects, exemplified by a skyscraper in New York that involved 3.6 million pages of documentation. This vast amount of data leads to inefficiencies and costly errors, with 10% of the $15 trillion construction industry being spent on rework due to data discrepancies.

Trun Tools aims to solve these issues by centralizing and analyzing all construction-related data through their AI system, referred to as the “brain behind construction.” They discuss deploying AI agents that can efficiently process and respond to queries, identifying mistakes within the massive data sets, and even generating corrective actions. The speaker underscores the importance of keeping humans central in the process while using AI to augment their capabilities. The presentation concludes with an invitation to learn more at their booth, as the company is actively hiring to expand its impact on the industry.

https://research.google/blog/speculative-rag-enhancing-retrieval-augmented-generation-through-drafting/

Speculative RAG is a novel Retrieval Augmented Generation framework that uses a smaller specialist LM to generate draft texts that are then fed to a larger generalist LM to verify and select the best draft. Speculative RAG achieves state-of-the-art performance both in accuracy and efficiency.

https://arxiv.org/pdf/2407.08223

Retrieval augmented generation (RAG) combines the generative abilities of large language models (LLMs) with external knowledge sources to provide more accurate and up-to-date responses. Recent RAG advancements focus on improving retrieval outcomes through iterative LLM refinement or self-critique capabilities acquired through additional instruction tuning of LLMs. In this work, we introduce SPECULATIVE RAG — a framework that leverages a larger generalist LM to efficiently verify multiple RAG drafts produced in parallel by a smaller, distilled specialist LM. Each draft is generated from a distinct subset of retrieved documents, offering diverse perspectives on the evidence while reducing input token counts per draft. This approach enhances comprehension of each subset and mitigates potential position bias over long context. Our method accelerates RAG by delegating drafting to the smaller specialist LM, with the larger generalist LM performing a single verification pass over the drafts. Extensive experiments demonstrate that SPECULATIVE RAG achieves state-of-the-art performance with reduced latency on TriviaQA, MuSiQue, PubHealth, and ARC-Challenge benchmarks. It notably enhances accuracy by up to 12.97% while reducing latency by 51% compared to conventional RAG systems on PubHealth.

https://arxiv.org/pdf/2408.02545

Implementing Retrieval-Augmented Generation (RAG) systems is inherently complex, requiring deep understanding of data, use cases, and intricate design decisions. Additionally, evaluating these systems presents significant challenges, necessitating assessment of both retrieval accuracy and generative quality through a multi-faceted approach. We introduce RAG FOUNDRY, an open-source framework for augmenting large language models for RAG use cases. RAG FOUNDRY integrates data creation, training, inference and evaluation into a single workflow, facilitating the creation of data-augmented datasets for training and evaluating large language models in RAG settings. This integration enables rapid prototyping and experimentation with various RAG techniques, allowing users to easily generate datasets and train RAG models using internal or specialized knowledge sources. We demonstrate the framework effectiveness by augmenting and finetuning Llama-3 and Phi-3 models with diverse RAG configurations, showcasing consistent improvements across three knowledge-intensive datasets. Code is released as open-source in https://github.com/IntelLabs/RAGFoundry.

Restrict the LLM to only what RAG provides

https://www.builder.io/blog/make-ai-suck-less

--

--

sbagency

Tech/biz consulting, analytics, research for founders, startups, corps and govs.