Retrieval augmented generation // RAG, keep your LLMs updated

1 min readDec 26, 2023

The key points of RAG:

- RAG (Retrieval-Augmented Generation) is a strategy for improving LLMs by retrieving relevant context from a vector database.

- To implement RAG, you need to continuously update the vector DB with new data via a streaming pipeline.

- The streaming pipeline has two components:
1. A streaming framework like Bytewax to ingest and process data in real-time.
2. A vector database like Qdrant to store the embedded document vectors.

- For a financial news app, the pipeline would ingest news via REST APIs and websockets. The news would be cleaned, embedded, and indexed in the vector DB.

- When users ask questions, RAG can leverage the up-to-date vector DB to retrieve the most relevant, recent news articles as context.

- Using Bytewax and Qdrant makes building and maintaining this pipeline efficient and seamless.

- Overall, implementing this streaming pipeline allows RAG to tap into the latest data, avoiding hallucinations and reducing fine-tuning needs.

There are many techniques, projects, libs and frameworks for RAG.

// in progress…

Retrieval augmented generation // RAG, keep your LLMs updated

Written by sbagency

No responses yet