The key points of RAG:
- RAG (Retrieval-Augmented Generation) is a strategy for improving LLMs by retrieving relevant context from a vector database.
- To implement RAG, you need to continuously update the vector DB with new data via a streaming pipeline.
- The streaming pipeline has two components:
1. A streaming framework like Bytewax to ingest and process data in real-time.
2. A vector database like Qdrant to store the embedded document vectors.
- For a financial news app, the pipeline would ingest news via REST APIs and websockets. The news would be cleaned, embedded, and indexed in the vector DB.
- When users ask questions, RAG can leverage the up-to-date vector DB to retrieve the most relevant, recent news articles as context.
- Using Bytewax and Qdrant makes building and maintaining this pipeline efficient and seamless.
- Overall, implementing this streaming pipeline allows RAG to tap into the latest data, avoiding hallucinations and reducing fine-tuning needs.
There are many techniques, projects, libs and frameworks for RAG.
// in progress…