Retrieval augmented generation with graphs // no chunking
Writer platform is based on graph-based RAG, answers from apps people build with us are accurate, insightful, and useful
Fact 1: When you “chunk” data to be fed into an embeddings model, you flatten out all of the ideas.
Is a chunk a sentence that got broken up into two? Did a piece of information get extracted from a table? Is one sentence nested under another or is it a piece of information that stands on its own? That context is lost as soon as you’ve chunked. And fundamentally, because an LLM treats every chunk as a separate idea, this breaks context.
Fact 2: The LLM doesn’t know how many chunks to grab when it’s trying to answer a question.
Sometimes it’ll grab too few from the vector DB; sometimes too many. Vector DBs use KNN algos, which limit the amount of information they can pull. If K is a low number, then RAG will return an answer that’s incomplete; if too high, the answer will include information that’s only tangentially related.
Why is this all problematic?
If you’re building an AI app for your employees, there’s a lot of CONTEXT that’s assumed in the types of questions they ask.
Any app you build for your teams is going to be considered DAH (dumb as hell) if they have to explain deep context when they ask your app to do something.
The data you use for RAG is also likely to be dynamic — whether it’s structured or unstructured, there’s a bunch of data that changes often, which means you gotta get the old chunks and replace them with the rechunked passages every time your data gets updated.
Add RBAC requirements to this, and it’s no wonder so few enterprises have stuff in production that lives up to the expectations people have for generative AI.
What’s the answer?
At Writer, we use LLMs to build AI knowledge graphs of your data before doing anything else.
AI graphs and traditional RAG (love that a 3-year-old technique is “traditional” now) can both answer SIMPLE questions — ie, What iPhones costs $700?
But in two very common types of questions — multi-hop questions and list of properties questions — we’ve seen graph-based to be a MUCH better approach to RAG.
Multi-hop variation:
Which iPhone accessories have promotions going on?Multi-hop is when you have to retrieve information from multiple passages to answer a question, and multi-hop is literally the reason to build anything; your employees aren’t asking for faster ctrl+f on a single document — they’re asking for real intelligence, which is why so many people are disillusioned.
List-of-properties variation:
What are all of the iPhone offers we have now?It’s hard for traditional RAG to be right when the data looks like what it does at most companies: highly repetitive, some out of date, and all on roughly the same topic. Generated answers are rarely exhaustive and thus wrong.
In the harder question types, there’s a time/duration element too. Many use cases require dynamic upkeep, and that’s a big second reason (first = accuracy) that apps aren’t shipping.
I get why: there are dozens of vectorDB companies and every db company is adding vector storage capabilities; surely that’s the way to go??
The fact that there ARE so many vector DB companies should be a red flag: there aren’t hundreds of graph DB / relational DB companies. Once an approach scales, a few winners emerge, and it’s clear that that has yet to happen in vector DB land for generative AI, and you should be asking yourself why.
First, context: The primary advantage of embeddings + vector search is that you can search for semantic similarity, not just literal keyword matches.
The reason you can search by similarity and not linguistic matches is bc you use embeddings models to turn the query into a numerical representation, and do the same thing to your data: so you’re comparing numbers to numbers, and thus better able to approximate conceptual meaning.
And bc the embeddings representation of your data sits in multi-dimensional space, you need a vector DB, which was invented for genetic research, to store those representations.
Embeddings capture semantic similarity between your data and a **query**, but *do not* also store or connect the relationships *between* data in said multi-dimensional space.
OK, back to my Monday post.
“May, don’t be silly, no one does fixed chunking, we solve the chunking problems you describe with hierarchical and context-aware chunking.”
Rebuttal: It’s chunks vs nodes and edges!!
A node in your graph could be a single portion of a page or table, for ex, and the *relationships* between the nodes in your graph are called edges: and those nodes and edges are specific pieces of information that can provide more context to support a query than hierarchical ranking of chunks.
There’s just no comparison in knowledge management use cases between graph-based approaches and vector DB-based approaches to RAG bc of the level of DETAIL that can be drawn upon to answer multi-hop questions.
Most organizations don’t have the NLP engineering / linguistics expertise in house to be building the primitives for the constant sentence boundary recognition, PoS tagging, parsing, table logic, chart logic, etc that’s required across hundreds of different file types to do hierarchical chunking well — plus, you’re rebuilding a TON of logic / doing a ton of small LLM fine-tuning to recreate the temporal / topical / semantic / entity context you get with graph-based approaches, that all has to be scaled AND maintained. Add multi-modalities to that, and your team is fried before they’ve scaled a single AI assistant.
Writer uses specially-trained LLMs to build your knowledge graphs, THEN does graph-based RAG.
imo the combination of AI knowledge graphs and LLMs **is** basically gonna be AGI in the enterprise. Especially in enterprise contexts, nodes and edges are your alpha.
We’re not anti-vector DB at Writer, but there’s no question that a knowledge graph-based approach helps enterprises build more intelligent applications in many, many settings.
I wish more folks knew about the graph-based approach — wasting time building apps that are not that smart is the best way to burn through AI budget — and appetite.
Thanks to the always-prescient Richard MacManus for taking some time to learn about our approach.
Because the Writer platform is based on graph-based RAG, answers from apps people build with us are accurate, insightful, and useful — so the adoption and daily use is there.
That’s the table stakes for getting to PRODUCTION, and the only way to get ROI on these (sometimes massive) AI investments people are making.
You gotta build really intelligent apps people are using all the time.
Here is a summary of the key points from the article:
- Retrieval-augmented generation (RAG) is commonly used to integrate large language models (LLMs) with external data sources. RAG often relies on vector databases.
- Writer.com advocates for a “graph-based” RAG approach using knowledge graphs and graph databases instead of vector databases. They claim this achieves higher accuracy.
- Writer uses its own LLMs to add a “metadata layer” and map semantic relationships between data points when pre-processing data. This builds a knowledge graph.
- Writer’s CEO argues traditional RAG with vector databases loses context when chunking data. Their approach maintains more relationships between data.
- Writer’s knowledge graphs are meant for machine consumption, not human use like traditional ontologies. But they could complement existing organizational knowledge graphs.
- Potential use cases include improving workflows in insurance, wealth management, CPG, retail, etc. Writer offers industry-specific “solution maps”.
- It’s unclear if Writer’s approach will gain traction over traditional RAG with vector databases. But it differentiates their offering and could interest graph database vendors.