The most obvious examples of systems that have implemented scalable solutions to this problem are search engines like Google, Bing, Apache Lucene, Apple Spotlight, and many others. As an industry, we’ve already created & iterated on highly scalable and highly available technologies using reverse indexes over the last few decades.
Old good keyword search is okay to use for RAG // for sure, but it isn’t so cool/buzz as embedding, vector databases and all this modern stuff
There’s an overabundance of vector databases, with over 20 available on the market. As an example, Langchain offers 60 different vector store options to pick from!
Vector databases come with hidden costs.
Vector search is an optimization.
Keyword search cannot be replaced with vector search — they are different capabilities.
Vector search is inherently limited compared to generative LLMs
Some engineers working on novel projects (e.g AutoGPT building autonomous agents) are ditching vector databases.
Smart autonomous agents will limit the need for smart indexes a.k.a. vector search