Teaching AI to look things up before answering
LLMs are trained on static data - they don't know what happened yesterday, can't read your company docs, and sometimes confidently make things up. RAG solves this by giving the model a library card: before answering, it searches a knowledge base for relevant information.
Think of it like an open-book exam. Instead of relying purely on memory (the model's training data), RAG lets the AI look up the answer in real documents - then synthesize a response grounded in actual facts.
PDFs, web pages, and docs are split into small chunks (typically 200-500 tokens). Each chunk becomes a searchable unit of knowledge.
Each chunk is converted into a vector - a list of numbers that captures its meaning. Similar concepts end up close together in vector space.
Vectors are stored in a vector database with efficient similarity search indices (HNSW, IVF). This enables millisecond retrieval from millions of documents.
When a user asks a question, the query is also converted to a vector. The system finds the most semantically similar chunks.
Top-k retrieved chunks are assembled into a prompt alongside the user's question. The LLM now has relevant context to work with.
The LLM generates an answer grounded in the retrieved documents, with citations pointing back to source material.
LangChain, LlamaIndex, Unstructured - parse any format
Fixed-size, semantic, recursive - how to split documents
OpenAI ada-002, Cohere embed, BGE, Voyage AI
Pinecone, pgvector, Weaviate, Qdrant, ChromaDB
Hybrid search, re-ranking (Cohere, ColBERT), query decomposition
Claude, GPT-4, Gemini - any LLM with good instruction following
Real-time web search + RAG = AI-powered answer engine
RAG over your workspace - search across all your team's docs
Retrieves relevant code files as context for suggestions
Enterprise search across Slack, Drive, Confluence with AI answers
Key Takeaway
RAG transforms LLMs from closed-book test-takers into open-book researchers. The quality of your retrieval directly determines the quality of your AI's answers.