Deep Dive

Retrieval-Augmented Generation

Teaching AI to look things up before answering

LLMs are trained on static data - they don't know what happened yesterday, can't read your company docs, and sometimes confidently make things up. RAG solves this by giving the model a library card: before answering, it searches a knowledge base for relevant information.

Think of it like an open-book exam. Instead of relying purely on memory (the model's training data), RAG lets the AI look up the answer in real documents - then synthesize a response grounded in actual facts.

How It Works

Document Ingestion

PDFs, web pages, and docs are split into small chunks (typically 200-500 tokens). Each chunk becomes a searchable unit of knowledge.

Embedding

Each chunk is converted into a vector - a list of numbers that captures its meaning. Similar concepts end up close together in vector space.

Indexing

Vectors are stored in a vector database with efficient similarity search indices (HNSW, IVF). This enables millisecond retrieval from millions of documents.

Query Processing

When a user asks a question, the query is also converted to a vector. The system finds the most semantically similar chunks.

Context Assembly

Top-k retrieved chunks are assembled into a prompt alongside the user's question. The LLM now has relevant context to work with.

Generation

The LLM generates an answer grounded in the retrieved documents, with citations pointing back to source material.

Key Components

Document Loaders

LangChain, LlamaIndex, Unstructured - parse any format

Chunking Strategies

Fixed-size, semantic, recursive - how to split documents

Embedding Models

OpenAI ada-002, Cohere embed, BGE, Voyage AI

Vector Stores

Pinecone, pgvector, Weaviate, Qdrant, ChromaDB

Retrievers

Hybrid search, re-ranking (Cohere, ColBERT), query decomposition

Generators

Claude, GPT-4, Gemini - any LLM with good instruction following

Who's Building With This

Perplexity

Real-time web search + RAG = AI-powered answer engine

Notion AI

RAG over your workspace - search across all your team's docs

GitHub Copilot

Retrieves relevant code files as context for suggestions

Glean

Enterprise search across Slack, Drive, Confluence with AI answers

Key Takeaway

RAG transforms LLMs from closed-book test-takers into open-book researchers. The quality of your retrieval directly determines the quality of your AI's answers.

References & Further Reading

← STORY OF INTELLIGENCE HOME