Deep Dive

Foundation Models

The engines of intelligence - how they differ and why it matters

Foundation models are massive neural networks trained on internet-scale data that can be adapted to virtually any task. They're called 'foundation' because everything else - chatbots, agents, copilots - is built on top of them.

The landscape is a mix of closed-source leaders (Claude, GPT, Gemini) and open-source challengers (Llama, Mistral, Qwen). Each makes different tradeoffs in training data, architecture, safety, and capabilities.

How It Works

Data Collection

Trillions of tokens from books, web pages, code repositories, and curated datasets. Data quality is arguably more important than model size.

Pre-training

The model learns to predict the next token. This simple objective, applied at massive scale, produces emergent capabilities - reasoning, coding, multilingual understanding.

Post-training

Supervised fine-tuning (SFT) on instruction-response pairs teaches the model to follow directions. This is where 'helpful assistant' behavior emerges.

Alignment

RLHF (Reinforcement Learning from Human Feedback) or Constitutional AI refines the model to be helpful, harmless, and honest. Different labs use different approaches.

Evaluation

Models are tested on benchmarks (MMLU, HumanEval, MATH), arena rankings (Chatbot Arena), and real-world tasks. No single metric captures everything.

Deployment

Models are served via APIs (pay-per-token), self-hosted (open weights), or embedded in products. Context windows range from 8K to 1M+ tokens.

Key Components

Claude 4.5/4.6 (Anthropic)

Opus 4.6 is the most capable model. Constitutional AI, 200K context, best at analysis and agentic coding

GPT-5 / o3 (OpenAI)

GPT-5 for broad capability, o3 for deep reasoning chains. Multimodal, massive API ecosystem

Gemini 2.5 (Google)

Natively multimodal with 1M context, Gemini Flash for speed, Deep Research for complex tasks

Llama 4 (Meta)

Open-source leader with Scout (10M context) and Maverick models. Runs locally, massive fine-tuning ecosystem

DeepSeek R1

Chinese open-source reasoning model that rivaled o1. Proved reasoning can be distilled cheaply

Qwen 2.5 (Alibaba)

Leading multilingual open-source. Strong at math, code, and tool use. 0.5B to 72B range

Who's Building With This

Anthropic

Claude 4.5/4.6 Opus - top of Chatbot Arena. Pioneered computer use agents and Constitutional AI.

OpenAI

GPT-5, o3 reasoning, Codex agent, DALL-E, Sora. Largest developer ecosystem and API platform.

Google DeepMind

Gemini 2.5 family, AlphaFold (biology), Veo 2 (video). Research + product integration.

Meta AI

Open-source strategy with Llama 4. Democratizing AI - powering thousands of fine-tuned derivatives.

Key Takeaway

Foundation models are converging in capabilities but diverging in philosophy. The real differentiator isn't raw intelligence - it's safety, reliability, context handling, and ecosystem integration.

References & Further Reading

← STORY OF INTELLIGENCE HOME