Big intelligence in small packages - AI that runs anywhere
Not every AI task needs a 175-billion parameter model. Small Language Models (SLMs) - typically 1B to 13B parameters - deliver surprisingly strong performance at a fraction of the cost, latency, and energy. They can run on phones, laptops, and edge devices.
The secret: knowledge distillation (learning from larger models), better training data curation, and architectural innovations like Mixture of Experts. A well-trained 7B model today outperforms GPT-3 (175B) from 2020.
A large 'teacher' model generates training data. The smaller 'student' model learns to mimic its outputs. The student captures 80-90% of the teacher's capability at 10x less cost.
Training on fewer but higher-quality tokens. Microsoft's Phi proved that textbook-quality data can train a 2.7B model that rivals models 25x its size.
Reducing number precision from 16-bit to 8-bit or 4-bit (GGUF, GPTQ, AWQ). A 7B model goes from 14GB to 4GB - fits in phone memory.
Grouped Query Attention, SwiGLU, RoPE - architectural choices that maximize performance per parameter. Mixture of Experts activates only relevant subnetworks.
Frameworks like llama.cpp, MLX (Apple), ONNX Runtime, and MediaPipe run models directly on consumer hardware - no cloud needed.
2.7B-14B params, trained on 'textbook quality' data, punches above its weight
2B-27B open models, optimized for on-device, strong safety training
1B-3B lightweight models for mobile and edge deployment
The model that proved open-source 7B can compete with proprietary 30B+
0.5B-72B range, strong multilingual, excellent at math and code
On-device models for summarization, writing, Siri - privacy by design
On-device AI for iOS/macOS - summarize, rewrite, Siri, all private
Phi models prove small can be mighty. Powers Copilot on-device features.
Run any open model locally with one command. 1M+ developers.
NPUs in mobile chips - dedicated silicon for on-device AI inference
Key Takeaway
The future isn't just bigger models - it's the right-sized model for each task. SLMs enable AI everywhere: offline, private, instant, and cheap. The 7B model on your phone today is smarter than the 175B model of 2020.