🚧 ⚠️ Under Construction - Active Development 🔨 🚧
TinyTorch is under active construction! We're building in public and sharing our progress for early feedback. Expect frequent updates, changes, and improvements as we develop the framework together with the community.

TinyTorch: Build ML Systems from Scratch#

Don't just import it. Build it.

What is TinyTorch?#

TinyTorch is an educational ML systems course where you build complete neural networks from scratch. Instead of blindly using PyTorch or TensorFlow as black boxes, you implement every component yourself—from tensors and gradients to optimizers and attention mechanisms—gaining deep understanding of how modern ML frameworks actually work.

Core Learning Approach: Build → Profile → Optimize. You’ll implement each system component, measure its performance characteristics, and understand the engineering trade-offs that shape production ML systems.

The ML Evolution Story You’ll Experience#

Journey through 40+ years of ML breakthroughs by building each era yourself: 1980s neural foundations → 1990s backpropagation → 2012 CNN revolution → 2017 transformer unification → 2024 production optimization. Each module teaches both the breakthrough AND the systems engineering that made it possible.

đź“– See Complete ML Evolution Timeline for the full historical context and technical progression.

🏆 Prove Your Mastery Through History#

As you complete modules, unlock historical milestone demonstrations that prove what you’ve built works! From Rosenblatt’s 1957 perceptron to modern CNNs achieving 75%+ accuracy on CIFAR-10, each milestone recreates a breakthrough using YOUR implementations:

  • đź§  1957: Perceptron - First trainable network with YOUR Linear layer

  • ⚡ 1969: XOR Solution - Multi-layer networks with YOUR autograd

  • 🔢 1986: MNIST MLP - Backpropagation achieving 95%+ with YOUR optimizers

  • 🖼️ 1998: CIFAR-10 CNN - Spatial intelligence with YOUR Conv2d (75%+ accuracy!)

  • 🤖 2017: Transformers - Language generation with YOUR attention

  • ⚡ 2024: Systems Age - Production optimization with YOUR profiling

đź“– See Journey Through ML History for complete milestone details and requirements.

Why Build Instead of Use?#

The difference between using a library and understanding a system is the difference between being limited by tools and being empowered to create them. When you build from scratch, you transform from a framework user into a systems engineer:

❌ Using PyTorch

import torch.nn as nn
import torch.optim as optim

model = nn.Linear(784, 10)
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Your model trains but then...
# 🔥 OOM error! Why?
# 🔥 Loss is NaN! How to debug?
# 🔥 Training is slow! What's the bottleneck?

You're stuck when things break

❌ Using TensorFlow

import tensorflow as tf

model = tf.keras.Sequential([
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dense(10)
])

# Magic happens somewhere...
# 🤷 How are gradients computed?
# 🤷 Why this initialization?
# 🤷 What's happening in backward pass?

Magic boxes you can't understand

âś… Building TinyTorch

class Linear:
    def __init__(self, in_features, out_features):
        self.weight = randn(in_features, out_features) * 0.01
        self.bias = zeros(out_features)

    def forward(self, x):
        self.input = x  # Save for backward
        return x @ self.weight + self.bias

    def backward(self, grad):
        # You wrote this! You know exactly why:
        self.weight.grad = self.input.T @ grad
        self.bias.grad = grad.sum(axis=0)
        return grad @ self.weight.T

You can debug anything

âś… Building KV Cache

class KVCache:
    def __init__(self, max_seq_len, n_heads, head_dim):
        # You understand EXACTLY the memory layout:
        self.k_cache = zeros(max_seq_len, n_heads, head_dim)
        self.v_cache = zeros(max_seq_len, n_heads, head_dim)
        # That's why GPT needs GBs of RAM!

    def update(self, k, v, pos):
        # You know why position matters:
        self.k_cache[pos:pos+len(k)] = k  # Reuse past computations
        self.v_cache[pos:pos+len(v)] = v  # O(n²) → O(n) speedup!
        # Now you understand why context windows are limited

You master modern LLM optimizations

Who Is This For?#

Perfect if you’re asking these questions:

ML Systems Engineers: “Why does my model training OOM at batch size 32? How do attention mechanisms scale quadratically with sequence length? When does data loading become the bottleneck?” You’ll build and profile every component, understanding memory hierarchies, computational complexity, and system bottlenecks that production ML systems face daily.

Students & Researchers: “How does that nn.Linear() call actually compute gradients? Why does Adam optimizer need 3× the memory of SGD? What’s actually happening during a forward pass?” You’ll implement the mathematics you learned in class and discover how theoretical concepts become practical systems with real performance implications.

Performance Engineers: “Where are the actual bottlenecks in transformer inference? How does KV-cache reduce computation by 10-100×? Why does my CNN use 4GB of memory?” By building these systems from scratch, you’ll understand memory access patterns, cache efficiency, and optimization opportunities that profilers alone can’t teach.

Academics & Educators: “How can I teach ML systems—not just ML algorithms?” TinyTorch provides a complete pedagogical framework emphasizing systems thinking: memory profiling, performance analysis, and scaling behavior are built into every module, not added as an afterthought.

ML Practitioners: “Why does training slow down after epoch 10? How do I debug gradient explosions? When should I use mixed precision?” Even experienced engineers often treat frameworks as black boxes. By understanding the systems underneath, you’ll debug faster, optimize better, and make informed architectural decisions.

How to Choose Your Learning Path#

Two Learning Approaches: You can either build it yourself (work through student notebooks and implement from scratch) or learn by reading (study the solution notebooks to understand how ML systems work). Both approaches use the same Build → Profile → Optimize methodology at different scales.

🔬 Quick Start

15 minutes setup • Try foundational modules • Hands-on experience

Start Building →

📚 Full Course

8+ weeks study • Complete ML framework • Systems understanding

Course Overview →

🎓 Instructors

Classroom-ready • NBGrader integration • Automated grading

Teaching Guide →

📊 Learning Community

Track progress • Join competitions • Student leaderboard

View Progress →

Getting Started#

Whether you’re just exploring or ready to dive in, here are helpful resources: 📖 See Essential Commands for complete setup and command reference, or 📖 See Complete Course Structure for detailed module descriptions.

Additional Resources:

TinyTorch is more than a course—it’s a community of learners building together. Join thousands exploring ML systems from the ground up.