PRACTICE · MINI PROJECTS · NOTEBOOKS WITH TESTS

Mini projects — build it, and the concept sticks

Eight hands-on notebooks, each a 2–4 hour build with staged TODOs and asserts that tell you when each stage is right. They are self-contained (synthetic data, NumPy + matplotlib, PyTorch only where noted) and runnable on a laptop CPU. Work the TODOs in order; reference solutions sit at the bottom of each notebook behind a no-peeking divider. Pair them with the matching course chapters listed on each card.

8 notebooks 2–4h each Asserts grade each stage CPU-only <2 min runs
📐 How to work a notebook — the rule
  1. Download the .ipynb, open in Jupyter / VS Code / Colab.
  2. Read the stage's markdown, implement the TODO(you) function, then un-comment and run that stage's check_stageN().
  3. Asserts fail → re-read the stage explanation, not the solution.
  4. Only after a real attempt, compare against the solution_* implementation at the bottom — then re-write yours from memory the next day.

Never: read the solutions first. Recognition feels like knowledge; only recall is.

1 · Two-tower retrieval

Build: two-tower encoders on synthetic taste clusters, in-batch sampled-softmax with temperature, logQ correction, recall@10 vs a popularity baseline, then LSH vs brute-force retrieval timing.

Concepts: retrieval, in-batch negatives, ANN. · PyTorch.

Pairs with: Course ch14 · ML algorithms coding

Download notebook

2 · Transformer block from scratch

Build: scaled dot-product attention in NumPy with a 4-token shape walkthrough, causal masking (an assert catches your leak), multi-head reshape, LN + residual + MLP, then a tiny char-LM in PyTorch.

Concepts: attention, masking, residual stream. · NumPy → PyTorch.

Pairs with: Transformers internals · DL course

Download notebook

3 · BPE tokenizer

Build: byte-pair encoding from scratch — pair counting, the merge loop, vocab, encode/decode with round-trip asserts, compression ratio vs char-level.

Concepts: tokenization, why vocab size matters. · Pure Python.

Pairs with: Transformers · tokenization

Download notebook

4 · KV-cache lab

Build: a 2-layer NumPy decoder, naive generation that recomputes everything, then the KV-cached version — assert identical outputs, time both, plot O(T²) vs O(T), measure cache bytes.

Concepts: why inference needs the cache — by measuring it. · NumPy.

Pairs with: Course ch17 · LLM inference

Download notebook

5 · Mini-RAG retrieval lab

Build: TF-IDF retriever from scratch over a 40-doc corpus with answer keys, recall@k/MRR eval, an SVD "embedding" retriever, RRF hybrid fusion, and a chunking experiment — including one case where keywords beat semantics.

Concepts: retrieval quality is measurable. · NumPy.

Pairs with: Course ch20 (RAG)

Download notebook

6 · Calibration + A/B lab

Build: reliability diagrams and ECE from scratch on a miscalibrated classifier, Platt scaling by gradient descent, isotonic-lite; then an A/B simulator that shows peeking inflating false positives over 1000 runs, and CUPED shrinking CIs.

Concepts: calibration, testing discipline. · NumPy + matplotlib.

Pairs with: Prob/stats ch13 · Course ch12

Download notebook

7 · K-means + ANN index

Build: k-means with named helpers, k-means++ init, the elbow curve, then a random-hyperplane LSH index and the recall-vs-speed tradeoff plot.

Concepts: clustering, approximate search. · NumPy.

Pairs with: ML theory (k-means) · Course ch14 (ANN)

Download notebook

8 · Production debugging — you're on call

Build: 14 days of service metrics hide two planted incidents (a cache-warmup regression and a feature-null spike). Plot, write rolling z-score detectors, localize each incident to its causal metric, write the postmortem timeline. Asserts check your detected day and metric.

Concepts: monitoring, drift, incident reasoning. · NumPy + matplotlib.

Pairs with: Course ch13 · Debug challenges

Download notebook

Suggested order

Foundations first: 3 (BPE) → 2 (transformer) → 4 (KV cache) builds the LLM stack bottom-up. Then 1 (two-tower) → 7 (k-means/ANN) → 5 (RAG) for the retrieval stack. Finish with the production pair: 6 (calibration/A-B) → 8 (debugging). If you're prepping ML-systems interviews specifically, do 4 → 1 → 8 first — they map to the most-probed topics.