Skip to content

πŸ“š Learning Plan

Week 0: Overview of ML Fundamentals

The ML fundamentals section introduces model evaluation, classical algorithms and more, which becomes the building blocks of the topics in the weeks following.

Week 1-2: Probability Foundations + Markov Assumption

Week 3: N-gram Models & Language Modeling

  • Ngram_Language_Modeling
  • Topics:
    • What is an n-gram?
    • How n-gram language models work
    • Perplexity and limitations of n-gram models
  • Activities:
    • Implement a bigram/trigram model on a toy corpus
  • Resources:
    • The Illustrated Transformer - start with n-gram part
    • Happy-LLM intro chapter
    • Optional: n-gram language model notebook

Week 4: Intro to Information Theory

Week 5-6: Linear Algebra for ML

  • Linear_Algebra_for_ML
  • Topics:
    • Vectors, Matrices, Matrix Multiplication
    • Dot product, norms, projections
    • Eigenvalues & Singular Value Decomposition (SVD)
  • Activities:
    • Practice via small matrix coding problems (NumPy or PyTorch)
  • Resources:

Week 7: Calculus + Gradient Descent

Week 8-9: Neural Networks & Backpropagation

  • Neural_Networks_and_Deep_Learning_Overview
  • Topics:
  • Activities:
    • Implement a simple NN from scratch (e.g., on MNIST or XOR)
    • Derive gradient of softmax + cross-entropy
  • Resources:
    • Michael NielsenÒ€ℒs NN book: http://neuralnetworksanddeeplearning.com/
    • CS231n lecture on backprop

Week 10: Integration and Project

  • Integration_and_Project
  • Goal:
    • Build a mini-project combining n-gram + neural net ideas
    • Example: Predict the next word using both n-gram and a small MLP
  • Outcome:
    • Review all learned concepts
    • Prepare to transition to Happy-LLMÒ€ℒs transformer section

Phase 2: Modern Deep Learning - Attention & Transformers

Week 11-12: Attention Mechanisms

Week 13-14: Self-Attention & Multi-Head Attention

Week 15-16: Transformer Architecture

  • Transformer Architecture Overview
  • Topics:
    • Full transformer architecture: encoder and decoder stacks
    • Encoder: multi-head self-attention + feed-forward networks
    • Decoder: masked self-attention + cross-attention + feed-forward
    • Residual connections and layer normalization
    • Training transformers: learning rate warmup, label smoothing
  • Activities:
    • Implement a complete transformer from scratch
    • Train on a small machine translation task
    • Experiment with different hyperparameters (heads, layers, dimensions)
    • Analyze attention patterns in trained model
  • Resources:

Week 17-18: BERT and GPT - Modern Applications

  • BERT & GPT Overview
  • Topics:
    • BERT (encoder-only): Masked language modeling, bidirectional context
    • GPT (decoder-only): Causal language modeling, autoregressive generation
    • Pre-training vs. fine-tuning paradigm
    • Transfer learning with transformers
    • Prompt engineering and few-shot learning (GPT-3)
    • Instruction tuning and RLHF (ChatGPT, InstructGPT)
  • Activities:
    • Fine-tune a pre-trained BERT model for text classification
    • Generate text with GPT-2/GPT-3 using different prompting strategies
    • Compare encoder-only vs. decoder-only architectures
    • Experiment with prompt engineering for various tasks
  • Resources:

Week 19-20: Advanced Topics & Integration Project

  • Topics:
    • Efficient transformers: Sparse attention, Linformer, Reformer
    • Long-context models: Relative position encoding, ALiBi
    • Vision transformers (ViT): applying transformers to images
    • Multimodal transformers: CLIP, DALL-E, GPT-4
    • State-space models and alternatives to attention
  • Integration Project:
    • Build an end-to-end NLP application using transformers
    • Examples:
      • Question answering system with BERT
      • Text summarization with BART/T5
      • Chatbot with GPT-2 fine-tuning
      • Code generation assistant
  • Resources: