Skip to content

N-gram Language Modeling

N-gram Models & Language Modeling

Introduction

N-gram language models are the classical approach to predicting words based on fixed-length context. They remain a great entry point for understanding how language modeling works, how to evaluate models with perplexity, and why data sparsity motivates neural approaches.

Knowledge Points

  • Textual Descriptive Models
  • What is an n-gram?
  • Building and training an n-gram language model
  • Perplexity: definition & interpretation
  • Limitations of n-gram models (data sparsity, context window)
  • Implementing bigram or trigram models on a toy corpus