N-gram Language Modeling
N-gram Models & Language Modeling¶
Introduction¶
N-gram language models are the classical approach to predicting words based on fixed-length context. They remain a great entry point for understanding how language modeling works, how to evaluate models with perplexity, and why data sparsity motivates neural approaches.
Knowledge Points¶
- Textual Descriptive Models
- What is an n-gram?
- Building and training an n-gram language model
- Perplexity: definition & interpretation
- Limitations of n-gram models (data sparsity, context window)
- Implementing bigram or trigram models on a toy corpus