Skip to content

Machine Learning Notes

Hyperparameter Tuning

Machine Learning Notes

Home
📚 Learning Plan
📊 Mathematical Foundations
📊 Mathematical Foundations
🔗 Engineering and Data Structures
🔗 Engineering and Data Structures
- Overview
- Data Structures
  Data Structures
  - Arrays
    Arrays
    
    Arrays Overview
    
    Dynamic Arrays
    
    Array Problems
  - Linked Lists
    Linked Lists
    
    Linked Lists Overview
    
    Implementation
    
    Problems
  - Stacks
    Stacks
    
    Stacks Overview
    
    Implementation
    
    Problems
  - Queues
    Queues
    
    Queues Overview
    
    Implementation
    
    Problems
  - Hash Tables
    Hash Tables
    
    Hash Tables Overview
    
    Hash Functions & Collisions
    
    Python Dictionaries
    
    Python Dictionary Operations
    
    Sets and Python Sets Overview
    
    Python Set Operations
    
    String Operations
    
    Hash Table Problems
  - Recursion
    Recursion
    
    Recursion Fundamentals
    
    Recursive Algorithms
    
    Recursion vs Iteration
    
    Common Recursive Patterns
- Algorithms
  Algorithms
  - Search Algorithms
    Search Algorithms
    
    Search Overview
    
    Binary Search
    
    Binary Search Variations
    
    Search Problems
  - Sorting Algorithms
    Sorting Algorithms
    
    Sorting Overview
    
    Sorting Problems
- Algorithmic Patterns
  Algorithmic Patterns
  - Two Pointers
    Two Pointers
    
    Two Pointers Overview
    
    Opposite Direction
    
    Same Direction
    
    Two Pointers Problems
  - Sliding Window
    Sliding Window
    
    Sliding Window Overview
    
    Fixed Size Window
    
    Variable Size Window
    
    Sliding Window Problems
- Problem Solving
  Problem Solving
  - Set & Dictionary Problems
    Set & Dictionary Problems
    
    Array Intersection
    
    Non-Repeating Elements
    
    Unique Elements
    
    Anagram Pairs
  - String Problems
    String Problems
    
    Unique Strings
- Resources
  Resources
📈 Probability & Statistics
📈 Probability & Statistics
- Probability & Markov
  Probability & Markov
🤖 ML Fundamentals
🤖 ML Fundamentals
- Overview
- Feature Engineering
  Feature Engineering
- Model Evaluation
  Model Evaluation
  - Evaluation Methods
  - Metrics & Validation
  - Hyperparameter Tuning Hyperparameter Tuning
    Table of contents
    
    Common Ways of Hyperparameter Tuning
    
    Grid Search
    
    Random Search
    
    Bayesian Optimization
    
    Related Topics
- Regularization
  Regularization
- Classical Algorithms
  Classical Algorithms
- Unsupervised Learning
  Unsupervised Learning
  - K-Nearest Neighbors
  - K-Means Clustering
💬 Language Models & NLP
💬 Language Models & NLP
- N-gram Language Modeling
- Information Theory
🧠 Neural Networks & Deep Learning
🧠 Neural Networks & Deep Learning
- Overview
- Perceptron Algorithm
⚡ Attention & Transformers
⚡ Attention & Transformers
- Overview
- Attention Fundamentals
  Attention Fundamentals
  - Overview
  - Mathematics
- Self-Attention
  Self-Attention
  - Overview
- Multi-Head Attention
  Multi-Head Attention
  - Overview
- Positional Encoding
  Positional Encoding
  - Overview
- Transformer Architecture
  Transformer Architecture
  - Overview
- Transformer Variants
  Transformer Variants
  - BERT & GPT
🚀 Projects & Integration
🚀 Projects & Integration
- Integration and Project

Hyperparameter Tuning¶

For a lot of algorithm engineers, hyperparameter tuning can be really of headache, as there is no other way other than empirically tune the parameters to a reasonable range, while it is really important for the algorithm to be effective.

Common Ways of Hyperparameter Tuning¶

Grid Search¶

Exhaustive on a small, low-dimensional space. Deterministic but expensive; scales poorly. In reality, it tend to be used as a bigger search space and larger step size to find the possible range of optimal results, then to shrink the search space and find more accurate optimal solution.

Random Search¶

Sample hyperparams at random (often log-uniform for learning rates). Much better than grid when only a few dims matter but cannot guarantee for a optimal solution.

Bayesian Optimization¶

Model config -> score to pick promising next trials. Unlike random/grid search do not learn from past trials, BO uses what you have learned so far to place the next (expensive) trial where it is most likely to pay off.

Evaluation Methods - Using evaluation methods for tuning
Metrics & Validation - Using metrics to guide tuning
Regularization - Tuning regularization parameters