Phase 2: Supervised Logic
Phase 2: Supervised Logic
In this phase, we move to Non-Linear models that can handle complex decision boundaries.
π’ Level 1: Decision Trees
A series of βIf-Thenβ rules.
- Criteria: Gini Impurity or Entropy (Information Gain).
- Pros: Highly interpretable.
- Cons: Prone to massive overfitting.
π‘ Level 2: Ensemble Methods (Wisdom of the Crowd)
Combine multiple models to improve performance.
1. Bagging (Bootstrap Aggregating)
Train multiple trees on random subsets of data and average them.
- Standard Tool: Random Forest.
2. Boosting
Train models sequentially. Each new model corrects the errors of the previous ones.
- Standard Tools: XGBoost, LightGBM, CatBoost.
π΄ Level 3: Support Vector Machines (SVM)
Finding the Hyperplane that maximizes the βMarginβ between classes.
3. The Kernel Trick
SVMs can map data to an infinite-dimensional space to find a linear boundary for non-linear data.
- RBF Kernel: The most popular for complex clusters.