Gradient Boosting From Scratch: Weak Trees Fixing Each Other

#machinelearning #ai #beginners #datascience

Random forests build many trees in parallel and average them. Gradient boosting builds trees one at a time, each one fixing the previous trees' mistakes — and it's what wins most Kaggle competitions on tabular data. Here it is, fitting residuals live.

🌲 Watch it boost (add trees one by one): https://dev48v.infy.uk/ml/day15-gradient-boosting.html

The core loop

Start with a constant prediction (the mean).
Compute the residuals — how far off you are at each point.
Fit a small, shallow tree to those residuals.
Add it to the ensemble, scaled by a learning rate.
Repeat. Each tree chips away at the remaining error.

In the demo you watch the prediction curve bend toward the data and the residual bars shrink while the MSE drops every round.

Forest vs boosting

Random forest: independent trees, built in parallel, averaged. Reduces variance.
Boosting: dependent trees, built sequentially, summed. Reduces bias by correcting errors.

Learning rate = shrinkage

Small steps (e.g. 0.1) generalize better than big ones — but need more trees. Too many trees / too high a rate → overfitting, so use early stopping.

The "gradient" part: fitting residuals is just gradient descent on squared error; swap the loss and it generalizes (that's XGBoost / LightGBM / CatBoost).

🔨 Built from scratch (mean → residuals → tree → add lr×tree → repeat) on the page: https://dev48v.infy.uk/ml/day15-gradient-boosting.html

Part of MachineLearningFromZero. 🌐 https://dev48v.infy.uk