Machine Learning from Scratch implementation

// the_story

The Why

At some point early in learning ML, I noticed a pattern: model.fit(X, y) worked, but I had no idea why. The math in the lecture made sense. The code made sense. But the connection between them — the actual mechanics of how a model adjusts its weights, why a loss function looks the way it does, what the algorithm is doing at each step — that part was fuzzy. The only fix I knew was to remove the abstraction entirely and build it myself.

This repo is that process, documented.

The Algorithms

No Scikit-learn. No .fit(). Each algorithm starts from the math and ends at working code:

Linear Regression — gradient descent loop written by hand, cost function, weight updates, learning rate effects
k-Nearest Neighbors — Euclidean distance, majority vote, the k hyperparameter's effect on decision boundaries
Decision Trees — Gini impurity or information gain, recursive splitting, when a tree overfits vs. generalizes

The progression matters. Linear regression teaches you what a loss landscape looks like and how gradient descent navigates it. k-NN teaches you that sometimes the right answer is just "look at your neighbors" — no training, all inference. Decision trees teach you that splitting rules are just math on entropy.

examplecode.py

def fit(self, X, y, lr=0.01, epochs=1000):
    m, n = X.shape
    self.weights = np.zeros(n)
    self.bias = 0

    for _ in range(epochs):
        y_pred = X @ self.weights + self.bias
        error = y_pred - y

        dw = (2 / m) * X.T @ error
        db = (2 / m) * np.sum(error)

        self.weights -= lr * dw
        self.bias    -= lr * db

That loop — computing the gradient, scaling it by the learning rate, updating the weights — is what's happening inside every .fit() call you've ever made. Writing it once makes every future abstraction transparent.

What You Discover

Implementing from scratch surfaces things that tutorials skip:

Learning rate sensitivity is immediate and brutal. Set it too high and the loss diverges. Too low and it barely moves. You feel this before you understand it theoretically, which is exactly the right order.
k-NN has no training phase. The entire "model" is just the dataset stored in memory. Classification is all inference-time computation. That's an interesting design tradeoff that Scikit-learn hides completely.
Decision tree splits are just argmax over a metric. Gini impurity sounds intimidating until you implement it and realize it's a one-liner loop over class proportions.

You don't really know an algorithm until you've debugged a broken implementation of it at 1am.

What I Learned

Fundamentals compound. The gradient descent loop in linear regression is the same loop, conceptually, inside neural networks. Understanding it in a simple setting makes the complex version approachable instead of magical.
NumPy is doing a lot of work. Vectorizing operations correctly — matrix multiply instead of nested loops, broadcasting instead of explicit expansion — is its own skill. Implementing algorithms from scratch forces you to learn it.
Mistakes are load-bearing. A wrong learning rate, a transposed matrix, an off-by-one in k-NN's neighbor count — each bug teaches you something the correct implementation doesn't.
Libraries are justified abstractions, not shortcuts. After implementing k-NN from scratch and timing it against Scikit-learn's version on a real dataset, "use the library in production" becomes a reasoned conclusion rather than a default.

This repo doesn't have a live demo or a deployed model. It has something more useful for right now: a clear record of what I understood, what confused me, and how the confusion resolved. The other projects in my portfolio run on top of this foundation — the StyleForge decoder, the LoanIQ pipeline, the T5 fine-tuning loop. None of them would have made sense without first getting my hands dirty here.