Expository notebooks
Here are a collection of Jupyter iPython notebooks I've created to solidify my understanding of ideas during my studies of probability, stats and machine learning.
-
Expectation Maximization with Coin Flips (Thu, 12/22)
-
Simulating Random Variables with Inverse Transform Sampling (Thu, 6/9)
-
The Sigmoid Function in Logistic Regression (Mon, 5/16)
-
The Birthday Problem Simulated (Fri, 4/29)
-
NBA Team Net Ratings (Thu, 3/17)
Kaggle notebooks
I've attempted a few Kaggle competitions, here are the relevant notebooks:
- Titanic: Machine Learning from Disaster
- Forest Cover Type Prediction
- Predicting Red Hat Business Value
- Attempt 1 quick first attempt using routine preprocessing pipeline for categorical / quantitative variables and using logistic regression and random forest models. Ignores categorical variables with thousands of unique values that can't be one-hot encoded.
- Attempt 2 exploring ways of including categorical variables that have thousands of unique values (ordinal, mix of one-hot and binary)