Here are a collection of Jupyter iPython notebooks I've created to solidify my understanding of ideas during my studies of probability, stats and machine learning.
I've attempted a few Kaggle competitions, here are the relevant notebooks:
- Titanic: Machine Learning from Disaster
- Forest Cover Type Prediction
- Predicting Red Hat Business Value
- Attempt 1 quick first attempt using routine preprocessing pipeline for categorical / quantitative variables and using logistic regression and random forest models. Ignores categorical variables with thousands of unique values that can't be one-hot encoded.
- Attempt 2 exploring ways of including categorical variables that have thousands of unique values (ordinal, mix of one-hot and binary)