Expository notebooks

Here are a collection of Jupyter iPython notebooks I've created to solidify my understanding of ideas during my studies of probability, stats and machine learning.

Kaggle notebooks

I've attempted a few Kaggle competitions, here are the relevant notebooks:

  • Titanic: Machine Learning from Disaster
    • Attempt 1: quick and dirty first attempt with a few models
    • Attempt 2: more exploration, feature engineering
  • Forest Cover Type Prediction
    • Attempt 1: quick and dirty first attempt with a few models, also using PCA
    • Attempt 2: pipelines, deeper performance analysis with k-fold cross validation and learning curves, hyperparameter tuning
  • Predicting Red Hat Business Value
    • Attempt 1 quick first attempt using routine preprocessing pipeline for categorical / quantitative variables and using logistic regression and random forest models. Ignores categorical variables with thousands of unique values that can't be one-hot encoded.
    • Attempt 2 exploring ways of including categorical variables that have thousands of unique values (ordinal, mix of one-hot and binary)