In the spring of 2016 I embarked on a learning sabbatical focused on machine learning. After a few months of full-time studying, I continue to study while on the job. Here are resources for you to follow along if you like.

  • Learning Log: day to day reflections as I work my way through the curriculum (not quite as good about updating this as during full time summer of studying)
  • Problem Sets: solutions to individual math problems I've worked through.
  • Notebooks and Kaggle submissions: Various iPython notebooks covering topics in probability and ML I write to solidify understanding, perhaps help others. Also some submissions to Kaggle competitions.
  • More IPython Notebooks and Python code on github: includes chapter by chapter notebooks for Python Machine Learning and various other materials.

How to Learn Machine Learning

Having the goal of "learning machine learning" is daunting. I've found the best way to make it tractable is to approach it in phases. Each phase should include at least one track that builds practical skills and one track focused on theoretical foundations. Additionally, it's always worth surveying the field at your current level of fluency to be on the look out for the next phase of studies and to continue to build a mental map of interconnected topics that may be prerequisites for the techniques and applications that you find most exciting.

Each track should focus on a specific curriculum resource, and then draw on supporting resources. For instance, you might choose a specific book or MOOC course you want to work through, and then draw on several related resources to cross-reference as you proceed. Sticking with a single main resource is important to staying focused; it's really easy and tempting to jump around various resources without making as much progress past introductory material. It's worth spending time upfront researching curriculum options before deciding; I usually find several good resources on a topic and do some initial skimming before deciding which becomes the primary resource, bookmarking the rest as supporting.

Phase 1: Applied Machine Learning & Probability & Statistics

Phase 1, which took me about 5 months to complete full-time studying, includes two tracks:

Track 1: Probability and Statistics

The goal of this track is to get comfortable with basic statistics and exploratory data analysis, and to build a solid theoretical foundation in probability theory that will make thinking more rigorously about machine learning possible. IMHO it is insufficient to rely solely on the brief intros to probability contained at the beginning of many machine learning texts; eating your spinach early will payoff over and over again as you approach more advanced models and techniques later on. As Wasserman puts it in the preface to All of Statistics:

Using fancy tools like neural nets, boosting, and support vector machines without understanding basic statistics is like doing brain surgery before knowing how to use a band-aid.

Slowing way down and struggling through real math problems is key here. I'd often spend hours, sometimes days on a single problem. Getting stuck = real learning. I recommend finding a study buddy and/or expert who's willing to help you get over the hump when you are truly stuck.

The track:

  1. Stanford's Free online Probability and Statistics Course: a nice gentle introduction before approaching the more axiomatic coverage of Wasserman. Good coverage of basic exploratory data analysis too.
  2. The first 5 chapters of Wasserman's All of Statistics text and problem sets from CMU's intermediate stats course (from the author) and another more introductory counterpart

Supporting resources:

What I appreciate about the All of Statistics book compared to others I've looked at, including my text from college, is that it doesn't spend too much time on counting methods (knowing how many ways one can deal a full house with a deck of cards isn't particularly relevant) and is otherwise more comprehensive on probability theory most relevant to machine learning. It is concise and somewhat dry, but it serves as a great road map of topics to study; the supporting resources and lectures can provide the additional context when necessary. Math Monk's videos are a particularly nice companion.

Track 2: Applied Machine Learning

The goal of this track is to gain practical experience applying supervised and unsupervised learning and data analysis techniques using Python, Scikit-learn and Jupyter notebooks and many of the practical considerations wrangling data using tools like Pandas and Numpy. By the end you will be able to build and evaluate predictive models that work with real data, and get exposed to many theoretical models. It's nice to get your feet wet and gain powerful skills right away even if you don't yet fully understand how everything works under the hood.

The track:

Supporting resources:

The Python Machine Learning book provides a great blend of practical concerns working with data (preprocessing, cross-validating) and exposure to models used for classification, regression and unsupervised learning and even gets into ensemble methods. By the end of chapter 4 you should be ready to take on your first Kaggle competition, and by the time you finish, approach many more and even tackle interesting new analysis problems that interest you. I recommend reading the book and working through all of the examples in a Jupyter notebook, looking at his provided notebooks whenever necessary and/or to copy over boilerplate code.

Survey the field

These are the resources that inspired me to leap into machine learning head first, and continued to provide companionship throughout phase 1.

I also bought Bishop's PRML and Murphy's MLPP a couple of months into my studies and wish I had sooner. They are great for perusing to plant seeds for future studies, and to begin connecting the dots from exciting advanced concepts back to probability theory.

Phase 2: Probabilistic ML and Computer Vision

The goal of phase 2 is to build on the theoretical knowledge of probability theory from phase 1 to gain a richer, probabilistic understanding of machine learning, and build on the practical skills by diving into a more advanced topic.

I think track 1 would be appropriate for everyone, and track 2 depends on what field of machine learning you are most interested in (and perhaps where you have taken a job!); in my case it is computer vision, but could just as well be something like natural language processing, or bioinformatics.

Track 1: Probabilistic ML

The track:

Supporting Resources:

This track is all about going deeper into the theory underlying machine learning, often viewing models in terms of joint probability distributions. Why bother? Well, beyond viewing machine learning fields like supervised learning as a useful black box that can make predictions, being able to reason more soundly about how confident you are in the model's predictions requires it. And as you wade into more advanced topics and bayesian methods, you will find you simply cannot understand the material without fluently seeing how things are modeled probabilistically, and reasoning about when and how you can infer the model from data, for instance, which models provide for exact inference, and which require sampling methods like MCMC.

Having spent time perusing both MLPP and PRML during phase 1 was helpful in determining which book to choose. I ultimately decided that MLPP was a better choice as I find it does a more thorough job covering the fundamentals and structuring the book to progress linearly. PRML both benefits and is burdened with nearly a decade more material, so it feels more like a really good and pretty thorough survey of nearly every field of ML. That it is nearly a thousand pages long also means it would be pretty impractical to attempt to read it cover to cover in a single 4-6 month phase. And while MLPP is "out of date", everything in it is feels like essential material and should be covered before moving onto the more recent material covered in PRML.

Track 2: Computer Vision

The track:

Supporting resources:

Once I started a job helping with research related to autonomous vehicles, the most exciting practical application of ML became computer vision. Examples of core tasks include image classification (given an image, what is it), object detection (given an image, where are the things in it, and what are they) and pose detection (given this image of a person, how are they oriented). I've had a chance to learn a lot about a lot of topics, but a lot of focus at the state of the art involves various applications of deep convolutional neural networks. So I'm focusing on learning the fundamentals of convolutional neural networks instead of some of the more fundamental topics within computer vision like multi-view geometry.

Stanford's cs231n course is perfect for mastering convolutional neural networks as it presents the theory and has assignments that require implementing the core required models. The main instructor, Andrej Karpathy, is a great teacher too. I have a github repo for my WIP solutions here.

Survey the field: read books and papers!

I'm finding that during this phase I'm capable of reading recent research (e.g the papers on object detection listed here) and watching research talks (e.g this one about scaleable gaussian processes. It feels similar to how perusing PRML and MLPP felt during phase 1: I don't always understand everything, but I continue to build a mental map of what lies ahead and get a sense of what is most exciting.

Future Phases and tracks

I won't know for sure until I've completed phase 2, but I think future (and life long!) studying of machine learning will likely consist of diving deep into particular topics in machine learning, mathematics, computer science and engineering. Some ideas: generative adversarial networks, reinforcement learning, real analysis, information theory, projective geometry, and high performance numerical computing. I will update this section as ideas for future tracks become clear.

Prior Art

I'm far from the first person to give advice on studying machine learning; here are some resources I've found helpful along the way: