In the spring of 2016 I embarked on a learning sabbatical focused on machine learning. After a few months of full-time studying, I continue to study while on the job. Here are resources for you to follow along if you like.
- Learning Log: day to day reflections as I work my way through the curriculum (not quite as good about updating this as during full time summer of studying)
- Problem Sets: solutions to individual math problems I've worked through.
- Notebooks and Kaggle submissions: Various iPython notebooks covering topics in probability and ML I write to solidify understanding, perhaps help others. Also some submissions to Kaggle competitions.
- More IPython Notebooks and Python code on github: includes chapter by chapter notebooks for Python Machine Learning and various other materials.
How to Learn Machine Learning
Having the goal of "learning machine learning" is daunting. I've found the best way to make it tractable is to approach it in phases. Each phase should include at least one track that builds practical skills and one track focused on theoretical foundations. Additionally, it's always worth surveying the field at your current level of fluency to be on the look out for the next phase of studies and to continue to build a mental map of interconnected topics that may be prerequisites for the techniques and applications that you find most exciting.
Each track should focus on a specific curriculum resource, and then draw on supporting resources. For instance, you might choose a specific book or MOOC course you want to work through, and then draw on several related resources to cross-reference as you proceed. Sticking with a single main resource is important to staying focused; it's really easy and tempting to jump around various resources without making as much progress past introductory material. It's worth spending time upfront researching curriculum options before deciding; I usually find several good resources on a topic and do some initial skimming before deciding which becomes the primary resource, bookmarking the rest as supporting.
Phase 1: Applied Machine Learning & Probability & Statistics
Phase 1, which took me about 5 months to complete full-time studying, includes two tracks:
Track 1: Probability and Statistics
The goal of this track is to get comfortable with basic statistics and exploratory data analysis, and to build a solid theoretical foundation in probability theory that will make thinking more rigorously about machine learning possible. IMHO it is insufficient to rely solely on the brief intros to probability contained at the beginning of many machine learning texts; eating your spinach early will payoff over and over again as you approach more advanced models and techniques later on. As Wasserman puts it in the preface to All of Statistics:
Using fancy tools like neural nets, boosting, and support vector machines without understanding basic statistics is like doing brain surgery before knowing how to use a band-aid.
Slowing way down and struggling through real math problems is key here. I'd often spend hours, sometimes days on a single problem. Getting stuck = real learning. I recommend finding a study buddy and/or expert who's willing to help you get over the hump when you are truly stuck.
The track:
- Stanford's Free online Probability and Statistics Course: a nice gentle introduction before approaching the more axiomatic coverage of Wasserman. Good coverage of basic exploratory data analysis too.
- The first 5 chapters of Wasserman's All of Statistics text and problem sets from CMU's intermediate stats course (from the author) and another more introductory counterpart
Supporting resources:
- Khan Academy's videos on Probability and Statistics: I didn't watch comprehensively, but did turn to for a second explanation many times.
- Math Monk's YouTube playlist: Probability Primer: These felt like the missing MOOC lectures for Wasserman's book
- UAH's Random: Probability, Mathematical Statistics, Stochastic Processes: comprehensive coverage of many important topics in probability and statistics, including some illustrative web app / simulations and exercises.
- First Look at Rigorous Probability Theory: another good textbook that I bought and consulted for a second explanation on many of the topics covered in Wasserman
- Peter Norvig's Introduction to Probability iPython Notebook
- Wikipedia's outlines of Statistics and Probability
- IUPUI's ECE 302 Probabilistic Methods in Electrical Engineering: course website with nicely written up homework and exam solutions.
- Penn State's Stat 414/415 Course Materials: another good place to cross reference concepts with some examples and solutions. I found perusing the section on functions of random variables helpful and wish I'd found it sooner!
- Guy Lebanon's The Analysis of Data Vo1: Probability: Prof turned industry ML champ (LinkedIn, Netflix) published a free book on probability theory. A nice resource to cross reference concepts covered in the 1st half of Wasserman. Guy also has a bunch of notes on his website that are interesting to peruse.
- Seeing Theory: very nice visualizations to aid understanding of fundamental concepts in probability and statistics
What I appreciate about the All of Statistics book compared to others I've looked at, including my text from college, is that it doesn't spend too much time on counting methods (knowing how many ways one can deal a full house with a deck of cards isn't particularly relevant) and is otherwise more comprehensive on probability theory most relevant to machine learning. It is concise and somewhat dry, but it serves as a great road map of topics to study; the supporting resources and lectures can provide the additional context when necessary. Math Monk's videos are a particularly nice companion.
Track 2: Applied Machine Learning
The goal of this track is to gain practical experience applying supervised and unsupervised learning and data analysis techniques using Python, Scikit-learn and Jupyter notebooks and many of the practical considerations wrangling data using tools like Pandas and Numpy. By the end you will be able to build and evaluate predictive models that work with real data, and get exposed to many theoretical models. It's nice to get your feet wet and gain powerful skills right away even if you don't yet fully understand how everything works under the hood.
The track:
- Python Machine Learning and accompanying notebooks available on github
- Kaggle competitions starting with two basic classification challenges (as recommended here) and moving onto whatever catches your eye.
Supporting resources:
- Andrew NG's Machine Learning coursee: even if not worked through as a main track, worth perusing video lectures for any overlapping topics, NG does a great job of explaining things.
- scikit-learn website
The Python Machine Learning book provides a great blend of practical concerns working with data (preprocessing, cross-validating) and exposure to models used for classification, regression and unsupervised learning and even gets into ensemble methods. By the end of chapter 4 you should be ready to take on your first Kaggle competition, and by the time you finish, approach many more and even tackle interesting new analysis problems that interest you. I recommend reading the book and working through all of the examples in a Jupyter notebook, looking at his provided notebooks whenever necessary and/or to copy over boilerplate code.
Survey the field
These are the resources that inspired me to leap into machine learning head first, and continued to provide companionship throughout phase 1.
- Talking Machines Podcast: each episode includes an introduction to a topic and an interview with an expert in the field.
- The Master Algorithm: great lay persons overview of the field. Also see my review on Amazon.
- Becoming a data scientist podcast: interviews with people who've successfully pursued or transitioned to careers in data science, which has plenty of overlap with aspirations to apply machine learning.
I also bought Bishop's PRML and Murphy's MLPP a couple of months into my studies and wish I had sooner. They are great for perusing to plant seeds for future studies, and to begin connecting the dots from exciting advanced concepts back to probability theory.
Phase 2: Probabilistic ML and Computer Vision
The goal of phase 2 is to build on the theoretical knowledge of probability theory from phase 1 to gain a richer, probabilistic understanding of machine learning, and build on the practical skills by diving into a more advanced topic.
I think track 1 would be appropriate for everyone, and track 2 depends on what field of machine learning you are most interested in (and perhaps where you have taken a job!); in my case it is computer vision, but could just as well be something like natural language processing, or bioinformatics.
Track 1: Probabilistic ML
The track:
- Read and work through Bishop's Pattern Recognition and Machine Learning book
- Author notebooks about topics along the way (e.g Expectation Maximization
Supporting Resources:
- Kevin Murphy's Machine Learning from a Probabilistic Perspective: was a close second for the learning track. More comprehensive than PRML, but feels more like a great reference than a book to work through cover to cover.
- Coursera course on Probabilistic Graphical Models: the original course has been broken into 3. I've found the video lectures a great complement to the coverage of graphical models in Bishop's book.
- Notebook Lectures from University of Michigan's EECS 545 and EECS 445 Machine Learning courses (some of which I helped to develop!)
- Mathematical Monk's ML YouTube playlist
This track is all about going deeper into the theory underlying machine learning, often viewing models in terms of joint probability distributions. Why bother? Well, beyond viewing machine learning fields like supervised learning as a useful black box that can make predictions, being able to reason more soundly about how confident you are in the model's predictions requires it. And as you wade into more advanced topics and bayesian methods, you will find you simply cannot understand the material without fluently seeing how things are modeled probabilistically, and reasoning about when and how you can infer the model from data, for instance, which models provide for exact inference, and which require sampling methods like MCMC.
Having spent time perusing both MLPP and PRML during phase 1 was helpful in determining which book to choose. I ultimately decided that MLPP was a better choice as I find it does a more thorough job covering the fundamentals and structuring the book to progress linearly. PRML both benefits and is burdened with nearly a decade more material, so it feels more like a really good and pretty thorough survey of nearly every field of ML. That it is nearly a thousand pages long also means it would be pretty impractical to attempt to read it cover to cover in a single 4-6 month phase. And while MLPP is "out of date", everything in it is feels like essential material and should be covered before moving onto the more recent material covered in PRML.
Track 2: Computer Vision
The track:
- Stanford's cs231n course materials and video lectures
Supporting resources:
- Hugo Larochelle's deep learning lectures: could be a learning track in itself. Covers conv nets, great for cross referencing.
- Understanding Higher Order Local Gradient Computation for Backpropagation in Deep Neural Networks: nice tips on reasoning about computing gradients of functions of tensors with respect to tensors from Daniel Seita, who TA'd the cs231n-like course at Berkeley
- Vector Deriviatives notes: more notes on computing derivatives of tensors from Erik Learned-Miller
Once I started a job helping with research related to autonomous vehicles, the most exciting practical application of ML became computer vision. Examples of core tasks include image classification (given an image, what is it), object detection (given an image, where are the things in it, and what are they) and pose detection (given this image of a person, how are they oriented). I've had a chance to learn a lot about a lot of topics, but a lot of focus at the state of the art involves various applications of deep convolutional neural networks. So I'm focusing on learning the fundamentals of convolutional neural networks instead of some of the more fundamental topics within computer vision like multi-view geometry.
Stanford's cs231n course is perfect for mastering convolutional neural networks as it presents the theory and has assignments that require implementing the core required models. The main instructor, Andrej Karpathy, is a great teacher too. I have a github repo for my WIP solutions here.
Survey the field: read books and papers!
I'm finding that during this phase I'm capable of reading recent research (e.g the papers on object detection listed here) and watching research talks (e.g this one about scaleable gaussian processes. It feels similar to how perusing PRML and MLPP felt during phase 1: I don't always understand everything, but I continue to build a mental map of what lies ahead and get a sense of what is most exciting.
Future Phases and tracks
I won't know for sure until I've completed phase 2, but I think future (and life long!) studying of machine learning will likely consist of diving deep into particular topics in machine learning, mathematics, computer science and engineering. Some ideas: generative adversarial networks, reinforcement learning, real analysis, information theory, projective geometry, and high performance numerical computing. I will update this section as ideas for future tracks become clear.
Prior Art
I'm far from the first person to give advice on studying machine learning; here are some resources I've found helpful along the way:
- The Open Source Data Science Masters: another person who laid out a curriculum and worked through it.
- Quora thread: How do I learn Machine Learning?
- Recommended ML Curriculum from Sebastian Raschka, the author of the Python Machine Learning book that I'm working through
- Metacademy Roadmaps: guides to learning ML and how to learn on your own in general
- Dive into Machine Learning: a good guide if you wish to get your hands dirty ASAP
- Metromap diagram to becoming a data scientist: cool way to visualize suggested curriculum across subfields
- How to Start Learning Deep Learning: nice roundup of recent material on deep learning
- Xavier Amatriain's (VP of Eng at Quora) How should you start a career in Machine Learning? and How do I learn machine learning?
- François Chollet (author of Keras) on What advice would you give to people studying ML/DL from MOOCs (Udacity, Coursera, edx, MIT Opencourseware) or from books in their own time?
- HOW TO LEARN ADVANCED MATHEMATICS WITHOUT HEADING TO UNIVERSITY - PART 1: a London quant's curriculum list for advanced mathematics. Great list of resources for linear algebra, real analysis, foundations of mathematics and the like.
- Machine Learning for Software Engineers: very thoroughly researched curriculum and study plans