IPython notebook on nba game net ratings and "examining distributions" in stats course.

Today I wrapped up playing with the NBA game net rating data set in a Jupyter IPython notebook. Not exactly setting the world on fire but was nice to get the basics going with Jupyter notebooks and to figure out how to make it viewable on github.

I also wrapped up the "examining distributions" section of the stanford stats class.

Concepts:

  • The standard deviation is the average squared delta from the mean
  • similar to mean it is heavily affected by outliers and best suited for symmetric datasets, otherwise box plots are likely better
  • The standard deviation rule: for a normal distribution, 68% of the data falls within 1 std deviation of the mean, 95% fall within 2 and 99.7% fall within 3.

Techniques:

  • calculate the standard deviation of a data set
  • report what % fall within 1, 2 and 3 standard deviations
  • Given mean and std deviation, apply standard rule to answer questions like: what range will 95% of the observations fall? What % of observations will fall above 1 std deviation from mean?