Probability intro and simulating the birthday problem

The next section of the Stanford Course: probability.

We use probability to quantify how much we expect random samples to vary. This gives us a way to draw conclusions about the population in the face of the uncertainty that is generated by the use of a random sample.

... if we find it quite unlikely that the sample percentage will be very different from the population percentage, then we have a lot of confidence that we can draw conclusions about the population based on the sample.

Our intuition can fool us

The let's make a deal paradox and the birthday problem: if 60 people in a room, there's > 99% chance that 2 people have the same birthday, while intuition usually tells is it's lower than that.

Defining probability

Probability is, roughly speaking, the likelihood that something will occur.

We can determine probability in two ways:

  • theoretically (classically): a game of chance where the rule says what the probability is, e.g when flipping a coin, the probability of heads is 1/2, when rolling a die, the probability of rolling 1 is 1/6
  • empirically (observationally): looking at a number of outcomes and counting how many times something occurred divided by the total number of trials, the relative frequency.

You can verify the theoretical probability by conducting trials; as the number of trials increases, the observed probability will converge towards the theoretical (e.g after flipping a coin 1000 times, the number of heads will be close to 50% but if you flip a coin 10 times, you may very well observe something further from 50%).

That relative frequency approaches the theoretical probability is called the law of large numbers.

Simulating the birthday problem

This stuff is kind of boring, so breaking out into python for a bit helped: here's a simulation of the birthday problem in a notebook.