The Birthday Problem Simulated¶
Let's say we can't figure out how to formally calculate the probability that 2 people out of N have the same birthday (ignoring leap years). Not to worry: we can simulate it!
First let's write a function to simulate the number of people who have the same birthday out of N:
import random
from collections import defaultdict
def num_people_same_birthday(n):
bdays = defaultdict(int)
for i in range(n):
bdays[random.randrange(0, 365)] += 1
return len([k for (k, v) in bdays.items() if v > 1])
num_people_same_birthday(60)
That doesn't give as an emperical probability though, we need to run some trials to do that.
def prob_n_people_have_same_birthday(n, *, num_trials):
num_positive_trials = len([1 for i in range(num_trials) if num_people_same_birthday(n) > 0])
return float(num_positive_trials) / num_trials
prob_n_people_have_same_birthday(60, num_trials=1000)
Note that because the liklihood is high, we need 1000 trials to suss out that it's not 100%; here's what we get if we use 10 a couple times:
prob_n_people_have_same_birthday(60, num_trials=10)
prob_n_people_have_same_birthday(60, num_trials=10)
Just to round out the "fun", let's see a bar chart of % of time we expect people to have the same birthday with groups ranging from 5 to 50 (counting by 5).
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
import matplotlib.pyplot as plt
group_sizes = list(range(5, 55, 5))
probs = [prob_n_people_have_same_birthday(n, num_trials=1000) for n in group_sizes]
plt.xlabel("Group Size")
plt.ylabel("Prob")
plt.title("Probability that among given group size at least 2 people have the same birthday")
plt.bar(group_sizes, probs, width=2.5)
plt.show()
And now let's see a histogram of the number of people with the same birthday among 60 people out of 1000 trials
plt.hist([num_people_same_birthday(60) for i in range(1000)])
None
The histogram shows that the probability of there being zero people is unlikely, hence the overall probability of there being at least 1 is high.