The Birthday Paradox
Valentin Cornaciu
rcor.ro
2022-12-02
The Problem
- The birthday paradox, also known as the birthday problem, states that in a random group of 23 people, there is about a 50 percent chance that two people have the same birthday.
- There are multiple reasons why this seems like a paradox. New sources will be added it to this ever growing presentation.
- Scientific American proposes the same problem, creating another paradox, by stating:
- Every one of the 253 combinations (of two persons) has the same odds, \(p=0.99726027\), of not being a match, they say.
- If you calculate (364/365)^253, you’ll find there’s a 49.952 percent chance that all 253 comparisons contain no matches, they say.
3 people only
- Why go for 23, before understanding the problem with only 3 persons?
- Define \(A_i\) the event where the \(i\) person will not share the same birthday with nobody and the negation by \(\bar{A}_i\).
- We have 3 combinations (of two persons) \(A_1A_2\), \(A_1A_3\), \(A_2A_3\).
- \(P(A_1A_2)\) denotes the probability of both \(A_1\) and \(A_2\) being TRUE.
- If \(A_1A_2\) and \(A_1A_3\) both not being a match (TRUE), then \(A_2A_3\) can be FALSE (share the same birthday).
- BUT: \(P(A_1A_2A_3|G_3)=P(A_2A_3|G_3)\), so problem solved (one combination), why asking for \(p^3\)?
Bayes settings
- We start by declaring the sample space, \(G_3\), being a group of 3 persons, and here the negation \(\bar{A}_i\) depends on \(G_3\).
- The Problem is symmetric, so finding \(P({A}_3|G_3)\) will determine \(P({A}_2|G_3)\) as well.
- \(P(\bar{A}_1\bar{A}_2\bar{A}_3|G_3)=1/365^2\) and this can be generalized for any group
- \(P(\bar{A}_1\bar{A}_2\bar{A}_3|G_3)=P(\bar{A}_1|\bar{A}_2\bar{A}_3G_3)·P(\bar{A}_2\bar{A}_3|G_3)\Rightarrow P(\bar{A}_2\bar{A}_3|G_3)=1/365\) (1)
- \(P(A_1A_2A_3|G_3)=P(A_1|A_2A_3G_3)·P(A_2A_3|G_3)=P(A_2A_3|G_3)\), by simple logic: if \(A_3\) is TRUE and \(A_2\) is TRUE, then the first person can’t share the same birthday with nobody else in \(G_3\), so the first probability is one. Also holds to be true for any \(n>2\): \[P(A_1A_2...A_n|G_n)=P(A_1|A_2\dots A_nG_n)·P(A_2...A_n|G_n)\]
Bayes calculations
- (2): \(P(\bar{A}_2|\bar{A}_3G_3)=1/2\), by knowing that the third person shares the birthday, we have a 50% chance that will mach either of the remaining persons.
- (1)+(2): \(P(\bar{A}_2\bar{A}_3|G_3)=P(\bar{A}_2|\bar{A}_3G_3)·P(\bar{A}_3|G_3)\) so we get \(P(\bar{A}_3|G_3)=2/365\) or \(P({A}_3|G_3)=363/365\)
- Actually \(P({A}_2|{A}_3G_3)=364/365\) so only by knowing that the third person is alone, will impose that the other persons will have that claimed chance of being alone (not matching the remaining person, or a \(G_2\) problem)
- Finally, \(P({A}_2{A}_3|G_3))=P({A}_2|{A}_3G_3)P({A}_3|G_3)=363·364/365^2\)
Another solution
- Even though not recommended in general, Bayes being preferred, sampling all possible different birthdays of 23 persons can pave the way.
- The number of ordered arrangements of 23 days taken from 365 unlike days is \(365!/(365-23)!\)
- Not to confuse with the number of ways of selecting 23 different days! Not looking for \(365\choose23\), because
- We consider the total number of cases \(365^{23}\), so we get \(P(A_1A_2...A_{23}|G_{23})=364·363\cdots343/365^{22}\approx0.4927\)
Mistakes explained
- \(p=P(A_1A_2)\) has different meanings for different groups of people, so claiming that \(p=364/365\) it is false in general, but TRUE in \(G_2\)
- Same mistakes can be found almost everywhere, the difference in calculations are luckily close (49.952% vs. 49.27%)
- Can not mix results from \(G_2\) to \(G_n\), for \(n>2\)
- We might guess that the intention was to calculate the probabilities of having 2 different days as a pair and impose it to all pairs. But \(({A}_2, {A}_3)\) or any other pair of events are not independent!
- Now imagine there were 366 persons in the group. It is clear that \(P(A_1A_2...A_{366}|G_{366})=0\), but \((364/365)^{66795}>0\)