{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Birthday paradox"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Problem 1a.** During exercise session, we showed that for $d$ equally likely dates in a year and $n$ people, the probability of a repeat date is $p(n)=1-\\frac{d^\\underline{n}}{d^n}$. We also derived a simple approximate formula for this probability: $p_a(n) = 1-e^{-\\frac{n(n-1)}{2d}}$. In particular, this approximation implies that for $d=365$ dates, $p(n)$ is close to $\\frac{1}{2}$ for $n=23$.\n",
"\n",
"Verify how good the approximation is. To this end, write a program that computes the exact value of $p(n)$ as well as the value of the approximation $p_a(n)$ for $d=365$ and $n=1,\\ldots,60$. Plot the graphs of the two functions."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Problem 1b.**\n",
"Implement a function, that chooses random dates out of $1,\\ldots,d$ until a repeat occurs. The function should return the number of the step in which that happened. Run the function $N=100\\,000$ times. \n",
" * Plot a histogram of the return values.\n",
" * Which return value seems most likely? \n",
" * What is the average of the return values?\n",
" * Is there any simple relation between these two numbers and the value $23$ in the previous problem?\n",
" * Is there any simple relation between the histogram obtained, and the plot in the previous problem?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Problem 1c.** Here we investigate how our analysis using the classical approach compares to a more empirical one. The file *us_births_69_88.csv* contains the counts for birthdates from 1969 to 1988. Only those births, for which the full date is known, are included in the file.\n",
" * Read the file.\n",
" * Investigate the data. Do you notice anything strange?\n",
" * Plot a heatmap of the data. Do you notice any significant deviations from the uniform assumption? Can you explain them?\n",
" * Implement a sampling function as in 1b, but this time using the supplied data. Compare the results."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.4.3"
}
},
"nbformat": 4,
"nbformat_minor": 0
}