{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Birthday paradox" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Problem 1a.** During exercise session, we showed that for $d$ equally likely dates in a year and $n$ people, the probability of a repeat date is $p(n)=1-\\frac{d^\\underline{n}}{d^n}$. We also derived a simple approximate formula for this probability: $p_a(n) = 1-e^{-\\frac{n(n-1)}{2d}}$. In particular, this approximation implies that for $d=365$ dates, $p(n)$ is close to $\\frac{1}{2}$ for $n=23$.\n", "\n", "Verify how good the approximation is. To this end, write a program that computes the exact value of $p(n)$ as well as the value of the approximation $p_a(n)$ for $d=365$ and $n=1,\\ldots,60$. Plot the graphs of the two functions." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Problem 1b.**\n", "Implement a function, that chooses random dates out of $1,\\ldots,d$ until a repeat occurs. The function should return the number of the step in which that happened. Run the function $N=100\\,000$ times. \n", " * Plot a histogram of the return values.\n", " * Which return value seems most likely? \n", " * What is the average of the return values?\n", " * Is there any simple relation between these two numbers and the value $23$ in the previous problem?\n", " * Is there any simple relation between the histogram obtained, and the plot in the previous problem?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Problem 1c.** Here we investigate how our analysis using the classical approach compares to a more empirical one. The file *us_births_69_88.csv* contains the counts for birthdates from 1969 to 1988. Only those births, for which the full date is known, are included in the file.\n", " * Read the file.\n", " * Investigate the data. Do you notice anything strange?\n", " * Plot a heatmap of the data. Do you notice any significant deviations from the uniform assumption? Can you explain them?\n", " * Implement a sampling function as in 1b, but this time using the supplied data. Compare the results." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.4.3" } }, "nbformat": 4, "nbformat_minor": 0 }