Monte Carlo simulation of Birthday paradox in python 3 - python

The birthday paradox is that everyone has equal probability of having a birthday on any given of 365 days. We start adding people in a room. What is the probability that 2 people have birthdays on same day as a function of number of people in the room? The code I wrote is as follows:
import numpy as np
import matplotlib.pyplot as plt
x=[0]
y=[0]
for j in range(1000):
if j!=0:
freq = []
L1 = list(np.random.randint(low = 1, high=366, size = j))
result = list((i, L1.count(i)) for i in L1)
for a_tuple in result:
freq.append(a_tuple[1])
print(freq)
rep = j - freq.count(1)
prob = rep/j
y = y + [prob]
x = x + [j]
print(prob)
plt.plot(x,y)
Here, in L1 = list(np.random.randint(low = 1, high=366, size = j)) I select the day on which someone would have a birthday and in result = list((i, L1.count(i)) for i in L1) I calculate the frequency of birthdays on each day. The entire thing is looped over to account for increasing number of people.
In the following for loop, I isolate the unique events and find repetitions and store the value in rep.
Next I calculated the probability as fraction of people sharing birthdays and plotted them as a function of number.
However, the question requires me to find the probability of just one shared birthday. How do I calculate that? I think I have to loop this entire thing for number of trials but that just gives an accurate solution with less variations of the same program. Currently my program gives fraction of people having shared birthdays I think.
Birthday problem Wikipedia for better reference

NOTE
I assume that when n persons have been in the room, they are all thrown out of the room and then n+1 persons enter the room.
========================================
I would think of it this way;
First, set probs = [0]*365. Now, say 2 persons get in the room - we then write their birthdays onto a piece of paper and check, if those two dates are equal. If they are, we increase probs[2] by 1 (yes, theres some indexes that we don't need, and Python is 0-indexed etc. but to keep it simple).
Now do the same for 3 persons, for 4 persons, for 5 persons ... all the way up to 365.
Your array might look something like probs==[0,0,0,0,0,1,0,1,1,0,1,1,1,1,0,1....].
You can now start over from 2 persons (still keeping the same array as before i.e don't create a new one with 0's!), then 3 persons etc. and start over 1000 times. Your array might look like
probs==[0,0,2,0,4,1,5,2,9,12,10,17....,967,998]
If you divide that array by 1000 (elementwise) you now have your simulated probability as a function of n persons.
import numpy as np
import matplotlib.pyplot as plt
N_TOTAL_PERS= 366
N_SIM = 10000 #number of simulations
counts = np.zeros(N_TOTAL_PERS)
for _ in range(N_SIM):
for n in range(2,N_TOTAL_PERS):
b_days = np.random.randint(1,366,size=n) #Get each persons birth-day
counts [n] += len(b_days) != len(set(b_days)) #Increment if some birthdays are equal
total_probs = counts/N_SIM #convert to probabilities
total_probs[70] #Get the probability when 70 persons are together (0.9988)
plt.plot(range(N_TOTAL_PERS),total_probs)
which generates a plot that looks like

You should run multiple experiments for different number of people in the room. Note that for N_people > 365, the probability should compute equal to 1.
Refactoring your code, and changing the logic a bit, I came up with the following:
import numpy as np
import matplotlib.pyplot as plt
def random_birthdays(n_people):
return list(np.random.randint(low=1, high=366, size=n_people))
def check_random_room(n_people):
"""
Generates a random sample of `n_people` and checks if at least two of them
have the same birthday
"""
birthdays = random_birthdays(n_people)
return len(birthdays) != len(set(birthdays))
def estimate_probability(n_people, n_experiments):
results = [check_random_room(n_people) for _ in range(n_experiments)]
return sum(results)/n_experiments
N_EXPERIMENTS = 1000
x = list(range(1, 400))
y = [estimate_probability(x_i, N_EXPERIMENTS) for x_i in x]
plt.plot(x, y)
plt.show()

Related

Numpy arrays incorrectly have identical values?

I have a small program where I am playing with creating an evolutionary algorithm related to disease spread. I have run into an issue that has driven me slightly mad trying to figure out the problem.
I have two numpy arrays, "infections", which is an array where each element is binary representation of whether that individual has been exposed and "current_infections", that is only ongoing infections and is supposed to be incremented by days.
For Example:
infections = [0,0,0,1,1]
current_infections = [0,0,0,15,0]
This would represent five individuals, individual three has had the disease for 15 days and individual four has had it for long enough that they have recovered and no longer currently have it.
infections = []
current_infections = []
responsible_infections = []
spread_rate = 0.1
contagious_period = 20
initial_node_infections = 2
current_network = #this is an array of adjacency matrices for nodes in a network
#initial infections.
def initialize_infections():
global infections, current_infections, responsible_infections
responsible_infections = np.zeros(current_network.shape[0])
infections = np.zeros(current_network.shape[0])
for each in rd.sample(range(len(current_network)), k=initial_node_infections):
infections[each] = 1
current_infections = infections[:]
# runs a day in simulation.
# returns 1 if there are still ongoing infections at the end of day, 0 if not
def progress_day():
global current_infections
print(np.sum(current_infections), np.sum(infections)) #should not be equivalent, yet they are
for i in range(len(current_infections)):
if current_infections[i] >= 1 and current_infections[i]<contagious_period:
current_infections[i]+=1
elif current_infections[i]>=contagious_period:
#patient recovered
current_infections[i] = 0
for i in range(contacts_per_day):
for j in range(len(current_infections)):
if current_infections[j] >= 1:
spread_infection(current_network[j], j)
if not np.sum(current_infections):
return 0
else:
return 1
#given infected node it calculates spread of disease to adjacent nodes.
def spread_infection(person, rp):
global infections, current_infections, responsible_infections
for x in range(len(person)):
if person[x] == 1 and infections[x] == 0 and rd.random()<=spread_rate:
current_infections[x] = 1
infections[x] = 1
responsible_infections[rp]+=1 #infections a given person is responsible for.
def main():
global current_infections, infections
initialize_infections()
day = 0
while day<100:
if not progress_day():
break
day+=1
main()
For some reason changes made to an element in current_infections are also being made to that element in infections so they are both incrementing. Am i doing something incorrectly with numpy such that they are somehow the same array?
current_infections = infections[:] makes current_infections a view over the elements in infections. Use current_infections = infections.copy().

Python programming probability marble out of a bag coding

Hey I'm new to programming but I cant seem to code probability questions. For example, how would I code this?
A box contains 12 transistors of type A and 18 of type B. one transistor is taken out at random and returned. This process is repeated. Determine the probability that the first chosen is type A and second is type B. Thanks!
This is my first try.
from scipy import stats as st
import numpy as np
import random
total=30
totalA=12
totalB=18
def transistor():
return random.choice("A","B")
random.seed(0)
for _in range(30):
try1=transistor()
try2=transistor()
if try1="A":
prob1=totalA/total
else:
prob1=totalB/total
if try2="A":
prob2=totalA/total
else:
prob2=totalB/total
if try1=="A" and try2=="A"
prob=2*totalA/total
If you're trying to run a simulation, this code will give you a probability from 10000 trials. It will generate a different result every time. The more trials, the more accurate it is. The correct, theoretical answer is 0.24.
import random
trials = 10000 # total number of trials
totalA = 12 # total number of A transistors
totalB = 18 # total number of B transistors
successes = 0 # variable keeping track of how many successful pulls there were
choicelist = list("A" * totalA + "B" * totalB) # list containing transitors to correct proportion
def transistor():
return random.choice(choicelist) # pick a random transistor from list
for i in range(trials):
try1 = transistor()
try2 = transistor()
if try1 == "A" and try2 == "B": # if first pull is type A and second is type B...
successes += 1 # ...then it's successful
print float(successes) / trials # print out the proportion of successes to trials

python - generating a non repeating random pairs of numbers

I'm trying to generate random pairs of numbers to place objects at random locations in a grid. I've tried looking for answers but I haven't found one that works for what I need. I don't want the pair to repeat but the objects can still be placed in the same row or column. Also the size of the grid and the number of objects is inputted by the user
def create_bombs(self):
bombs_flaged = 0
#creates the bombs
for i in range(self.bomb_num):
bomb_row = randint(0,self.board_size - 1)
bomb_col = randint(1,self.board_size)
self.bomb_list.append(Bomb(bomb_row, bomb_col, self, bombs_flaged))
One way to think about this is: there are X*Y possible positions (specifically board_size * board_size, in your case), and you want to pick N (self.bomb_num) random samples from those positions, without repetition.
The sample function in the random module does this perfectly:
possible_coordinates = [(x, y) for x in range(X) for y in range(1, Y+1)]
bomb_coordinates = random.sample(possible_coordinates, N)
Creating that list is a little wasteful—but given that board_size is probably something small, like 30, a temporary list of 900 elements is not worth worrying about.
Python's sets are meant to do just what you need: membership testing is very fast, with them (constant time):
def create_bombs(self):
bombs_flagged = 0
existing_bomb_coords = set() # All bomb coordinates so far
# Creates the bombs
while len(existing_bomb_coords) < self.bomb_num: # Looping as much as needed
bomb_row = randint(0, self.board_size-1)
bomb_col = randint(1, self.board_size)
bomb_coords = (bomb_row, bomb_col)
if bomb_coords not in existing_bomb_coords: # Very fast test
self.bomb_list.append(Bomb(bomb_row, bomb_col, self, bombs_flagged))
existing_bomb_coords.add(bomb_coords) # New bomb registration
Now, I like #abarnert's answer too: it is a bit wasteful, as he indicates, but it is very legible.

Python Set Birthday

So I am trying to make a program that creates the probability of a bunch of people in a room to have the same birthday... I can't figure out how to create the function. Here is what I have so far
def birthday():
mySet = set()
x = 1
for item in mySet:
if item in mySet:
return x
else:
mySet().append() # don't know what to do here.
Edit:
Alright so what I am trying to accomplish is to make a function using a set that stores birthdays using numbers 1 through 365...For example, if you randomly pick a room with 30 people in it, they may not have the same birthday. Although, if you have twins in the same room, you only need 2 people
in the room to have the same birthday. So eventually I want a parameter that tests this function several times and averages it all up. Unfortunately I can't figure out how to make this. I want x to be a counter of how many people are in the room and when there is a match the loop stops and it stops. I also don't know what to append to.
Is there a reason why you're trying to simulate this rather than using the closed form solution to this problem? There's a pretty decent approximation that's fast and easy to code:
import math
def closed_form_approx_birthday_collision_probability(num_people):
return 1 - math.exp(-num_people * (num_people - 1) / (2 * 365.0))
You could also implement an very good "exact" solution (in quotes because some fidelity is lost when converting to float):
import operator
import functools
import fractions
def slow_fac(n):
return functools.reduce(operator.mul, range(2, n+1), 1)
def closed_form_exact_birthday_collision_probability(num_people):
p_no_collision = fractions.Fraction(slow_fac(365), 365 ** num_people * slow_fac(365 - num_people))
return float(1 - p_no_collision)
To do a simulation, you'd do something like this. I'm using a list rather than a set because the number of possibilities is small and this avoids some extra work that using a set would do:
import random
def birthday_collision_simulate_once(num_people):
s = [False] * 365
for _ in range(num_people):
birthday = random.randint(0, 364)
if s[birthday]:
return True
else:
s[birthday] = True
return False
def birthday_collision_simulation(num_people, runs):
collisions = 0
for _ in range(runs):
if birthday_collision_simulate_once(num_people):
collisions += 1
return collisions / float(runs)
The numbers I get from the simulation and the closed form solution look similar to the table at http://en.wikipedia.org/wiki/Birthday_problem
>>> closed_form_approx_birthday_collision_probability(20)
0.40580512747932584
>>> closed_form_exact_birthday_collision_probability(20)
0.41143838358058
>>> birthday_collision_simulation(20, 100000)
0.41108
Of course the simulation with that many runs is closer to the actual 41.1%, it's much slower to calculate. I'd choose one of the closed form solutions, depending on how accurate it needs to be.

how do I use modular expression/ working with large intergers

I want to make a program that calculate the the populations after x years.
where the pop in 2002 is 6.2 billion people and increases 1.3 % each year.
The formula I will use is
population = ((1.013)**x) * 6.2B
How do I make 6.2B easier to work with?
Here is your code. Read and learn well. This is probably a problem that you could have solved with Google.
import math
def calculate_population(years_since_2002): #the original calculation
population_2002 = 6.2*10**9
final_population = int(((1.013)**years_since_2002)*population_2002)
return final_population
def pretty_print(num,trunc=0):
multiplier = int(math.log10(num)) #finds the power of 10
remainder = float(num)/(10**multiplier) #finds the float after
str_remainder = str(remainder)
if trunc != 0:
str_remainder = remainder[:trunc+1] #truncates to trunc digits total
return str_remainder+'e'+str(multiplier) #can also be print

Categories

Resources