Birthday paradox python - incorrect probability output - python

I am having issues with the programming the birthday paradox in Python. The birthday paradox basically says that if there are 23 people in a class, the probability that two of them will have the same birthday is 50%.
I have attempted to code this paradox in Python, however it keeps coming back with a probability of closer to 25%. I am very new to Python and so no doubt there is a simple solution to this problem. Here is my code:
import random
def random_birthdays():
bdays = []
bdays = [random.randint(1, 365) for i in range(23)]
bdays.sort()
for x in bdays:
while x < len(bdays)-1:
if bdays[x] == bdays[x+1]:
print(bdays[x])
return True
x+=1
return False
count = 0
for i in range (1000):
if random_birthdays() == True:
count = count + 1
print('In a sample of 1000 classes each with 23 pupils, there were', count, 'classes with individuals with the same birthday')

Besides, your function should be implemented like this:
import random
def random_birthdays(pupils):
bdays = [random.randint(1, 365) for _ in range(pupils)]
return pupils > len(set(bdays))
This eliminates so many sources of error.
This can be called as #Zefick has indicated:
count = sum(random_birthdays(23) for _ in range(1000))

Error in this line:
for x in bdays:
should be
for x in range(len(bdays)):
Because you need to iterate over indices of birthdays but not birthdays itself.
And one more optimization:
count = 0
for i in range (1000):
if random_birthdays() == True:
count = count + 1
can be replaced by
count = sum(random_birthdays() for _ in range(1000))

import math
def find(p):
return math.ceil(math.sqrt(2*365*math.log(1/(1-p))));
print(find(0.25))

Here's how I wrote it.
# check probability for birthday reoccurance for a class of 23 students or the birthday paradox
import random as r
def check_date(students):
date=[]
count=0
for i in range(students): # Generate a random age for n students
date+=[r.randint(1,365)] # entire sample list for age is created
for letter in date: # check if the date repeats anywhere else
if date.count(letter)>=2: # Use count it's simple & easy.
count+=1
return count # count of a pair of students having same b.day
def simulations(s,students):
result=[] # empty list to update data.
simulation_match=0
for i in range(s):
result+=[check_date(students)] # get a sample list for all the students in 'n' no. of simulations
if check_date(students)>1: # if atleat 2 students have same b.day in each simulation
simulation_match+=1
return simulation_match,s,int(simulation_match/s*100),'%'
simulations(1000,23) # 1000 simulations with 23 students sample size
OUT: (494, 1000, 49, '%') ** the percentage part varies based on the random int generated**

Related

How do I print only 3 values per line with right alignment?

The below is my current code. However, I want to print only 3 values per line with 5 spaces between column with right alignment. I am trying to get my code to match with the image below.
def formats():
import random
lst = []
for i in range(100):
lst.append(random.uniform(0, 1000)) #Get a random number btw 0 and 1000
num = eval(input('Enter number of values to retrieve: '))
for x in range(num+1):
print('${:>5,.2f}'.format(lst[x]), end=' ')
well, I tried this, and it is working fine,
for x in range(num):
print('${:>5,.2f}'.format(lst[x]), end=' ')
if (x+1)%3==0:
print('\n')
I don't know why you were using range(num+1). If you want 19 values, this will be returning 20.
Not the most elegant solution, but works fine. Including the justification.
import random
lst = []
for i in range(100):
lst.append(random.uniform(0, 1000)) #Get a random number btw 0 and 1000
num = eval(input('Enter number of values to retrieve: '))
groups_of_3 = [lst[0:num][i:i+3] for i in range(0, num, 3)]
for group in groups_of_3:
for value in group:
justfied_num = (str(value).split(".")[0] + "." + str(value).split(".")[1][0:2]).rjust(6)
print(f'${justfied_num}', end=' ')
print()

Print Each Combination in Shell

If you have a range of numbers from 1-49 with 6 numbers to choose from, there are nearly 14 million combinations. Using my current code (below), I have only 85,805 combinations remaining. I want to get all those 85,805 combinations to print into the Python shell showing every combination rather than the number of combinations possible as I'm currently seeing. Is that possible? Here's my code:
import functools
_MIN_SUM = 152
_MAX_SUM = 152
_MIN_NUM = 1
_MAX_NUM = 49
_NUM_CHOICES = 6
_MIN_ODDS = 2
_MAX_ODDS = 4
#functools.lru_cache(maxsize=None)
def f(n, l, s = 0, odds = 0):
if s > _MAX_SUM or odds > _MAX_ODDS:
return 0
if n == 0 :
return int(s >= _MIN_SUM and odds >= _MIN_ODDS)
return sum(f(n-1, i+2, s+i, odds + i % 2) for i in range(l, _MAX_NUM+1))
result = f(_NUM_CHOICES, _MIN_NUM)
print('Number of choices = {}'.format(result))
Thank you!
Printing to the console is rather slow. You might want to print it to a file instead.
print("Hello World")
# vs
with open("file.txt", "w") as f:
print("Hello World", file=f)
Try using for loops and recursion together:
def combinations(base, numbers, placesRemaining):
out = []
for i in numbers:
if placesRemaining <= 1:
out.append(base*i)
else:
out.extend(combinations(base*i, numbers, placesRemaining-1))
return out
places = 6
numbers = range(1, 50)
answer = combinations(1, numbers, places)
That solution is not likely to run into the recursion limit, as the maximum recursion depth is equal to places. I did not run this on the full problem, but it performed well on smaller ones. Altering the starting base will multiply every number you calculate by that number, so I do not recommend it.

Algorithm is too slow; do I have higher odds of winning the lotto i pick the same 4 numbers

Here is the problem statement.
There is a lottery where 4 random numbers are picked everyday.
I want to find out whether I have better odds of winning the lottery (let's say over 1 000 000 trials).
I have added the solution I have written to solve this problem, but it is very slow, running. Anything over 3000 trials is very very slow.
I have added comments to my code to show my reasoning
ADD: I need help finding the bottleneck
ADD2: Code is complete, sorry, had renamed a few variables
#lottery is 4 numbers
#lottery runs 365 days a year
#i pick the same number every day, what are my odds of winning/how many times will i win
#what are my odds of winning picking 4 random numbers
import random
my_pick = [4,4,4,7]
lotto_nums = list(range(0,9))
iterations = 3000
#function to pick 4 numbers at random
def rand_func ():
rand_pick = [random.choice(lotto_nums) for _ in range(4)]
return rand_pick
#pick 4 random numbers X amount of times
random_pick = [rand_func() for _ in range(iterations)]
#pick 4 random numbers for the lottery itself
def lotto ():
lotto_pick = [random.choice(lotto_nums) for _ in range(4)]
return lotto_pick
#check how many times I picked the correct lotto numbers v how many times i randomly generated numbers that would have won me the lottery
def lotto_picks ():
lotto_yr =[]
for _ in range(iterations):
lotto_yr.append(lotto())
my_count = 0
random_count = 0
for lotto_one in lotto_yr:
if my_pick == lotto_one:
my_count = my_count +1
elif random_pick == lotto_one:
random_count = random_count +1
print('I have {} % chance of winning if pick the same numbers versus {} % if i picked random numbers. The lotto ran {} times'.format(((my_count/iterations)*100), ((random_count/iterations)*100), iterations))
lotto_picks()
The reason of why your code is slow is because in each iteration you are calculating all simulations all over again. In reality you need to check if you won the lottery only once per simulation. So lotto_picks() should probably look something like this:
def lotto_picks ():
lotto_yr = []
my_count = 0
random_count = 0
for _ in range(iterations):
new_numbers = lotto()
lotto_yr.append(new_numbers) # You can still save them for later analysis
if my_pick == new_numbers:
my_count = my_count +1
if random_pick == new_numbers: # Changed from elif to if
random_count = random_count +1
print('I have {} % chance of winning if pick the same numbers versus {} % if i picked random numbers. The lotto ran {} times'.format(((my_count/iterations)*100), ((random_count/iterations)*100), iterations))
This will make your program run in linear time O(n), and before your code was running at a quadratic time complexity O(n^2).
Your problem is with the nested for loop.
Your initial running time for your first for loop is of the order O(n) (aka linear).
For each initial iteration (let's say i) your nested loop runs i times.
for i in range(iterations):
for lotto_one in i:
This means that in total your nested loop will be run 4501500 times (sum of numbers from 1 to 3000). Add your initial outer loop iterations to it (3000) and you get 4 504 500 "real" iterations total. Which gives you something like O(n^1.9) running time, almost ^2 running time. That's your bottleneck.

How can I get the average of a range of inputs?

I have to create a program that shows the arithmetic mean of a list of variables. There are supposed to be 50 grades.
I'm pretty much stuck. Right now I´ve only got:
for c in range (0,50):
grade = ("What is the grade?")
Also, how could I print the count of grades that are below 50?
Any help is appreciated.
If you don't mind using numpy this is ridiculously easy:
import numpy as np
print np.mean(grades)
Or if you'd rather not import anything,
print float(sum(grades))/len(grades)
To get the number of grades below 50, assuming you have them all in a list, you could do:
grades2 = [x for x in grades if x < 50]
print len(grades2)
Assuming you have a list with all the grades.
avg = sum(gradeList)/len(gradeList)
This is actually faster than numpy.mean().
To find the number of grades less than 50 you can put it in a loop with a conditional statement.
numPoorGrades = 0
for g in grades:
if g < 50:
numPoorGrades += 1
You could also write this a little more compactly using a list comprehension.
numPoorGrades = len([g for g in grades if g < 50])
First of all, assuming grades is a list containing the grades, you would want to iterate over the grades list, and not iterate over range(0,50).
Second, in every iteration you can use a variable to count how many grades you have seen so far, and another variable that sums all the grades so far. Something like that:
num_grades = 0
sum_grades = 0
for grade in grades:
num_grades += 1 # this is the same as writing num_grades = num_grades + 1
sum_grades += sum # same as writing sum_grades = sum_grades + sum
Now all you need to do is to divide sum_grades by num_grades to get the result.
average = float(sum_grade)s / max(num_grades,1)
I used the max function that returns the maximum number between num_grades and 1 - in case the list of grades is empty, num_grades will be 0 and division by 0 is undefined.
I used float to get a fraction.
To count the number of grades lower than 50, you can add another variable num_failed and initialize him to 0 just like num_counts, add an if that check if grade is lower than 50 and if so increase num_failed by 1.
Try the following. Function isNumber tries to convert the input, which is read as a string, to a float, which I believe convers the integer range too and is the floating-point type in Python 3, which is the version I'm using. The try...except block is similar in a way to the try...catch statement found in other programming languages.
#Checks whether the value is a valid number:
def isNumber( value ):
try:
float( value )
return True
except:
return False
#Variables initialization:
numberOfGradesBelow50 = 0
sumOfAllGrades = 0
#Input:
for c in range( 0, 5 ):
currentGradeAsString = input( "What is the grade? " )
while not isNumber( currentGradeAsString ):
currentGradeAsString = input( "Invalid value. What is the grade? " )
currentGradeAsFloat = float( currentGradeAsString )
sumOfAllGrades += currentGradeAsFloat
if currentGradeAsFloat < 50.0:
numberOfGradesBelow50 += 1
#Displays results:
print( "The average is " + str( sumOfAllGrades / 5 ) + "." )
print( "You entered " + str( numberOfGradesBelow50 ) + " grades below 50." )

summing the dice trials and histogram plot

I am stuck in a code in python which takes in number of dices and number of rolls and returns the sum of numbers obtained. It should also print the histogram of the sum. I am stuck in the first part of the code. Can someone help me fix this? Not sure where i am going wrong. Any help for the second part (returning histogram) would be helpful for me to learn it in python.
from random import choice
def roll(rolls,dice):
d = []
for _ in range(rolls):
d[sum(choice(range(1,7)) for _ in range(dice))] += 1
return(d)
Your problem here is that you can't arbitrarily index into an empty list:
l = []
l[13] += 1 # fails with IndexError
Instead, you could use a defaultdict, which is a special type of dictionary that doesn't mind if a key hasn't been used yet:
from collections import defaultdict
d = defaultdict(int) # default to integer (0)
d[13] += 1 # works fine, adds 1 to the default
or Counter, which is designed for cases like this ("provided to support convenient and rapid tallies") and provides extra handy functions (like most_common(n), to get the n most common entries):
from collections import Counter
c = Counter()
c[13] += 1
To manually use a standard dict to do this, just add a check:
d = {}
if 13 in d: # already there
d[13] += 1 # increment
else: # not already there
d[13] = 1 # create
Try this,
from random import choice
import pylab
def roll( rolls, dice ):
s = list()
for d in range( dice ):
for r in range( rolls ):
s.append( choice( range(1,7) ) )
return s
s = roll( rolls, dice )
sum_of_rolls = sum( s )
# then to plot..
pylab.hist( s )
This should do it
import random
def rolls(N, r): # N=number of dice. r=number of rolls
myDie = [1,2,3,4,5,6]
answer = {}
for _rolling in range(r):
rolls = []
for _die in range(N):
rolls.append(random.choice(myDie))
total = 0
for roll in rolls:
total += roll
if total not in answer:
answer[total] = 0
answer[total] += 1
return answer

Categories

Resources