Below python code represents a small 'game':
I flip a coin 100 times and get a sequence of Heads and Tails
I try to figure out how many times in that sequence it happens that there are 6 Heads or 6 Tails after each other.
I want to run this 10000 times and then calculate the average occurrence of streaks per sequence.
I know the solution is that there are about 0.8 streaks per sequence, but I get to a number of 1.6 and I cannot figure out what I am doing wrong...
I have obviously seen other solutions, but I would like to figure out how I can make this specific code work.
Could you have a look at the below code and let me know what I am doing wrong?
import random
numberOfStreaks = 0
possib = ['H', 'T']
folge = ''
x = 0
while x < 10000:
for i in range (100):
folge = folge + str(random.choice(possib))
numberOfStreaks = folge.count('TTTTTT') + folge.count('HHHHHH')
x = x + 1
print(numberOfStreaks)
Since you "know" the answer you seek is ~= 0.8:
I believe you have misinterpreted the question. I suspect that the question you really want to answer is the (in)famous one from "Automate the Boring Stuff with Python" by Al Sweigart (emphasis mine):
If you flip a coin 100 times ...
... Write a program to find out how often a streak of six heads or a
streak of six tails comes up in a randomly generated list of heads and
tails. Your program breaks up the experiment into two parts: the first
part generates a list of randomly selected 'heads' and 'tails' values,
and the second part checks if there is a streak in it. Put all of this
code in a loop that repeats the experiment 10,000 times so we can find
out what percentage of the coin flips (experiments) contains a
streak of six heads or tails in a row.
Part 1 (generate a list of randomly selected 'heads' and 'tails' values):
observations = "".join(random.choice("HT") for _ in range(100))
Part 2 (checks if there is a streak in it.):
has_streak = observations.find("H"*6) != -1 or observations.find("T"*6) != -1
Part Do Loop (put code in a loop that repeats the experiment 10,000 times):
experimental_results = []
for _ in range(10_000):
observations = "".join(random.choice("HT") for _ in range(100))
has_streak = observations.find("H"*6) != -1 or observations.find("T"*6) != -1
experimental_results.append(has_streak)
Part Get Result (find percentage of the experiments that contain a streak):
print(sum(experimental_results)/len(experimental_results))
This should give you something close to:
0.8
Full Code:
import random
experimental_results = []
for _ in range(10_000):
observations = "".join(random.choice("HT") for _ in range(100))
has_streak = observations.find("H"*6) != -1 or observations.find("T"*6) != -1
experimental_results.append(has_streak)
print(sum(experimental_results)/len(experimental_results))
If however, the question you seek to answer is:
On average, how many occurrences of of at least 6 consecutive
heads or tails there are in 100 flips of a coin?
Then we can count them up and average that like:
import random
def count_streaks(observations):
streaks = 0
streak_length = 1
prior = observations[0]
for current in observations[1:]:
if prior == current:
streak_length += 1
if streak_length == 6:
streaks += 1
else:
streak_length = 1
prior = current
return streaks
experimental_results = []
for _ in range(10_000):
observations = [random.choice("HT") for _ in range(100)]
observed_streaks = count_streaks(observations)
experimental_results.append(observed_streaks)
print(sum(experimental_results)/len(experimental_results))
This will give you a result of about:
1.50
Note:
Your code uses folge.count('TTTTTT'). I believe this code and any answer that uses a similar strategy is likely (over the course of 10k experiments) to overestimate the answer as ("H"*12).count("H"*6) is 2 not 1.
For example:
This otherwise excellent answer by #samwise (Probability of streak of heads or tails in sequence of coin tossing) consistently generates results in the range of:
1.52
You're appending to folge each time through the x loop, so the 10000 different runs aren't independent of one another -- you don't have 10000 different sets of 100 tosses, you have a single set of 1000000 tosses (which is going to have slightly more streaks in it since you aren't "breaking" it after 100 tosses).
What you want to do is count the streaks for each set of 100 tosses, and then take the mean of all those counts:
from random import choice
from statistics import mean
def count_streaks(folge: str) -> int:
return folge.count("TTTTTT") + folge.count("HHHHHH")
print(mean(
count_streaks(''.join(
choice("HT") for _ in range(100)
))
for _ in range(10000)
))
import random as rd
n = 0
ListOfStreaks = []
ListOfResults = []
while n != 10:
numberOfStreaks = 0
for i in range(100):
Flip = rd.randint(0,1)
ListOfResults.append(Flip)
for i in range(96):
count = 0
for j in range(6):
if ListOfResults[i] == ListOfResults[i + j]:
count += 1
if count == 6:
numberOfStreaks += 1
count = 0
else:
continue
else:
break
ListOfStreaks.append(numberOfStreaks)
n += 1
print(ListOfStreaks)
print(len(ListOfResults))
In the code above, I am able to successfully flip a coin 100 times, and examine how many times in the 100 flips Heads or Tails came up six time in a row. I am unable to properly set up the code to run the experiment 10 times in order to examine how many times Heads or Tails came up six times in a row in each of the single experiments. The goal is to not flip the coins 1,000 times in a row but 10 experiments of flipping 100 coins in a row.
The exercise focuses on later being able to simulate the experiment 10,000 times in order to see what the probability is of Heads or Tails appearing six times in a row in 100 flips. Essentially, I am trying to gather enough of a sample size. While there are actual statistical/probability methods to get the exact answer, that isn't what I am trying to focus on.
CoinFlip Code
Your key problem appears to be that you have ListOfResults = [] outside of your while loop, so each run adds another 100 entries to the list instead of setting up a new test.
I've replaced the initial for loop with a list comprehension which sets up a new sample each time.
import random as rd
list_of_streaks = []
for _ in range(10):
list_of_results = [rd.randint(0,1) for _ in range(100)]
number_of_streaks = 0
for i in range(96):
if sum(list_of_results[i: i+6]) in(0, 6):
number_of_streaks += 1
list_of_streaks.append(number_of_streaks)
print(list_of_streaks)
print(len(list_of_results))
You also don't need the inner for loop to add up all of the 6 flips - you can just sum them to see if the sum is 6 or 0. You appear to have just tested for heads - I tested for 6 identical flips, either heads or tails, but you can adjust that easily enough.
It's also much easier to use a for loop with a range, rather than while with a counter if you are iterating over a set number of iterations.
The first comment from #JonSG is also worth noting. If you had set up the individual test as a function, you'd have been forced to have ListOfResults = [] inside the function, so you would have got a new sample of 100 results each time. Something like:
import random as rd
def run_test():
list_of_results = [rd.randint(0,1) for _ in range(100)]
number_of_streaks = 0
for i in range(96):
if sum(list_of_results[i: i+6]) in(0, 6):
number_of_streaks += 1
return number_of_streaks
print([run_test() for _ in range(10)])
print(len(list_of_results))
So i was learning how to handle probabilities and how to plot them in Python. I came across a problem where i needed to find the probability of the sum of 2 dices being > 7 or odd. I know the result is around 75% but it was translating that to Python that i had a problem with. My code to solve the problem is something like this:
import random
import numpy as np
dice = [1,2,3,4,5,6]
value = 0
simulation_number = len(dice)**2
percentage = []
for i in range(0,simulation_number):
dice1 = random.choice(dice)
dice2 = random.choice(dice)
random_match = dice1,dice2
if (sum(random_match))>7 or (sum(random_match)%2) != 0:
value += 1
percentage.append(np.round((value/simulation_number)*100,2))
print(percentage,"%")
It works just fine but everytime i run the code it gives a different solution because the loop is repeating outcomes for random_match. How do I include in the code the condition of not repeating random_match values?
Generating random values from 1 to 6 won't work. Assume that you are tossing a coin 10 times. theoretically you should get 5 heads and 5 tails. But that does not happens in real life because of sampling error. When you generate random values, there is always some sampling error.
import random
import numpy as np
dice = [1,2,3,4,5,6]
value = 0
simulation_number = len(dice)**2
for i in range(len(dice)):
for j in range(len(dice)):
dice1 = dice[i]
dice2 = dice[j]
x = dice1+dice2
if x>7 or x%2== 1:
value += 1
percentage = np.round((value/simulation_number)*100,2)
print(f'{percentage} %')
This works as you need every case only once. Using random might take same case again and again
I guess you're looking for random.sample:
import random
dice = [1,2,3,4,5,6]
dice1, dice2 = random.sample(dice, 2)
Here is the problem statement.
There is a lottery where 4 random numbers are picked everyday.
I want to find out whether I have better odds of winning the lottery (let's say over 1 000 000 trials).
I have added the solution I have written to solve this problem, but it is very slow, running. Anything over 3000 trials is very very slow.
I have added comments to my code to show my reasoning
ADD: I need help finding the bottleneck
ADD2: Code is complete, sorry, had renamed a few variables
#lottery is 4 numbers
#lottery runs 365 days a year
#i pick the same number every day, what are my odds of winning/how many times will i win
#what are my odds of winning picking 4 random numbers
import random
my_pick = [4,4,4,7]
lotto_nums = list(range(0,9))
iterations = 3000
#function to pick 4 numbers at random
def rand_func ():
rand_pick = [random.choice(lotto_nums) for _ in range(4)]
return rand_pick
#pick 4 random numbers X amount of times
random_pick = [rand_func() for _ in range(iterations)]
#pick 4 random numbers for the lottery itself
def lotto ():
lotto_pick = [random.choice(lotto_nums) for _ in range(4)]
return lotto_pick
#check how many times I picked the correct lotto numbers v how many times i randomly generated numbers that would have won me the lottery
def lotto_picks ():
lotto_yr =[]
for _ in range(iterations):
lotto_yr.append(lotto())
my_count = 0
random_count = 0
for lotto_one in lotto_yr:
if my_pick == lotto_one:
my_count = my_count +1
elif random_pick == lotto_one:
random_count = random_count +1
print('I have {} % chance of winning if pick the same numbers versus {} % if i picked random numbers. The lotto ran {} times'.format(((my_count/iterations)*100), ((random_count/iterations)*100), iterations))
lotto_picks()
The reason of why your code is slow is because in each iteration you are calculating all simulations all over again. In reality you need to check if you won the lottery only once per simulation. So lotto_picks() should probably look something like this:
def lotto_picks ():
lotto_yr = []
my_count = 0
random_count = 0
for _ in range(iterations):
new_numbers = lotto()
lotto_yr.append(new_numbers) # You can still save them for later analysis
if my_pick == new_numbers:
my_count = my_count +1
if random_pick == new_numbers: # Changed from elif to if
random_count = random_count +1
print('I have {} % chance of winning if pick the same numbers versus {} % if i picked random numbers. The lotto ran {} times'.format(((my_count/iterations)*100), ((random_count/iterations)*100), iterations))
This will make your program run in linear time O(n), and before your code was running at a quadratic time complexity O(n^2).
Your problem is with the nested for loop.
Your initial running time for your first for loop is of the order O(n) (aka linear).
For each initial iteration (let's say i) your nested loop runs i times.
for i in range(iterations):
for lotto_one in i:
This means that in total your nested loop will be run 4501500 times (sum of numbers from 1 to 3000). Add your initial outer loop iterations to it (3000) and you get 4 504 500 "real" iterations total. Which gives you something like O(n^1.9) running time, almost ^2 running time. That's your bottleneck.
I'm trying to understand this solution of the Monty Hall problem, I understand most of the code, but am stuck on two pieces.
Below is the code, but specifically I'm stuck on these two parts
result[bad] = np.random.randint(0,3, bad.sum())
and the entire switch_guess function.
If anyone could explain in plain English for me that would be awesome.
#Simulates picking a prize door
def simulate_prizedoor(nsim):
return np.random.randint(0,3,(nsim))
#Simulates the doors guessed
def simulate_guesses(nsim):
return np.zeros(nsim, dtype=np.int)
#Simulates the "game host" showing whats behind a door
def goat_door(prize_doors, guesses):
result = np.random.randint(0,3, prize_doors.size)
while True:
bad = (result == prize_doors) | (result == guesses)
if not bad.any():
return result
result[bad] = np.random.randint(0,3, bad.sum())
#Used to change your guess
def switch_guess(guesses, goat_doors):
result = np.zeros(guesses.size)
switch = {(0, 1): 2, (0, 2): 1, (1, 0): 2, (1, 2): 1, (2, 0): 1, (2, 1): 0}
for i in [0,1,2]:
#print "i = ", i
for j in [0,1,2]:
#print "j = ", j
mask = (guesses == i) & (goat_doors == j)
#print "mask = ", mask
if not mask.any():
continue
result = np.where(mask, np.ones_like(result) * switch[(i, j)], result)
return result
#Calculates the win percentage
def win_percentage(guesses, prizedoors):
return 100 * (guesses == prizedoors).mean()
#The code to pull everything together
nsim = 10000
#keep guesses
print "Win percentage when keeping original door"
print win_percentage(simulate_prizedoor(nsim), simulate_guesses(nsim))
#switch
pd = simulate_prizedoor(nsim)
guess = simulate_guesses(nsim)
goats = goat_door(pd, guess)
guess = switch_guess(guess, goats)
print "Win percentage when switching doors"
print win_percentage(pd, guess)
… specifically I'm stuck on these two parts
result[bad] = np.random.randint(0,3, bad.sum())
Let's break this down into pieces. It may help to reduce that 10000 to something small, like 5, so you can print out the values (either with print calls, or in the debugger) and see what's going on.
When we start this function, prize_doors is going to have 5 random values from 0 to 2, like 2 2 0 1 2, and guesses will have 5 values, all 0, like 0 0 0 0 0. result will therefore start off with 5 random values from 0 to 2, like 0 2 2 0 1.
Each first time through the loop, bad will be a list of 5 bool values, which are each True if the corresponding value in result matches the corresponding value in either prize_doors or guesses. So, in this example, True True False True False, because guess #1 matches prize_doors, and guesses #0 and #3 match goats.
Unfortunately, we're just going to go around that loop forever, because there's nothing inside the loop that modifies result, and therefore bad is going to be the same forever, and doing the same check forever is always going to return the same values.
But if you indent that result[bad] = … line so it's inside the loop, that changes everything. So, let's assume that's what you were supposed to do, and you just copied it wrong.
When treated as numbers, True and False have values 1 and 0, respectively. So, bad.sum() is a count of how many matches there were in bad—in this case, 3.
So, np.random.randint(0, 3, bad.sum()) picks 3 random values from 0 to 2, let's say 1 0 1.
Now, result[bad] selects all of the elements of result for which the corresponding value in bad is True, so in this example it's result[0], result[1], and result[3].
So we end up assigning that 1 0 1 to those three selected locations, so result is now 1 0 2 1 1.
So, next time through the loop, bad is now True False False False False. We've still got at least one True value, so we run that result[bad] = line again. This time, bad.sum() is 1, so we pick 1 random value, let's say 0, and we then assign that 1 value to result[0], so result is now 0 0 2 1 1.
The next time through, bad is now False False False False False, so bad.any() is False, so we're done.
In other words, each time through, we take all the values that don't match either the prize door or the goat door, and pick a new door for them, until finally there are no such values.
It also confused me, until 5 mins ago when I finally figured it out.
Since the first question has been solved, I will only talk about the second one.
The intuition goes like this : given a sequence of (guesses, goatdoors),in the (i,j) loop, there are always some simulation (e.g., simulation[0] and simulation[5]) that 'hit' by the (i,j), that is the say, the 0th and 5th simulation have guess i and goatdoor j.
Variable mask record 0 and 5 in this example. Then result in 0th and 5th can be decided, because in these simulation, the only possible door to switch to is determined by i and j. So np.where refreshes result in these simulation, leave other simulations unchanged.
Intuition is above. You need to know how np.where work if you want to know what I'm talking about. Good luck.