Probability seems wrong using Python randint - python

I'm using Python 2.7 to generate two random numbers, both 1-100 (including 1 and 100), and if they are the same, an event occurs. I would think that the probability of this would be 1/10000 because
1/100 * 1/100 = 1/10000
but the number times that it takes for the two numbers to match up is usually between 10-200. Why does this happen and is there any way to fix it?
Here's my full code:
import random
p5SickGen1 = random.randint(1,100)
p5SickGen2 = random.randint(1,100)
counter = 0
while p5SickGen1 != p5SickGen2:
counter += 1
p5SickGen1 = random.randint(1,100)
p5SickGen2 = random.randint(1,100)
print(counter)

As #jgritty said earlier, your assumption is wrong.
The probability would not be 1/10000 because you are selecting from two different sets of numbers at the same time, which doesn't mean that you are picking a number from a set of numbers twice.
You can easily find the solution like this;
The number of the possibilities of getting same numbers are;
(1,1), (2,2), (3,3), (4,4), (5,5), ..., (100, 100) = 100
Your sample space is 100*100 = 10000. Thus the probability of getting same numbers in one pick;
100 / 10000 = 0.01
Hope this helps.
Btw, for those who are interested in learning the basics of probability, you can start from here.

Your assumption is wrong. There's nothing wrong here.
The odds of the numbers coming up the same twice in a row is simply 1 in 100.
Now, if you pick a specific number, say 42. The odds of getting 42 both times is 1 in 10000.

Related

How to generate random binary numbers 0 or 1 with length of N and with option to control the probability of having 0 or 1?

I want to generate random binary numbers (0 or 1) having N length size. The tricky part is that, It should be able to control the probability of having either more 1 or 0. For example, I want total 100 random numbers with 0 having probability of 40% and 1 having probability of 60%. Please help.
A general solution for controlling this distribution is as follows:
First generate a uniform random number between 0-100 (or 0-1000 for more control, i.e if you need 60.1% chance for a number)
Then if the number is below or equal to 60, assign 1, now you have a 60% chance to assign 1.
I hope this helps, I think you will figure it out.
You can always store the count of 0s and 1s. Here since you need a total of 100 random numbers with 0 having the probability of 40% and 1 having the probability of 60% so lets initialize count_0=40, count_1=60, total_count=100. Now you can generate a random number between 0 and 100.
Let us assume the first randomly generated number happens to be 35. Since 35 is less than 40 so first result happens to be 0. Decrement count_0. Decrement total_count. Now, for the next random no pick a randomly generated number between 0 and 99(total_count). This time if the randomly generated number is less than or equal to count_0 ( which equals 39 ) you set the result to be 0 else 1.
Follow this process for 100 iterations to generate 100 random 0s and 1s. This approach uses no extra space and O(n) time complexity where n is the number of 0s and 1s to be generated.
In Python:
prob = 0.6 #p = prob of having 1
n_samples = np.random.choice([0,1], size=N, p=[prob, 1-prob])

Been using rand.int for a while and seeing unexpected results

I've been running some code for an hour or so using a rand.int function, where the code models a dice's roll, where the dice has ten faces, and you have to roll it six times in a row, and each time it has to roll the same number, and it is tracking how many tries it takes for this to happen.
success = 0
times = 0
count = 0
total = 0
for h in range(0,100):
for i in range(0,10):
times = 0
while success == 0:
numbers = [0,0,0,0,0,0,0,0,0,0]
for j in range(0,6):
x = int(random.randint(0,9))
numbers[x] = 1
count = numbers.count(1)
if count == 1:
success = 1
else:
times += 1
print(i)
total += times
success = 0
randtst = open("RandomTesting.txt", "a" )
randtst.write(str(total / 10)+"\n")
randtst.close()
And running this code, this has been going into a file, the contents of which is below
https://pastebin.com/7kRK1Z5f
And taking the average of these numbers using
newtotal = 0
totalamounts = 0
with open ('RandomTesting.txt', 'rt') as rndtxt:
for myline in rndtxt: ,
newtotal += float(myline)
totalamounts += 1
print(newtotal / totalamounts)
Which returns 742073.7449342106. This number is incorrect, (I think) as this is not near to 10^6. I tried getting rid of the contents and doing it again, but to no avail, the number is nowhere near 10^6. Can anyone see a problem with this?
Note: I am not asking for fixes to the code or anything, I am asking whether something has gone wrong to get the above number rather that 100,000
There are several issues working against you here. Bottom line up front:
your code doesn't do what you described as your intent;
you currently have no yardstick for measuring whether your results agree with the theoretical answer; and
your expectations regarding the correct answer are incorrect.
I felt that your code was overly complex for the task you were describing, so I wrote my own version from scratch. I factored out the basic experiment of rolling six 10-sided dice and checking to see if the outcomes were all equal by creating a list of length 6 comprised of 10-sided die rolls. Borrowing shamelessly from BoarGules' comment, I threw the results into a set—which only stores unique elements—and counted the size of the set. The dice are all the same value if and only if the size of the set is 1. I kept repeating this while the number of distinct elements was greater than 1, maintaining a tally of how many trials that required, and returned the number of trials once identical die rolls were obtained.
That basic experiment is then run for any desired number of replications, with the results placed in a numpy array. The resulting data was processed by numpy and scipy to yield the average number of trials and a 95% confidence interval for the mean. The confidence interval uses the estimated variability of the results to construct a lower and an upper bound for the mean. The bounds produced this way should contain the true mean for 95% of estimates generated in this way if the underlying assumptions are met, and address the second point in my BLUF.
Here's the code:
import random
import scipy.stats as st
import numpy as np
NUM_DIGITS = 6
SAMPLE_SIZE = 1000
def expt():
num_trials = 1
while(len(set([random.randrange(10) for _ in range(NUM_DIGITS)])) > 1):
num_trials += 1
return num_trials
data = np.array([expt() for _ in range(SAMPLE_SIZE)])
mu_hat = np.mean(data)
ci = st.t.interval(alpha=0.95, df=SAMPLE_SIZE-1, loc=mu_hat, scale=st.sem(data))
print(mu_hat, ci)
The probability of producing 6 identical results of a particular value from a 10-sided die is 10-6, but there are 10 possible particular values so the overall probability of producing all duplicates is 10*10-6, or 10-5. Consequently, the expected number of trials until you obtain a set of duplicates is 105. The code above took a little over 5 minutes to run on my computer, and produced 102493.559 (96461.16185897154, 108525.95614102845) as the output. Rounding to integers, this means that the average number of trials was 102493 and we're 95% confident that the true mean lies somewhere between 96461 and 108526. This particular range contains 105, i.e., it is consistent with the expected value. Rerunning the program will yield different numbers, but 95% of such runs should also contain the expected value, and the handful that don't should still be close.
Might I suggest if you're working with whole integers that you should be receiving a whole number back instead of a floating point(if I'm understanding what you're trying to do.).
##randtst.write(str(total / 10)+"\n") Original
##randtst.write(str(total // 10)+"\n")
Using a floor division instead of a division sign will round down the number to a whole number which is more idea for what you're trying to do.
If you ARE using floating point numbers, perhaps using the % instead. This will not only divide your number, but also ONLY returns the remainder.
% is Modulo in python
// is floor division in python
Those signs will keep your numbers stable and easier to work if your total returns a floating point integer.
If this isn't the case, you will have to account for every number behind the decimal to the right of it.
And if this IS the case, your result will never reach 10x^6 because the line for totalling your value is stuck in a loop.
I hope this helps you in anyway and if not, please let me know as I'm also learning python.

Why does this not seem to be random?

I was running a procedure to be like one of those games were people try to guess a number between 0 and 100 where there are 100 people guessing.I then averaged how many different guesses there are.
import random
def averager(times):
tests=[]
for i in range(times):
l=[]
for i in range(0,100):
l.append(random.randint(0,100))
tests.append(len(set(l)))
return (sum(tests))/len(tests)
print(averager(1000))
For some reason, the number of different guesses averages out to 63.6
Why is this?Is it due to a flaw in the python random library?
In a scenario where people were guessing a number between 1 and 10
The first person has a 100% chance to guess a previously unguessed number
The second person has a 90% chance to guess a previously unguessed number
The third person has a 80% chance to guess a previously unguessed number
and so on...
The average chance of guessing a new number(by my reasoning) is 55%.
But the data doesn't reflect this.
Your code is for finding the average number of unique guesses made by 100 people each guessing a number from 1 to 100.
As for why it converges to a number around 63... you should post your question to the math Stack Exchange.
If this was a completely flat distribution, you would expect the average to come out as 100, meaning everybody's guess was different. However, you know that such a scenario is much less random than a scenario where you have duplication. The fact that you get repeated numbers during a random sequence should be comforting.
All you are doing here is measuring some kind of uniqueness within very small sets: ie 1000 repeats of an experiment involving 100 random values. You might get a better appreciation of this if you use some sort of bootstrapping algorithm to sample from.
Also, if you scale up the number of repeats to millions, and perhaps measure the sample distribution (not just the mean), you'll have a little more confidence in the results you're getting.
It may be that the pseudo-random generator has a characteristic which yields approximately 60-70% non-repeated values inside a sequence the same length as the range. However, you would need to experiment with far more samples, as well as different random seeds. Otherwise your results are meaningless.
I modified your code so it would take an already generated sequence as input, rather than calculating random numbers:
def averager(seqs):
tests = []
for s in seqs:
tests.append(len(set(s)))
return float(sum(tests))/len(tests)
Then I made a function to return all possible choices for any given number of people and guess range:
def combos(n, limit):
return itertools.product(*((range(limit),) * n))
(One of the things I love about Python is that it's so easy to break apart a function into trivial pieces.)
Then I started testing with increasing numbers:
for n in range(2,100):
x = averager(combos(n, n))
print n, x, x/n
2 1.5 0.75
3 2.11111111111 0.703703703704
4 2.734375 0.68359375
5 3.3616 0.67232
6 3.99061213992 0.66510202332
7 4.62058326038 0.660083322911
8 5.25112867355 0.656391084194
This algorithm has a horrible complexity, so at this point I got a MemoryError. As you can see, the percentage of unique results keeps dropping as the number of people and guess range keeps increasing.
Repeating the test with random numbers:
def rands(repeats, n, limit):
for i in range(repeats):
yield [random.randint(0, limit) for j in range(n)]
for n in range(10, 101, 10):
x = averager(rands(10000, n, n))
print n, x, x/n
10 6.7752 0.67752
20 13.0751 0.653755
30 19.4131 0.647103333333
40 25.7309 0.6432725
50 32.0471 0.640942
60 38.3333 0.638888333333
70 44.6882 0.638402857143
80 50.948 0.63685
90 57.3525 0.63725
100 63.6322 0.636322
As you can see the results are consistent with what we saw earlier and with your own observation. I'm sure a bit of combinatorial math could explain it all.

Probability exercise returning different result that expected

As an exercise I'm writing a program to calculate the odds of rolling 5 die with the same number. The idea is to get the result via simulation as opposed to simple math though. My program is this:
# rollFive.py
from random import *
def main():
n = input("Please enter the number of sims to run: ")
hits = simNRolls(n)
hits = float(hits)
n = float(n)
prob = hits/n
print "The odds of rolling 5 of the same number are", prob
def simNRolls(n):
hits = 0
for i in range(n):
hits = hits + diceRoll()
return hits
def diceRoll():
firstDie = randrange(1,7,1)
for i in range(4):
nextDie = randrange(1,7,1)
if nextDie!=firstDie:
success = 0
break
else:
success = 1
return success
The problem is that running this program with a value for n of 1 000 000 gives me a probability usually between 0.0006 and 0.0008 while my math makes me believe I should be getting an answer closer to 0.0001286 (aka (1/6)^5).
Is there something wrong with my program? Or am I making some basic mistake with the math here? Or would I find my result revert closer to the right answer if I were able to run the program over larger iterations?
The probability of getting a particular number five times is (1/6)^5, but the probability of getting any five numbers the same is (1/6)^4.
There are two ways to see this.
First, the probability of getting all 1's, for example, is (1/6)^5 since there is only one way out of six to get a 1. Multiply that by five dice, and you get (1/6)^5. But, since there are six possible numbers to get the same, then there are six ways to succeed, which is 6((1/6)^5) or (1/6)^4.
Looked at another way, it doesn't matter what the first roll gives, so we exclude it. Then we have to match that number with the four remaining rolls, the probability of which is (1/6)^4.
Your math is wrong. The probability of getting five dice with the same number is 6*(1/6)^5 = 0.0007716.
Very simply, there are 6 ** 5 possible outcomes from rolling 5 dice, and only 6 of those outcomes are successful, so the answer is 6.0 / 6 ** 5
I think your expected probability is wrong, as you've stated the problem. (1/6)^5 is the probability of rolling some specific number 5 times in a row; (1/6)^4 is the probability of rolling any number 5 times in a row (because the first roll is always "successful" -- that is, the first roll will always result in some number).
>>> (1.0/6.0)**4
0.00077160493827160479
Compare to running your program with 1 million iterations:
[me#host:~] python roll5.py
Please enter the number of sims to run: 1000000
The odds of rolling 5 of the same number are 0.000755

Combinatorics Counting Puzzle: Roll 20, 8-sided dice, what is the probability of getting at least 5 dice of the same value

Assume a game in which one rolls 20, 8-sided die, for a total number of 8^20 possible outcomes. To calculate the probability of a particular event occurring, we divide the number of ways that event can occur by 8^20.
One can calculate the number of ways to get exactly 5 dice of the value 3. (20 choose 5) gives us the number of orders of 3. 7^15 gives us the number of ways we can not get the value 3 for 15 rolls.
number of ways to get exactly 5, 3's = (20 choose 5)*7^15.
The answer can also be viewed as how many ways can I rearrange the string 3,3,3,3,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 (20 choose 5) times the total number of values we the zero's (assuming 7 legal values) 7^15 (is this correct).
Question 1: How can I calculate the number of ways to get exactly 5 dice of the same value(That is, for all die values).
Note: if I just naively use my first answer above and multiply bt 8, I get an enormous amount of double counting?
I understand that I could solve for each of the cases (5 1's), (5, 2's), (5, 3's), ... (5's, 8) sum them (more simply 8*(5 1's) ). Then subtract the sum of number of overlaps (5 1's) and (5 2's), (5 1's) and (5 3's)... (5 1's) and (5, 2's) and ... and (5, 8's) but this seems exceedingly messy. I would a generalization of this in a way that scales up to large numbers of samples and large numbers of classes.
How can I calculate the number of ways to get at least 5 dice of the same value?
So 111110000000000000000 or 11110100000000000002 or 11111100000001110000 or 11011211222222223333, but not 00001111222233334444 or 000511512252363347744.
I'm looking for answers which either explain the math or point to a library which supports this (esp python modules). Extra points for detail and examples.
I suggest that you spend a little bit of time writing up a Monte Carlo simulation and let it run while you work out the math by hand. Hopefully the Monte Carlo simulation will converge before you're finished with the math and you'll be able to check your solution.
A slightly faster option might involve creating a SO clone for math questions.
Double counting can be solved by use of the Inclusion/Exclusion Principle
I suspect it comes out to:
Choose(8,1)*P(one set of 5 Xs)
- Choose(8,2)*P(a set of 5 Xs and a set of 5 Ys)
+ Choose(8,3)*P(5 Xs, 5 Ys, 5 Zs)
- Choose(8,4)*P(5 Xs, 5 Ys, 5 Zs, 5 As)
P(set of 5 Xs) = 20 Choose 5 * 7^15 / 8^20
P(5 Xs, 5 Ys) = 20 Choose 5,5 * 6^10 / 8^20
And so on. This doesn't solve the problem directly of 'more then 5 of the same', as if you simply summed the results of this applied to 5,6,7..20; you would over count the cases where you have, say, 10 1's and 5 8's.
You could probably apply inclusion exclusion again to come up with that second answer; so, P(of at least 5)=P(one set of 20)+ ... + (P(one set of 15) - 7*P(set of 5 from 5 dice)) + ((P(one set of 14) - 7*P(one set of 5 from 6) - 7*P(one set of 6 from 6)). Coming up with the source code for that is proving itself more difficult.
The exact probability distribution Fs,i of a sum of i s-sided dice can be calculated as the repeated convolution of the single-die probability distribution with itself.
where for all and 0 otherwise.
http://en.wikipedia.org/wiki/Dice
This problem is really hard if you have to generalize it (get the exact formula).
But anyways, let me explain the algorithm.
If you want to know
the number of ways to get exactly 5
dice of the same value
you have to rephrase your previous problem, as
calculate the number of ways to get
exactly 5 dice of the value 3 AND no
other value can be repeated exactly 5
times
For simplicity's sake, let's call function F(20,8,5) (5 dice, all values) the first answer, and F(20,8,5,3) (5 dice, value 3) the second.
We have that F(20,8,5) = F(20,8,5,3) * 8 + (events when more than one value is repeated 5 times)
So if we can get F(20,8,5,3) it should be pretty simple isn't it?
Well...not so much...
First, let us define some variables:
X1,X2,X3...,Xi , where Xi=number of times we get the dice i
Then:
F(20,8,5)/20^8 = P(X1=5 or X2=5 or ... or X8=5, with R=20(rolls) and N=8(dice number))
, P(statement) being the standard way to write a probability.
we continue:
F(20,8,5,3)/20^8 = P(X3=5 and X1<>5 and ... and X8<>5, R=20, N=8)
F(20,8,5,3)/20^8 = 1 - P(X1=5 or X2=5 or X4=5 or X5=5 or X6=5 or X7=5 or X8=5, R=15, N=7)
F(20,8,5,3)/20^8 = 1 - F(15,7,5)/7^15
recursively:
F(15,8,5) = F(15,7,5,1) * 7
P(X1=5 or X2=5 or X4=5 or X5=5 or X6=5 or X7=5 or X8=5, R=15, N=7) = P(X1=5 and X2<>5 and X4<>5 and .. and X8<>5. R=15, N=7) * 7
F(15,7,5,1)/7^15 = 1 - F(10,6,5)/6^10 F(10,6,5) = F(10,6,5,2) * 6
F(10,6,5,2)/6^10 = 1 - F(5,5,5)/5^5
F(5,5,5) = F(5,5,5,4) * 5
Well then... F(5,5,5,4) is the number of ways to get 5 dices of value 4 in 5 rolls, such as no other dice repeats 5 times. There is only 1 way, out of a total 5^5. The probability is then 1/5^5.
F(5,5,5) is the number of ways to get 5 dices of any value (out of 5 values) in 5 rolls. It's obviously 5. The probability is then 5/5^5 = 1/5^4.
F(10,6,5,2) is the number of ways to get 5 dices of value 2 in 10 rolls, such as no other dice repeats 5 times.
F(10,6,5,2) = (1-F(5,5,5)/5^5) * 6^10 = (1-1/5^4) * 6^10
Well... I think it may be incorrect at some part, but anyway, you get the idea. I hope I could make the algorithm understandable.
edit:
I did some checks, and I realized you have to add some cases when you get more than one value repeated exactly 5 times. Don't have time to solve that part thou...
Here is what I am thinking...
If you just had 5 dice, you would only have eight ways to get what you want.
For each of those eight ways, all possible combinations of the other 15 dice work.
So - I think the answer is: (8 * 815) / 820
(The answer for at least 5 the same.)
I believe you can use the formula of x occurrences in n events as:
P = probability^n * (n!/((n - x)!x!))
So the final result is going to be the sum of results from 0 to n.
I don't really see any easy way to combine it into one step that would be less messy. With this way you have the formula spelled out in the code as well. You may have to write your own factorial method though.
float calculateProbability(int tosses, int atLeastNumber) {
float atLeastProbability = 0;
float eventProbability = Math.pow( 1.0/8.0, tosses);
int nFactorial = factorial(tosses);
for ( i = 1; i <= atLeastNumber; i++) {
atLeastProbability += eventProbability * (nFactorial / (factorial(tosses - i) * factorial(i) );
}
}
Recursive solution:
Prob_same_value(n) = Prob_same_value(n-1) * (1 - Prob_noone_rolling_that_value(N-(n-1)))

Categories

Resources