I tried two different ways to get a coin flip result, seeding the RNG first in order to get reproducible results.
First, I tried using random.randint:
import random
random.seed(23412)
flip = random.randint(0,1)
if flip == 0:
print("Tails")
else:
print("Heads")
For this seed, I get a Heads result; the flip result is 1.
Then, I tried rounding the result of random.random:
import random
random.seed(23412)
flip = random.random()
print(flip) # for testing purposes
new_flip = round(flip)
if new_flip == 0:
print("Tails")
else:
print("Heads")
This time, I get a value for flip of 0.27484468113952387, which rounds to 0, i.e. a Tails result.
Why does the result differ? Shouldn't random.randint pull the same random value (since the seed was the same), and flip the coin the same way?
Seeding a random number generator ensures a reproducible stream of underlying raw data. Different random module methods will compute their results in different ways, using different amounts of data.
The exact computation is an implementation detail. In the implementation I use (although I suspect it is the same for many other versions), random.randint will use the next single bit of raw data from the stream - it does a bunch of math first to figure out how many possible answers there are in range, then rounds that up to the next power of two, then chooses the corresponding number of bits to get a value (0..2n-1), repeats that until it's in range, and finally does more math to scale that to the original range. (If you tried, for example, random.randint(0, 2), it would repeatedly grab two bits interpreted as an integer, until the result isn't 3 which is out of range.)
The implementation of random.random is hidden in the C code, but I assume that it grabs multiple bits of data (at least 53) in order to fill the mantissa of a machine double-precision floating-point value.
Once you set a seed, all random numbers that follow will be consistent. It does not mean that all random numbers will be the same (and not be random any more).
However, once you re-set the seed, the random number generation will start from the beginning. Therefore it's predictable what will happen, e.g. like this, where 200 random numbers are generated in a 100 loops and I can predict each outcome (via assertions)
import random
for i in range(100):
test_seed = 23412
random.seed(test_seed)
flip = round(random.random())
assert flip == 0 # predict "Tails"
if flip == 0:
print("Tails")
else:
print("Heads")
flip = random.randint(0,1)
assert flip == 1 # predict "Heads"
if flip == 0:
print("Tails")
else:
print("Heads")
Related
I've been running some code for an hour or so using a rand.int function, where the code models a dice's roll, where the dice has ten faces, and you have to roll it six times in a row, and each time it has to roll the same number, and it is tracking how many tries it takes for this to happen.
success = 0
times = 0
count = 0
total = 0
for h in range(0,100):
for i in range(0,10):
times = 0
while success == 0:
numbers = [0,0,0,0,0,0,0,0,0,0]
for j in range(0,6):
x = int(random.randint(0,9))
numbers[x] = 1
count = numbers.count(1)
if count == 1:
success = 1
else:
times += 1
print(i)
total += times
success = 0
randtst = open("RandomTesting.txt", "a" )
randtst.write(str(total / 10)+"\n")
randtst.close()
And running this code, this has been going into a file, the contents of which is below
https://pastebin.com/7kRK1Z5f
And taking the average of these numbers using
newtotal = 0
totalamounts = 0
with open ('RandomTesting.txt', 'rt') as rndtxt:
for myline in rndtxt: ,
newtotal += float(myline)
totalamounts += 1
print(newtotal / totalamounts)
Which returns 742073.7449342106. This number is incorrect, (I think) as this is not near to 10^6. I tried getting rid of the contents and doing it again, but to no avail, the number is nowhere near 10^6. Can anyone see a problem with this?
Note: I am not asking for fixes to the code or anything, I am asking whether something has gone wrong to get the above number rather that 100,000
There are several issues working against you here. Bottom line up front:
your code doesn't do what you described as your intent;
you currently have no yardstick for measuring whether your results agree with the theoretical answer; and
your expectations regarding the correct answer are incorrect.
I felt that your code was overly complex for the task you were describing, so I wrote my own version from scratch. I factored out the basic experiment of rolling six 10-sided dice and checking to see if the outcomes were all equal by creating a list of length 6 comprised of 10-sided die rolls. Borrowing shamelessly from BoarGules' comment, I threw the results into a set—which only stores unique elements—and counted the size of the set. The dice are all the same value if and only if the size of the set is 1. I kept repeating this while the number of distinct elements was greater than 1, maintaining a tally of how many trials that required, and returned the number of trials once identical die rolls were obtained.
That basic experiment is then run for any desired number of replications, with the results placed in a numpy array. The resulting data was processed by numpy and scipy to yield the average number of trials and a 95% confidence interval for the mean. The confidence interval uses the estimated variability of the results to construct a lower and an upper bound for the mean. The bounds produced this way should contain the true mean for 95% of estimates generated in this way if the underlying assumptions are met, and address the second point in my BLUF.
Here's the code:
import random
import scipy.stats as st
import numpy as np
NUM_DIGITS = 6
SAMPLE_SIZE = 1000
def expt():
num_trials = 1
while(len(set([random.randrange(10) for _ in range(NUM_DIGITS)])) > 1):
num_trials += 1
return num_trials
data = np.array([expt() for _ in range(SAMPLE_SIZE)])
mu_hat = np.mean(data)
ci = st.t.interval(alpha=0.95, df=SAMPLE_SIZE-1, loc=mu_hat, scale=st.sem(data))
print(mu_hat, ci)
The probability of producing 6 identical results of a particular value from a 10-sided die is 10-6, but there are 10 possible particular values so the overall probability of producing all duplicates is 10*10-6, or 10-5. Consequently, the expected number of trials until you obtain a set of duplicates is 105. The code above took a little over 5 minutes to run on my computer, and produced 102493.559 (96461.16185897154, 108525.95614102845) as the output. Rounding to integers, this means that the average number of trials was 102493 and we're 95% confident that the true mean lies somewhere between 96461 and 108526. This particular range contains 105, i.e., it is consistent with the expected value. Rerunning the program will yield different numbers, but 95% of such runs should also contain the expected value, and the handful that don't should still be close.
Might I suggest if you're working with whole integers that you should be receiving a whole number back instead of a floating point(if I'm understanding what you're trying to do.).
##randtst.write(str(total / 10)+"\n") Original
##randtst.write(str(total // 10)+"\n")
Using a floor division instead of a division sign will round down the number to a whole number which is more idea for what you're trying to do.
If you ARE using floating point numbers, perhaps using the % instead. This will not only divide your number, but also ONLY returns the remainder.
% is Modulo in python
// is floor division in python
Those signs will keep your numbers stable and easier to work if your total returns a floating point integer.
If this isn't the case, you will have to account for every number behind the decimal to the right of it.
And if this IS the case, your result will never reach 10x^6 because the line for totalling your value is stuck in a loop.
I hope this helps you in anyway and if not, please let me know as I'm also learning python.
I am trying to write a program that generates a random number from a random number generated by a Perlin noise. If you ask why I am trying to add a little bit more randomness to a random number generator and see how that would work.
Anyways my problem is from the Perlin noise random number I always get 12 digit random numbers like: 124592051214, 431268750000, 420799999999, 613979257812...
the thing is I want this function to be used just like a normal python Random function. You give the borders(a,b) and you get a random number in those borders.
So, how can I turn a 12 digit number to match the given borders? Thanks in advance
ex:
num = 124592051214
perlinRand(50,100)
62
You can do something like
perlin_noise % (higher-lower) + lower
where "higher" is the higher limit of the random number you want, "lower" is the lower limit and perlin_noise is the number you generate
def perlinRand(lower, higher):
perlin_noise = get_perlin_noise_random_number()
return perlin_noise % (higher-lower) + lower
perlinRand(50, 100)
So I am simply playing around with trying to make a "dice roller" using random.getrandbits() and the "wasteful" methodology stated here: How to generate an un-biased random number within an arbitrary range using the fewest bits
My code seems to be working fine, however when I roll D6's the Max\Min ratio is in the 1.004... range but with D100's it's in the 1.05... range. Considering my dataset is only about a million rolls, is this ok or is the pRNG nature of random affecting the results? Or am I just being an idiot and overthinking it and it's due to D100s simply having a larger range of values than a D6?
Edit: Max/Min ratio is the frequency of the most common result divided by the frequency of the least common result. For a perfectly fair dice this should be 1.
from math import ceil, log2
from random import getrandbits
def wasteful_die(dice_size: int):
#Generate minumum binary number greater than or equal to dice_size number of random bits
bits = getrandbits(ceil(log2(dice_size)))
#If bits is a valid number (i.e. its not greater than dice_size), yeild
if bits < dice_size:
yield 1 + bits
def generate_rolls(dice_size: int, number_of_rolls: int) -> list:
#Store the results
list_of_numbers = []
#Command line activity indicator
print('Rolling '+f'{number_of_rolls:,}'+' D'+str(dice_size)+'s',end='',flush=True)
activityIndicator = 0
#As this is a wasteful algorithm, keep rolling until you have the desired number of valid rolls.
while len(list_of_numbers) < number_of_rolls:
#Print a period every 1000 attempts
if activityIndicator % 1000 == 0:
print('.',end='',flush=True)
#Build up the list of rolls with valid rolls.
for value in wasteful_die(dice_size):
list_of_numbers.append(value)
activityIndicator+=1
print(' ',flush=True)
#Use list slice just in case something wrong.
return list_of_numbers[0:number_of_rolls]
#Rolls one million, fourty eight thousand, five hundred and seventy six D6s
print(generate_rolls(6, 1048576), file=open("RollsD6.txt", "w"))
#Rolls one million, fourty eight thousand, five hundred and seventy six D100
print(generate_rolls(100, 1048576), file=open("RollsD100.txt", "w"))
Your final statement is incorrect: for a perfectly fair douse (never say die :-) ), the ratio should tend to 1.0, but should rarely land directly on that value for large numbers of rolls. To hit 1.0 regularly requires the die to know the history of previous rolls, which violates the fairness principles.
A variation of 0.4% for a D6 is reasonable over 10^6 rolls, as is 0.5% for a D100. As you surmised, this is because the D100 has many more "buckets" (different values).
The D6 will average 10^6/6, or nearly 170K expected instances per "bucket". A D100 has only 10K expected instances per bucket: somewhat less room for the Law of Central Tendency to influence the numbers. Having a 50:4 difference in a single test run is well within expectations.
I suggest that you try running a chi-squared test, rather than a simple max/min metric.
I wrote a program that records how many times 2 fair dice need to be rolled to match the probabilities for each result that we should expect.
I think it works but I'm wondering if there's a more resource friendly way to solve this problem.
import random
expected = [0.0, 0.0, 0.028, 0.056, 0.083,
0.111, 0.139, 0.167, 0.139, 0.111,
0.083, 0.056, 0.028]
results = [0.0] * 13 # store our empirical results here
emp_percent = [0.0] * 13 # results / by count
count = 0.0 # how many times have we rolled the dice?
while True:
r = random.randrange(1,7) + random.randrange(1,7) # roll our die
count += 1
results[r] += 1
emp_percent = results[:]
for i in range(len(emp_percent)):
emp_percent[i] /= count
emp_percent[i] = round(emp_percent[i], 3)
if emp_percent == expected:
break
print(count)
print(emp_percent)
There are several problems here.
Firstly, there is no guarantee that this will ever terminate, nor is it particularly likely to terminate in a reasonable amount of time. Ignoring floating point arithmetic issues, this should only terminate when your numbers are distributed exactly right. But the law of large numbers does not guarantee this will ever happen. The law of large numbers works like this:
Your initial results are (by random chance) almost certainly biased one way or another.
Eventually, the trials not yet performed will greatly outnumber your initial trials, and the lack of bias in those later trials will outweigh your initial bias.
Notice that the initial bias is never counterbalanced. Rather, it is dwarfed by the rest of the results. This means the bias tends to zero, but it does not guarantee the bias actually vanishes in a finite number of trials. Indeed, it specifically predicts that progressively smaller amounts of bias will continue to exist indefinitely. So it would be entirely possible that this algorithm never terminates, because there's always that tiny bit of bias still hanging around, statistically insignificant, but still very much there.
That's bad enough, but you're also working with floating point, which has its own issues; in particular, floating point arithmetic violates lots of conventional rules of math because the computer keeps doing intermediate rounding to ensure the numbers continue to fit into memory, even if they are repeating (in base 2) or irrational. The fact that you are rounding the empirical percents to three decimal places doesn't actually fix this, because not all terminating decimals (base 10) are terminating binary values (base 2), so you may still find mismatches between your empirical and expected values. Instead of doing this:
if emp_percent == expected:
break
...you might try this (in Python 3.5+ only):
if all(map(math.is_close, emp_percent, expected)):
break
This solves both problems at once. By default, math.is_close() requires the values to be within (about) 9 decimal places of one another, so it inserts the necessary give for this algorithm to actually have a chance of working. Note that it does require special handling for comparisons involving zero, so you may need to tweak this code for your use case, like this:
is_close = functools.partial(math.is_close, abs_tol=1e-9)
if all(map(is_close, emp_percent, expected)):
break
math.is_close() also removes the need to round your empiricals, since it can do this approximation for you:
is_close = functools.partial(math.is_close, rel_tol=1e-3, abs_tol=1e-5)
if all(map(is_close, emp_percent, expected)):
break
If you really don't want these approximations, you will have to give up floating point and work with fractions exclusively. They produce exact results when divided by one another. However, you still have the problem that your algorithm is unlikely to terminate quickly (or perhaps at all), for the reasons discussed above.
Rather than trying to match floating point numbers -- you could try to match expected values for each possible sum. This is equivalent to what you are trying to do since (observed number)/(number of trials) == (theoretical probability) if and only if the observed number equals the expected number. These will always be an integer exactly when the number of rolls is a multiple of 36. Hence, if the number of rolls is not a multiple of 36 then it is impossible for your observations to equal expectations exactly.
To get the expected values, note that the numerators that appear in the exact probabilities of the various sums (1,2,3,4,5,6,5,4,3,2,1 for the sums 2,3,..., 12 respectively) are the expected values for the sums if the dice are rolled 36 times. If the dice are rolled 36i times then multiply these numerators by i to get the expected values of the sums. The following code simulates repeatedly rolling a pair of fair dice 36 times, accumulating the total counts and then comparing them with the expected counts. If there is a perfect match, the number of trials (where a trial is 36 rolls) needed to get the match is returned. If this doesn't happen by max_trials, a vector showing the discrepancy between the final counts and final expected value is given:
import random
def roll36(counts):
for i in range(36):
r1 = random.randint(1,6)
r2 = random.randint(1,6)
counts[r1+r2 - 2] += 1
def match_expected(max_trials):
counts = [0]*11
numerators = [1,2,3,4,5,6,5,4,3,2,1]
for i in range(1, max_trials+1):
roll36(counts)
expected = [i*j for j in numerators]
if counts == expected:
return i
#else:
return [c-e for c,e in zip(counts,expected)]
Here is some typical output:
>>> match_expected(1000000)
[-750, 84, 705, -286, 5783, -3504, -1208, 1460, 543, -1646, -1181]
Not only have the exact expected values never been observed in 36 million simulated rolls of a pair of fair dice, in the final state the discrepancies between observations and expectations have become quite large (in absolute value -- the relative discrepancies are approaching zero, as the law of large numbers predicts). This approach is unlikely to ever yield a perfect match. A variation that would work (while still focusing on expected numbers) would be to iterate until the observations pass a chi-squared goodness of fit test when compared with the theoretical distribution. In that case there would no longer be any reason to focus on multiples of 36.
I was running a procedure to be like one of those games were people try to guess a number between 0 and 100 where there are 100 people guessing.I then averaged how many different guesses there are.
import random
def averager(times):
tests=[]
for i in range(times):
l=[]
for i in range(0,100):
l.append(random.randint(0,100))
tests.append(len(set(l)))
return (sum(tests))/len(tests)
print(averager(1000))
For some reason, the number of different guesses averages out to 63.6
Why is this?Is it due to a flaw in the python random library?
In a scenario where people were guessing a number between 1 and 10
The first person has a 100% chance to guess a previously unguessed number
The second person has a 90% chance to guess a previously unguessed number
The third person has a 80% chance to guess a previously unguessed number
and so on...
The average chance of guessing a new number(by my reasoning) is 55%.
But the data doesn't reflect this.
Your code is for finding the average number of unique guesses made by 100 people each guessing a number from 1 to 100.
As for why it converges to a number around 63... you should post your question to the math Stack Exchange.
If this was a completely flat distribution, you would expect the average to come out as 100, meaning everybody's guess was different. However, you know that such a scenario is much less random than a scenario where you have duplication. The fact that you get repeated numbers during a random sequence should be comforting.
All you are doing here is measuring some kind of uniqueness within very small sets: ie 1000 repeats of an experiment involving 100 random values. You might get a better appreciation of this if you use some sort of bootstrapping algorithm to sample from.
Also, if you scale up the number of repeats to millions, and perhaps measure the sample distribution (not just the mean), you'll have a little more confidence in the results you're getting.
It may be that the pseudo-random generator has a characteristic which yields approximately 60-70% non-repeated values inside a sequence the same length as the range. However, you would need to experiment with far more samples, as well as different random seeds. Otherwise your results are meaningless.
I modified your code so it would take an already generated sequence as input, rather than calculating random numbers:
def averager(seqs):
tests = []
for s in seqs:
tests.append(len(set(s)))
return float(sum(tests))/len(tests)
Then I made a function to return all possible choices for any given number of people and guess range:
def combos(n, limit):
return itertools.product(*((range(limit),) * n))
(One of the things I love about Python is that it's so easy to break apart a function into trivial pieces.)
Then I started testing with increasing numbers:
for n in range(2,100):
x = averager(combos(n, n))
print n, x, x/n
2 1.5 0.75
3 2.11111111111 0.703703703704
4 2.734375 0.68359375
5 3.3616 0.67232
6 3.99061213992 0.66510202332
7 4.62058326038 0.660083322911
8 5.25112867355 0.656391084194
This algorithm has a horrible complexity, so at this point I got a MemoryError. As you can see, the percentage of unique results keeps dropping as the number of people and guess range keeps increasing.
Repeating the test with random numbers:
def rands(repeats, n, limit):
for i in range(repeats):
yield [random.randint(0, limit) for j in range(n)]
for n in range(10, 101, 10):
x = averager(rands(10000, n, n))
print n, x, x/n
10 6.7752 0.67752
20 13.0751 0.653755
30 19.4131 0.647103333333
40 25.7309 0.6432725
50 32.0471 0.640942
60 38.3333 0.638888333333
70 44.6882 0.638402857143
80 50.948 0.63685
90 57.3525 0.63725
100 63.6322 0.636322
As you can see the results are consistent with what we saw earlier and with your own observation. I'm sure a bit of combinatorial math could explain it all.