Trouble with calculating random occurrence - python

Let's say I have this code
import random
def fooFunc():
return 1
What is the overall chance of fooFunc being executed when using the code below?
if random.randrange(4096)==1:
fooFunc()
if random.randrange(256)==1:
fooFunc()

I'd suggest this isn't a python problem and better suited to https://math.stackexchange.com/ - to ask about probabilities.
As random.randrange(x) produces a number between 0 and x (including 0, but NOT including x), you have a 1/x probability of any specific number being produced.
Please see Neil Slater's answer for calculating the specific probability in your situation.
(Please see here if you want to look at the internals of random.randrange(): How does a randrange() function work?)

Each call to random.randrange can be treated as independent random selection, provided you don't know the seed and are happy to treat the output of a PRNG as a random variable.
What's the overall chance of fooFunc being executed?
Assuming you don't care about tracking whether fooFunc is called twice?
This is just the normal probability calculation, similar to "what is the chance of rolling at least one 6 when I roll two dice". To do this, it is easier to re-formulate the question as "What is the probability that I don't roll any 6", and subtract that from 1.0, because there is only one combination of failing both checks, whilst there are 3 combinations of succeeding one or other or both.
So p = 1 - ((4095/4096) * (255/256))

Related

Is this considered a fitness function for a genetic algorithm?

I start off with a population. I also have properties each individual in the population can have. If an individual DOES have the property, it’s score goes up by 5. If it DOESNT have it, it’s score increases by 0.
Example code using length as a property:
for x in individual:
if len <5:
score += 5
if len >=5:
score += 0
Then I add up the total score and select the individuals I want to continue. Is this a fitness function?
Anything can be a fitness algorithm as long as it gives better points for better DNA. The code you wrote looks like a gene of a DNA rather than a constraint. If it was a constraint, you'd give it a growing score penalty (its a minimization of score?) depending on the distance to the constraint point so that the selection/crossover part could prioritize the closer DNAs to 5 for smaller and distant values to 5 for the bigger. But currently it looks like "anything > 5 works fine" so there will be a lot of random solutions to this with high diversity rather than values like 4.9, 4.99, etc even if you apply elitism.
If there are many variables like "len" with equal score, then one gene's failure could be shadowed by another gene's success. To stop this, you can give them different scores like 5,10,20,40,... so that the selection and crossover can know if it actually made progress without any failure.
If you've meant a constraint by that 5, then you should tell the selection that the "failed" values closer to 5 (i.e. 4,4.5,4.9,4.99) are better than distant ones, by applying a variable score like this:
if(gene < constraint_value)
score += (constraint_value - gene)^2;
// if you've meant to add zero, then don't need to add zero
In comments, you said molecular computations. Molecules have floating point coordinates&masses so if you are optimizing them, then the constraint with variable penalty will make it easier for the selection to get better groups of DNAs for future generations if the mutation is adding onto the current value of genes rather than setting them to a totally random value.

Fitness function with multiple weights in DEAP

I'm learning to use the Python DEAP module and I have created a minimising fitness function and an evaluation function. The code I am using for the fitness function is below:
ct.create("FitnessFunc", base.Fitness, weights=(-0.0001, -100000.0))
Notice the very large difference in weights. This is because the DEAP documentation for Fitness says:
The weights can also be used to vary the importance of each objective one against another. This means that the weights can be any real number and only the sign is used to determine if a maximization or minimization is done.
To me, this says that you can prioritise one weight over another by making it larger.
I'm using algorithms.eaSimple (with a HallOfFame) to evolve and the best individuals in the population are selected with tools.selTournament.
The evaluation function returns abs(sum(input)), len(input).
After running, I take the values from the HallOfFame and evaluate them, however, the output is something like the following (numbers at end of line added by me):
(154.2830144, 3) 1
(365.6353634, 4) 2
(390.50576340000003, 3) 3
(390.50576340000003, 14) 4
(417.37616340000005, 4) 5
The thing that is confusing me is that I thought that the documentation stated that the larger second weight meant that len(input) would have a larger influence and would result in an output like so:
(154.2830144, 3) 1
(365.6353634, 4) 2
(390.50576340000003, 3) 3
(417.37616340000005, 4) 5
(390.50576340000003, 14) 4
Notice that lines 4 and 5 are swapped. This is because the weight of line 4 was much larger than the weight of line 5.
It appears that the fitness is actually evaluated based on the first element first, and then the second element is only considered if there is a tie between the first elements. If this is the case, then what is the purpose of setting a weight other than -1 or +1?
From a Pareto-optimality standpoint, neither of the two A=(390.50576340000003, 14) and B=(417.37616340000005, 4) solutions are superior to the other, regardless of the weights; always f1(A) > f1(B) and f2(A) < f2(B), and therefore neither dominates the other (source):
If they are on the same frontier, the winner can now be selected based on a secondary metric: density of solutions surrounding each solution in the frontier, which now accounts for the weights (wighted crowding distance). Indeed, if you select an appropriate operator, like selNSGA2. The selTournament operator you are using selects on the basis the first objective only:
def selTournament(individuals, k, tournsize, fit_attr="fitness"):
chosen = []
for i in xrange(k):
aspirants = selRandom(individuals, tournsize)
chosen.append(max(aspirants, key=attrgetter(fit_attr)))
return chosen
If you still want to use that, you can consider updating your evaluation function to return a single output of the weighted sum of the objectives. This approach would fail in the case of a non-convex objective space though (Page 12 here for details).

a question about "randoms and my question"

example:
import random
random.seed(10)
n1=random.randint(1,5)
n2=random.randint(1,5)
print(n1,n2) # => 5,1
I am not good at English, so I used a translator. Please understand if it's awkward.
If there is the same number in parentheses behind the 'seed' in the same expression, does the value always come out the same? I wonder what the numbers in parentheses do. Run a no number expression multiple times, the value change all the time.
똑같은 식에서 seed 뒤의 괄호 안에 같은 숫자가 들어가면 값도 무조건 똑같이 나오나요? 괄호 안에 들어가는 숫자가 무슨 역할을 하는지 궁금합니다. 숫자를 넣지 않은 식에서는 여러번 실행하면 값이 계속 바뀝니다.
Given two random instances within the same seed, the nth call to randint on the first instance will yield the same number as the nth call on the second instance.
That does not mean that the random value returned across multiple calls for the same instance will be the same.
You will see the same ordered series of values, meaning if you were to run your python program at some different time, you would see the output 5,1 once again.

Exponentially distributed random generator (log function) in python?

I really need help as I am stuck at the begining of the code.
I am asked to create a function to investigate the exponential distribution on histogram. The function is x = −log(1−y)/λ. λ is a constant and I referred to that as lamdr in the code and simply gave it 10. I gave N (the number of random numbers) 10 and ran the code yet the results and the generated random numbers gave me totally different results; below you can find the code, I don't know what went wrong, hope you guys can help me!! (I use python 2)
import random
import math
N = raw_input('How many random numbers you request?: ')
N = int(N)
lamdr = raw_input('Enter a value:')
lamdr = int(lamdr)
def exprand(lamdr):
y = []
for i in range(N):
y.append(random.uniform(0,1))
return y
y = exprand(lamdr)
print 'Randomly generated numbers:', (y)
x = []
for w in y:
x.append((math.log((1 - w) / lamdr)) * -1)
print 'Results:', x
After viewing the code you provided, it looks like you have the pieces you need but you're not putting them together.
You were asked to write function exprand(lambdr) using the specified formula. Python already provides a function called random.expovariate(lambd) for generating exponentials, but what the heck, we can still make our own. Your formula requires a "random" value for y which has a uniform distribution between zero and one. The documentation for the random module tells us that random.random() will give us a uniform(0,1) distribution. So all we have to do is replace y in the formula with that function call, and we're in business:
def exprand(lambdr):
return -math.log(1.0 - random.random()) / lambdr
An historical note: Mathematically, if y has a uniform(0,1) distribution, then so does 1-y. Implementations of the algorithm dating back to the 1950's would often leverage this fact to simplify the calculation to -math.log(random.random()) / lambdr. Mathematically this gives distributionally correct results since P{X = c} = 0 for any continuous random variable X and constant c, but computationally it will blow up in Python for the 1 in 264 occurrence where you get a zero from random.random(). One historical basis for doing this was that when computers were many orders of magnitude slower than now, ditching the one additional arithmetic operation was considered worth the minuscule risk. Another was that Prime Modulus Multiplicative PRNGs, which were popular at the time, never yield a zero. These days it's primarily of historical interest, and an interesting example of where math and computing sometimes diverge.
Back to the problem at hand. Now you just have to call that function N times and store the results somewhere. Likely candidates to do so are loops or list comprehensions. Here's an example of the latter:
abuncha_exponentials = [exprand(0.2) for _ in range(5)]
That will create a list of 5 exponentials with λ=0.2. Replace 0.2 and 5 with suitable values provided by the user, and you're in business. Print the list, make a histogram, use it as input to something else...
Replacing exporand with expovariate in the list comprehension should produce equivalent results using Python's built-in exponential generator. That's the beauty of functions as an abstraction, once somebody writes them you can just use them to your heart's content.
Note that because of the use of randomness, this will give different results every time you run it unless you "seed" the random generator to the same value each time.
WHat #pjs wrote is true to a point. While statement mathematically, if y has a uniform(0,1) distribution, so does 1-y appears to be correct, proposal to replace code with -math.log(random.random()) / lambdr is just wrong. Why? Because Python random module provide U(0,1) in the range [0,1) (as mentioned here), thus making such replacement non-equivalent.
In more layman term, if your U(0,1) is actually generating numbers in the [0,1) range, then code
import random
def exprand(lambda):
return -math.log(1.0 - random.random()) / lambda
is correct, but code
import random
def exprand(lambda):
return -math.log(random.random()) / lambda
is wrong, it will sometimes generate NaN/exception, as log(0) will be called

How to do binomial distribution in python where trial probabilities are unequal

I know how to do a standard binomial distribution in python where probabilities of each trial is the same. My question is what to do if the trial probabilities change each time. I'm drafting up an algorithm based on the paper below but thought I should check on here to see whether there's already a standard way to do it.
http://www.tandfonline.com/doi/abs/10.1080/00949658208810534#.UeVnWT6gk6w
Thanks in advance,
James
Is this kind of what you are looking for?
import numpy as np
def random_MN_draw(n, probs): # n=2 since binomial
""" get X random draws from the multinomial distribution whose probability is given by 'probs' """
mn_draw = np.random.multinomial(n,probs) # do 1 multinomial experiment with the given probs with probs= [0.5,0.5], this is a fair coin-flip
return mn_draw
def simulate(sim_probabilities):
len_sim = len(sim_probabilities)
simulated_flips = np.zeros(2,len_sim)
for i in range(0,len_sim)
simulated_flips(:,i) = random_MN_draw(2, sim_probabilities(i))
# Here, at the end of the simulation, you can count the number of heads
# in 'simulated_flips' to get your MLE's on P(H) and P(T).
Suppose you want to do 9 coin tosses, and P(H) on each flip is 0.1 .. 0.9, respectively. !0% chance of a head on first flip, 90% on last.
For E(H), the expected number of heads, you can just sum the 9 individual expectations.
For a distribution, you could enumerate the ordered possible outcomes (itertools.combinations_with_replacement(["H", "T"], 9))
(HHH HHH HHH)
(HHH HHH HHT)
...
(TTT TTT TTT)
and calculate a probability for the ordered outcome in a straightforward manner.
for each ordered outcome, increment a defaultdict(float) indexed by the number of heads with the calculated p.
When done, compute the sum of the dictionary values, then divide every value in the dictionary by that sum.
You'll have 10 values that correspond to the chances of observing 0 .. 9 heads.
Gerry
Well, the question is old and I can't answer it since I don't know pythons math libraries well enough.
Howewer, it might be helpful to other readers to know that this distribution often runs under the name
Poisson Binomial Distribution

Categories

Resources