While I can find decent information on how to generate numbers based on probabilities for picking each number with numpy.random.choice e.g.:
np.random.choice(5, 3, p=[0.1, 0, 0.3, 0.6, 0])
which picks 0 with probability p =.1, 1 with p = 0, 2 with p = .3, 3 with p = .6 and 4 with p = 0.
What I would like to know is, what function is there that will vary the probabilities? So for example, one time I might have the probability distribution above and the next maybe p=[0.25, .1, 0.18, 0.2, .27]). So I would like to generate probability distributions on the fly. Is there a python python library that does this?
What I am wanting to do is to generate arrays, each of length n with numbers from some probability distribution, such as above.
One good option is the Dirichlet distribution: samples from this distribution lie in a K-dimensional simplex aka a multinomial distribution.
Naturally there's a convenient numpy function for generating as many such random distributions as you'd like:
# 10 length-4 probability distributions:
np.random.dirichlet((1,1,1,3),size = 10)
And these would get fed to the p= argument in your np.random.choice call.
You can consult Wikipedia for more info about how the tuple parameter affects the sampled multinomial distributions.
AFAIK there's no inbuilt way to do this. You can do roulette wheel selection which should accomplish what you want.
The basic idea is simple:
def roulette(weights):
total = sum(weights)
mark = random.random() * total
runner = 0
for index, val in enumerate(weights):
runner += val
if runner >= mark:
return index
You can read more at https://en.wikipedia.org/wiki/Fitness_proportionate_selection
Related
The goal is to simulate the actual number of occurrence given theoretical probabilities.
For example, a 6-faces biased dice with probability of landing (1,2,3,4,5,6) being (0.1,0.2,0.15,0.25,0.1,0.2).
Roll the dice for 1000 times and output simulated number of getting each face.
I know numpy.random.choices offer the function to generate each rolling, but I need kind of summary of number of landing of each face.
What is the optimal scripts with Python for above?
Numpy can be used to do that easily and very efficently:
faces = np.arange(0, 6)
faceProbs = [0.1, 0.2, 0.15, 0.25, 0.1, 0.2] # Define the face probabilities
v = np.random.choice(faces, p=faceProbs, size=1000) # Roll the dice for 1000 times
counts = np.bincount(v, minlength=6) # Count the existing occurrences
prob = counts / len(v) # Compute the probability
Can be done without Numpy too.
import random
random.choices([1,2,3,4,5,6], weights=[0.1,0.2,0.15,0.25,0.1,0.2], k=1000)
I am trying to fit a beta distribution that should be defined between 0 and 1 on a data set that only has samples in a subrange. My problem is that using the fit() function will cause the fitted PDF to be defined only between my smallest and largest values.
For instance, if my dataset has samples between 0.2 and 0.3, what I get is a PDF defined between 0.2 and 0.3, instead of between 0 and 1, as it should be. The code I am using is:
ps1 = beta.fit(selected, loc=0, scale=1)
Am I missing something?
So:
you know that the distribution has a=0 and b=1 lower and upper bounds,
but the sample does not contain any values close to these limits.
This may happen if the distribution truly is a Beta distribution and the alpha and beta parameters are so that the density near 0 and 1 is zero.
In this case, I would suggest to use the maximum likelihood method, restricting the active parameters to alpha and beta, with known a and b parameters.
This is easy with the MaximumLikelihoodFactory class of OpenTURNS, which has a setKnownParameter method. This method allows to restrict the parameters which are optimized by the maximum likelihood method.
To reproduce this situation, I created a Beta distribution with the following parameters.
import openturns as ot
distribution = ot.Beta(3.0, 2.0, 0.0, 1.0)
sampleSize = 100
sample = distribution.getSample(sampleSize)
Fitting a Beta distribution with known a and b parameters is straightforward.
factory = ot.MaximumLikelihoodFactory(distribution)
factory.setKnownParameter([0.0, 1.0], [2, 3])
inf_distribution = factory.build(sample)
The list [0.0, 1.0] contains the values of the a and b parameters and the indices [2, 3] are the indices of the parameters in the Beta distribution.
This produces :
Beta(alpha = 3.02572, beta = 1.88172, a = 0, b = 1)
with sample I simulated.
I came up with a partial solution that does the trick for me: I replicate my samples (for the datasets that are too small) and add dummy samples at 0 and 1. Although that increases the fit error, it is low enough for my purpose.
Also, I asked in Google groups and got this answer that works fine, but its giving me some errors occasionally. I hope this helps anyone with that problem.
Suppose you have a number that you want to represent a total -- let's say it's 123,456,789.
Now, suppose you want to generate some numbers that add up to that number, but with fuzzy weights.
For instance, suppose I want to generate three numbers. The first should be around 60% of the total, but with some small level of variance. The second should be 30% of the total, again with some variance. And the third would end up being about 10%, depending on the other two.
I tried doing it this way:
percentages = [0.6, 0.3]
total = 123456789
still_need = total
values = []
for i in range(2):
x = int(total * (percentages[i] + np.random.normal(scale=0.05)))
values.append(x)
still_need = still_need - x
values.append(still_need)
But that doesn't seem very elegant.
Is there a better way?
A clean way to do it would be to draw from a multinomial distribution
total = 123456789
percentages = [0.6, 0.3, 0.1]
values = np.random.multinomial(total, percentages)
In this case, the multinomial distribution models rolling a 3-sided die 123456789 times, where the probability of each face turning up is [0.6, 0.3, 0.1]. Calling multinomial() is like running a single trial of this experiment. It returns 3 random integers that sum to 123456789. They represent the number of times that each face of the die turned up. If you want multiple draws, you can use the size parameter`.
I would like to pick a number randomly between 1-100 such that the probability of getting numbers 60-100 is higher than 1-59.
I would like to have the probability to be a left-skewed distribution for numbers 1-100. That is to say, it has a long tail and a peak.
Something along the lines:
pers = np.arange(1,101,1)
prob = <left-skewed distribution>
number = np.random.choice(pers, 1, p=prob)
I do not know how to generate a left-skewed discrete probability function. Any ideas? Thanks!
This is the answer you are looking for using the SciPy function 'skewnorm'. It can make any positive set of integers either left or rightward skewed.
from scipy.stats import skewnorm
import matplotlib.pyplot as plt
numValues = 10000
maxValue = 100
skewness = -5 #Negative values are left skewed, positive values are right skewed.
random = skewnorm.rvs(a = skewness,loc=maxValue, size=numValues) #Skewnorm function
random = random - min(random) #Shift the set so the minimum value is equal to zero.
random = random / max(random) #Standadize all the vlues between 0 and 1.
random = random * maxValue #Multiply the standardized values by the maximum value.
#Plot histogram to check skewness
plt.hist(random,30,density=True, color = 'red', alpha=0.1)
plt.show()
Please reference the documentation here:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.skewnorm.html
Histogram of left-skewed distribution
The code generates the following plot.
Like you described, just make sure your skewed-distribution adds up to 1.0:
pers = np.arange(1,101,1)
# Make each of the last 41 elements 5x more likely
prob = [1.0]*(len(pers)-41) + [5.0]*41
# Normalising to 1.0
prob /= np.sum(prob)
number = np.random.choice(pers, 1, p=prob)
The p argument of np.random.choice is the probability associated with each element in the array in the first argument. So something like:
np.random.choice(pers, 1, p=[0.01, 0.01, 0.01, 0.01, ..... , 0.02, 0.02])
Where 0.01 is the lower probability for 1-59 and 0.02 is the higher probability for 60-100.
The SciPy documentation has some useful examples.
http://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.random.choice.html
EDIT:
You might also try this link and look for a distribution (about half way down the page) that fits the model you are looking for.
http://docs.scipy.org/doc/scipy/reference/stats.html
I need to generate a vector of random float numbers between [0,1] such
that their sum equals 1 and that are distributed non-uniformly.
Is there any python function that generates such a vector?
Best wishes
The distribution you are probably looking for is called the Dirichlet distribution. There's no built-in function in Python for drawing random numbers from a Dirichlet distribution, but NumPy contains one:
>>> from numpy.random.mtrand import dirichlet
>>> print dirichlet([1] * n)
This will give you n numbers that sum up to 1, and the probability of each such combination will be equal.
Alternatively, if you don't have NumPy, you can make use of the fact that a random sample drawn from an n-dimensional Dirichlet distribution can be generated by drawing n independent samples from a gamma distribution with shape and scale parameters equal to 1 and then dividing the samples with the sum:
>>> from random import gammavariate
>>> def dirichlet(n):
... samples = [gammavariate(1, 1) for _ in xrange(n)]
... sum_samples = sum(samples)
... return [x/sum_samples for x in samples]
The reason why you need a Dirichlet distribution is because if you simply draw random numbers uniformly from some interval and then divide them by the sum of them, the resulting distribution will be biased towards samples consisting of roughly equal numbers. See Luc Devroye's book for more on this topic.
There is a nicer example in Wikipedia page: Dirichlet distribution.
The code below generate a k dimension sample:
params = [a1, a2, ..., ak]
sample = [random.gammavariate(a,1) for a in params]
sample = [v/sum(sample) for v in sample]