I would like to pick a number randomly between 1-100 such that the probability of getting numbers 60-100 is higher than 1-59.
I would like to have the probability to be a left-skewed distribution for numbers 1-100. That is to say, it has a long tail and a peak.
Something along the lines:
pers = np.arange(1,101,1)
prob = <left-skewed distribution>
number = np.random.choice(pers, 1, p=prob)
I do not know how to generate a left-skewed discrete probability function. Any ideas? Thanks!
This is the answer you are looking for using the SciPy function 'skewnorm'. It can make any positive set of integers either left or rightward skewed.
from scipy.stats import skewnorm
import matplotlib.pyplot as plt
numValues = 10000
maxValue = 100
skewness = -5 #Negative values are left skewed, positive values are right skewed.
random = skewnorm.rvs(a = skewness,loc=maxValue, size=numValues) #Skewnorm function
random = random - min(random) #Shift the set so the minimum value is equal to zero.
random = random / max(random) #Standadize all the vlues between 0 and 1.
random = random * maxValue #Multiply the standardized values by the maximum value.
#Plot histogram to check skewness
plt.hist(random,30,density=True, color = 'red', alpha=0.1)
plt.show()
Please reference the documentation here:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.skewnorm.html
Histogram of left-skewed distribution
The code generates the following plot.
Like you described, just make sure your skewed-distribution adds up to 1.0:
pers = np.arange(1,101,1)
# Make each of the last 41 elements 5x more likely
prob = [1.0]*(len(pers)-41) + [5.0]*41
# Normalising to 1.0
prob /= np.sum(prob)
number = np.random.choice(pers, 1, p=prob)
The p argument of np.random.choice is the probability associated with each element in the array in the first argument. So something like:
np.random.choice(pers, 1, p=[0.01, 0.01, 0.01, 0.01, ..... , 0.02, 0.02])
Where 0.01 is the lower probability for 1-59 and 0.02 is the higher probability for 60-100.
The SciPy documentation has some useful examples.
http://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.random.choice.html
EDIT:
You might also try this link and look for a distribution (about half way down the page) that fits the model you are looking for.
http://docs.scipy.org/doc/scipy/reference/stats.html
Related
I'm looking to create a random function where 100 is the rarest number while 1 is the most common. It should be a linear distribution so for example the function returning 100 is the lowest chance, then 99 is the second lowest, then 98 is the third lowest and so forth. I've tried this code below:
def getPercentContent():
minPercent = 5
maxPercent = 102 # this will return 101 as highest number
power = 1.5 # higher number, more concentration to lower numbers
num = math.floor(minPercent+(maxPercent-minPercent)*random.random()**power)
return str(num)
This does return a lot more low numbers through 1- 10, but after that, since its an exponential function, numbers 10-100 have a very similar count.
Is there any way to create a linear distribution like below:
For the distribution you're looking for, simply generate two numbers in the range and take their minimum. Here is an example:
min(random.randint(minPercent,maxPercent-1),
random.randint(minPercent,maxPercent-1))
You just need to return an inverse of whatever number you're inputting and then normalise all values by the sum of all frequencies to get percentages:
def frequency(start: int = 0, end: int = 100):
freq = [1/n for n in range(start, stop)]
perc = [k/sum(freq) for k in freq]
return perc
What you're describing is a triangular distribution with its mode (most frequently occurring value) equal to the min. These are built-in as a continuous distribution in the random module or (if you want a lot of them fast) in numpy. If you want integer outcomes from 1 to 100, inclusive, generate with min = mode = 0 and max = 100, take the floor, and add 1 to the result.
The following code generates and plots a million triangular values in half a second on my laptop:
from numpy.random import default_rng
import matplotlib.pyplot as plt
rng = default_rng()
data = rng.triangular(0, 0, 100, size = 1000000).astype(int) + 1
h = plt.hist(data, bins=100, density=True)
plt.show()
Sample output:
I am new to numpy. I want to generate random values from 0 to 1 with random distribution using numpy, the known input is the standard deviation = 0.2 and the Mean = 0.55 and no. of population = 1000. I used this code:
number = np.random.normal(avg, std_dev, num_pop).round(2)
However it generated number with negative values and also values greater than 1. How to limit the value from 0 to 1?
The normal distribution has no lower or upper bound, and sampling from it and discarding the results outside your bounds until you get the 1000 necessary points would be awkward.
Luckily there's the truncated normal distribution:
from scipy import stats
low = 0
high = 1
mean = 0.55
stddev = 0.2
num_pop = 1000
number = stats.truncnorm.rvs(low, high,
loc = mean, scale = stddev,
size = num_pop)
I faced the same issue. Here's what I used to resolve it:
number = abs(np.random.normal(0,1,1))
if number > 1:
number = number - 1
abs finds the absolute value of number, making it positive.
The if statement at the end makes sure the number does not go over one, because we are using a normal distribution which does not have an upper or lower limit.
Hope this helped!
You can use
number = numpy.random.rand(1).round(2)
It will generate a random number in interval [0,1).
While I can find decent information on how to generate numbers based on probabilities for picking each number with numpy.random.choice e.g.:
np.random.choice(5, 3, p=[0.1, 0, 0.3, 0.6, 0])
which picks 0 with probability p =.1, 1 with p = 0, 2 with p = .3, 3 with p = .6 and 4 with p = 0.
What I would like to know is, what function is there that will vary the probabilities? So for example, one time I might have the probability distribution above and the next maybe p=[0.25, .1, 0.18, 0.2, .27]). So I would like to generate probability distributions on the fly. Is there a python python library that does this?
What I am wanting to do is to generate arrays, each of length n with numbers from some probability distribution, such as above.
One good option is the Dirichlet distribution: samples from this distribution lie in a K-dimensional simplex aka a multinomial distribution.
Naturally there's a convenient numpy function for generating as many such random distributions as you'd like:
# 10 length-4 probability distributions:
np.random.dirichlet((1,1,1,3),size = 10)
And these would get fed to the p= argument in your np.random.choice call.
You can consult Wikipedia for more info about how the tuple parameter affects the sampled multinomial distributions.
AFAIK there's no inbuilt way to do this. You can do roulette wheel selection which should accomplish what you want.
The basic idea is simple:
def roulette(weights):
total = sum(weights)
mark = random.random() * total
runner = 0
for index, val in enumerate(weights):
runner += val
if runner >= mark:
return index
You can read more at https://en.wikipedia.org/wiki/Fitness_proportionate_selection
I have an list ordered by some quality function from which I'd like to take elements, preferring the good elements at the beginning of the list.
Currently, my function to generate the random indices looks essentially as follows:
def pick():
p = 0.2
for i in itertools.count():
if random.random() < p:
break
return i
It does a good job, but I wonder:
What's the name of the generated random distribution?
Is there a built-in function in Python for that distribution?
What you are describing sounds a lot like the exponential distribution. It already exists in the random module.
Here is some code that takes just the integer part of sampling from an exponential distribution with a rate parameter of 100.
import random
import matplotlib.pyplot as plt
d = [int(random.expovariate(1/100)) for i in range(10000)]
h,b = np.histogram(d, bins=np.arange(0,max(d)))
plt.bar(left=b[:-1], height=h, ec='none', width=1))
plt.show()
You could simulate it via exponential, but this is like making square peg fit round hole. As Mark said, it is geometric distribution - discrete, shifted by 1. And it is right here in the numpy:
import numpy as np
import random
import itertools
import matplotlib.pyplot as plt
p = 0.2
def pick():
for i in itertools.count():
if random.random() < p:
break
return i
q = np.random.geometric(p, size = 100000) - 1
z = [pick() for i in range(100000)]
bins = np.linspace(-0.5, 30.5, 32)
plt.hist(q, bins, alpha=0.2, label='geom')
plt.hist(z, bins, alpha=0.2, label='pick')
plt.legend(loc='upper right')
plt.show()
Output:
random.random() defaults to a uniform distribution, but there are other methods within random that would also work. For your given use case, I would suggest random.expovariate(2) (Documentation, Wikipedia). This is an exponential distribution that will heavily prefer lower values. If you google some of the other methods listed in the documentation, you can find some other built-in distributions.
Edit: Be sure to play around with the argument value for expovariate. Also note that it doesn't guarantee a value less than 1, so you might need to ensure that you only use values less than 1.
I want to specify the probability density function of a distribution and then pick up N random numbers from that distribution in Python. How do I go about doing that?
In general, you want to have the inverse cumulative probability density function. Once you have that, then generating the random numbers along the distribution is simple:
import random
def sample(n):
return [ icdf(random.random()) for _ in range(n) ]
Or, if you use NumPy:
import numpy as np
def sample(n):
return icdf(np.random.random(n))
In both cases icdf is the inverse cumulative distribution function which accepts a value between 0 and 1 and outputs the corresponding value from the distribution.
To illustrate the nature of icdf, we'll take a simple uniform distribution between values 10 and 12 as an example:
probability distribution function is 0.5 between 10 and 12, zero elsewhere
cumulative distribution function is 0 below 10 (no samples below 10), 1 above 12 (no samples above 12) and increases linearly between the values (integral of the PDF)
inverse cumulative distribution function is only defined between 0 and 1. At 0 it is 10, at 12 it is 1, and changes linearly between the values
Of course, the difficult part is obtaining the inverse cumulative density function. It really depends on your distribution, sometimes you may have an analytical function, sometimes you may want to resort to interpolation. Numerical methods may be useful, as numerical integration can be used to create the CDF and interpolation can be used to invert it.
This is my function to retrieve a single random number distributed according to the given probability density function. I used a Monte-Carlo like approach. Of course n random numbers can be generated by calling this function n times.
"""
Draws a random number from given probability density function.
Parameters
----------
pdf -- the function pointer to a probability density function of form P = pdf(x)
interval -- the resulting random number is restricted to this interval
pdfmax -- the maximum of the probability density function
integers -- boolean, indicating if the result is desired as integer
max_iterations -- maximum number of 'tries' to find a combination of random numbers (rand_x, rand_y) located below the function value calc_y = pdf(rand_x).
returns a single random number according the pdf distribution.
"""
def draw_random_number_from_pdf(pdf, interval, pdfmax = 1, integers = False, max_iterations = 10000):
for i in range(max_iterations):
if integers == True:
rand_x = np.random.randint(interval[0], interval[1])
else:
rand_x = (interval[1] - interval[0]) * np.random.random(1) + interval[0] #(b - a) * random_sample() + a
rand_y = pdfmax * np.random.random(1)
calc_y = pdf(rand_x)
if(rand_y <= calc_y ):
return rand_x
raise Exception("Could not find a matching random number within pdf in " + max_iterations + " iterations.")
In my opinion this solution is performing better than other solutions if you do not have to retrieve a very large number of random variables. Another benefit is that you only need the PDF and avoid calculating the CDF, inverse CDF or weights.