Select a non uniformely distributed random element from an array

Select a non uniformely distributed random element from an array - python

I'm trying to pick numbers from an array at random.
I can easily do this by picking an element using np.random.randint(len(myArray)) - but that gives a uniform distribution.
For my needs I need to pick a random number with higher probability of picking a number near the beginning of the array - so I think something like an exponential probability function would suit better.
Is there a way for me to generate a random integer in a range (1, 1000) using an exponential (or other, non-uniform distribution) to use as an array index?

You can assign an exponential probability vector to choice module from NumPy. The probability vector should add up to 1 therefore you normalize it by the sum of all probabilities.
import numpy as np
from numpy.random import choice
arr = np.arange(0, 1001)
prob = np.exp(arr/1000) # To avoid a large number
rand_draw = choice(arr, 1, p=prob/sum(prob))
To make sure the random distribution follows exponential behavior, you can plot it for 100000 random draws between 0 and 1000.
import matplotlib.pyplot as plt
# above code here
rand_draw = choice(arr, 100000, p=prob/sum(prob))
plt.hist(rand_draw, bins=100)
plt.show()

Related

Generate random samples for each sample length for a distribution

My goal is to have draw 500 sample points, take its mean, and then do 6000 times from a distribution. Basically:
Take sample lengths ranging from N = 1 to 500. For each sample length,
draw 6000 samples and estimate the mean from each of the samples.
Calculate the standard deviation from these means for each sample
length, and show graphically that the decrease in standard deviation
corresponds to a square root reduction.
I am trying to do this on a gamma distribution, but all of my standard deviations are coming out as zero... and I'm not sure why.
This is the program so far:
import math
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
from scipy.stats import gamma
# now taking random gamma samples
stdevs = []
length = np.arange(1, 401,1)
mean=[]
for i in range(400):
sample = np.random.gamma(shape=i,size=1000)
mean.append(np.mean(sample))
stdevs.append(np.std(mean))
# then trying to plot the standard deviations but it's just a line..
# thought there should be a decrease
plt.plot(length, stdevs,label='sampling')
plt.show()
I thought there should be a decrease in the standard deviation, not an increase. What might I be doing wrong when trying to draw 1000 samples from a gamma distribution and estimate the mean and standard deviation?

I think you are misusing shape. Shape is the shape of the distribution not the number of independent draws.
import numpy as np
import matplotlib.pyplot as plt
# Reproducible
gen = np.random.default_rng(20210513)
# Generate 400 (max sample size) by 1000 (number of indep samples)
sample = gen.gamma(shape=2, size=(400, 1000))
# Use cumsum to compute the cumulative sum
means = np.cumsum(sample, axis=0)
# Divid the cumsume by the number of observations used in each
# A little care needed to get broadcasting to work right
means = means / np.arange(1,401)[:,None]
# Compute the std dev using the observations in each row
stdevs = means.std(axis=1)
# Plot
plt.plot(np.arange(1,401), stdevs,label='sampling')
plt.show()
This produces the pictire.

The problem is with the line stdevs.append(np.std(sample.mean(axis=0)))
This takes the standard deviation of a single value i.e. the mean of your sample array, so it will always be 0.
You need to pass np.std() all the values in your sample not just its mean.
stdevs.append(np.std(sample)) will give you your array of standard deviations for each sampling.

How can I get the percentage of random variates in an interval? (Python)

let's say I generate 10000 normally distributed random variates with σ = 1 and μ = 0:
from scipy.stats import norm
x = norm.rvs(size=10000,loc=0,scale=1)
How can I get the percentage of random variates in [-1,1] or [-3,3]? How can I count the percentage that will fall into these intervals?

You can do this:
import numpy as np
print(sum(np.abs(x)<1) / len(x) * 100)
sum(np.abs(x)<1) finds the number of samples in the (-1, 1) range and dividing that by the number of samples, you get what you need.
Edit: You can replace np.abs(x)<1 with (x<1) & (-1<x) to make it work for non-symmetric ranges and also make it work without numpy.

Weird FFT plot with numpy random set

Code below:
import numpy as np
from numpy import random_intel
import mkl_fft
import matplotlib.pyplot as plt
n = 10**5
a = np.random_intel.rand(n)
b = mkl_fft.fft(a)
plt.scatter(b.real,b.imag)
plt.show()
print(b)
for i in b :
if i.real > n/2:
print("Weird FFT Number is ",i)
Result is :
You can see:
Weird FFT Number is (50020.99077289924+0j)
Why FFT with random set came out one particular number?
(Thanks to Paul Panzer & SleuthEye)
With mkl_fft.fft(a-0.5) the final result is:
[2019/03/29 Updated]
With normalized data everything went well
b = mkl_fft.fft((a - np.mean(a))/np.std(a))
The average value of (a - np.mean(a))/np.std(a) is near zero

That is the constant or zero frequency mode, which is essentially the mean of your signal. You are sampling uniformly from the unit interval, so the mean is ~0.5. Some fft implementations scale this with the number of points to save a multiplication.

The large value in the FFT output happens to be the very first one which corresponds to the DC component. This indicates that the input has a non-zero average value over the entire data set.
Indeed if you look closer at the input data, you might notice that the values are always between 0 and 1, with an average value around 0.5. This is consistent with the rand function implementation which provides pseudo-random samples drawn from a uniform distribution over [0, 1).
You may confirm this to be the case by subtracting the average value with
b = mkl_fft.fft(a - np.mean(a))
and noting that the large initial value b[0] should be near zero.

Randomly generate integers with a distribution that prefers low ones

I have an list ordered by some quality function from which I'd like to take elements, preferring the good elements at the beginning of the list.
Currently, my function to generate the random indices looks essentially as follows:
def pick():
p = 0.2
for i in itertools.count():
if random.random() < p:
break
return i
It does a good job, but I wonder:
What's the name of the generated random distribution?
Is there a built-in function in Python for that distribution?

What you are describing sounds a lot like the exponential distribution. It already exists in the random module.
Here is some code that takes just the integer part of sampling from an exponential distribution with a rate parameter of 100.
import random
import matplotlib.pyplot as plt
d = [int(random.expovariate(1/100)) for i in range(10000)]
h,b = np.histogram(d, bins=np.arange(0,max(d)))
plt.bar(left=b[:-1], height=h, ec='none', width=1))
plt.show()

You could simulate it via exponential, but this is like making square peg fit round hole. As Mark said, it is geometric distribution - discrete, shifted by 1. And it is right here in the numpy:
import numpy as np
import random
import itertools
import matplotlib.pyplot as plt
p = 0.2
def pick():
for i in itertools.count():
if random.random() < p:
break
return i
q = np.random.geometric(p, size = 100000) - 1
z = [pick() for i in range(100000)]
bins = np.linspace(-0.5, 30.5, 32)
plt.hist(q, bins, alpha=0.2, label='geom')
plt.hist(z, bins, alpha=0.2, label='pick')
plt.legend(loc='upper right')
plt.show()
Output:

random.random() defaults to a uniform distribution, but there are other methods within random that would also work. For your given use case, I would suggest random.expovariate(2) (Documentation, Wikipedia). This is an exponential distribution that will heavily prefer lower values. If you google some of the other methods listed in the documentation, you can find some other built-in distributions.
Edit: Be sure to play around with the argument value for expovariate. Also note that it doesn't guarantee a value less than 1, so you might need to ensure that you only use values less than 1.

Uniformly distributed data in d dimensions

How can I generate a uniformly distributed [-1,1]^d data in Python? E.g. d is a dimension like 10.
I know how to generate uniformly distributed data like np.random.randn(N) but dimension thing is confused me a lot.

Assuming independence of the individual coordinates, then the following will generate a random point in [-1, 1)^d
np.random.random(d) * 2 - 1
The following will generate n observations, where each row is an observation
np.random.random((n, d)) * 2 - 1

As has been pointed out, randn produces normally distributed number (aka Gaussian). To get uniformly distributed you should use "uniform".
If you just want a single sample at a time of 10 uniformly distributed numbers you can use:
import numpy as np
x = np.random.uniform(low=-1,high=1,size=10)
OR if you'd like to generate lots (e.g. 100) of them at once then you can do:
import numpy as np
X = np.random.uniform(low=-1,high=1,size=(100,10))
Now X[0], X[1], ... each has length 10.

You can import the random module and call random.random to get a random sample from [0, 1). You can double that and subtract 1 to get a sample from [-1, 1).
Draw d values this way and the tuple will be a uniform draw from the cube [-1, 1)^d.

Without numpy:
[random.choice([-1,1]) for _ in range(N)]
There may be reasons to use numpy's internal mechanisms, or use random() manually, etc. But those are implementation details, and also related to how the random number generation rations the bits of entropy the operating system provides.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Select a non uniformely distributed random element from an array - python

Related

Generate random samples for each sample length for a distribution

How can I get the percentage of random variates in an interval? (Python)

Weird FFT plot with numpy random set

Randomly generate integers with a distribution that prefers low ones

Uniformly distributed data in d dimensions

Categories

Resources