I'm using x = numpy.random.rand(1) to generate a random number between 0 and 1. How do I make it so that x > .5 is 2 times more probable than x < .5?
That's a fitting name!
Just do a little manipulation of the inputs. First set x to be in the range from 0 to 1.5.
x = numpy.random.uniform(1.5)
x has a 2/3 chance of being greater than 0.5 and 1/3 chance being smaller. Then if x is greater than 1.0, subtract .5 from it
if x >= 1.0:
x = x - 0.5
This is overkill for you, but it's good to know an actual method for generating a random number with any probability density function (pdf).
You can do that by subclassing scipy.stat.rv_continuous, provided you do it correctly. You will have to have a normalized pdf (so that its integral is 1). If you don't, numpy will automatically adjust the range for you. In this case, your pdf has a value of 2/3 for x<0.5, and 4/3 for x>0.5, with a support of [0, 1) (support is the interval over which it's nonzero):
import scipy.stats as spst
import numpy as np
import matplotlib.pyplot as plt
import ipdb
def pdf_shape(x, k):
if x < 0.5:
return 2/3.
elif 0.5 <= x and x < 1:
return 4/3.
else:
return 0.
class custom_pdf(spst.rv_continuous):
def _pdf(self, x, k):
return pdf_shape(x, k)
instance = custom_pdf(a=0, b=1)
samps = instance.rvs(k=1, size=10000)
plt.hist(samps, bins=20)
plt.show()
tmp = random()
if tmp < 0.5: tmp = random()
is pretty easy way to do it
ehh I guess this is 3x as likely ... thats what i get for sleeping through that class I guess
from random import random,uniform
def rand1():
tmp = random()
if tmp < 0.5:tmp = random()
return tmp
def rand2():
tmp = uniform(0,1.5)
return tmp if tmp <= 1.0 else tmp-0.5
sample1 = []
sample2 = []
for i in range(10000):
sample1.append(rand1()>=0.5)
sample2.append(rand2()>=0.5)
print sample1.count(True) #~ 75%
print sample2.count(True) #~ 66% <- desired i believe :)
First off, numpy.random.rand(1) doesn't return a value in the [0,1) range (half-open, includes zero but not one), it returns an array of size one, containing values in that range, with the upper end of the range having nothing to do with the argument passed in.
The function you're probably after is the uniform distribution one, numpy.random.uniform() since this will allow an arbitrary upper range.
And, to make the upper half twice as likely is a relatively simple matter.
Take, for example, a random number generator r(n) which returns a uniformly distributed integer in the range [0,n). All you need to do is adjust the values to change the distribution:
x = r(3) # 0, 1 or 2, # 1/3 probability each
if x == 2:
x = 1 # Now either 0 (# 1/3) or 1 (# 2/3)
Now the chances of getting zero are 1/3 while the chances of getting one are 2/3, basically what you're trying to achieve with your floating point values.
So I would simply get a random number in the range [0,1.5), then subtract 0.5 if it's greater than or equal to one.
x = numpy.random.uniform(high=1.5)
if x >= 1: x -= 0.5
Since the original distribution should be even across the [0,1.5) range, the subtraction should make [0.5,1.0) twice as likely (and [1.0,1.5) impossible), while keeping the distribution even within each section ([0,0.5) and [0.5,1)):
[0.0,0.5) [0.5,1.0) [1.0,1.5) before
<---------><---------><--------->
[0.0,0.5) [0.5,1.0) [0.5,1.0) after
You could take a "mixture model" approach where you split the process into two steps: first, decide whether to take option A or B, where B is twice as likely as A; then, if you chose A, return a random number between 0.0 and 0.5, else if you chose B, return one between 0.5 and 1.0.
In the example, the randint randomly returns 0, 1, or 2, so the else case is twice as likely as the if case.
m = numpy.random.randint(3)
if m==0:
x = numpy.random.uniform(0.0, 0.5)
else:
x = numpy.random.uniform(0.5, 1.0)
This is a little more expensive (two random draws instead of one) but it can generalize to more complicated distributions in a fairly straightforward way.
if you want a more fluid randomness, you can just square the output of the random function
(and subtract it from 1 to make x > 0.5 more probable instead of x < 0.5).
x = 1 - sqr(numpy.random.rand(1))
Related
I have two questions:
1- This code takes too long to execute. Any idea how I can make it faster?
With the code bellow I want generate 100 random discrete values between 700 and 1200.
I choosed the weibull distribution because I wanted to generate failure rates data please see the histogram bellow.
import random
nums = []
alpha = 0.6
beta = 0.4
while len(nums) !=100:
temp = int(random.weibullvariate(alpha, beta))
if 700 <= temp <1200:
nums.append(temp)
print(nums)
# plotting a graph
#plt.hist(nums, bins = 200)
#plt.show()
print(nums)
I wanted to generate a histogram like this one:
Histogram
2- I have this function for discrete weibull distribution
def DiscreteWeibull(q, b, x):
return q**(x**b) - q**((x + 1)**b)
How can I generate random values that follow this distribution?
Since the Weibull distribution with shape parameter K and scale parameter lambda can be characterized as this function on the Uniform (0,1) dist. U, we can 'cut' the distribution to a desired minimum and maximum value. We do this by inverting the equation, setting W to 700 or 1200, and finding the values between 0 and 1 that correspond. Here's some sample code.
def weibull_from_uniform(shape, scale, x):
assert 0 <= x <= 1
return scale * pow(-1 * math.log(x), 1.0 / shape)
scale_param = 0.6
shape_param = 0.4
min_value = 700.0
max_value = 1200.0
lower_bound = math.exp(-1 * pow(min_value / scale_param, shape_param))
upper_bound = math.exp(-1 * pow(max_value / scale_param, shape_param))
if lower_bound > upper_bound:
lower_bound, upper_bound = upper_bound, lower_bound
nums = []
while len(nums) < 100:
nums.append(weibull_from_uniform(shape_param, scale_param, random.uniform(lower_bound, upper_bound)))
print(nums)
plt.hist(nums, bins=8)
plt.show()
This code gives a histogram very similar to the one you provided; the method will give values from the same distribution as your original method, just faster. Note that this direct approach only works when our shape parameter K <= 1, so that the density function is strictly decreasing. When K > 1, the Weibull density function increases to a mode, then decreases, so you may need to draw from two uniform intervals for particular min and max values (since inverting for W and U may give two answers).
Your question is not very clear on why you thought using this Weibull distribution was a good idea, nor what distribution you are looking to achieve.
Discrete uniform distribution
Here are two ways to achieve the discrete uniform distribution on [700, 1200).
1) With random
import random
nums = [random.randrange(700, 1200) for _ in range(100)]
2) With numpy
import numpy
nums = numpy.random.randint(700, 1200, 100)
Geometric distribution
You have edited your question with an example histogram, and the mention "I wanted to generate a histogram like this one". The histogram vaguely looks like a geometric distribution.
We can use numpy.random.geometric:
import numpy
n_samples = 100
p = 0.5
a, b = 50, 650
cap = 1200
nums = numpy.random.geometric(p, size = 2 * n_samples) * a + b
nums = nums[numpy.where(nums < cap)][:n_samples]
I have a function f(x) which I know has two zeros within an interval and I need to compute both x values for wich the function cross 0.
I usually use
import scipy.optimize as opt
opt.brentq(f, xmin, xmax)
But the problem is this method is working if the function has one 0 in the interval, and it is not very simple to know where to divide in two parts.
The function is also time costly to evaluate...
I think a good approach would be to pre-process the search of the zeros by sampling f before searching for the zeros. During that pre-process, you evaluate f to detect if the sign of the function has changed.
def preprocess(f,xmin,xmax,step):
first_sign = f(xmin) > 0 # True if f(xmin) > 0, otherwise False
x = xmin + step
while x <= xmax: # This loop detects when the function changes its sign
fstep = f(x)
if first_sign and fstep < 0:
return x
elif not(first_sign) and fstep > 0:
return x
x += step
return x # If you ever reach here, that means that there isn't a zero in the function !!!
With this function , you can separate your initial interval in several smaller intervals. For example :
import scipy.optimize as opt
step = ...
xmid = preprocess(f,xmin,max,step)
z0 = opt.brentq(f,xmin,xmid)
z1 = opt.brentq(f,xmid,xmax)
Depending of the functions f you use, you may need to separate your interval in more than two sub-intervals. Just iterates through [xmin,xmax] like this :
x_list = []
x = x_min
while x < xmax: # This discovers when f changes its sign
x_list.append(x)
x = preprocess(f,x,xmax,step)
x_list.append(xmax)
z_list = []
for i in range(len(x_list) - 1):
z_list.append(opt.brentq(f,x_list[i],x_list[i + 1]))
In the end, z_list contains all the zeros in the given interval [xmin,xmax].
Keep in mind that this algorithm is time-consuming but will do the job.
I want to find the closest representation of a floating point number in the form N/2**M in python, where N and M are integers. I attempted to use the minimisation function from scipy.optimise but it cannot be confined to the case where N and M are integers.
I ended up using a simple implementation that iterates through values of M and N and finds the minimum, but this is computationally expensive and time consuming for arrays of many numbers, what might be a better way of doing this?
My simple implementation is shown below:
import numpy as np
def ValueRepresentation(X):
M, Dp = X
return M/(2**Dp)
def Diff(X, value):
return abs(ValueRepresentation(X) - value)
def BestApprox(value):
mindiff = 1000000000
for i in np.arange(0, 1000, 1):
for j in np.arange(0, 60, 1):
diff = Diff([i, j], value)
if diff < mindiff:
mindiff = diff
M = i
Dp = j
return M, Dp
Just use the built-in functionality:
In [10]: 2.5.as_integer_ratio() # get representation as fraction
Out[10]: (5, 2)
In [11]: (2).bit_length() - 1 # convert 2**M to M
Out[11]: 1
Note that all non-infinite, non-NaN floats are dyadic rationals, so we can rely on the denominator being an exact power of 2.
Thanks to jasonharper I realised my implementation is ridiculously inefficient and could be much simpler.
The implementation of his method is shown below:
def BestApprox_fast(value):
mindiff = 1000000000
for Dp in np.arange(0, 32, 1):
M = round(value*2**Dp)
if abs(M) < 1000:
diff = Diff([M, Dp], value)
if diff < mindiff:
mindiff = diff
M_best = M
Dp_best = Dp
return M_best, Dp_best
It is approximately 200 times quicker.
With the limits on M and N given, the range of N/2**M is a well defined discrete number scale:
[0-1000/2^26, 501-1000/2^25, 501-1000/2^24, ... 501-1000/2^1, 501-1000/2^0].
In this given discrete set, different subsets have different accuracy/resolution. The first subset [0-1000/2^26] has accuracy of 2^-26 or 26 binary bits resolution. So whenever the given number falls in the corresponding continuous domain [0,1000/2^26], the best accuracy achievable is 2^-26. Successively, the best accuracy is 2^25 when the given number is beyond the first domain but falls in domain [500/2^25,1000/2^25], which corresponds to the second subset [501-1000/2^25]. (Note the difference between discrete set and continuous domain.)
With the above logic, we know the best accuracy, defined by M, depends on where the given number falls on the scale. Thus we can implement it as following python code:
import numpy as np
limits = 1000.0/2**np.arange(0,61)
a = 103.23 # test value
for i in range(60,-1,-1):
if a <= limits[i]:
N = i
M = round(a * 2**N)
r = [M, N]
break
if a > 1000:
r = [round(a), 0]
This solution has O(c) execution time, so it is ideal for multiple invocations.
I have 10k data points like this:
0.010222
0.010345
0.010465
0.010611
0.010768
0.010890
0.011049
0.011206
0.011329
0.011465
0.011613
0.11763
0.011888
0.012015
0.012154
0.012282
0.012408
0.012524
....
I want to calculate Lyapunov exponent for that. This is what I've done so far:
lyapunovs = []
eps = 0.0001
for i in range(N):
for j in range(i + 1, N):
if np.abs(data[i] - data[j]) < eps:
for k in range(1, min(N - i, N - j)):
d0 = np.abs(data[i] - data[j])
dn = np.abs(data[i + k] - data[j + k])
lyapunovs.append(math.log(dn) - math.log(d0)) # problem
My problem is that I don't know first Lyapunov exponent is average of all the lyapunovs when k = 1 or average of all the lyapunovs for the first time that data[i] - data[j] < eps?
Is this right implementation for Lyapunov exponent?
And this is the Numerical Calculation of Lyapunov Exponent
I would calculate the Lyapunov Exponent in this way and then output the results as tuples in a file see blog:
https://blog.abhranil.net/2014/07/22/calculating-the-lyapunov-exponent-of-a-time-series-with-python-code/:
from math import log
import numpy as np
with open('data.txt', 'r') as f:
data = [float(i) for i in f.read().split()]
N = len(data)
eps = 0.001
lyapunovs = [[] for i in range(N)]
for i in range(N):
for j in range(i + 1, N):
if np.abs(data[i] - data[j]) < eps:
for k in range(min(N - i, N - j)):
lyapunovs[k].append(log(np.abs(data[i+k] - data[j+k])))
with open('lyapunov.txt', 'w') as f:
for i in range(len(lyapunovs)):
if len(lyapunovs[i]):
string = str((i, sum(lyapunovs[i]) / len(lyapunovs[i])))
f.write(string + '\n')
I see from the chosen loop structure in the question that a triangle of the Cartesian product of the points is being used. This might improve the estimate of the derivatives, which are susceptible to noise, but it is not part of the Lyapunov exponent explicitly. See this example of the calculations on a known function in the absence of measurement error. Feel free to look into that aspect more, but below I will assume the comparison of signal points adjacent in time.
Your original question uses NumPy, so I will also make use of it. One of the rules of thumb to using NumPy well is to avoid loops, although it is possible to vectorize functions that contain loops. With no explicit time measurements, and no repeated values, you could simply do:
import numpy as np
x = np.random.normal(0,1,size=10**4) # Mock signal data
np.mean(np.log(np.abs(np.diff(x))))
Or if the signal is paired with an array of timepoints, then the numerical derivative can involve time:
import numpy as np
x = np.random.normal(0,1,size=10**4) # Mock signal data
t = np.arange(10**4) # Mock time data
np.mean(np.log(np.abs(np.diff(x) / np.diff(t))))
However, in some datasets it is possible for adjacent values to repeat! This can occur when you've measured the signal only to a few decimal places, and it is a problem because it leads to np.log(0) (=-np.inf) which will blow up your calculation. A simple solution is to remove duplicated values, but this will only be suitable if duplicates are relatively rare and you have a large sample size. It is possible to estimate an upper bound on the estimate of the L-exponent by considering the precision of your measurements, but that is not the estimate of the L-exponent itself.
I just want to mention that knowing the literal expression is the best.
I will take an example with the logistic map equation :
def logisticmap(x_init, r, length):
x = [x_init]
for t in range(length):
x.append(r*x[-1]*(1-x[-1]))
return np.array(x)
Now let's generate the data :
x = logistic(0.2, 3.92, 1000)
plt.plot(x)
plt.show()
Plot logistic map
Here is the proposed solution by Galan,
np.mean(np.log(abs(np.diff(x))))
Which gives : -1.0379
When you derive the Lyapunov exponent from the logistic map equation :
np.mean(np.log(abs(r*(1-2*x))))
It gives : 0.538296
Which is the actual true value for the Lyapunov, since the system is in its chaotic regime it must be positive, so I guess the evaluation from data points is not working in this example, you can try with more data points, but it will still give you a negative LE.
Unfortunately I don't know enough to guide you towards a better estimation for the Lyapunov if you can't derive a mathematical expression, but I would be intersted to know !
I tried to reduce computational complexity with numpy vectorization.
def lyapunov_exponent(series: np.array, threshold: float): -> np.array
N = len(series)
eps = threshold
L = [np.array([0]*N)]
for i in range(1, N):
diff = np.abs(series[i:]-series[:-i])
dist = np.log(diff)
L.append(np.concatenate([[0]*i, dist]))
L = np.array(L)
tf_L = np.where(L<eps, 1, 0)
count_L = np.zeros_like(tf_L)
for i in range(N):
indices = ( np.array(range(0,N-i)), np.array(range(i,N)) )
count_L[indices] = np.cumsum(tf_L[indices])
avg = np.sum(count_L * L, axis=0) / np.sum(count_L, axis=0)
return avg
If there is room for improvement or you get some different result than already answered, please reply.
I have been trying to solve integration with riemann sum. My function has 3 arguments a,b,d so a is lower limit b is higher limit and d is the part where a +(n-1)*d < b. This is my code so far but. My output is 28.652667999999572 what I should get is 28.666650000000388. Also if the input b is lower than a it has to calculate but I have solved that problem already.
def integral(a, b, d):
if a > b:
a,b = b,a
delta_x = float((b-a)/1000)
j = abs((b-a)/delta_x)
i = int(j)
n = s = 0
x = a
while n < i:
delta_A = (x**2+3*x+4) * delta_x
x += delta_x
s += delta_A
n += 1
return abs(s)
print(integral(1,3,0.01))
There is no fault here, neither with the algorithm nor with your code (or python). The Riemann sum is an approximation of the integral and per se not "exact". You approximate the area of a (small) stripe of width dx, say between x and x+dx, and f(x) with the area of an rectangle of the same width and the height of f(x) as it's left upper corner. If the function changes it's value when you go from x to x+dx then the area of the rectangle deviates from the true integral.
As you have noticed, you can make the approximation closer by making thinner and thinner slices, at the cost of more computational effort and time.
In your example, the function is f(x) = x^2 + 3*x + 4, and it's exact integral over x in [1.0,3.0) is 28 2/3 or 28.66666...
The approximation by rectangles is a crude one, you cannot change that. But what you could change is the time it takes for your code to evaluate, say, 10^8 steps instead of 10^3. Look at this code:
def riemann(a, b, dx):
if a > b:
a,b = b,a
# dx = (b-a)/n
n = int((b - a) / dx)
s = 0.0
x = a
for i in xrange(n):
f_i = (x + 3.0) * x + 4.0
s += f_i
x += dx
return s * dx
Here, I've used 3 tricks for speedup, and one for greater precision. First, if you write a loop and you know the number of repetions in advance then use a for-loop instead of a while-loop. It's faster. (BTW, loop variables conventionally are i, j, k ... whereas a limit or final value is n). Secondly, using xrange instead of range is faster for users of python 2.x. Thirdly, factorize polynoms when calculating them often. You should see from the code what I mean here. This way, the result is numerically stable. Last trick: operations within the loop which do not depend on the loop variable can be extracted and applied after the loop has ended. Here, the final multiplication with dx.