I want to generate random numbers from lognormal distribution on background of exponential distribution as folows:
I have 100 integers (say localities) from 1 to 25. This integers are generated from my own exponential-like distribution.
On this localities I want to distribute N items. But these items have their own lognormal distribution, with some mode (between 1 and 25) and standart deviation (from 1 to 7). My code works like this:
I have array of localities called variable_vec, I know N called N, I know mode called pref_value and I know standard deviation called power_of_preference.
First I will compute shape and scale parameters from pref_value and power_of_preference. Than my progress is as folows:
unique_localities = np.unique(np.array(vec_of_variable)) #all values of localities
res1 = [0 for i in range(len(unique_localities))]
res = [0 for i in range(len(vec_of_variable))] #this will be desired output
for i in range(len(res1)):
res1[i] = stats.lognorm.pdf(unique_localities[i], shape, 0, scale) #pdfs of values of localities
res1 = np.array([x/min(res1) for x in res1]) #here is the problem, min(res1) could be zero, see text
res1 = np.round(res1)
res1 = np.cumsum(res1)
item = 0
while item < N:
r = random.uniform(0, max(res1))
site_pdf_value_vec = [x for x in res1 if x >= r]
site_pdf_value = min(site_pdf_value_vec) #this is value of locality where Ill place one item
The code continues but crucial part is here. Simply, lognorm pdf values of localities are 'probabilities' that Ill place my item in that locality. This is why I need pdf values.
PS: This approach is approved by my supervisor so I do not want to change it.
The problem is, that sometimes happens, that min(res1) = 0. Than ill divide by zero, and res1 become array of infinities. The lognormal for x between 0 and 25 is never zero, but it could be very close. I thing that problem is that one of these pdf values is too close to zero, so python will round it to zero.
My question is, how to avoid getting zeros in res1 in my code? My idea was to replace zeros by smallest positive floats in python, but I dont know this value. Or is there another, more elegant solution?
Thx for help.
PS: Someone could thing about not taking reverse values of res1, the problem step looks superflows. But it is the control, that min of these values is not zero. In another words, every locality must have some "interval" 'probability' that ill place item in it, if one pdf is zero, its probability is not interval but one number.
Compute lognorm.logpdf rather than lognorm.pdf, and then work in log space. This should have better accuracy for the very small probabilities that are being rounded to zero.
Related
So like the title says I need help trying to map points from a 2d plane to a number line in such a way that each point is associated with a unique positive integer. Put another way, I need a function f:ZxZ->Z+ and I need f to be injective. Additionally I need to to run in a reasonable time.
So the way I've though about doing this is to basically just count points, starting at (1,1) and spiraling outwards.
Below I've written some python code to do this for some point (i,j)
def plot_to_int(i,j):
a=max(i,j) #we want to find which "square" we are in
b=(a-1)^2 #we can start the count from the last square
J=abs(j)
I=abs(i)
if i>0 and j>0: #the first quadrant
#we start counting anticlockwise
if I>J:
b+=J
#we start from the edge and count up along j
else:
b+=J+(J-i)
#when we turn the corner, we add to the count, increasing as i decreases
elif i<0 and j>0: #the second quadrant
b+=2a-1 #the total count from the first quadrant
if J>I:
b+=I
else:
b+=I+(I-J)
elif i<0 and j<0: #the third quadrant
b+=(2a-1)2 #the count from the first two quadrants
if I>J:
b+=J
else:
b+=J+(J-I)
else:
b+=(2a-1)3
if J>I:
b+=I
else:
b+=I+(I-J)
return b
I'm pretty sure this works, but as you can see it quite a bulky function. I'm trying to think of some way to simplify this "spiral counting" logic. Or possibly if there's another counting method that is simpler to code that would work too.
Here's a half-baked idea:
For every point, calculate f = x + (y-y_min)/(y_max-y_min)
Find the smallest delta d between any given f_n and f_{n+1}. Multiply all the f values by 1/d so that all f values are at least 1 apart.
Take the floor() of all the f values.
This is sort of like a projection onto the x-axis, but it tries to spread out the values so that it preserves uniqueness.
UPDATE:
If you don't know all the data and will need to feed in new data in the future, maybe there's a way to hardcode an arbitrarily large or small constant for y_max and y_min in step 1, and an arbitrary delta d for step 2 according the boundaries of the data values you expect. Or a way to calculate values for these according to the limits of the floating point arithmetic.
For alpha and k fixed integers with i < k also fixed, I am trying to encode a sum of the form
where all the x and y variables are known beforehand. (this is essentially the alpha coordinate of a big iterated matrix-vector multiplication)
For a normal sum varying over one index I usually create a 1d array A and set A[i] equal to the i indexed entry of the sum then use sum(A), but in the above instance the entries of the innermost sum depend on the indices in the previous sum, which in turn depend on the indices in the sum before that, all the way back out to the first sum which prevents me using this tact in a straightforward manner.
I tried making a 2D array B of appropriate length and width and setting the 0 row to be the entries in the innermost sum, then the 1 row as the entries in the next sum times sum(np.transpose(B),0) and so on, but the value of the first sum (of row 0) needs to vary with each entry in row 1 since that sum still has indices dependent on our position in row 1, so on and so forth all the way up to sum k-i.
A sum which allows for a 'variable' filled in by each position of the array it's summing through would thusly do the trick, but I can't find anything along these lines in numpy and my attempts to hack one together have thus far failed -- my intuition says there is a solution that involves summing along the axes of a k-i dimensional array, but I haven't been able to make this precise yet. Any assistance is greatly appreciated.
One simple attempt to hard-code something like this would be:
for j0 in range(0,n0):
for j1 in range(0,n1):
....
Edit: (a vectorized version)
You could do something like this: (I didn't test it)
temp = np.ones(n[k-i])
for j in range(0,k-i):
temp = x[:n[k-i-1-j],:n[k-i-j]].T#(y[:n[k-i-j]]*temp)
result = x[alpha,:n[0]]#(y[:n[0]]*temp)
The basic idea is that you try to press it into a matrix-vector form. (note that this is python3 syntax)
Edit: You should note that you need to change the "k-1" to where the innermost sum is (I just did it for all sums up to index k-i)
This is 95% identical to #sehigle's answer, but includes a generic N vector:
def nested_sum(XX, Y, N, alpha):
intermediate = np.ones(N[-1], dtype=XX.dtype)
for n1, n2 in zip(N[-2::-1], N[:0:-1]):
intermediate = np.sum(XX[:n1, :n2] * Y[:n2] * intermediate, axis=1)
return np.sum(XX[alpha, :N[0]] * Y[:N[0]] * intermediate)
Similarly, I have no knowledge of the expression, so I'm not sure how to build appropriate tests. But it runs :\
Is there an elegant way or function to compute the mean of the last X elements of a list?
I have a list register that increases in size at each iterations :
register = np.append(register, value)
I want to create another list in which an i element corresponds to the mean of the X last elements in register
register_mean[i] = np.mean(register[i-X:i])
The tricky part is for the first X iterations, when there isn't X values yet in register. For these specific cases, I would like it to compute the mean on the firsts values of register, and only take the first value of register as first value of register_mean.
This could be done during the iterations or after, when register is complete.
I know there is lots of similar questions but haven't found one that answered this particular problem
Could it be something as simple as
if X < i:
register_mean[i] = np.mean(register[:i])
This just averages however many prior points there are until you have enough to average X points
Perhaps I misinterpreted your intent!
If I understand your question correctly, this should do the work:
X = 4 # Span of mean
register_mean = [np.mean(register[max(i-X, 0): max(i-X, 0) + 1]) for i in range(len(register))]
It will essentially create a moving average of the register elements between i - X and i; however, whenever i - X is negative, it will only take the values between 0 and i + 1.
Today my task is to make a histogram to represent the operation of A^n where A is a matrix, but only for specific entries in the matrix.
For example, say I have a matrix where the rows sum to one. The first entry is some specific decimal number. However, if I raise that matrix to the 2nd power, that first entry becomes something else, and if I raise that matrix to the 3rd power, it changes again, etc - ad nauseum, and that's what I need to plot.
Right now my attempt is to create an empty list, and then use a for loop to add the entries that result from matrix multiplication to the list. However, all that it does is print the result from the final matrix multiplication into the list, rather than printing its value at each iteration.
Here's the specific bit of code that I'm talking about:
print("The intial probability matrix.")
print(tabulate(matrix))
baseprob = []
for i in range(1000):
matrix_n = numpy.linalg.matrix_power(matrix, s)
baseprob.append(matrix_n.item(0))
print(baseprob)
print("The final probability matrix.")
print(tabulate(matrix_n))
Here is the full code, as well as the output I got.
http://pastebin.com/EkfQX2Hu
Of course it only prints the final value, you are doing the same operation, matrix^s, 1000 times. You need to have s change each of those 1000 times.
If you want to calculate all values in location matrix(0) for matrix^i where i is each value from 1 to s (your final power) do:
baseprob = []
for i in range(1,s): #changed to do a range 1-s instead of 1000
#must use the loop variable here, not s (s is always the same)
matrix_n = numpy.linalg.matrix_power(matrix, i)
baseprob.append(matrix_n.item(0))
Then baseprob will hold matrix(0) for matrix^1, matrix^2, etc. all the way to matrix^s.
I would like to create an array of Zipf Distributed values withing range of [0, 1000].
I am using numpy.random.zipf to create the values but I cannot create them within the range I want.
How can I do that?
normalize and multiply by 1000 ?
a=2
s = np.random.zipf(a, 1000)
result = (s/float(max(s)))*1000
print min(s), max(s)
print min(result), max(result)
althought isn't the whole point of zipf that the range of values is a function of the number of values generated ?
I agree with the original answer (Felix) that forcing Zipf values to a specific range is a very unusual thing, and it likely means that you're doing something wrong.
Having said that, I actually had a similar problem, where I really did need to generate Zipf values conforming to a certain criteria. In my case, I wanted to generate a brand new set of data that was similar to an existing data set. I wanted the sum to be the same as the existing distribution, but the values to be different.
My insight is that it's possible to re-generate the values a few times until you get ones you like.
#Generate a quantity of Zipf-distributed values close to a desired sum
def gen_zipf_values(alpha, sum, quantity):
best = []
best_sum = 0
for _ in range(10):
s = np.random.zipf(alpha,quantity)
this_sum = s.sum()
if (this_sum > best_sum) and (this_sum <= sum):
best = s
best_sum=this_sum
return best
Again, this solution is tailored to my problem, where I wanted to generate values close to a sum, without going over. I also had a pretty good idea of what I wanted alpha to be in each time. I omitted some of the conditions checking, sorting, etc. for clarity.
If you had to do it more than a few times though (i.e. you had to run the for loop 1 million times to get your distribution), you probably have something wrong (like alpha, or unrealistic expectations on the values). I feel it's valid to 'let the computer do the work', or to hand-pick the best option from a few reasonable ones.