I try to get myself familiar with programming in python but have just started and struggling with the following problem. Maybe someone can give me a hint how to proceed or where I can look for a nice solution.
I'd like to plot planck curves for 132 wavelength in 6 different temperatures via a loop in a loop. The function planckwavel receives two parameters, wavelength and temperature, which I separated in two loops.
I so far managed to use lists, which worked, however probably not solved in an elegant way:
plancks = []
temp = [280, 300, 320, 340, 360, 380]
temp_len = len(temp)
### via fun planckwavel
for i in range(temp_len):
t_list = [] # list nach jeder j schleife wieder leeren
for j in range(wl_centers_ar.shape[0]):
t = planckwavel(wl_centers_ar[j],temp[i])
t_list.append(t)
plancks.append(t_list)
### PLOT Planck curves
plancks = np.array(plancks).T # convert list to array and transpose
view_7 = plt.figure(figsize=(8.5, 4.5))
plt.plot(wl_centers_ar,plancks)
plt.xticks(rotation='vertical')
But I would like to use arrays insted of lists, as I like to continue afterwards with huge more dimensional images. So I tried the same with arrays but unfortunately failed with this code:
plancks_ar = zeros([132,6], dtype=float ) # create array and fill with zeros
temp_ar = array([273, 300, 310, 320, 350, 373])
for i in range(temp_ar.shape[0]):
t_ar = np.zeros(plancks_ar.shape[0])
for j in range(plancks_ar.shape[0]):
t = planck(wl_centers_ar[j]*1e-6,temp[1])/10**6
np.append(t_ar,t)
np.append(plancks_ar, t_ar)
plt.plot(wl_centers_ar,plancks)
I would be very thankful, if someone can give me some advice.
Thanx,
best regards,
peter
I think you're asking about how to use NumPy's broadcasting and vectorization. Here's a way to remove the explicit Python loops:
import numpy as np
# Some physical constants we'll need
h, kB, c = 6.626e-34, 1.381e-23, 2.998e8
def planck(lam, T):
# The Planck function, using NumPy vectorization
return 2*h*c**2/lam**5 / (np.exp(h*c/lam/kB/T) - 1)
# wavelength array, 3 - 300 um
lam = np.linspace(3, 75, 132)
# temperature array
T = np.array([280, 300, 320, 340, 360, 380])
# Remember to convert wavelength from um to m
pfuncs = planck(lam * 1.e-6, T[:,None])
import pylab
for pfunc in pfuncs:
pylab.plot(lam, pfunc)
pylab.show()
We want to calculate planck for each wavelength and for each T, so we need to broadcast the calculation over the two arrays. Following the rules laid out in the documentation linked to above, we can do that by adding a new axis to the temperature array (with T[:, None]):
lam: 132
T 6 x 1
--------------
6 x 132
The final dimension of T[:, None] is 1, so the 132 values of lam can be broadcast across it to produce a 6 x 132 array: 6 rows (one for each T) of 132 values (the wavelengths).
I tried to doublecheck the above planck equation using the inverse (brightness temperature). Oriented on your code I definde the following function and expected to get 300 Kelvin (# 10 microns, for 10 W/m2/str/microns):
def planckInv(lam, rad):
rad=rad*1.e6 #convert to W/m^2/m/sr
lam=lam*1.e-6 #convert wavelength to m
return (h*c/kB*lam)*( 1/ np.log( (2*h*c**2/lam**5) / rad +1 ))
but received a strange result
planckInv(10,10) - - > 3.0039933569668916e-08
Any suggestions what's wrong with my brightness temperature function?
thanks,
peter
Related
GOAL
I have values v given at specific 3D coordinates x y z. The data is stored as a pandas dataframe:
x y z v
0 -68.5 68.50 -10.00 0.297845
1 -68.5 -23.29 61.10 0.148683
2 -68.5 -23.29 63.47 0.142325
3 -68.5 -23.29 65.84 0.135908
4 -68.5 -23.29 68.21 0.129365
... ... ... ...
91804 68.5 23.29 151.16 0.118460
91805 68.5 23.29 153.53 0.119462
91806 68.5 23.29 155.90 0.120386
91807 68.5 23.29 139.31 0.112257
91808 68.5 -68.50 227.00 0.127948
I would like to find the values at new coordinates that are not part of the dataframe, hence I am looking into how to efficiently interpolate the data.
What I have done:
Since the coordinates are on a grid, I can use interpn:
import numpy as np
from scipy.interpolate import interpn
# Extract the list of coordinates (I know that they are on a grid)
xs = np.array(df["x"].to_list())
ys = np.array(df["y"].to_list())
zs = np.array(df["z"].to_list())
# Extract the associated values
vs = np.array(df["v"].to_list())
Reshape the data to fit the scipy function:
points = (np.unique(xs), np.unique(ys), np.unique(zs))
values= vs.reshape(len(np.unique(xs)), len(np.unique(ys)), len(np.unique(zs)))
To test the interpolation, I would like to see if I get the same values back, if I put in the same points as the original points:
request = (xs,ys,zs)
output = interpn(points, values, request)
... BUT
I am wondering, what I am doing wrong??
Other:
Dataset
Please find the complete dataset here: https://filebin.net/u10lrw956enqhg5i
Visualization
from mayavi import mlab
# Create figure
fig = mlab.figure(1, fgcolor=(0, 0, 0), bgcolor=(0, 0, 0))
mlab.points3d(xs,ys,zs,output)
mlab.view(azimuth=270, elevation=90, roll=180, figure=fig)
# View plot
mlab.show()
I strongly suspect that your data, while on a grid, is not ordered so as to allow a simple reshape of the values. You have two solutions available, both involving reordering the data in different ways.
Solution 1
Since you're already using np.unique to extract the grid, you can get the correct ordering of vs using the return_inverse parameter:
px, ix = np.unique(xs, return_inverse=True)
py, iy = np.unique(ys, return_inverse=True)
pz, iz = np.unique(zs, return_inverse=True)
points = (px, py, pz)
values = np.empty_like(vs, shape=(px.size, py.size, pz.size))
values[ix, iy, iz] = vs
return_inverse is sort of magical, largely because it's so counterintuitive. In this case, for each element of values, it tells you which unique, sorted gross location it corresponds to.
By the way, if you are missing grid elements, you may want to replace np.empty_like(vs, shape=(px.size, py.size, pz.size)) with either np.zeros_like(vs, shape=(px.size, py.size, pz.size)) or np.empty_like(vs, np.nan, shape=(px.size, py.size, pz.size)). In the latter case, you could interpolate the nans in the grid first.
Solution 2
The more obvious solution would be to rearrange the indices so you can reshape vs as you tried to do. That only works if you're sure that there are no missing grid elements. The easiest way would be to sort the whole dataframe, since the pandas methods are less annoying than np.lexsort (IMO):
df.sort_values(['x', 'y', 'z'], inplace=True, ignore_index=True)
When you extract, do it efficiently:
xs, ys, zs, vs = df.to_numpy().T
Since everything is sorted, you don't need np.unique to identify the grid any more. The number of unique x values is:
nx = np.count_nonzero(np.diff(xs)) + 1
And the unique values are:
bx = xs.size // nx
ux = xs[::bx]
y values go through a full cycle every bx elements, so
ny = np.count_nonzero(np.diff(ys[:bx])) + 1
by = bx // ny
uy = ys[:bx:by]
And for z (bz == 1):
nz = by
uz = zs[:nz]
Now you can construct your original arrays:
points = (ux, uy, uz)
values = vs.reshape(nx, ny, nz)
I'm trying to plot a simple moving averages function but the resulting array is a few numbers short of the full sample size. How do I plot such a line alongside a more standard line that extends for the full sample size? The code below results in this error message:
ValueError: x and y must have same first dimension, but have shapes (96,) and (100,)
This is using standard matplotlib.pyplot. I've tried just deleting X values using remove and del as well as switching all arrays to numpy arrays (since that's the output format of my moving averages function) then tried adding an if condition to the append in the while loop but neither has worked.
import random
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
def movingaverage(values, window):
weights = np.repeat(1.0, window) / window
smas = np.convolve(values, weights, 'valid')
return smas
sampleSize = 100
min = -10
max = 10
window = 5
vX = np.array([])
vY = np.array([])
x = 0
val = 0
while x < sampleSize:
val += (random.randint(min, max))
vY = np.append(vY, val)
vX = np.append(vX, x)
x += 1
plt.plot(vX, vY)
plt.plot(vX, movingaverage(vY, window))
plt.show()
Expected results would be two lines on the same graph - one a simple moving average of the other.
Just change this line to the following:
smas = np.convolve(values, weights,'same')
The 'valid' option, only convolves if the window completely covers the values array. What you want is 'same', which does what you are looking for.
Edit: This, however, also comes with its own issues as it acts like there are extra bits of data with value 0 when your window does not fully sit on top of the data. This can be ignored if chosen, as is done in this solution, but another approach is to pad the array with specific values of your choosing instead (see Mike Sperry's answer).
Here is how you would pad a numpy array out to the desired length with 'nan's (replace 'nan' with other values, or replace 'constant' with another mode depending on desired results)
https://docs.scipy.org/doc/numpy/reference/generated/numpy.pad.html
import numpy as np
bob = np.asarray([1,2,3])
alice = np.pad(bob,(0,100-len(bob)),'constant',constant_values=('nan','nan'))
So in your code it would look something like this:
import random
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
def movingaverage(values,window):
weights = np.repeat(1.0,window)/window
smas = np.convolve(values,weights,'valid')
shorted = int((100-len(smas))/2)
print(shorted)
smas = np.pad(smas,(shorted,shorted),'constant',constant_values=('nan','nan'))
return smas
sampleSize = 100
min = -10
max = 10
window = 5
vX = np.array([])
vY = np.array([])
x = 0
val = 0
while x < sampleSize:
val += (random.randint(min,max))
vY = np.append(vY,val)
vX = np.append(vX,x)
x += 1
plt.plot(vX,vY)
plt.plot(vX,(movingaverage(vY,window)))
plt.show()
To answer your basic question, the key is to take a slice of the x-axis appropriate to the data of the moving average. Since you have a convolution of 100 data elements with a window of size 5, the result is valid for the last 96 elements. You would plot it like this:
plt.plot(vX[window - 1:], movingaverage(vY, window))
That being said, your code could stand to have some optimization done on it. For example, numpy arrays are stored in fixed size static buffers. Any time you do append or delete on them, the entire thing gets reallocated, unlike Python lists, which have amortization built in. It is always better to preallocate if you know the array size ahead of time (which you do).
Secondly, running an explicit loop is rarely necessary. You are generally better off using the under-the-hood loops implemented at the lowest level in the numpy functions instead. This is called vectorization. Random number generation, cumulative sums and incremental arrays are all fully vectorized in numpy. In a more general sense, it's usually not very effective to mix Python and numpy computational functions, including random.
Finally, you may want to consider a different convolution method. I would suggest something based on numpy.lib.stride_tricks.as_strided. This is a somewhat arcane, but very effective way to implement a sliding window with numpy arrays. I will show it here as an alternative to the convolution method you used, but feel free to ignore this part.
All in all:
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
def movingaverage(values, window):
# this step creates a view into the same buffer
values = np.lib.stride_tricks.as_strided(values, shape=(window, values.size - window + 1), strides=values.strides * 2)
smas = values.sum(axis=0)
smas /= window # in-place to avoid temp array
return smas
sampleSize = 100
min = -10
max = 10
window = 5
v_x = np.arange(sampleSize)
v_y = np.cumsum(np.random.random_integers(min, max, sampleSize))
plt.plot(v_x, v_y)
plt.plot(v_x[window - 1:], movingaverage(v_y, window))
plt.show()
A note on names: in Python, variable and function names are conventionally name_with_underscore. CamelCase is reserved for class names. np.random.random_integers uses inclusive bounds just like random.randint, but allows you to specify the number of samples to generate. Confusingly, np.random.randint has an exclusive upper bound, more like random.randrange.
I have a numpy array whose values are distributed in the following manner
From this array I need to get a random sub-sample which is normally distributed.
I need to get rid of the values from the array which are above the red line in the picture. i.e. I need to get rid of some occurences of certain values from the array so that my distribution gets smoothened when the abrupt peaks are removed.
And my array's distribution should become like this:
Can this be achieved in python, without manually looking for entries corresponding to the peaks and remove some occurences of them ? Can this be done in a simpler way ?
The following kind of works, it is rather aggressive, though:
It works by ordering the samples, transforming to uniform and then trying to select a regular griddish subsample. If you feel it is too aggressive you could increase ns which is essentially the number of samples kept.
Also, please note that it requires the knowledge of the true distribution. In case of normal distribution you should be fine with using sample mean and unbiased variance estimate (the one with n-1).
Code (without plotting):
import scipy.stats as ss
import numpy as np
a = ss.norm.rvs(size=1000)
b = ss.uniform.rvs(size=1000)<0.4
a[b] += 0.1*np.sin(10*a[b])
def smooth(a, gran=25):
o = np.argsort(a)
s = ss.norm.cdf(a[o])
ns = int(gran / np.max(s[gran:] - s[:-gran]))
grid, dp = np.linspace(0, 1, ns, endpoint=False, retstep=True)
grid += dp/2
idx = np.searchsorted(s, grid)
c = np.flatnonzero(idx[1:] <= idx[:-1])
while c.size > 0:
idx[c+1] = idx[c] + 1
c = np.flatnonzero(idx[1:] <= idx[:-1])
idx = idx[:np.searchsorted(idx, len(a))]
return o[idx]
ap = a[smooth(a)]
c, b = np.histogram(a, 40)
cp, _ = np.histogram(ap, b)
I have a set of 46 years worth of rainfall data. It's in the form of 46 numpy arrays each with a shape of 145, 192, so each year is a different array of maximum rainfall data at each lat and lon coordinate in the given model.
I need to create a global map of tau values by doing an M-K test (Mann-Kendall) for each coordinate over the 46 years.
I'm still learning python, so I've been having trouble finding a way to go through all the data in a simple way that doesn't involve me making 27840 new arrays for each coordinate.
So far I've looked into how to use scipy.stats.kendalltau and using the definition from here: https://github.com/mps9506/Mann-Kendall-Trend
EDIT:
To clarify and add a little more detail, I need to perform a test on for each coordinate and not just each file individually. For example, for the first M-K test, I would want my x=46 and I would want y=data1[0,0],data2[0,0],data3[0,0]...data46[0,0]. Then to repeat this process for every single coordinate in each array. In total the M-K test would be done 27840 times and leave me with 27840 tau values that I can then plot on a global map.
EDIT 2:
I'm now running into a different problem. Going off of the suggested code, I have the following:
for i in range(145):
for j in range(192):
out[i,j] = mk_test(yrmax[:,i,j],alpha=0.05)
print out
I used numpy.stack to stack all 46 arrays into a single array (yrmax) with shape: (46L, 145L, 192L) I've tested it out and it calculates p and tau correctly if I change the code from out[i,j] to just out. However, doing this messes up the for loop so it only takes the results from the last coordinate in stead of all of them. And if I leave the code as it is above, I get the error: TypeError: list indices must be integers, not tuple
My first guess was that it has to do with mk_test and how the information is supposed to be returned in the definition. So I've tried altering the code from the link above to change how the data is returned, but I keep getting errors relating back to tuples. So now I'm not sure where it's going wrong and how to fix it.
EDIT 3:
One more clarification I thought I should add. I've already modified the definition in the link so it returns only the two number values I want for creating maps, p and z.
I don't think this is as big an ask as you may imagine. From your description it sounds like you don't actually want the scipy kendalltau, but the function in the repository you posted. Here is a little example I set up:
from time import time
import numpy as np
from mk_test import mk_test
data = np.array([np.random.rand(145, 192) for _ in range(46)])
mk_res = np.empty((145, 192), dtype=object)
start = time()
for i in range(145):
for j in range(192):
out[i, j] = mk_test(data[:, i, j], alpha=0.05)
print(f'Elapsed Time: {time() - start} s')
Elapsed Time: 35.21990394592285 s
My system is a MacBook Pro 2.7 GHz Intel Core I7 with 16 GB Ram so nothing special.
Each entry in the mk_res array (shape 145, 192) corresponds to one of your coordinate points and contains an entry like so:
array(['no trend', 'False', '0.894546014835', '0.132554125342'], dtype='<U14')
One thing that might be useful would be to modify the code in mk_test.py to return all numerical values. So instead of 'no trend'/'positive'/'negative' you could return 0/1/-1, and 1/0 for True/False and then you wouldn't have to worry about the whole object array type. I don't know what kind of analysis you might want to do downstream but I imagine that would preemptively circumvent any headaches.
Thanks to the answers provided and some work I was able to work out a solution that I'll provide here for anyone else that needs to use the Mann-Kendall test for data analysis.
The first thing I needed to do was flatten the original array I had into a 1D array. I know there is probably an easier way to go about doing this, but I ultimately used the following code based on code Grr suggested using.
`x = 46
out1 = np.empty(x)
out = np.empty((0))
for i in range(146):
for j in range(193):
out1 = yrmax[:,i,j]
out = np.append(out, out1, axis=0) `
Then I reshaped the resulting array (out) as follows:
out2 = np.reshape(out,(27840,46))
I did this so my data would be in a format compatible with scipy.stats.kendalltau 27840 is the total number of values I have at every coordinate that will be on my map (i.e. it's just 145*192) and the 46 is the number of years the data spans.
I then used the following loop I modified from Grr's code to find Kendall-tau and it's respective p-value at each latitude and longitude over the 46 year period.
`x = range(46)
y = np.zeros((0))
for j in range(27840):
b = sc.stats.kendalltau(x,out2[j,:])
y = np.append(y, b, axis=0)`
Finally, I reshaped the data one for time as shown:newdata = np.reshape(y,(145,192,2)) so the final array is in a suitable format to be used to create a global map of both tau and p-values.
Thanks everyone for the assistance!
Depending on your situation, it might just be easiest to make the arrays.
You won't really need them all in memory at once (not that it sounds like a terrible amount of data). Something like this only has to deal with one "copied out" coordinate trend at once:
SIZE = (145,192)
year_matrices = load_years() # list of one 145x192 arrays per year
result_matrix = numpy.zeros(SIZE)
for x in range(SIZE[0]):
for y in range(SIZE[1]):
coord_trend = map(lambda d: d[x][y], year_matrices)
result_matrix[x][y] = analyze_trend(coord_trend)
print result_matrix
Now, there are things like itertools.izip that could help you if you really want to avoid actually copying the data.
Here's a concrete example of how Python's "zip" might works with data like yours (although as if you'd used ndarray.flatten on each year):
year_arrays = [
['y0_coord0_val', 'y0_coord1_val', 'y0_coord2_val', 'y0_coord2_val'],
['y1_coord0_val', 'y1_coord1_val', 'y1_coord2_val', 'y1_coord2_val'],
['y2_coord0_val', 'y2_coord1_val', 'y2_coord2_val', 'y2_coord2_val'],
]
assert len(year_arrays) == 3
assert len(year_arrays[0]) == 4
coord_arrays = zip(*year_arrays) # i.e. `zip(year_arrays[0], year_arrays[1], year_arrays[2])`
# original data is essentially transposed
assert len(coord_arrays) == 4
assert len(coord_arrays[0]) == 3
assert coord_arrays[0] == ('y0_coord0_val', 'y1_coord0_val', 'y2_coord0_val', 'y3_coord0_val')
assert coord_arrays[1] == ('y0_coord1_val', 'y1_coord1_val', 'y2_coord1_val', 'y3_coord1_val')
assert coord_arrays[2] == ('y0_coord2_val', 'y1_coord2_val', 'y2_coord2_val', 'y3_coord2_val')
assert coord_arrays[3] == ('y0_coord2_val', 'y1_coord2_val', 'y2_coord2_val', 'y3_coord2_val')
flat_result = map(analyze_trend, coord_arrays)
The example above still copies the data (and all at once, rather than a coordinate at a time!) but hopefully shows what's going on.
Now, if you replace zip with itertools.izip and map with itertools.map then the copies needn't occur — itertools wraps the original arrays and keeps track of where it should be fetching values from internally.
There's a catch, though: to take advantage itertools you to access the data only sequentially (i.e. through iteration). In your case, it looks like the code at https://github.com/mps9506/Mann-Kendall-Trend/blob/master/mk_test.py might not be compatible with that. (I haven't reviewed the algorithm itself to see if it could be.)
Also please note that in the example I've glossed over the numpy ndarray stuff and just show flat coordinate arrays. It looks like numpy has some of it's own options for handling this instead of itertools, e.g. this answer says "Taking the transpose of an array does not make a copy". Your question was somewhat general, so I've tried to give some general tips as to ways one might deal with larger data in Python.
I ran into the same task and have managed to come up with a vectorized solution using numpy and scipy.
The formula are the same as in this page: https://vsp.pnnl.gov/help/Vsample/Design_Trend_Mann_Kendall.htm.
The trickiest part is to work out the adjustment for the tied values. I modified the code as in this answer to compute the number of tied values for each record, in a vectorized manner.
Below are the 2 functions:
import copy
import numpy as np
from scipy.stats import norm
def countTies(x):
'''Count number of ties in rows of a 2D matrix
Args:
x (ndarray): 2d matrix.
Returns:
result (ndarray): 2d matrix with same shape as <x>. In each
row, the number of ties are inserted at (not really) arbitary
locations.
The locations of tie numbers in are not important, since
they will be subsequently put into a formula of sum(t*(t-1)*(2t+5)).
Inspired by: https://stackoverflow.com/a/24892274/2005415.
'''
if np.ndim(x) != 2:
raise Exception("<x> should be 2D.")
m, n = x.shape
pad0 = np.zeros([m, 1]).astype('int')
x = copy.deepcopy(x)
x.sort(axis=1)
diff = np.diff(x, axis=1)
cated = np.concatenate([pad0, np.where(diff==0, 1, 0), pad0], axis=1)
absdiff = np.abs(np.diff(cated, axis=1))
rows, cols = np.where(absdiff==1)
rows = rows.reshape(-1, 2)[:, 0]
cols = cols.reshape(-1, 2)
counts = np.diff(cols, axis=1)+1
result = np.zeros(x.shape).astype('int')
result[rows, cols[:,1]] = counts.flatten()
return result
def MannKendallTrend2D(data, tails=2, axis=0, verbose=True):
'''Vectorized Mann-Kendall tests on 2D matrix rows/columns
Args:
data (ndarray): 2d array with shape (m, n).
Keyword Args:
tails (int): 1 for 1-tail, 2 for 2-tail test.
axis (int): 0: test trend in each column. 1: test trend in each
row.
Returns:
z (ndarray): If <axis> = 0, 1d array with length <n>, standard scores
corresponding to data in each row in <x>.
If <axis> = 1, 1d array with length <m>, standard scores
corresponding to data in each column in <x>.
p (ndarray): p-values corresponding to <z>.
'''
if np.ndim(data) != 2:
raise Exception("<data> should be 2D.")
# alway put records in rows and do M-K test on each row
if axis == 0:
data = data.T
m, n = data.shape
mask = np.triu(np.ones([n, n])).astype('int')
mask = np.repeat(mask[None,...], m, axis=0)
s = np.sign(data[:,None,:]-data[:,:,None]).astype('int')
s = (s * mask).sum(axis=(1,2))
#--------------------Count ties--------------------
counts = countTies(data)
tt = counts * (counts - 1) * (2*counts + 5)
tt = tt.sum(axis=1)
#-----------------Sample Gaussian-----------------
var = (n * (n-1) * (2*n+5) - tt) / 18.
eps = 1e-8 # avoid dividing 0
z = (s - np.sign(s)) / (np.sqrt(var) + eps)
p = norm.cdf(z)
p = np.where(p>0.5, 1-p, p)
if tails==2:
p=p*2
return z, p
I assume your data come in the layout of (time, latitude, longitude), and you are examining the temporal trend for each lat/lon cell.
To simulate this task, I synthesized a sample data array of shape (50, 145, 192). The 50 time points are taken from Example 5.9 of the book Wilks 2011, Statistical methods in the atmospheric sciences. And then I simply duplicated the same time series 27840 times to make it (50, 145, 192).
Below is the computation:
x = np.array([0.44,1.18,2.69,2.08,3.66,1.72,2.82,0.72,1.46,1.30,1.35,0.54,\
2.74,1.13,2.50,1.72,2.27,2.82,1.98,2.44,2.53,2.00,1.12,2.13,1.36,\
4.9,2.94,1.75,1.69,1.88,1.31,1.76,2.17,2.38,1.16,1.39,1.36,\
1.03,1.11,1.35,1.44,1.84,1.69,3.,1.36,6.37,4.55,0.52,0.87,1.51])
# create a big cube with shape: (T, Y, X)
arr = np.zeros([len(x), 145, 192])
for i in range(arr.shape[1]):
for j in range(arr.shape[2]):
arr[:, i, j] = x
print(arr.shape)
# re-arrange into tabular layout: (Y*X, T)
arr = np.transpose(arr, [1, 2, 0])
arr = arr.reshape(-1, len(x))
print(arr.shape)
import time
t1 = time.time()
z, p = MannKendallTrend2D(arr, tails=2, axis=1)
p = p.reshape(145, 192)
t2 = time.time()
print('time =', t2-t1)
The p-value for that sample time series is 0.63341565, which I have validated against the pymannkendall module result. Since arr contains merely duplicated copies of x, the resultant p is a 2d array of size (145, 192), with all 0.63341565.
And it took me only 1.28 seconds to compute that.
I need a list or 2-d array of integers between a minimum value and a maximum value, where the interval between the integers varies according to a distribution function inversely. In other words, at the maximum value of the distribution the density should be highest. In my case something like a Weibull probability density function with k parameter 1.5 would be nice.
Output would look something like this:
>>> min = 1
>>> max = 500
>>> peak = 100
>>> n = 18
>>> myfunc(min, max, peak, n)
[1, 50, 75, 88, 94, 97, 98, 99, 100, 102, 106, 112, 135, 176, 230, 290, 360, 500]
I already tried one method using the np.random.weibull() function to populate a numpy array but this doesn't work out nicely enough; the randomization when producing a list of 20 items means that the spacing is not satisfactory. It would be much better to avoid generating random numbers from a distribution and instead do what I described above, controlling the spacing directly.
Thank you.
Edit: I mention a Weibull distribution because it is asymmetric, but of course any similar distribution function that gives similar results is also OK and may be more suitable.
Edit2: So I want a numpy non-linear space!
Edit3: As I answered in one comment, I want to avoid random number generation so that the function output is identical each time it is run with the same input parameters.
If I'm understanding your question right, this function should do what you're asking:
def weibullspaced(min, max, k, arrsize):
wb = np.random.weibull(k, arrsize - 1)
spaced = np.zeros((arrsize,))
spaced[1:] = np.cumsum(wb)
diff = max - min
spaced *= diff / spaced[-1]
return min + np.rint(spaced)
You can of course substitute in any distribution you want, but you said you wanted Weibull. Is that the function you're looking for?
Here is a rather unelegant but simple solution to my own question. I simplified things by using a triangular distribution function. This is good because it is easy to specify a minimum and maximum value. A function named "spacing()" provides a spacing amount from the x value according to a specified mathematical function. After incrementing through a while loop I add the Max value to the list so that the full range is present. I then convert to integers while converting to a numpy array.
The downside of this method is that I must manually specify a minimum and maximum step size. I would rather specify the length of the returned array!
import numpy as np
import math
Min = 1.0
Max = 500.0
peak = 100.0
minstep = 1.0
maxstep = 50.0
def spacing(x):
# Triangle distribution:
if x < peak:
# Since we are calculating gradients I keep everything as floats for now.
grad = (minstep - maxstep)/(peak - Min)
return grad*x + maxstep
elif x == peak:
return minstep
else:
grad = (maxstep-minstep)/(Max-peak)
return grad*x + minstep
def myfunc(Min, Max, peak, minstep, maxstep):
x = 1.0
chosen = []
while x < Max:
space = spacing(x)
chosen.append(x)
x += space
chosen.append(Max)
# I cheat with the integers by casting the list to ints right at the end:
chosen = np.array(chosen, dtype = 'int')
return chosen
print myfunc(1.0, 500.0, 100.0, 1.0, 50.0)
Output:
[ 1 50 75 88 94 97 99 100 113 128 145 163 184 208 235 264 298 335 378 425 478 500]