Related
I have x,y,v arrays of data points and I am binning v on x-y plane. I am trying to get the x,y,v values back after binning but I want them as arrays corresponding to each bin. My code can get them individually but that will not work for large data sets with many bins. Maybe I need to use loops of some kind but my understanding of loops is weak. Code:
from scipy import stats
import numpy as np
x=np.array([-10,-2,4,12,3,6,8,14,3])
y=np.array([5,5,-6,8,-20,10,2,2,8])
v=np.array([4,-6,-10,40,22,-14,20,8,-10])
ret = stats.binned_statistic_2d(x,
y,
values,
'count',
bins=2,
expand_binnumbers=True)
print('counts=',ret.statistic)
print('binnumber=', ret.binnumber)
binnumber = ret.binnumber
statistic = ret.statistic
# get the bin numbers according to some condition
idx_bin_x, idx_bin_y = np.where(statistic==statistic[1][1])#[0]
print('idx_binx=',idx_bin_x)
print('idx_bin_y=',idx_bin_y)
# A binnumber of i means the corresponding value is
# between (bin_edges[i-1], bin_edges[i]).
# -> increment the bin indices by one
idx_bin_x += 1
idx_bin_y += 1
print('idx_binx+1=',idx_bin_x)
print('idx_bin_y+1=',idx_bin_y)
# get the boolean mask and apply it
is_event_x = np.in1d(binnumber[0], idx_bin_x)
print('eventx=',is_event_x)
is_event_y = np.in1d(binnumber[1], idx_bin_y)
print('eventy=',is_event_y)
is_event_xy = np.logical_and(is_event_x, is_event_y)
print('event_xy=', is_event_xy)
events_x = x[is_event_xy]
events_y = y[is_event_xy]
event_v=v[is_event_xy]
print('x=', events_x)
print('y=', events_y)
print('v=',event_v)
This outputs x,y,v for the bin with count=5 but I want all 4 bins returning 4 arrays for each x,y,v. eg for bin1: x_bin1=[...], y_bin1=[...], v_bin1=[...] and so on for 4 bins.
Also, feel free to suggest if you think there are easier ways to bin 2d planes (x,y) with values (v) like mine and getting binned values. Thank you!
Using np.array facilitates a compact way to recover the arrays you are after:
from scipy import stats
# coordinates
x = np.array([-10,-2,4,12,3,6,8,14,3])
y = np.array([5,5,-6,8,-20,10,2,2,8])
v = np.array([4,-6,-10,40,22,-14,20,8,-10])
ret = stats.binned_statistic_2d(x, y, None, 'count', bins=2, expand_binnumbers=True)
b = ret.binnumber
for i in [1,2]:
for j in [1,2]:
m = (b[0] == i) & (b[1] == j) # mask
print((list(x[m]),list(y[m]),list(v[m])))
which gives for each of the four bins a tuple of 3 lists corresponding to x, y and v values:
([], [], [])
([-10, -2], [5, 5], [4, -6])
([4, 3], [-6, -20], [-10, 22])
([12, 6, 8, 14, 3], [8, 10, 2, 2, 8], [40, -14, 20, 8, -10])
I am trying to extract first and last element from array "x" and then repeat for five times then finally concatenate to original "x".
Error: operands cannot be broadcast together with shapes (5,) (1000,)
Here is the code
import numpy as np
import random
x= np.random.uniform(0, 1, 1000)
x = [x[0]]*5 + x + [x[-1]]*5
It is not clear what you are tying to achieve. There are not really two lists in your example.
It seems like you're trying to do one of these two things:
Add the first and last element (times 5) to all elements of the array
Add the value of the previous and next elements (times 5) to each element
For the first objective, you would only be adding scalars to the array so you must not enclose the first/last items in aquare brackets:
import numpy as np
x= np.random.uniform(0, 1, 1000)
x = x[0]*5 + x + x[-1]*5
for the second objective, you could do it using assignments to a copy of the array:
y = x.copy()
y[:-1] += 5*x[1:]
y[1:] += 5*x[:-1]
[EDIT] based on the OP's comment, he needs padding of 5 on each side. the np.pad() function can do it directly:
x = np.pad(x,(5,5),mode="edge")
example:
a = np.array([7,8,9])
np.pad(a,(5,5),mode="edge")
# array([7, 7, 7, 7, 7, 7, 8, 9, 9, 9, 9, 9, 9])
I want to do something quite simple but I'm unable to find it in the depths of numpy. I want to numerically and continuously integrate a function given by its values (not by its formula!). That means I simply want an array which holds the sums of the beginning of the input array. Example:
Input:
[ 4, 3, 5, 8 ]
Output:
[ 4, 7, 12, 20 ] # [ sum(i[0:1]), sum(i[0:2]), sum(i[0:3]), sum(i[0:4]) ]
Sounds pretty straight forward, so I'm hopeful this must be easy with some numpy functionality I'm currently unable to find.
I found stuff like scipy.integrate.quad() but that seems to integrate over a given range (from a to b) and the returns a single value. I need an array as output.
You're looking for numpy.cumsum:
>>> numpy.cumsum([ 4, 3, 5, 8 ])
array([ 4, 7, 12, 20])
You would simply need numpy.cumsum().
import numpy as np
a = np.array([ 4, 3, 5, 8 ])
print np.cumsum(a) # prints [ 4 7 12 20]
You can use quadpy (pip install quadpy), a project of mine, which as opposed to scipy.integrate.quad() does vectorized compution. Provide it with many intervals, and get all the integral values over these intervals back.
import numpy
import quadpy
a = 0.0
b = 3.0
h = 1.0e-2
n = int((b-a) / h)
x0 = numpy.linspace(a, b, num=n, endpoint=False)
x1 = x0 + h
intervals = numpy.stack([x0, x1])
vals = quadpy.line_segment.integrate(
lambda x: numpy.sin(x),
intervals,
quadpy.line_segment.GaussLegendre(5)
)
res = numpy.cumsum(vals)
import matplotlib.pyplot as plt
plt.plot(x1, numpy.sin(x1), label='f')
plt.plot(x1, res, label='F')
plt.legend()
plt.show()
You don't need numpy to get the output. Using standard itertools we get the following:
from itertools import accumulate
a = [4, 3, 5, 8]
*b, = accumulate(a)
print(b)
# [4, 7, 12, 20]
I already found two solutions for the strides moving windows which can compute mean, max, min, variance, etc. Now, I look to add a count of unique value function by axis. By axis, I mean compute all 2D arrays in single pass.
len(numpy.unique(array)) can make it but a lot of iterations will be needed to compute all arrays. I may work with image as big as 2000 x 2000, so iterations are not a good option. It's all about performance and memory effectiveness.
Here is the two solutions for the strides moving windows:
First is directly taken from Erik Rigtorp's at http://www.mail-archive.com/numpy-discussion#scipy.org/msg29450.html
import numpy as np
def rolling_window_lastaxis(a, window):
if window < 1:
raise ValueError, "`window` must be at least 1."
if window > a.shape[-1]:
raise ValueError, "`window` is too long."
shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
strides = a.strides + (a.strides[-1],)
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
def rolling_window(a, window):
if not hasattr(window, '__iter__'):
return rolling_window_lastaxis(a, window)
for i, win in enumerate(window):
if win > 1:
a = a.swapaxes(i, -1)
a = rolling_window_lastaxis(a, win)
a = a.swapaxes(-2, i)
return a
filtsize = (3, 3)
a = np.zeros((10,10), dtype=np.float)
a[5:7,5] = 1
b = rolling_window(a, filtsize)
blurred = b.mean(axis=-1).mean(axis=-1)
Second is from Alex Rogozhnikov at http://gozhnikov.github.io/2015/09/30/NumpyTipsAndTricks2.html.
def compute_window_mean_and_var_strided(image, window_w, window_h):
w, h = image.shape
strided_image = np.lib.stride_tricks.as_strided(image,
shape=[w - window_w + 1, h - window_h + 1, window_w, window_h],
strides=image.strides + image.strides)
# important: trying to reshape image will create complete 4-dimensional compy
means = strided_image.mean(axis=(2,3))
mean_squares = (strided_image ** 2).mean(axis=(2, 3))
maximums = strided_image.max(axis=(2,3))
variations = mean_squares - means ** 2
return means, maximums, variations
image = np.random.random([500, 500])
compute_window_mean_and_var_strided(image, 20, 20)
Is there a way to add/implement a count of unique value function in one or both solutions?
Clarification: Basically, I need a Unique Value filter for a 2D array, just like numpy.ndarray.mean.
Thanks you
Alex
Here's one approach with scikit-image's view_as_windows for efficient sliding window extraction.
Steps involved :
Get sliding windows.
Reshape into 2D array. Note that this would make a copy and thus we would lose the efficiency of views, but keep it vectorized.
Sort along the axis of merged block axes.
Get the differentiation along that axes and count the number of different elements, which when added with 1 would be the count of unique values in each of those sliding windows and hence the final expected result.
The implementation would be like so -
from skimage.util import view_as_windows as viewW
def sliding_uniq_count(a, BSZ):
out_shp = np.asarray(a.shape) - BSZ + 1
a_slid4D = viewW(a,BSZ)
a_slid2D = np.sort(a_slid4D.reshape(-1,np.prod(BSZ)),axis=1)
return ((a_slid2D[:,1:] != a_slid2D[:,:-1]).sum(1)+1).reshape(out_shp)
Sample run -
In [233]: a = np.random.randint(0,10,(6,7))
In [234]: a
Out[234]:
array([[6, 0, 5, 7, 0, 8, 5],
[3, 0, 7, 1, 5, 4, 8],
[5, 0, 5, 1, 7, 2, 3],
[5, 1, 3, 3, 7, 4, 9],
[9, 0, 7, 4, 9, 1, 1],
[7, 0, 4, 1, 6, 3, 4]])
In [235]: sliding_uniq_count(a, [3,3])
Out[235]:
array([[5, 4, 4, 7, 7],
[5, 5, 4, 6, 7],
[6, 6, 6, 6, 6],
[7, 5, 6, 6, 6]])
Hybrid approach
To make it work with very large arrays, to accommodate everything into memory, we might have to keep one loop that would iterate along each row of the input data, like so -
def sliding_uniq_count_oneloop(a, BSZ):
S = np.prod(BSZ)
out_shp = np.asarray(a.shape) - BSZ + 1
a_slid4D = viewW(a,BSZ)
out = np.empty(out_shp,dtype=int)
for i in range(a_slid4D.shape[0]):
a_slid2D_i = np.sort(a_slid4D[i].reshape(-1,S),-1)
out[i] = (a_slid2D_i[:,1:] != a_slid2D_i[:,:-1]).sum(-1)+1
return out
Hybrid approach - Version II
Another version of hybrid one, with the explicit usage of np.lib.stride_tricks.as_strided -
def sliding_uniq_count_oneloop(a, BSZ):
S = np.prod(BSZ)
out_shp = np.asarray(a.shape) - BSZ + 1
strd = np.lib.stride_tricks.as_strided
m,n = a.strides
N = out_shp[1]
out = np.empty(out_shp,dtype=int)
for i in range(out_shp[0]):
a_slid3D = strd(a[i], shape=((N,) + tuple(BSZ)), strides=(n,m,n))
a_slid2D_i = np.sort(a_slid3D.reshape(-1,S),-1)
out[i] = (a_slid2D_i[:,1:] != a_slid2D_i[:,:-1]).sum(-1)+1
return out
np.mean operates on a given axis without making any copies. Looking at just the shape of the as_strided array it looks much bigger than the original array. But because each 'window' is a view, it doesn't take up any additional space. Reduction operators like mean work fine with that kind of view.
But note that your second example warns about reshape. That creates a copy; it replicates the values in all of those windows.
unique starts with
ar = np.asanyarray(ar).flatten()
so right off the bat is is making a reshapened copy. It's a copy, and 1d. Then it sorts elements, looks for duplicates etc.
There are ways of finding unique rows, but they require converting rows into large structured array elements. In effect turning a 2d array into a 1d that unique can work with.
I have a large array of thousands of vals in numpy. I want to decrease its size by averaging adjacent values.
For example:
a = [2,3,4,8,9,10]
#average down to 2 values here
a = [3,9]
#it averaged 2,3,4 and 8,9,10 together
So, basically, I have n number of elements in array, and I want to tell it to average down to X number of values, and it averages like above.
Is there some way to do that with numpy (already using it for other things, so I'd like to stick with it).
Using reshape and mean, you can average every m adjacent values of an 1D-array of size N*m, with N being any positive integer number. For example:
import numpy as np
m = 3
a = np.array([2, 3, 4, 8, 9, 10])
b = a.reshape(-1, m).mean(axis=1)
#array([3., 9.])
1)a.reshape(-1, m) will create a 2D image of the array without copying data:
array([[ 2, 3, 4],
[ 8, 9, 10]])
2)taking the mean in the second axis (axis=1) will then calculate the mean value of each row, resulting in:
array([3., 9.])
Try this:
n_averaged_elements = 3
averaged_array = []
a = np.array([ 2, 3, 4, 8, 9, 10])
for i in range(0, len(a), n_averaged_elements):
slice_from_index = i
slice_to_index = slice_from_index + n_averaged_elements
averaged_array.append(np.mean(a[slice_from_index:slice_to_index]))
>>>> averaged_array
>>>> [3.0, 9.0]
Looks like a simple non-overlapping moving window average to me, how about:
In [3]:
import numpy as np
a = np.array([2,3,4,8,9,10])
window_sz = 3
a[:len(a)/window_sz*window_sz].reshape(-1,window_sz).mean(1)
#you want to be sure your array can be reshaped properly, so the [:len(a)/window_sz*window_sz] part
Out[3]:
array([ 3., 9.])
In this example, I presume that a is the 1D numpy array that needs to be averaged. In the method that I give below, we first find the factors of the length of this array a. And, then we choose the an appropriate factor as the step size to average the array with.
Here is the code.
import numpy as np
from functools import reduce
''' Function to find factors of a given number 'n' '''
def factors(n):
return list(set(reduce(list.__add__,
([i, n//i] for i in range(1, int(n**0.5) + 1) if n % i == 0))))
a = [2,3,4,8,9,10] #Given array.
'''fac: list of factors of length of a.
In this example, len(a) = 6. So, fac = [1, 2, 3, 6] '''
fac = factors(len(a))
'''step: choose an appropriate step size from the list 'fac'.
In this example, we choose one of the middle numbers in fac
(3). '''
step = fac[int( len(fac)/3 )+1]
'''avg: initialize an empty array. '''
avg = np.array([])
for i in range(0, len(a), step):
avg = np.append( avg, np.mean(a[i:i+step]) ) #append averaged values to `avg`
print avg #Prints the final result
[3.0, 9.0]