Related
Note that this question is not about multiple conditions within a single np.where(), see this thread for that.
I have a numpy array arr1 with some numbers (without a particular structure):
arr0 = \
np.array([[0,3,0],
[1,3,2],
[1,2,0]])
and a list of all the entries in this array:
entries = [0,1,2,3]
I also have another array, arr1:
arr1 = \
np.array([[4,5,6],
[6,2,4],
[3,7,9]])
I would like to perform some function on multiple subsets of elements of arr1. A subset consts of numbers which are at the same position as arr0 entries with a cetrain value. Let this function be finding the max value. Performing the function on each subset via a list comprehension:
res = [np.where(arr0==index,arr1,0).max() for index in entries]
res is [9, 6, 7, 5]
As expected: 0 in arr0 is on the top left, top right, bottom right corner, and the biggest number from the top left, top right, bottom right entries of arr1 (ie 4, 6, 9) is 9. Rest follow with a similar logic.
How can I achieve this without iteration?
My actual arrays are much bigger than these examples.
With broadcasting
res = np.where(arr0[...,None] == entries, arr1[...,None], 0).max(axis=(0, 1))
The result of np.where(...) is a (3, 3, 4) array, where slicing [...,0] would give you the same 3x3 array you get by manually doing the np.where with just entries[0], etc. Then taking the max of each 3x3 subarray leaves you with the desired result.
Timings
Apparently this method doesn't scale well for bigger arrays. The other answer using np.unique is more efficient because it reduces the maximum operation down to a few unique value regardless of how big the original arrays are.
import timeit
import matplotlib.pyplot as plt
import numpy as np
def loops():
return [np.where(arr0==index,arr1,0).max() for index in entries]
def broadcast():
return np.where(arr0[...,None] == entries, arr1[...,None], 0).max(axis=(0, 1))
def numpy_1d():
arr0_1D = arr0.ravel()
arr1_1D = arr1.ravel()
arg_idx = np.argsort(arr0_1D)
u, idx = np.unique(arr0_1D[arg_idx], return_index=True)
return np.maximum.reduceat(arr1_1D[arg_idx], idx)
sizes = (3, 10, 25, 50, 100, 250, 500, 1000)
lengths = (4, 10, 25, 50, 100)
methods = (loops, broadcast, numpy_1d)
fig, ax = plt.subplots(len(lengths), sharex=True)
for i, M in enumerate(lengths):
entries = np.arange(M)
times = [[] for _ in range(len(methods))]
for N in sizes:
arr0 = np.random.randint(1000, size=(N, N))
arr1 = np.random.randint(1000, size=(N, N))
for j, method in enumerate(methods):
times[j].append(np.mean(timeit.repeat(method, number=1, repeat=10)))
for t in times:
ax[i].plot(sizes, t)
ax[i].legend(['loops', 'broadcasting', 'numpy_1d'])
ax[i].set_title(f'Entries size {M}')
plt.xticks(sizes)
fig.text(0.5, 0.04, 'Array size (NxN)', ha='center')
fig.text(0.04, 0.5, 'Time (s)', va='center', rotation='vertical')
plt.show()
It's more convenient to work in 1D case. You need to sort your arr0 then find starting indices for every group and use np.maximum.reduceat.
arr0_1D = np.array([[0,3,0],[1,3,2],[1,2,0]]).ravel()
arr1_1D = np.array([[4,5,6],[6,2,4],[3,7,9]]).ravel()
arg_idx = np.argsort(arr0_1D)
>>> arr0_1D[arg_idx]
array([0, 0, 0, 1, 1, 2, 2, 3, 3])
u, idx = np.unique(arr0_1D[arg_idx], return_index=True)
>>> idx
array([0, 3, 5, 7], dtype=int64)
>>> np.maximum.reduceat(arr1_1D[arg_idx], idx)
array([9, 6, 7, 5], dtype=int32)
I have a three dimensional numpy source array and a two-dimensional numpy array of indexes.
For example:
src = np.array([[[1,2,3],[4,5,6]],
[[7,8,9],[10,11,12]]])
idx = np.array([[0,1],
[1,2]])
I'd like to get a 2d array, where each element represents the indexed value in the innermost dimension in that position:
array([[1,5],
[8,12]])
How do I do this with numpy?
You can try np.take, here is the documentation.
However, you should count the index of the array after flattening all the elements. For example you should use
src = np.array([[[1,2,3],[4,5,6]],
[[7,8,9],[10,11,12]]])
idx = np.array([[0,4],
[7,11]])
# Wanted result
res = np.take(src, idx)
where src was regarded as [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
You can also try np.take_along_axis, here is the documentation.
Using this method need your src and idx in same dimension, therefore, you should first unsqueezed the src and squeeze the res.
# Unsqueezed the last dim
idx = np.expand_dims(idx, axis=-1)
# Squeeze the last dim
res = np.take_along_axis(src, idx, axis=2).squeeze(-1)
You can use the np.choose method with a little reshaping:
np.choose(idx.reshape((1, 2, 2)), src.transpose()).reshape((2, 2))
>>>> array([[ 1, 8],
[ 5, 12]])
Direct indexing:
src[np.arange(2)[:, None], np.arange(2), idx]
I have an array of data-points, for example:
[10, 9, 8, 7, 6, 5, 4, 3, 2, 1]
and I need to perform the following sum on the values:
However, the problem is that I need to perform this sum on each value > i. For example, using the last 3 values in the set the sum would be:
and so on up to 10.
If i run something like:
import numpy as np
x = np.array([10, 9, 8, 7, 6, 5, 4, 3, 2, 1])
alpha = 1/np.log(2)
for i in x:
y = sum(x**(alpha)*np.log(x))
print (y)
It returns a single value of y = 247.7827060452275, whereas I need an array of values. I think I need to reverse the order of the data to achieve what I want but I'm having trouble visualising the problem (hope I explained it properly) as a whole so any suggestions would be much appreciated.
The following computes all the partial sums of the grand sum in your formula
import numpy as np
# Generate numpy array [1, 10]
x = np.arange(1, 11)
alpha = 1 / np.log(2)
# Compute parts of the sum
parts = x ** alpha * np.log(x)
# Compute all partial sums
part_sums = np.cumsum(parts)
print(part_sums)
You really do not any explicit loop, or a non-numpy operation (like sum()) here. numpy takes care of all your needs.
I already found two solutions for the strides moving windows which can compute mean, max, min, variance, etc. Now, I look to add a count of unique value function by axis. By axis, I mean compute all 2D arrays in single pass.
len(numpy.unique(array)) can make it but a lot of iterations will be needed to compute all arrays. I may work with image as big as 2000 x 2000, so iterations are not a good option. It's all about performance and memory effectiveness.
Here is the two solutions for the strides moving windows:
First is directly taken from Erik Rigtorp's at http://www.mail-archive.com/numpy-discussion#scipy.org/msg29450.html
import numpy as np
def rolling_window_lastaxis(a, window):
if window < 1:
raise ValueError, "`window` must be at least 1."
if window > a.shape[-1]:
raise ValueError, "`window` is too long."
shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
strides = a.strides + (a.strides[-1],)
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
def rolling_window(a, window):
if not hasattr(window, '__iter__'):
return rolling_window_lastaxis(a, window)
for i, win in enumerate(window):
if win > 1:
a = a.swapaxes(i, -1)
a = rolling_window_lastaxis(a, win)
a = a.swapaxes(-2, i)
return a
filtsize = (3, 3)
a = np.zeros((10,10), dtype=np.float)
a[5:7,5] = 1
b = rolling_window(a, filtsize)
blurred = b.mean(axis=-1).mean(axis=-1)
Second is from Alex Rogozhnikov at http://gozhnikov.github.io/2015/09/30/NumpyTipsAndTricks2.html.
def compute_window_mean_and_var_strided(image, window_w, window_h):
w, h = image.shape
strided_image = np.lib.stride_tricks.as_strided(image,
shape=[w - window_w + 1, h - window_h + 1, window_w, window_h],
strides=image.strides + image.strides)
# important: trying to reshape image will create complete 4-dimensional compy
means = strided_image.mean(axis=(2,3))
mean_squares = (strided_image ** 2).mean(axis=(2, 3))
maximums = strided_image.max(axis=(2,3))
variations = mean_squares - means ** 2
return means, maximums, variations
image = np.random.random([500, 500])
compute_window_mean_and_var_strided(image, 20, 20)
Is there a way to add/implement a count of unique value function in one or both solutions?
Clarification: Basically, I need a Unique Value filter for a 2D array, just like numpy.ndarray.mean.
Thanks you
Alex
Here's one approach with scikit-image's view_as_windows for efficient sliding window extraction.
Steps involved :
Get sliding windows.
Reshape into 2D array. Note that this would make a copy and thus we would lose the efficiency of views, but keep it vectorized.
Sort along the axis of merged block axes.
Get the differentiation along that axes and count the number of different elements, which when added with 1 would be the count of unique values in each of those sliding windows and hence the final expected result.
The implementation would be like so -
from skimage.util import view_as_windows as viewW
def sliding_uniq_count(a, BSZ):
out_shp = np.asarray(a.shape) - BSZ + 1
a_slid4D = viewW(a,BSZ)
a_slid2D = np.sort(a_slid4D.reshape(-1,np.prod(BSZ)),axis=1)
return ((a_slid2D[:,1:] != a_slid2D[:,:-1]).sum(1)+1).reshape(out_shp)
Sample run -
In [233]: a = np.random.randint(0,10,(6,7))
In [234]: a
Out[234]:
array([[6, 0, 5, 7, 0, 8, 5],
[3, 0, 7, 1, 5, 4, 8],
[5, 0, 5, 1, 7, 2, 3],
[5, 1, 3, 3, 7, 4, 9],
[9, 0, 7, 4, 9, 1, 1],
[7, 0, 4, 1, 6, 3, 4]])
In [235]: sliding_uniq_count(a, [3,3])
Out[235]:
array([[5, 4, 4, 7, 7],
[5, 5, 4, 6, 7],
[6, 6, 6, 6, 6],
[7, 5, 6, 6, 6]])
Hybrid approach
To make it work with very large arrays, to accommodate everything into memory, we might have to keep one loop that would iterate along each row of the input data, like so -
def sliding_uniq_count_oneloop(a, BSZ):
S = np.prod(BSZ)
out_shp = np.asarray(a.shape) - BSZ + 1
a_slid4D = viewW(a,BSZ)
out = np.empty(out_shp,dtype=int)
for i in range(a_slid4D.shape[0]):
a_slid2D_i = np.sort(a_slid4D[i].reshape(-1,S),-1)
out[i] = (a_slid2D_i[:,1:] != a_slid2D_i[:,:-1]).sum(-1)+1
return out
Hybrid approach - Version II
Another version of hybrid one, with the explicit usage of np.lib.stride_tricks.as_strided -
def sliding_uniq_count_oneloop(a, BSZ):
S = np.prod(BSZ)
out_shp = np.asarray(a.shape) - BSZ + 1
strd = np.lib.stride_tricks.as_strided
m,n = a.strides
N = out_shp[1]
out = np.empty(out_shp,dtype=int)
for i in range(out_shp[0]):
a_slid3D = strd(a[i], shape=((N,) + tuple(BSZ)), strides=(n,m,n))
a_slid2D_i = np.sort(a_slid3D.reshape(-1,S),-1)
out[i] = (a_slid2D_i[:,1:] != a_slid2D_i[:,:-1]).sum(-1)+1
return out
np.mean operates on a given axis without making any copies. Looking at just the shape of the as_strided array it looks much bigger than the original array. But because each 'window' is a view, it doesn't take up any additional space. Reduction operators like mean work fine with that kind of view.
But note that your second example warns about reshape. That creates a copy; it replicates the values in all of those windows.
unique starts with
ar = np.asanyarray(ar).flatten()
so right off the bat is is making a reshapened copy. It's a copy, and 1d. Then it sorts elements, looks for duplicates etc.
There are ways of finding unique rows, but they require converting rows into large structured array elements. In effect turning a 2d array into a 1d that unique can work with.
I have a large array of thousands of vals in numpy. I want to decrease its size by averaging adjacent values.
For example:
a = [2,3,4,8,9,10]
#average down to 2 values here
a = [3,9]
#it averaged 2,3,4 and 8,9,10 together
So, basically, I have n number of elements in array, and I want to tell it to average down to X number of values, and it averages like above.
Is there some way to do that with numpy (already using it for other things, so I'd like to stick with it).
Using reshape and mean, you can average every m adjacent values of an 1D-array of size N*m, with N being any positive integer number. For example:
import numpy as np
m = 3
a = np.array([2, 3, 4, 8, 9, 10])
b = a.reshape(-1, m).mean(axis=1)
#array([3., 9.])
1)a.reshape(-1, m) will create a 2D image of the array without copying data:
array([[ 2, 3, 4],
[ 8, 9, 10]])
2)taking the mean in the second axis (axis=1) will then calculate the mean value of each row, resulting in:
array([3., 9.])
Try this:
n_averaged_elements = 3
averaged_array = []
a = np.array([ 2, 3, 4, 8, 9, 10])
for i in range(0, len(a), n_averaged_elements):
slice_from_index = i
slice_to_index = slice_from_index + n_averaged_elements
averaged_array.append(np.mean(a[slice_from_index:slice_to_index]))
>>>> averaged_array
>>>> [3.0, 9.0]
Looks like a simple non-overlapping moving window average to me, how about:
In [3]:
import numpy as np
a = np.array([2,3,4,8,9,10])
window_sz = 3
a[:len(a)/window_sz*window_sz].reshape(-1,window_sz).mean(1)
#you want to be sure your array can be reshaped properly, so the [:len(a)/window_sz*window_sz] part
Out[3]:
array([ 3., 9.])
In this example, I presume that a is the 1D numpy array that needs to be averaged. In the method that I give below, we first find the factors of the length of this array a. And, then we choose the an appropriate factor as the step size to average the array with.
Here is the code.
import numpy as np
from functools import reduce
''' Function to find factors of a given number 'n' '''
def factors(n):
return list(set(reduce(list.__add__,
([i, n//i] for i in range(1, int(n**0.5) + 1) if n % i == 0))))
a = [2,3,4,8,9,10] #Given array.
'''fac: list of factors of length of a.
In this example, len(a) = 6. So, fac = [1, 2, 3, 6] '''
fac = factors(len(a))
'''step: choose an appropriate step size from the list 'fac'.
In this example, we choose one of the middle numbers in fac
(3). '''
step = fac[int( len(fac)/3 )+1]
'''avg: initialize an empty array. '''
avg = np.array([])
for i in range(0, len(a), step):
avg = np.append( avg, np.mean(a[i:i+step]) ) #append averaged values to `avg`
print avg #Prints the final result
[3.0, 9.0]