How can I get an element-wise count of each element's number of occurrences in a numpy array, along a given axis? By "element-wise," I mean each value of the array should be converted to the number of times it appears.
Simple 2D input:
[[1, 1, 1],
[2, 2, 2],
[3, 4, 5]]
Should output:
[[3, 3, 3],
[3, 3, 3],
[1, 1, 1]]
The solution also needs to work relative to a given axis. For example, if my input array a has shape (4, 2, 3, 3), which I think of as "a 4x2 matrix of 3x3 matrices," running solution(a) should spit out a (4, 2, 3, 3) solution of the form above, where each 3x3 "submatrix" contains counts of the corresponding elements relative to that submatrix alone, rather than the entire numpy array at large.
More complex example: suppose I take the example input above a and call skimage.util.shape.view_as_windows(a, (2, 2)). This gives me array b of shape (2, 2, 2, 2):
[[[[1 1]
[2 2]]
[[1 1]
[2 2]]]
[[[2 2]
[3 4]]
[[2 2]
[4 5]]]]
Then solution(b) should output:
[[[[2 2]
[2 2]]
[[2 2]
[2 2]]]
[[[2 2]
[1 1]]
[[2 2]
[1 1]]]]
So even though the value 1 occurs 3 times in a and 4 times in b, it only occurs twice in each 2x2 window.
Starting off approach
We can use np.unique to get the counts of occurrences and also tag each element from 0 onwards, letting us index into those counts with the tags for the desired output, like so -
In [43]: a
Out[43]:
array([[1, 1, 1],
[2, 2, 2],
[3, 4, 5]])
In [44]: _,ids,c = np.unique(a, return_counts=True, return_inverse=True)
In [45]: c[ids].reshape(a.shape)
Out[45]:
array([[3, 3, 3],
[3, 3, 3],
[1, 1, 1]])
For positive integers numbers in input array, we can also use np.bincount -
In [73]: c = np.bincount(a.ravel())
In [74]: c[a]
Out[74]:
array([[3, 3, 3],
[3, 3, 3],
[1, 1, 1]])
For negative integers numbers, simply offset by the minimum in it.
Extending to generic n-dims
Let's use bincount for this -
In [107]: ar
Out[107]:
array([[[1, 1, 1],
[2, 2, 2],
[3, 4, 5]],
[[2, 3, 5],
[4, 3, 4],
[3, 1, 2]]])
In [104]: ar2D = ar.reshape(-1,ar.shape[-2]*ar.shape[-1])
# bincount2D_vectorized from https://stackoverflow.com/a/46256361/ #Divakar
In [105]: c = bincount2D_vectorized(ar2D)
In [106]: c[np.arange(ar2D.shape[0])[:,None], ar2D].reshape(ar.shape)
Out[106]:
array([[[3, 3, 3],
[3, 3, 3],
[1, 1, 1]],
[[2, 3, 1],
[2, 3, 2],
[3, 1, 2]]])
Related
I have the following NumPy matrix:
m = np.array([[1, 2, 3, 4],
[10, 5, 3, 4],
[12, 8, 1, 2],
[7, 0, 2, 4]])
Now, I need the indices of N (say, N=2) lowest values of each row in this matrix . So with the example above, I expect the following output:
[[0, 1],
[2, 3],
[3, 2],
[1, 2]]
where the rows of the output matrix correspond to the respective rows of the original, and the elements of the rows of the output matrix are the indices of the N lowest values in the corresponding original rows (preferably in ascending order by values in the original matrix). How could I do it in NumPy?
You could either use a simple loop-approach (not recommended) or you use np.argpartition:
In [13]: np.argpartition(m, 2)[:, :2]
Out[13]:
array([[0, 1],
[2, 3],
[2, 3],
[1, 2]])
You could use np.argsort on your array and then slice the array with the amount of N lowest/highest values.
np.argsort(m, axis=1)[:, :2]
array([[0, 1],
[2, 3],
[2, 3],
[1, 2]], dtype=int64)
Try this;
import numpy as np
m = np.array([[1, 2, 3, 4],
[10, 5, 3, 4],
[12, 8, 1, 2],
[7, 0, 2, 4]])
for arr in m:
print(arr.argsort()[:2])
I have a 2D numpy array
import numpy as np
np.array([[1, 2], [3, 4]])
[[1 2]
[3 4]]
I would like the above array to be enlarged to the following:
np.array([[1, 1, 2, 2], [1, 1, 2, 2], [3, 3, 4, 4], [3, 3, 4, 4]])
[[1 1 2 2]
[1 1 2 2]
[3 3 4 4]
[3 3 4 4]]
Each element in the original array because a 2x2 array in the new array.
I do I go from the first array to the second 'enlarged' array?
Edit
This is different from the question about scaling here How to "scale" a numpy array? because np.kron(a, np.ones((2,2))) is not the same as a.repeat(2, axis=1).repeat(2, axis=0)
Edit #2 still waiting to get the [duplicate] flag removed
As posted by #Michael Szczesny. The numpy API has seen a slight change. This question is relevant as updated answers are being provide.
You can use np.repeat twice, with axis=1 and axis=0:
out = a.repeat(2, axis=1).repeat(2, axis=0)
Output:
>>> out
array([[1, 1, 2, 2],
[1, 1, 2, 2],
[3, 3, 4, 4],
[3, 3, 4, 4]])
How can I add a column containing only "1" to the beginning of a second numpy array.
X = np.array([1, 2], [3, 4], [5, 6])
I want to have X become
[[1,1,2], [1,3,4],[1,5,6]]
You can use the np.insert
new_x = np.insert(x, 0, 1, axis=1)
You can use the np.append method to add your array at the right of a column of 1 values
x = np.array([[1, 2], [3, 4], [5, 6]])
ones = np.array([[1]] * len(x))
new_x = np.append(ones, x, axis=1)
Both will give you the expected result
[[1 1 2]
[1 3 4]
[1 5 6]]
Try this:
>>> X = np.array([[1, 2], [3, 4], [5, 6]])
>>> X
array([[1, 2],
[3, 4],
[5, 6]])
>>> np.insert(X, 0, 1, axis=1)
array([[1, 1, 2],
[1, 3, 4],
[1, 5, 6]])
Since a new array is going to be created in any event, it is just sometimes easier to do so from the beginning. Since you want a column of 1's at the beginning, then you can use builtin functions and the input arrays existing structure and dtype.
a = np.arange(6).reshape(3,2) # input array
z = np.ones((a.shape[0], 3), dtype=a.dtype) # use the row shape and your desired columns
z[:, 1:] = a # place the old array into the new array
z
array([[1, 0, 1],
[1, 2, 3],
[1, 4, 5]])
numpy.insert() will do the trick.
X = np.array([[1, 2], [3, 4], [5, 6]])
np.insert(X,0,[1,2,3],axis=1)
The Output will be:
array([[1, 1, 2],
[2, 3, 4],
[3, 5, 6]])
Note that the second argument is the index before which you want to insert. And the axis = 1 indicates that you want to insert as a column without flattening the array.
For reference:
numpy.insert()
a = np.zeros((5,4,3))
v = np.ones((5, 4), dtype=int)
data = a[v]
shp = data.shape
This code gives shp==(5,4,4,3)
I don't understand why. How can a larger array be output? makes no sense to me and would love an explanation.
This is known as advanced indexing. Advanced indexing allows you to select arbitrary elements in the input array based on an N-dimensional index.
Let's use another example to make it clearer:
a = np.random.randint(1, 5, (5,4,3))
v = np.ones((5, 4), dtype=int)
Say in this case a is:
array([[[2, 1, 1],
[3, 4, 4],
[4, 3, 2],
[2, 2, 2]],
[[4, 4, 1],
[3, 3, 4],
[3, 4, 2],
[1, 3, 1]],
[[3, 1, 3],
[4, 3, 1],
[2, 1, 4],
[1, 2, 2]],
...
By indexing with an array of np.ones:
print(v)
array([[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1]])
You will simply be indexing a with 1 along the first axis as many times as v. Putting it in another way, when you do:
a[1]
[[4, 4, 1],
[3, 3, 4],
[3, 4, 2],
[1, 3, 1]]
You're indexing along the first axis, as no indexing is specified along the additional axes. It is the same as doing a[1, ...], i.e taking a full slice along the remaining axes. Hence by indexing with a 2D array of ones, you will have the above 2D array (5, 4) times stacked together, resulting in an ndarray of shape (5, 4, 4, 3). Or in other words, a[1], of shape (4,3), stacked 5*4=20 times.
Hence, in this case you'd be getting:
array([[[[4, 4, 1],
[3, 3, 4],
[3, 4, 2],
[1, 3, 1]],
[[4, 4, 1],
[3, 3, 4],
[3, 4, 2],
[1, 3, 1]],
...
the value of v is:
[[1 1 1 1]
[1 1 1 1]
[1 1 1 1]
[1 1 1 1]
[1 1 1 1]]
every single 1 indexes a complete "row" in a, but every "element" in said "row" is a matrix. so every "row" in v indexes a "row" of "matrix"es in a.
(does this make any sense to you..?)
so you get 5 * 4 1s, each is a 4*3 "matrix".
if instead of zeroes you define a as a = np.arange(5*4*3).reshape((5, 4, 3))
it might be easier to understand, because you get to see which parts of a are being chosen:
import numpy as np
a = np.arange(5*4*3).reshape((5, 4, 3))
v = np.ones((5,4), dtype=int)
data = a[v]
print(data)
(output is pretty long, I don't want to paste it here)
I have a large n x 2 numpy array that is formatted as (x, y) coordinates. I would like to filter this array so as to:
Identify coordinate pairs with duplicated x-values.
Keep only the coordinate pair of those duplicates with the highest y-value.
For example, in the following array:
arr = [[1, 4]
[1, 8]
[2, 3]
[4, 6]
[4, 2]
[5, 1]
[5, 2]
[5, 6]]
I would like the result to be:
arr = [[1, 8]
[2, 3]
[4, 6]
[5, 6]]
Ive explored np.unique and np.where but cannot figure out how to leverage them to solve this problem. Thanks so much!
Here's one way based on np.maximum.reduceat -
def grouby_maxY(a):
b = a[a[:,0].argsort()] # if first col is already sorted, skip this
grp_idx = np.flatnonzero(np.r_[True,(b[:-1,0] != b[1:,0])])
grp_maxY = np.maximum.reduceat(b[:,1], grp_idx)
return np.c_[b[grp_idx,0], grp_maxY]
Alternatively, if you want to bring np.unique, we can use it to find grp_idx with np.unique(b[:,0], return_index=1)[1].
Sample run -
In [453]: np.random.seed(0)
In [454]: arr = np.random.randint(0,5,(10,2))
In [455]: arr
Out[455]:
array([[4, 0],
[3, 3],
[3, 1],
[3, 2],
[4, 0],
[0, 4],
[2, 1],
[0, 1],
[1, 0],
[1, 4]])
In [456]: grouby_maxY(arr)
Out[456]:
array([[0, 4],
[1, 4],
[2, 1],
[3, 3],
[4, 0]])