Find equal value indices in numpy array [duplicate] - python

I have a 2D Numpy array containing values from 0 to n.
I want to get a list of length n, such that the i'th element of that list is an array of all the indices with value i+1 (0 is excluded).
For example, for the input
array([[1, 0, 1],
[2, 2, 0]])
I'm expecting to get
[array([[0, 0], [0, 2]]), array([[1,0], [1,1]])]
I found this related question:
Get a list of all indices of repeated elements in a numpy array
which may be helpful, but I hoped to find a more direct solution that doesn't require flattening and sorting the array and that is as efficient as possible.

Here's a vectorized approach, which works for arrays of an arbitrary amount of dimensions. The idea of this solution is to extend the functionality of the return_index method in np.unique, and return an array of arrays, each containing the N-dimensional indices of unique values in a numpy array.
For a more compact solution, I've defined the following function along with some explanations throughout the different steps:
def ndix_unique(x):
"""
Returns an N-dimensional array of indices
of the unique values in x
----------
x: np.array
Array with arbitrary dimensions
Returns
-------
- 1D-array of sorted unique values
- Array of arrays. Each array contains the indices where a
given value in x is found
"""
x_flat = x.ravel()
ix_flat = np.argsort(x_flat)
u, ix_u = np.unique(x_flat[ix_flat], return_index=True)
ix_ndim = np.unravel_index(ix_flat, x.shape)
ix_ndim = np.c_[ix_ndim] if x.ndim > 1 else ix_flat
return u, np.split(ix_ndim, ix_u[1:])
Checking with the array from the question -
a = np.array([[1, 0, 1],[2, 2, 0]])
vals, ixs = ndix_unique(a)
print(vals)
array([0, 1, 2])
print(ixs)
[array([[0, 1],
[1, 2]]),
array([[0, 0],
[0, 2]]),
array([[1, 0],
[1, 1]])]
Lets try with this other case:
a = np.array([[1,1,4],[2,2,1],[3,3,1]])
vals, ixs = ndix_unique(a)
print(vals)
array([1, 2, 3, 4])
print(ixs)
array([array([[0, 0],
[0, 1],
[1, 2],
[2, 2]]),
array([[1, 0],
[1, 1]]),
array([[2, 0],
[2, 1]]),
array([[0, 2]])], dtype=object)
For a 1D array:
a = np.array([1,5,4,3,3])
vals, ixs = ndix_unique(a)
print(vals)
array([1, 3, 4, 5])
print(ixs)
array([array([0]), array([3, 4]), array([2]), array([1])], dtype=object)
Finally another example with a 3D ndarray:
a = np.array([[[1,1,2]],[[2,3,4]]])
vals, ixs = ndix_unique(a)
print(vals)
array([1, 2, 3, 4])
print(ixs)
array([array([[0, 0, 0],
[0, 0, 1]]),
array([[0, 0, 2],
[1, 0, 0]]),
array([[1, 0, 1]]),
array([[1, 0, 2]])], dtype=object)

You can first get non-zero elements in your array and then use argwhere in a list comprehension to get separate array for each non-zero element. Here np.unique(arr[arr!=0]) will give you the nonzero elements over which you can iterate to get the indices.
arr = np.array([[1, 0, 1],
[2, 2, 0]])
indices = [np.argwhere(arr==i) for i in np.unique(arr[arr!=0])]
# [array([[0, 0],
# [0, 2]]), array([[1, 0],
# [1, 1]])]

Related

Sort array based on value and create new array

Imagine a two dimensional array:
a = np.array([[1,1],[1, 0],[0, 0],[0, 0],[0, 0],[1, 1],[1, 1],[0, 1]])
I want to sort the array based on its first value like:
[[1,1],[1, 1],[1, 1],[1, 0],[0, 1],[0, 0],[0, 0],[0, 0]]
If I am simply going with a .sort() like:
a[::-1].sort(axis=0)
and the returned array looks like:
array([[1, 1],
[1, 1],
[1, **1**],
[**1**, 1],
[0, 0],
[0, 0],
[0, 0],
[0, 0]])
As you can see the bold 1 used to be a zero. Why is the function flipping around my numbers? I searched the internet and haven't found any answers.
The problem is that numpy sort when you pass axis=0 is sorting each column independently (see examples on doc page). If you want to sort rows, then you can use sorted instead:
np.array(sorted(a, key=lambda x: x.tolist(), reverse=True))
In your case the result is
[[1 1]
[1 1]
[1 1]
[1 0]
[0 1]
[0 0]
[0 0]
[0 0]]
the sort you are doing is a sort of all columns in the array independently of each other,
from the first example on this page https://numpy.org/doc/stable/reference/generated/numpy.sort.html
>>> a = np.array([[1,4],[3,1]])
>>> np.sort(a) # sort along the last axis
array([[1, 4],
[1, 3]])
>>> np.sort(a, axis=None) # sort the flattened array
array([1, 1, 3, 4])
>>> np.sort(a, axis=0) # sort along the first axis
array([[1, 1],
[3, 4]])
also see this answer to sort the rows based on a single column: Sorting arrays in NumPy by column
You can use np.lexsort, and pass the two columns independently, then reverse the order. lexsort returns the sorted indices, given the key. You need to put first column second, because the primary key in lexsort is the last column:
>>> a[np.lexsort((a[:,1], a[:,0]))][::-1]
array([[1, 1],
[1, 1],
[1, 1],
[1, 0],
[0, 1],
[0, 0],
[0, 0],
[0, 0]])
Here is the output of np.lexsort on your data:
key1 key2 (primary)
>>> np.lexsort((a[:,1], a[:,0]))
array([2, 3, 4, 7, 1, 0, 5, 6], dtype=int64)

Merge three numpy arrays, keep largest value

I want to merge three numpy arrays, for example:
a = np.array([[0,0,1],[0,1,0],[1,0,0]])
b = np.array([[1,0,0],[0,1,0],[0,0,1]])
c = np.array([[0,1,0],[0,2,0],[0,1,0]])
a = array([[0, 0, 1],
[0, 1, 0],
[1, 0, 0]])
b = array([[1, 0, 0],
[0, 1, 0],
[0, 0, 1]])
c = array([[0, 1, 0],
[0, 2, 0],
[0, 1, 0]])
Desired result would be to overlay them but keep the largest value where multiple elements are not 0, like in the middle.
array([[1, 1, 1],
[0, 2, 0],
[1, 1, 1]])
I solved this by iterating over all elements with multiple if-conditions. Is there a more compact and more beautiful way to do this?
You can try of stacking arrays together in extra dimension with Numpy np.dstack method
and extract the maximum value specific to added dimension
# Stacking arrays together
d = np.dstack([a,b,c])
d.max(axis=2)
Out:
array([[1, 1, 1],
[0, 2, 0],
[1, 1, 1]])
NumPy's np.ufunc.reduce allows to apply a function cumulatively along a given axis. We can just concatenate the arrays and reduce with numpy.maximum to keep the accumulated elementwise maximum:
np.maximum.reduce([a,b,c])
array([[1, 1, 1],
[0, 2, 0],
[1, 1, 1]])

Indices of unique values in n-dimensional array

I have a 2D Numpy array containing values from 0 to n.
I want to get a list of length n, such that the i'th element of that list is an array of all the indices with value i+1 (0 is excluded).
For example, for the input
array([[1, 0, 1],
[2, 2, 0]])
I'm expecting to get
[array([[0, 0], [0, 2]]), array([[1,0], [1,1]])]
I found this related question:
Get a list of all indices of repeated elements in a numpy array
which may be helpful, but I hoped to find a more direct solution that doesn't require flattening and sorting the array and that is as efficient as possible.
Here's a vectorized approach, which works for arrays of an arbitrary amount of dimensions. The idea of this solution is to extend the functionality of the return_index method in np.unique, and return an array of arrays, each containing the N-dimensional indices of unique values in a numpy array.
For a more compact solution, I've defined the following function along with some explanations throughout the different steps:
def ndix_unique(x):
"""
Returns an N-dimensional array of indices
of the unique values in x
----------
x: np.array
Array with arbitrary dimensions
Returns
-------
- 1D-array of sorted unique values
- Array of arrays. Each array contains the indices where a
given value in x is found
"""
x_flat = x.ravel()
ix_flat = np.argsort(x_flat)
u, ix_u = np.unique(x_flat[ix_flat], return_index=True)
ix_ndim = np.unravel_index(ix_flat, x.shape)
ix_ndim = np.c_[ix_ndim] if x.ndim > 1 else ix_flat
return u, np.split(ix_ndim, ix_u[1:])
Checking with the array from the question -
a = np.array([[1, 0, 1],[2, 2, 0]])
vals, ixs = ndix_unique(a)
print(vals)
array([0, 1, 2])
print(ixs)
[array([[0, 1],
[1, 2]]),
array([[0, 0],
[0, 2]]),
array([[1, 0],
[1, 1]])]
Lets try with this other case:
a = np.array([[1,1,4],[2,2,1],[3,3,1]])
vals, ixs = ndix_unique(a)
print(vals)
array([1, 2, 3, 4])
print(ixs)
array([array([[0, 0],
[0, 1],
[1, 2],
[2, 2]]),
array([[1, 0],
[1, 1]]),
array([[2, 0],
[2, 1]]),
array([[0, 2]])], dtype=object)
For a 1D array:
a = np.array([1,5,4,3,3])
vals, ixs = ndix_unique(a)
print(vals)
array([1, 3, 4, 5])
print(ixs)
array([array([0]), array([3, 4]), array([2]), array([1])], dtype=object)
Finally another example with a 3D ndarray:
a = np.array([[[1,1,2]],[[2,3,4]]])
vals, ixs = ndix_unique(a)
print(vals)
array([1, 2, 3, 4])
print(ixs)
array([array([[0, 0, 0],
[0, 0, 1]]),
array([[0, 0, 2],
[1, 0, 0]]),
array([[1, 0, 1]]),
array([[1, 0, 2]])], dtype=object)
You can first get non-zero elements in your array and then use argwhere in a list comprehension to get separate array for each non-zero element. Here np.unique(arr[arr!=0]) will give you the nonzero elements over which you can iterate to get the indices.
arr = np.array([[1, 0, 1],
[2, 2, 0]])
indices = [np.argwhere(arr==i) for i in np.unique(arr[arr!=0])]
# [array([[0, 0],
# [0, 2]]), array([[1, 0],
# [1, 1]])]

Numpy: swap values of 2D array based on a separate vector

Let's say I have a 3x4 numpy array, like so:
[[0, 1, 2],
[2, 0, 1],
[0, 2, 1],
[1, 2, 0]]
And let's say that I have an additional vector:
[2,
1,
2,
1]
For each row, I want to find the index of the value found in my additional vector, and swap it with the first column in my numpy array.
For example, the first entry in my vector is 2, and in the first row of my numpy array, 2 is in the 3rd column, so I want to swap the first and third columns for that row, and continue this for each additional row.
[[2, 1, 0], # the number in the 0th position (0) and 2 have swapped placement
[1, 0, 2], # the number in the 0th position (2) and 1 have swapped placement
[2, 0, 1], # the number in the 0th position (0) and 2 have swapped placement
[1, 2, 0] # the number in the 0th position (1) and 1 have swapped placement
What's the best way to accomplish this?
Setup
arr = np.array([[0, 1, 2], [2, 0, 1], [0, 2, 1], [1, 2, 0]])
vals = np.array([2, 1, 2, 1])
First, you need to find the index of your values, which we can accomplish using broadcasting and argmax (This will find the first index, not necessarily the only index):
idx = (arr == vals[:, None]).argmax(1)
# array([2, 2, 1, 0], dtype=int64)
Now using basic indexing and assignment:
r = np.arange(len(arr))
arr[r, idx], arr[:, 0] = arr[:, 0], arr[r, idx]
Output:
array([[2, 1, 0],
[1, 0, 2],
[2, 0, 1],
[1, 2, 0]])

Get array of indices for array

If I have a multidimensional array like this:
a = np.array([[9,9,9],[9,0,9],[9,9,9]])
I'd like to get an array of each index in that array, like so:
i = np.array([[0,0],[0,1],[0,2],[1,0],[1,1],...])
One way of doing this that I've found is like this, using np.indices:
i = np.transpose(np.indices(a.shape)).reshape(a.shape[0] * a.shape[1], 2)
But that seems somewhat clumsy, especially given the presence of np.nonzero which almost does what I want.
Is there a built-in numpy function that will produce an array of the indices of every item in a 2D numpy array?
Here is one more concise way (if the order is not important):
In [56]: np.indices(a.shape).T.reshape(a.size, 2)
Out[56]:
array([[0, 0],
[1, 0],
[2, 0],
[0, 1],
[1, 1],
[2, 1],
[0, 2],
[1, 2],
[2, 2]])
If you want it in your intended order you can use dstack:
In [46]: np.dstack(np.indices(a.shape)).reshape(a.size, 2)
Out[46]:
array([[0, 0],
[0, 1],
[0, 2],
[1, 0],
[1, 1],
[1, 2],
[2, 0],
[2, 1],
[2, 2]])
For the first approach if you don't want to use reshape another way is concatenation along the first axis using np.concatenate().
np.concatenate(np.indices(a.shape).T)

Categories

Resources