Numpy: swap values of 2D array based on a separate vector - python

Let's say I have a 3x4 numpy array, like so:
[[0, 1, 2],
[2, 0, 1],
[0, 2, 1],
[1, 2, 0]]
And let's say that I have an additional vector:
[2,
1,
2,
1]
For each row, I want to find the index of the value found in my additional vector, and swap it with the first column in my numpy array.
For example, the first entry in my vector is 2, and in the first row of my numpy array, 2 is in the 3rd column, so I want to swap the first and third columns for that row, and continue this for each additional row.
[[2, 1, 0], # the number in the 0th position (0) and 2 have swapped placement
[1, 0, 2], # the number in the 0th position (2) and 1 have swapped placement
[2, 0, 1], # the number in the 0th position (0) and 2 have swapped placement
[1, 2, 0] # the number in the 0th position (1) and 1 have swapped placement
What's the best way to accomplish this?

Setup
arr = np.array([[0, 1, 2], [2, 0, 1], [0, 2, 1], [1, 2, 0]])
vals = np.array([2, 1, 2, 1])
First, you need to find the index of your values, which we can accomplish using broadcasting and argmax (This will find the first index, not necessarily the only index):
idx = (arr == vals[:, None]).argmax(1)
# array([2, 2, 1, 0], dtype=int64)
Now using basic indexing and assignment:
r = np.arange(len(arr))
arr[r, idx], arr[:, 0] = arr[:, 0], arr[r, idx]
Output:
array([[2, 1, 0],
[1, 0, 2],
[2, 0, 1],
[1, 2, 0]])

Related

Sort array based on value and create new array

Imagine a two dimensional array:
a = np.array([[1,1],[1, 0],[0, 0],[0, 0],[0, 0],[1, 1],[1, 1],[0, 1]])
I want to sort the array based on its first value like:
[[1,1],[1, 1],[1, 1],[1, 0],[0, 1],[0, 0],[0, 0],[0, 0]]
If I am simply going with a .sort() like:
a[::-1].sort(axis=0)
and the returned array looks like:
array([[1, 1],
[1, 1],
[1, **1**],
[**1**, 1],
[0, 0],
[0, 0],
[0, 0],
[0, 0]])
As you can see the bold 1 used to be a zero. Why is the function flipping around my numbers? I searched the internet and haven't found any answers.
The problem is that numpy sort when you pass axis=0 is sorting each column independently (see examples on doc page). If you want to sort rows, then you can use sorted instead:
np.array(sorted(a, key=lambda x: x.tolist(), reverse=True))
In your case the result is
[[1 1]
[1 1]
[1 1]
[1 0]
[0 1]
[0 0]
[0 0]
[0 0]]
the sort you are doing is a sort of all columns in the array independently of each other,
from the first example on this page https://numpy.org/doc/stable/reference/generated/numpy.sort.html
>>> a = np.array([[1,4],[3,1]])
>>> np.sort(a) # sort along the last axis
array([[1, 4],
[1, 3]])
>>> np.sort(a, axis=None) # sort the flattened array
array([1, 1, 3, 4])
>>> np.sort(a, axis=0) # sort along the first axis
array([[1, 1],
[3, 4]])
also see this answer to sort the rows based on a single column: Sorting arrays in NumPy by column
You can use np.lexsort, and pass the two columns independently, then reverse the order. lexsort returns the sorted indices, given the key. You need to put first column second, because the primary key in lexsort is the last column:
>>> a[np.lexsort((a[:,1], a[:,0]))][::-1]
array([[1, 1],
[1, 1],
[1, 1],
[1, 0],
[0, 1],
[0, 0],
[0, 0],
[0, 0]])
Here is the output of np.lexsort on your data:
key1 key2 (primary)
>>> np.lexsort((a[:,1], a[:,0]))
array([2, 3, 4, 7, 1, 0, 5, 6], dtype=int64)

Find equal value indices in numpy array [duplicate]

I have a 2D Numpy array containing values from 0 to n.
I want to get a list of length n, such that the i'th element of that list is an array of all the indices with value i+1 (0 is excluded).
For example, for the input
array([[1, 0, 1],
[2, 2, 0]])
I'm expecting to get
[array([[0, 0], [0, 2]]), array([[1,0], [1,1]])]
I found this related question:
Get a list of all indices of repeated elements in a numpy array
which may be helpful, but I hoped to find a more direct solution that doesn't require flattening and sorting the array and that is as efficient as possible.
Here's a vectorized approach, which works for arrays of an arbitrary amount of dimensions. The idea of this solution is to extend the functionality of the return_index method in np.unique, and return an array of arrays, each containing the N-dimensional indices of unique values in a numpy array.
For a more compact solution, I've defined the following function along with some explanations throughout the different steps:
def ndix_unique(x):
"""
Returns an N-dimensional array of indices
of the unique values in x
----------
x: np.array
Array with arbitrary dimensions
Returns
-------
- 1D-array of sorted unique values
- Array of arrays. Each array contains the indices where a
given value in x is found
"""
x_flat = x.ravel()
ix_flat = np.argsort(x_flat)
u, ix_u = np.unique(x_flat[ix_flat], return_index=True)
ix_ndim = np.unravel_index(ix_flat, x.shape)
ix_ndim = np.c_[ix_ndim] if x.ndim > 1 else ix_flat
return u, np.split(ix_ndim, ix_u[1:])
Checking with the array from the question -
a = np.array([[1, 0, 1],[2, 2, 0]])
vals, ixs = ndix_unique(a)
print(vals)
array([0, 1, 2])
print(ixs)
[array([[0, 1],
[1, 2]]),
array([[0, 0],
[0, 2]]),
array([[1, 0],
[1, 1]])]
Lets try with this other case:
a = np.array([[1,1,4],[2,2,1],[3,3,1]])
vals, ixs = ndix_unique(a)
print(vals)
array([1, 2, 3, 4])
print(ixs)
array([array([[0, 0],
[0, 1],
[1, 2],
[2, 2]]),
array([[1, 0],
[1, 1]]),
array([[2, 0],
[2, 1]]),
array([[0, 2]])], dtype=object)
For a 1D array:
a = np.array([1,5,4,3,3])
vals, ixs = ndix_unique(a)
print(vals)
array([1, 3, 4, 5])
print(ixs)
array([array([0]), array([3, 4]), array([2]), array([1])], dtype=object)
Finally another example with a 3D ndarray:
a = np.array([[[1,1,2]],[[2,3,4]]])
vals, ixs = ndix_unique(a)
print(vals)
array([1, 2, 3, 4])
print(ixs)
array([array([[0, 0, 0],
[0, 0, 1]]),
array([[0, 0, 2],
[1, 0, 0]]),
array([[1, 0, 1]]),
array([[1, 0, 2]])], dtype=object)
You can first get non-zero elements in your array and then use argwhere in a list comprehension to get separate array for each non-zero element. Here np.unique(arr[arr!=0]) will give you the nonzero elements over which you can iterate to get the indices.
arr = np.array([[1, 0, 1],
[2, 2, 0]])
indices = [np.argwhere(arr==i) for i in np.unique(arr[arr!=0])]
# [array([[0, 0],
# [0, 2]]), array([[1, 0],
# [1, 1]])]

Merge three numpy arrays, keep largest value

I want to merge three numpy arrays, for example:
a = np.array([[0,0,1],[0,1,0],[1,0,0]])
b = np.array([[1,0,0],[0,1,0],[0,0,1]])
c = np.array([[0,1,0],[0,2,0],[0,1,0]])
a = array([[0, 0, 1],
[0, 1, 0],
[1, 0, 0]])
b = array([[1, 0, 0],
[0, 1, 0],
[0, 0, 1]])
c = array([[0, 1, 0],
[0, 2, 0],
[0, 1, 0]])
Desired result would be to overlay them but keep the largest value where multiple elements are not 0, like in the middle.
array([[1, 1, 1],
[0, 2, 0],
[1, 1, 1]])
I solved this by iterating over all elements with multiple if-conditions. Is there a more compact and more beautiful way to do this?
You can try of stacking arrays together in extra dimension with Numpy np.dstack method
and extract the maximum value specific to added dimension
# Stacking arrays together
d = np.dstack([a,b,c])
d.max(axis=2)
Out:
array([[1, 1, 1],
[0, 2, 0],
[1, 1, 1]])
NumPy's np.ufunc.reduce allows to apply a function cumulatively along a given axis. We can just concatenate the arrays and reduce with numpy.maximum to keep the accumulated elementwise maximum:
np.maximum.reduce([a,b,c])
array([[1, 1, 1],
[0, 2, 0],
[1, 1, 1]])

How to find the indices of the maximum in each line, a concatenation of rows, in numpy?

I don't know if this is simple or not or if it is asked before or not. (I searched but did not find the correct way to do it. I have found numpy.argmax and numpy.amax but I am not able to use them correctly.)
I have a numpy array (it is a CxKxN matrix) as follows (C=K=N=3):
array([[[1, 2, 3],
[2, 1, 4],
[4, 3, 3]],
[[2, 1, 1],
[1, 3, 1],
[3, 4, 2]],
[[5, 2, 1],
[3, 3, 3],
[4, 1, 2]]])
I would like to find the indices of the maximum elements across each line. A line is the concatenation of the three (C) rows of each matrix. In other words, the i-th line is the concatenation of the i-th row in the first matrix, the i-th row in the second matrix, ..., until the i-th row in the C-th matrix.
For example, the first line is
[1, 2, 3, 2, 1, 1, 5, 2, 1]
So I would like to return
[2, 0, 0] # the index of the maximum in the first line
and
[0, 1, 2] # the index of the maximum in the second line
and
[0, 2, 0] # the index of the maximum in the third line
or
[1, 2, 1] # the index of the maximum in the third line
or
[2, 2, 0] # the index of the maximum in the third line
Now, I am trying this
np.argmax(a[:,0,:], axis=None) # for the first line
It returns 6 and
np.argmax(a[:,1,:], axis=None)
and it returns 2 and
np.argmax(a[:,2,:], axis=None)
and it returns 0
but I am able to convert these numbers to indices like 6 = (2,0,0), etc.
With an transpose and reshape I get your 'rows'
In [367]: arr.transpose(1,0,2).reshape(3,9)
Out[367]:
array([[1, 2, 3, 2, 1, 1, 5, 2, 1],
[2, 1, 4, 1, 3, 1, 3, 3, 3],
[4, 3, 3, 3, 4, 2, 4, 1, 2]])
In [368]: np.argmax(_, axis=1)
Out[368]: array([6, 2, 0])
These max are same as yours. The same indices, but in a (3,3) array:
In [372]: np.unravel_index([6,2,0],(3,3))
Out[372]: (array([2, 0, 0]), array([0, 2, 0]))
Join them with middle dimension range:
In [373]: tup = (_[0],np.arange(3),_[1])
In [374]: np.transpose(tup)
Out[374]:
array([[2, 0, 0],
[0, 1, 2],
[0, 2, 0]])

Updating by index in an multi-dimensional numpy array

I am using numpy to tally a lot of values across many large arrays, and keep track of which positions the maximum values appear in.
In particular, imagine I have a 'counts' array:
data = numpy.array([[ 5, 10, 3],
[ 6, 9, 12],
[13, 3, 9],
[ 9, 3, 1],
...
])
counts = numpy.zeros(data.shape, dtype=numpy.int)
data is going to change a lot, but I want 'counts' to reflect the number of times the max has appeared in each position:
max_value_indices = numpy.argmax(data, axis=1)
# this is now [1, 2, 0, 0, ...] representing the positions of 10, 12, 13 and 9, respectively.
From what I understand of broadcasting in numpy, I should be able to say:
counts[max_value_indices] += 1
What I expect is the array to be updated:
[[0, 1, 0],
[0, 0, 1],
[1, 0, 0],
[1, 0, 0],
...
]
But instead this increments ALL the values in counts giving me:
[[1, 1, 1],
[1, 1, 1],
[1, 1, 1],
[1, 1, 1],
...
]
I also though perhaps if I transformed max_value_indices to a 100x1 array, it might work:
counts[max_value_indices[:,numpy.newaxis]] += 1
but this has effect of updating just the elements in positions 0, 1, and 2:
[[1, 1, 1],
[1, 1, 1],
[1, 1, 1],
[0, 0, 0],
...
]
I'm also happy to turn the indices array into an array of 0's and 1's, and then add it to the counts array each time, but I'm not sure how to construct that.
You could use so-called advanced integer indexing (aka Multidimensional list-of-locations indexing):
In [24]: counts[np.arange(data.shape[0]),
np.argmax(data, axis=1)] += 1
In [25]: counts
Out[25]:
array([[0, 1, 0],
[0, 0, 1],
[1, 0, 0],
[1, 0, 0]])
The first array, np.arange(data.shape[0]) specifies the row. The second array, np.argmax(data, axis=1) specifies the column.

Categories

Resources