Finding indices of values in 2D numpy array - python

I'm trying to get the index values out of a numpy array, I've tried using intersects instead to no avail. I'm simply trying to find like values in 2 arrays. One is 2D and I'm selecting a column, and the other is 1D, just a list of values to search for, so effectively just 2 1D arrays.
We'll call this array a:
array([[ 1, 97553, 1],
[ 1, 97587, 1],
[ 1, 97612, 1],
[ 1, 97697, 1],
[ 1, 97826, 3],
[ 1, 97832, 1],
[ 1, 97839, 1],
[ 1, 97887, 1],
[ 1, 97944, 1],
[ 1, 97955, 2]])
And we're searching say, values = numpy.array([97612, 97633, 97697, 97999, 97943, 97944])
So I try:
numpy.where(a[:, 1] == values)
And I'd expect a bunch of indices of the values, but instead I get back an array that's empty, it spits out [(array([], dtype=int64),)].
If I try this though:
numpy.where(a[:, 1] == 97697)
It gives me back (array([2]),), which is what I would expect.
What weirdness of arrays am I missing here? Or is there maybe even an easier way to do this? Finding array indices and matching arrays seems to not work as I expect at all. When I want to find the unions or intersects of arrays, by indice or unique value it just doesn't seem to function. Any help would be super. Thanks.
Edit:
As per Warrens request:
import numpy
a = numpy.array([[ 1, 97553, 1],
[ 1, 97587, 1],
[ 1, 97612, 1],
[ 1, 97697, 1],
[ 1, 97826, 3],
[ 1, 97832, 1],
[ 1, 97839, 1],
[ 1, 97887, 1],
[ 1, 97944, 1],
[ 1, 97955, 2]])
values = numpy.array([97612, 97633, 97697, 97999, 97943, 97944])
I've found that numpy.in1d will give me a correct truth table of booleans for the operation, with a 1d array of the same length that should map to the original data. My only issue here is now how to act with that, for instance deleting or modifying the original array at those indices. I could do it laboriously with a loop, but as far as I know there are better ways in numpy. Truth tables as masks are supposed to be quite powerful with numpy from what I have been able to find.

np.where with a single argument is equivalent to np.nonzero. It gives you the indices where a condition, the input array, is True.
In your example you are checking for element-wise equality between a[:,1] and values
a[:, 1] == values
False
So it's giving you the correct result: no index in the input is True.
You should use np.isin instead
np.isin(a[:,1], values)
array([False, False, True, True, False, False, False, False, True, False], dtype=bool)
Now you can use np.where to get the indices
np.where(np.isin(a[:,1], values))
(array([2, 3, 8]),)
and use those to address the original array
a[np.where(np.isin(a[:,1], values))]
array([[ 1, 97612, 1],
[ 1, 97697, 1],
[ 1, 97944, 1]])
Your initial solution with a simple equality check could indeed have worked with proper broadcasting:
np.where(a[:,1] == values[..., np.newaxis])[1]
array([2, 3, 8])
EDIT: given you seem to have issues with using the above results to index and manipulate your array here's a couple of simple examples
Now you should have two ways of accessing your matching elements in the original array, either the binary mask or the indices from np.where.
mask = np.isin(a[:,1], values) # np.in1d if np.isin is not available
idx = np.where(mask)
Let's say you want to set all matching rows to zero
a[mask] = 0 # or a[idx] = 0
array([[ 1, 97553, 1],
[ 1, 97587, 1],
[ 0, 0, 0],
[ 0, 0, 0],
[ 1, 97826, 3],
[ 1, 97832, 1],
[ 1, 97839, 1],
[ 1, 97887, 1],
[ 0, 0, 0],
[ 1, 97955, 2]])
Or you want to multiply the third column of matching rows by 100
a[mask, 2] *= 100
array([[ 1, 97553, 1],
[ 1, 97587, 1],
[ 1, 97612, 100],
[ 1, 97697, 100],
[ 1, 97826, 3],
[ 1, 97832, 1],
[ 1, 97839, 1],
[ 1, 97887, 1],
[ 1, 97944, 100],
[ 1, 97955, 2]])
Or you want to delete matching rows (here using indices is more convenient than masks)
np.delete(a, idx, axis=0)
array([[ 1, 97553, 1],
[ 1, 97587, 1],
[ 1, 97826, 3],
[ 1, 97832, 1],
[ 1, 97839, 1],
[ 1, 97887, 1],
[ 1, 97955, 2]])

Just a thought:
Try to flatten the 2D array and compare using numpy.intersect1d.
https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.ndarray.flatten.html
https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.intersect1d.html

Related

Unexpected result from Numpy Matrix insert, How does this work?

My goal was to insert a column to the right on a numpy matrix. However, I found that the code I was using is putting in two columns rather than just one.
# This one results in a 4x1 matrix, as expected
np.insert(np.matrix([[0],[0]]), 1, np.matrix([[0],[0]]), 0)
>>>matrix([[0],
[0],
[0],
[0]])
# I would expect this line to return a 2x2 matrix, but it returns a 2x3 matrix instead.
np.insert(np.matrix([[0],[0]]), 1, np.matrix([[0],[0]]), 1)
>>>matrix([[0, 0, 0],
[0, 0, 0]]
Why do I get the above, in the second example, instead of [[0,0], [0,0]]?
While new use of np.matrix is discouraged, we get the same result with np.array:
In [41]: np.insert(np.array([[1],[2]]),1, np.array([[10],[20]]), 0)
Out[41]:
array([[ 1],
[10],
[20],
[ 2]])
In [42]: np.insert(np.array([[1],[2]]),1, np.array([[10],[20]]), 1)
Out[42]:
array([[ 1, 10, 20],
[ 2, 10, 20]])
In [44]: np.insert(np.array([[1],[2]]),1, np.array([10,20]), 1)
Out[44]:
array([[ 1, 10],
[ 2, 20]])
Insert as [1]:
In [46]: np.insert(np.array([[1],[2]]),[1], np.array([[10],[20]]), 1)
Out[46]:
array([[ 1, 10],
[ 2, 20]])
In [47]: np.insert(np.array([[1],[2]]),[1], np.array([10,20]), 1)
Out[47]:
array([[ 1, 10, 20],
[ 2, 10, 20]])
np.insert is a complex function written in Python. So we need to look at that code, and see how values are being mapped on the target space.
The docs elaborate on the difference between insert at 1 and [1]. But off hand I don't see an explanation of how the shape of values matters.
Difference between sequence and scalars:
>>> np.insert(a, [1], [[1],[2],[3]], axis=1)
array([[1, 1, 1],
[2, 2, 2],
[3, 3, 3]])
>>> np.array_equal(np.insert(a, 1, [1, 2, 3], axis=1),
... np.insert(a, [1], [[1],[2],[3]], axis=1))
True
When adding an array at the end of another, I'd use concatenate (or one of its stack variants) rather than insert. None of these operate in-place.
In [48]: np.concatenate([np.array([[1],[2]]), np.array([[10],[20]])], axis=1)
Out[48]:
array([[ 1, 10],
[ 2, 20]])

Indexing the max elements in a multidimensional tensor in PyTorch

I'm trying to index the maximum elements along the last dimension in a multidimensional tensor. For example, say I have a tensor
A = torch.randn((5, 2, 3))
_, idx = torch.max(A, dim=2)
Here idx stores the maximum indices, which may look something like
>>>> A
tensor([[[ 1.0503, 0.4448, 1.8663],
[ 0.8627, 0.0685, 1.4241]],
[[ 1.2924, 0.2456, 0.1764],
[ 1.3777, 0.9401, 1.4637]],
[[ 0.5235, 0.4550, 0.2476],
[ 0.7823, 0.3004, 0.7792]],
[[ 1.9384, 0.3291, 0.7914],
[ 0.5211, 0.1320, 0.6330]],
[[ 0.3292, 0.9086, 0.0078],
[ 1.3612, 0.0610, 0.4023]]])
>>>> idx
tensor([[ 2, 2],
[ 0, 2],
[ 0, 0],
[ 0, 2],
[ 1, 0]])
I want to be able to access these indices and assign to another tensor based on them. Meaning I want to be able to do
B = torch.new_zeros(A.size())
B[idx] = A[idx]
where B is 0 everywhere except where A is maximum along the last dimension. That is B should store
>>>>B
tensor([[[ 0, 0, 1.8663],
[ 0, 0, 1.4241]],
[[ 1.2924, 0, 0],
[ 0, 0, 1.4637]],
[[ 0.5235, 0, 0],
[ 0.7823, 0, 0]],
[[ 1.9384, 0, 0],
[ 0, 0, 0.6330]],
[[ 0, 0.9086, 0],
[ 1.3612, 0, 0]]])
This is proving to be much more difficult than I expected, as the idx does not index the array A properly. Thus far I have been unable to find a vectorized solution to use idx to index A.
Is there a good vectorized way to do this?
You can use torch.meshgrid to create an index tuple:
>>> index_tuple = torch.meshgrid([torch.arange(x) for x in A.size()[:-1]]) + (idx,)
>>> B = torch.zeros_like(A)
>>> B[index_tuple] = A[index_tuple]
Note that you can also mimic meshgrid via (for the specific case of 3D):
>>> index_tuple = (
... torch.arange(A.size(0))[:, None],
... torch.arange(A.size(1))[None, :],
... idx
... )
Bit more explanation:
We will have the indices something like this:
In [173]: idx
Out[173]:
tensor([[2, 1],
[2, 0],
[2, 1],
[2, 2],
[2, 2]])
From this, we want to go to three indices (since our tensor is 3D, we need three numbers to retrieve each element). Basically we want to build a grid in the first two dimensions, as shown below. (And that's why we use meshgrid).
In [174]: A[0, 0, 2], A[0, 1, 1]
Out[174]: (tensor(0.6288), tensor(-0.3070))
In [175]: A[1, 0, 2], A[1, 1, 0]
Out[175]: (tensor(1.7085), tensor(0.7818))
In [176]: A[2, 0, 2], A[2, 1, 1]
Out[176]: (tensor(0.4823), tensor(1.1199))
In [177]: A[3, 0, 2], A[3, 1, 2]
Out[177]: (tensor(1.6903), tensor(1.0800))
In [178]: A[4, 0, 2], A[4, 1, 2]
Out[178]: (tensor(0.9138), tensor(0.1779))
In the above 5 lines, the first two numbers in the indices are basically the grid that we build using meshgrid and the third number is coming from idx.
i.e. the first two numbers form a grid.
(0, 0) (0, 1)
(1, 0) (1, 1)
(2, 0) (2, 1)
(3, 0) (3, 1)
(4, 0) (4, 1)
An ugly hackaround is to create a binary mask out of idx and use it to index the arrays. The basic code looks like this:
import torch
torch.manual_seed(0)
A = torch.randn((5, 2, 3))
_, idx = torch.max(A, dim=2)
mask = torch.arange(A.size(2)).reshape(1, 1, -1) == idx.unsqueeze(2)
B = torch.zeros_like(A)
B[mask] = A[mask]
print(A)
print(B)
The trick is that torch.arange(A.size(2)) enumerates the possible values in idx and mask is nonzero in places where they equal the idx. Remarks:
If you really discard the first output of torch.max, you can use torch.argmax instead.
I assume that this is a minimal example of some wider problem, but be aware that you are currently reinventing torch.nn.functional.max_pool3d with kernel of size (1, 1, 3).
Also, be aware that in-place modification of tensors with masked assignment can cause issues with autograd, so you may want to use torch.where as shown here.
I would expect that somebody comes up with a cleaner solution (avoiding the intermedia allocation of the mask array), likely making use of torch.index_select, but I can't get it to work right now.
could use torch.scatter here
>>> import torch
>>> a = torch.randn(4,2,3)
>>> a
tensor([[[ 0.1583, 0.1102, -0.8188],
[ 0.6328, -1.9169, -0.5596]],
[[ 0.5335, 0.4069, 0.8403],
[-1.2537, 0.9868, -0.4947]],
[[-1.2830, 0.4386, -0.0107],
[ 1.3384, 0.5651, 0.2877]],
[[-0.0334, -1.0619, -0.1144],
[ 0.1954, -0.7371, 1.7001]]])
>>> ind = torch.max(a,1,keepdims=True)[1]
>>> ind
tensor([[[1, 0, 1]],
[[0, 1, 0]],
[[1, 1, 1]],
[[1, 1, 1]]])
>>> torch.zeros_like(a).scatter(1,ind,a)
tensor([[[ 0.0000, 0.1102, 0.0000],
[ 0.1583, 0.0000, -0.8188]],
[[ 0.5335, 0.0000, 0.8403],
[ 0.0000, 0.4069, 0.0000]],
[[ 0.0000, 0.0000, 0.0000],
[-1.2830, 0.4386, -0.0107]],
[[ 0.0000, 0.0000, 0.0000],
[-0.0334, -1.0619, -0.1144]]])

Indexing into the last axis of a 3D array with another 3D array

if i have an array x, which value is as follow with shape (2,3,4):
array([[[ 0.15845319, 0.57808432, 0.05638804, 0.56237656],
[ 0.73164208, 0.80562342, 0.64561066, 0.15397456],
[ 0.34734043, 0.88063258, 0.4863103 , 0.09881028]],
[[ 0.35823078, 0.71260357, 0.49410944, 0.94909165],
[ 0.02730397, 0.67890392, 0.74340148, 0.47434223],
[ 0.02494292, 0.59827256, 0.20550867, 0.30859339]]])
and i have an index array y, which shape is (2, 3, 3), and the value is:
array([[[0, 2, 2],
[2, 0, 2],
[0, 0, 2]],
[[1, 2, 1],
[1, 1, 1],
[1, 2, 2]]])
so i could use x[0,0,y[0][0]] to index the array x, and it will generate the output as follow:
array([ 0.15845319, 0.05638804, 0.05638804])
my question is: is there any simple way to do this? i had already tried with
x[y], it did not work.
You could use fancy-indexing -
m,n = y.shape[:2]
out = x[np.arange(m)[:,None,None],np.arange(n)[:,None],y]

Inserting a row into a NumPy array

I have :
A = np.array([[0,1,1],[0,3,2],[1,1,1],[1,5,2]])
where the NumPy array is sorted based on first element and then second element and so on.
I want to insert [1,4,10] into A,such that the output would be :
A = array([[0,1,1],[0,3,2],[1,1,1],[1,4,10][1,5,2]])
How should I do it?
First off, stack the new 1D array as the last row with np.vstack -
B = np.vstack((A,[1,4,10]))
Now, for maintaining the precedence order of considering first and then second and so on elements for each row, assume each row as an indexing tuple and then get the sorted indices. This could be achieved with np.ravel_multi_index(B.T,B.max(0)+1). Then, use these indices for rearranging rows of B and have the desired output. Thus, the final code would be -
out = B[np.ravel_multi_index(B.T,B.max(0)+1).argsort()]
It seems there's an alternative with np.lexsort to get the sorted indices that respects that precedence, but does from in the opposite sense. So, we need to reverse the order of elements row-wise, use lexsort and then get the sorted indices. These indices could then be used for indexing into B just like in the previous approach and get us the output. So, the alternative final code with np.lexsort would be -
out = B[np.lexsort(B[:,::-1].T)]
Sample run -
In [60]: A
Out[60]:
array([[0, 1, 1],
[0, 3, 2],
[1, 1, 1],
[1, 5, 2]])
In [61]: B = np.vstack((A,[1,4,10]))
In [62]: B
Out[62]:
array([[ 0, 1, 1],
[ 0, 3, 2],
[ 1, 1, 1],
[ 1, 5, 2],
[ 1, 4, 10]]) # <= New row
In [63]: B[np.ravel_multi_index(B.T,B.max(0)+1).argsort()]
Out[63]:
array([[ 0, 1, 1],
[ 0, 3, 2],
[ 1, 1, 1],
[ 1, 4, 10], # <= New row moved here
[ 1, 5, 2]])
In [64]: B[np.lexsort(B[:,::-1].T)]
Out[64]:
array([[ 0, 1, 1],
[ 0, 3, 2],
[ 1, 1, 1],
[ 1, 4, 10], # <= New row moved here
[ 1, 5, 2]])

Joining two 2D numpy arrays into a single 2D array of 2-tuples

I have two 2D numpy arrays like this, representing the x/y distances between three points. I need the x/y distances as tuples in a single array.
So from:
x_dists = array([[ 0, -1, -2],
[ 1, 0, -1],
[ 2, 1, 0]])
y_dists = array([[ 0, -1, -2],
[ 1, 0, -1],
[ 2, 1, 0]])
I need:
dists = array([[[ 0, 0], [-1, -1], [-2, -2]],
[[ 1, 1], [ 0, 0], [-1, -1]],
[[ 2, 2], [ 1, 1], [ 0, 0]]])
I've tried using various permutations of dstack/hstack/vstack/concatenate, but none of them seem to do what I want. The actual arrays in code are liable to be gigantic, so iterating over the elements in python and doing the rearrangement "manually" isn't an option speed-wise.
Edit:
This is what I came up with in the end: https://gist.github.com/807656
import numpy as np
dists = np.vstack(([x_dists.T], [y_dists.T])).T
returns dists like you wanted them. Afterwards it is not "a single 2D array of 2-tuples", but a normal 3D array where the third axis is the concatenation of the two original arrays.
You see:
dists.shape # (3, 3, 2)
numpy.rec.fromarrays([x_dists, y_dists], names='x,y')

Categories

Resources