This question already has answers here:
find and delete from more-dimensional numpy array
(3 answers)
Closed 9 years ago.
I have two numpy arrays that represent 2D coordinates. Each row represents (x, y) pairs:
a = np.array([[1, 1], [2, 1], [3, 1], [3, 2], [3, 3], [5, 5]])
b = np.array([[1, 1], [5, 5], [3, 2]])
I would like to remove elements from a which are in b efficiently. So the result would be:
array([[2, 1], [3, 1], [3, 3]])
I can do it by looping and comparing, I was hoping I could do it easier.
Python sets does a nice job of giving differences. It doesn't, though, maintain order
np.array(list(set(tuple(x) for x in a.tolist()).difference(set(tuple(x) for x in b.tolist()))))
Or to use boolean indexing, use broadcasting to create an outer equals, and sum with any and all
A = np.all((a[None,:,:]==b[:,None,:]),axis=-1)
A = np.any(A,axis=0)
a[~A,:]
Or make a and b complex:
ac = np.dot(a,[1,1j])
bc = np.dot(b,[1,1j])
A = np.any(ac==bc[:,None],axis=0)
a[~A,:]
or to use setxor1d
xx = np.setxor1d(ac,bc)
# array([ 2.+1.j, 3.+1.j, 3.+3.j])
np.array([xx.real,xx.imag],dtype=int).T
=================
In [222]: ac = np.dot(a,[1,1j])
...: bc = np.dot(b,[1,1j])
In [223]: ac
Out[223]: array([ 1.+1.j, 2.+1.j, 3.+1.j, 3.+2.j, 3.+3.j, 5.+5.j])
In [225]: bc
Out[225]: array([ 1.+1.j, 5.+5.j, 3.+2.j])
In [226]: ac == bc[:,None]
Out[226]:
array([[ True, False, False, False, False, False],
[False, False, False, False, False, True],
[False, False, False, True, False, False]], dtype=bool)
Related
so let`s say I have a matrix mat= [[1,2,3,4,5,6],[1,2,3,4,5,6],[1,2,3,4,5,6]]
and a lower bound vector vector_low = [2.1,1.9,1.7] and upper bound vector vector_up = [3.1,3.5,4.1].
How do I get the values in the matrix in between the upper and lower bounds for every row?
Expected Output:
[[3],[2,3],[2,3,4]] (it`s a list #mozway)
alternatively a vector with all of them would also do...
(Extra question: get the values of the matrix that are between the upper and lower bound, but rounded down/up to the next value in the matrix..
Expected Output:
[[2,3,4],[1,2,3,4],[1,2,3,4,5]])
There should be a fast solution without loop, hope someone can help, thanks!
PS: In the end I just want to sum over the list entries, so the output format is not important...
I probably shouldn't indulge you since you haven't provided the code I asked for, but to satisfy my own curiosity, here my solution(s)
Your lists:
In [72]: alist = [[1, 2, 3, 4, 5, 6], [1, 2, 3, 4, 5, 6], [1, 2, 3, 4, 5, 6]]
In [73]: low = [2.1,1.9,1.7]; up = [3.1,3.5,4.1]
A utility function:
In [74]: def between(row, l, u):
...: return [i for i in row if l <= i <= u]
and the straightforward list comprehension solution - VERY PYTHONIC:
In [75]: [between(row, l, u) for row, l, u in zip(alist, low, up)]
Out[75]: [[3], [2, 3], [2, 3, 4]]
A numpy solutions requires starting with arrays:
In [76]: arr = np.array(alist)
In [77]: Low = np.array(low)
...: Up = np.array(up)
We can check the bounds with:
In [79]: Low[:, None] <= arr
Out[79]:
array([[False, False, True, True, True, True],
[False, True, True, True, True, True],
[False, True, True, True, True, True]])
In [80]: (Low[:, None] <= arr) & (Up[:,None] >= arr)
Out[80]:
array([[False, False, True, False, False, False],
[False, True, True, False, False, False],
[False, True, True, True, False, False]])
Applying the mask to index arr produces a flat array of values:
In [81]: arr[_]
Out[81]: array([3, 2, 3, 2, 3, 4])
to get values by row, we still have to iterate:
In [82]: [row[mask] for row, mask in zip(arr, Out[80])]
Out[82]: [array([3]), array([2, 3]), array([2, 3, 4])]
For the small case I expect the list approach to be faster. For larger cases [81] will do better - IF we already have arrays. Creating arrays from the lists is not a time-trivial task.
This question already has answers here:
check for identical rows in different numpy arrays
(7 answers)
Closed 1 year ago.
I have two arrays:
A = np.array([[3, 1], [4, 1], [1, 4]])
B = np.array([[0, 1, 5], [2, 4, 5], [2, 3, 5]])
Is it possible to use numpy.isin rowwise for 2D arrays? I want to check if A[i,j] is in B[i] and return this result into C[i,j]. At the end I would get the following C:
np.array([[False, True], [True, False], [False, False]])
It would be great, if this is also doable with the == operator, then I could use it also with PyTorch.
Edit:
I also considered check for identical rows in different numpy arrays. The question is somehow similar, but I am not able to apply its solutions to this slightly different problem.
Not sure that my code solves your problem perfectly. please run it on more test cases to confirm. but i would do smth like i've done in the following code taking advantage of numpys vector outer operations ability (similar to vector outer product). If it works as intended it should work with pytorch as well.
import numpy as np
A = np.array([[3, 1], [4, 1], [1, 4]])
B = np.array([[0, 1, 5], [2, 4, 5], [2, 3, 5]])
AA = A.reshape(3, 2, 1)
BB = B.reshape(3, 1, 3)
(AA == BB).sum(-1).astype(bool)
output:
array([[False, True],
[ True, False],
[False, False]])
Updated answer
Here is a way to do this :
(A == B[..., None]).any(axis=1).astype(bool)
# > array([[False, True],
# [ True, False],
# [False, False]])
Previous answer
You could also do it inside a list comprehension:
[np.isin(a, b) for a,b in zip(A, B)]
# > [array([False, True]), array([ True, False]), array([False, False])]
np.array([np.isin(a, b) for a,b in zip(A, B)])
# > array([[False, True],
# [ True, False],
# [False, False]])
But, as #alex said it defeats the purpose of numpy.
I can perform a boolean mask on an array of arrays like this
import numpy as np
a = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
b = [[True, False, False], [True, True, False], [False, False, False]]
np.array(a)[np.array(b)]
and get array([1, 4, 5])
How would I preserve the information of which numbers belonged to the same array?
something like this would work
is_in_original(1, 4)
> False
is_in_origina(5, 4)
>True
One thing I could think of is this
def is_in_original(x, y):
for arry in np.array(a):
if x in arry and y in arry:
return True
return False
I am wondering if this is the most computationally efficient method. I will be working with very large array of arrays, and need the throughput to be as fast as possible.
You can use np.where(mask, array, 0) to preserve dimensions.
import numpy as np
a = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
b = [[True, False, False], [True, True, False], [False, False, False]]
ret = np.where(np.array(b), np.array(a), 0)
Output:
array([[1, 0, 0],
[4, 5, 0],
[0, 0, 0]])
In this case you can change third parameter of np.where is 0, you can change the value to any number or inf
With 2 numpy 2-d arrays, I'd like a result array filled with row-by-row equality results. For example:
>>> a = np.array([[1, 2], [3, 4], [5, 6]])
>>> b = np.array([[5, 6], [3, 4], [1, 2]])
>>> a == b # not quite what I want
array([[False, False],
[ True, True],
[False, False]])
>>> np.equal(a, b) # also not quite
array([[False, False],
[ True, True],
[False, False]])
The result I want, each row's equality as one element, would be:
array([False, True, False])
What's the compact/idiomatic way to get this result?
You can use all() over axis=1 to turn each row of [bool, ...] into True if all-true, and False if any false.
For example:
>>> a = np.array([[1, 2], [3, 4], [5, 6]])
>>> b = np.array([[5, 6], [3, 4], [1, 2]])
>>> np.all(a == b, axis=1)
array([False, True, False])
I wanted to convert the specified elements of the NumPy array A: 1, 5, and 8 into 0.
So I did the following:
import numpy as np
A = np.array([[1,2,3,4,5],[6,7,8,9,10]])
bad_values = (A==1)|(A==5)|(A==8)
A[bad_values] = 0
print A
Yes, I got the expected result, i.e., new array.
However, in my real world problem, the given array (A) is very large and is also 2-dimensional, and the number of bad_values to be converted into 0 are also too many. So, I tried the following way of doing that:
bads = [1,5,8] # Suppose they are the values to be converted into 0
bad_values = A == x for x in bads # HERE is the problem I am facing
How can I do this?
Then, of course the remaining is the same as before.
A[bad_values] = 0
print A
If you want to get the index of where a bad value occurs in your array A, you could use in1d to find out which values are in bads:
>>> np.in1d(A, bads)
array([ True, False, False, False, True, False, False, True, False, False], dtype=bool)
So you can just write A[np.in1d(A, bads)] = 0 to set the bad values of A to 0.
EDIT: If your array is 2D, one way would be to use the in1d method and then reshape:
>>> B = np.arange(9).reshape(3, 3)
>>> B
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>> np.in1d(B, bads).reshape(3, 3)
array([[False, True, False],
[False, False, True],
[False, False, True]], dtype=bool)
So you could do the following:
>>> B[np.in1d(B, bads).reshape(3, 3)] = 0
>>> B
array([[0, 0, 2],
[3, 4, 0],
[6, 7, 0]])