I saw this code in some examples online and am trying to understand and modify it:
c = a[b == 1]
Why does this work? It appears b == 1 returns true for each element of b that satisfies the equality. I don't understand how something like a[True] ends up evaluating to something like "For all values in a for which the same indexed value in b is equal to 1, copy them to c"
a,b, and c are all NumPy arrays of the same length containing some data.
I've searched around quite a bit but don't even know what to call this sort of thing.
If I want to add a second condition, for example:
c = a[b == 1 and d == 1]
I get
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
I know this happens because that combination of equality operations is ambiguous for reasons explained here, but I am unsure of how to add a.any() or a.all() into that expression in just one line.
EDIT:
For question 2, c = a[(b == 1) & (d == 1)] works. Any input on my first question about how/why this works?
Why wouldn't your example in point (1) work? This is Boolean indexing. If the arrays were different shapes then it may be a different matter, but:
c = a[b == 1]
Is indistinguishable from:
c = a[a == 1]
When you don't know the actual arrays. Nothing specific to a is going on here; a == 1 is just setting up a boolean mask, that you then re-apply to a in a[mask_here]. Doesn't matter what generated the mask.
You just need to put the conditions separately in brackets. Try using this
c = a[(b == 1) & (d == 1)]
Related
Suppose a is an array_like and we want to check if it is empty. Two possible ways to accomplish this are:
if not a:
pass
if numpy.array(a).size == 0:
pass
The first solution would also evaluate to True if a=None. However I would like to only check for an empty array_like.
The second solution seems good enough for that. I was just wondering if there is a numpy built-in function for that or a better solution then to check for the size?
If you want to check if size is zero, you might use numpy.size function to get more concise code
import numpy
a = []
b = [1,2]
c = [[1,2],[3,4]]
print(numpy.size(a) == 0) # True
print(numpy.size(b) == 0) # False
print(numpy.size(c) == 0) # False
I have 2 tensors of unequal size
a = torch.tensor([[1,2], [2,3],[3,4]])
b = torch.tensor([[4,5],[2,3]])
I want a boolean array of whether each value exists in the other tensor without iterating. something like
a in b
and the result should be
[False, True, False]
as only the value of a[1] is in b
I think it's impossible without using at least some type of iteration. The most succinct way I can manage is using list comprehension:
[True if i in b else False for i in a]
Checks for elements in b that are in a and gives [False, True, False]. Can also be reversed to get elements a in b [False, True].
this should work
result = []
for i in a:
try: # to avoid error for the case of empty tensors
result.append(max(i.numpy()[1] == b.T.numpy()[1,i.numpy()[0] == b.T.numpy()[0,:]]))
except:
result.append(False)
result
Neither of the solutions that use tensor in tensor work in all cases for the OP. If the tensors contain elements/tuples that match in at least one dimension, the aforementioned operation will return True for those elements, potentially leading to hours of debugging. For example:
torch.tensor([2,5]) in torch.tensor([2,10]) # returns True
torch.tensor([5,2]) in torch.tensor([5,10]) # returns True
A solution for the above could be forcing the check for equality in each dimension, and then applying a Tensor Boolean add. Note, the following 2 methods may not be very efficient because Tensors are rather slow for iterating and equality checking, so converting to numpy may be needed for large data:
[all(torch.any(i == b, dim=0)) for i in a] # OR
[any((i[0] == b[:, 0]) & (i[1] == b[:, 1])) for i in a]
That being said, #yuri's solution also seems to work for these edge cases, but it still seems to fail occasionally, and it is rather unreadable.
If you need to compare all subtensors across the first dimension of a, use in:
>>> [i in b for i in a]
[False, True, False]
I recently also encountered this issue though my goal is to select those row sub-tensors not "in" the other tensor. My solution is to first convert the tensors to pandas dataframe, then use .drop_duplicates(). More specifically, for OP's problem, one can do:
import pandas as pd
import torch
tensor1_df = pd.DataFrame(tensor1)
tensor1_df['val'] = False
tensor2_df = pd.DataFrame(tensor2)
tensor2_df['val'] = True
tensor1_notin_tensor2 = torch.from_numpy(pd.concat([tensor1_df, tensor2_df]).reset_index().drop(columns=['index']).drop_duplicates(keep='last').reset_index().loc[np.arange(tensor1_df.shape[0])].val.values)
I have seen some coding of condition checking inside the Numpy array,
like if the array is a = np.zeros((10,10))
and doing something like,
a[ a == 255 ] = 0
Now with this simple thing, I have seen people are doing complex things,
What is this concept called?
I don't think it has a name beyond indexing. Each type can define what such indexing means though, so the common example of a list being an integer-indexed container is not the only thing you can do.
A numpy.array first overloads its __eq__ operator so that an expression like a == 255 doesn't return a single Boolean value. Instead, it returns an entire array of Boolean values, were the ith element of the result is True if a[i] == 255. That is, result = a == 255 is similar to
result = [a[i] == 255 for i in range(len(a))]
Then, __getitem__ is overloaded to handle the case where you try to index an array with another array. In this case, the result "selects" each element of the array where the corresponding Boolean value is true. The assignment
a[a == 255] = 0
then is roughly equivalent to
for x in range(len(a)):
if (a == 255)[x]:
a[x] = 0
I'm new to Pandas, the answer may be obvious.
I have 3 series of the same length: a, b, c
a[b > c] = 0
works, but:
a[math.fabs(b) > c] = 0
doesn't work, and
a[(b > c or b < -c)] = 0
doesn't work either.
How can I implement that logic?
Your issue is that in the first expression the expression you use is vectorized while in the other one it is not.
In the first expression, the < operation between two series returns a series as well
In the second expression, math.fabs is supposed to be applied elements by elements and not to an array/series of elements (try the numpy version instead is it exists).
In the third expression, the or operation is not vectorized and you should use | instead.
Consider the following list of two arrays:
from numpy import array
a = array([0, 1])
b = array([1, 0])
l = [a,b]
Then finding the index of a correctly gives
l.index(a)
>>> 0
while this does not work for b:
l.index(b)
ValueError: The truth value of an array with more than one element is ambiguous.
Use a.any() or a.all()
It seems to me, that calling a list's .index function is not working for lists of numpy arrays.
Does anybody know an explanation?
Up to now, I always solved this problem kind of daggy by converting the arrays to strings. Does someone know a more elegant and fast solution?
The good question is in fact how l.index[a] can return a correct value. Because numpy arrays treat equality in a special manner: l[1] == b returns an array and not a boolean, by comparing individual values. Here it gives array([ True, True], dtype=bool) which cannot be directly converted to a boolean, hence the error.
In fact, Python uses rich comparison and specifically PyObject_RichCompareBool to compare the searched value to every element of the list is sequence, that means that it first test identity (a is b) and next equality (a == b). So for the first element, as a is l[0], identity is true and index 0 is returned.
But for any other element, identity with first element is false, and the equality test causes the error. (thanks to Ashwini Chaudhary for its nice explaination in comment).
You can confirm it by testing a new copy of an array containing same elements as l[0]:
d = array([0,1])
l.index(d)
it gives the same error, because identity is false, and the equality test raises the error.
It means that you cannot rely on any list method using comparison (index, in, remove) and must use custom functions such as the one proposed by #orestiss. Alternatively, as a list of numpy arrays seems hard to use, you should considere wrapping the arrays:
>>> class NArray(object):
def __init__(self, arr):
self.arr = arr
def array(self):
return self.arr
def __eq__(self, other):
if (other.arr is self.arr):
return True
return (self.arr == other.arr).all()
def __ne__(self, other):
return not (self == other)
>>> a = array([0, 1])
>>> b = array([1, 0])
>>> l = [ NArray(a), NArray(b) ]
>>> l.index(NArray(a))
0
>>> l.index(NArray(b))
1
This error comes from the way numpy treats comparison between array elements see : link,
So I am guessing that since the first element is the instance of the search you get the index for it, but trying to compare the first element with the second you get this error.
I think you could use something like:
[i for i, temp in enumerate(l) if (temp == b).all()]
to get a list with the indices of equal arrays but since I am no expert in python there could be a better solution (it seems to work...)