How to search a list of arrays - python

Consider the following list of two arrays:
from numpy import array
a = array([0, 1])
b = array([1, 0])
l = [a,b]
Then finding the index of a correctly gives
l.index(a)
>>> 0
while this does not work for b:
l.index(b)
ValueError: The truth value of an array with more than one element is ambiguous.
Use a.any() or a.all()
It seems to me, that calling a list's .index function is not working for lists of numpy arrays.
Does anybody know an explanation?
Up to now, I always solved this problem kind of daggy by converting the arrays to strings. Does someone know a more elegant and fast solution?

The good question is in fact how l.index[a] can return a correct value. Because numpy arrays treat equality in a special manner: l[1] == b returns an array and not a boolean, by comparing individual values. Here it gives array([ True, True], dtype=bool) which cannot be directly converted to a boolean, hence the error.
In fact, Python uses rich comparison and specifically PyObject_RichCompareBool to compare the searched value to every element of the list is sequence, that means that it first test identity (a is b) and next equality (a == b). So for the first element, as a is l[0], identity is true and index 0 is returned.
But for any other element, identity with first element is false, and the equality test causes the error. (thanks to Ashwini Chaudhary for its nice explaination in comment).
You can confirm it by testing a new copy of an array containing same elements as l[0]:
d = array([0,1])
l.index(d)
it gives the same error, because identity is false, and the equality test raises the error.
It means that you cannot rely on any list method using comparison (index, in, remove) and must use custom functions such as the one proposed by #orestiss. Alternatively, as a list of numpy arrays seems hard to use, you should considere wrapping the arrays:
>>> class NArray(object):
def __init__(self, arr):
self.arr = arr
def array(self):
return self.arr
def __eq__(self, other):
if (other.arr is self.arr):
return True
return (self.arr == other.arr).all()
def __ne__(self, other):
return not (self == other)
>>> a = array([0, 1])
>>> b = array([1, 0])
>>> l = [ NArray(a), NArray(b) ]
>>> l.index(NArray(a))
0
>>> l.index(NArray(b))
1

This error comes from the way numpy treats comparison between array elements see : link,
So I am guessing that since the first element is the instance of the search you get the index for it, but trying to compare the first element with the second you get this error.
I think you could use something like:
[i for i, temp in enumerate(l) if (temp == b).all()]
to get a list with the indices of equal arrays but since I am no expert in python there could be a better solution (it seems to work...)

Related

Comparing nested structures of arrays in Python

Good evening everyone,
I am comparing nested structures of two arrays of the same length (i.e.: [ 1, [ 1, 1 ] ], [ [ 2, 2 ], 2 ] ) by checking for the type and length equality as follows:
def array_structure(array1, array2):
for i in range(len(array1)):
if type(array1[i]) == type(array2[i]):
if isinstance(array1[i], int) == False:
if len(array1[i]) == len(array2[i]):
return True
else:
return False
elif len(str(array1[i])) == len(str(array2[i])):
return True
else:
return False
else:
return False
The function should return True or False whether the nested structures of the arrays are equal or not.
My attemp works for some arrays, but I'm feeling like I'm missing the point so I would appreciate any help to understand and find a more pythonic way to code this and improve the logical approach.
You can use itertools.zip_longest and all:
from itertools import zip_longest
def compare(a, b):
if not isinstance(a, list) or not isinstance(b, list):
return a == b
return all(compare(j, k) for j, k in zip_longest(a, b))
print(compare([[1, [2, [3, 4]]]], [[1, [2, [3, 4]]]]))
print(compare([3, [4], 5], [3, [6], 5]))
Output:
True
False
Note that you are manipulating lists, not arrays.
The exact meaning of "structure" is a bit vague, I am guessing that you mean that there are either integers or nested lists of the same length, at the same positions.
If that is so, here are some problems:
As soon as you find two identical things, you return and exit. This means that if the first elements are identical, you never compare the second element !
>>> array_structure([1,2],[4,[5]])
True
Finding the "same structure" at one position isn't enough to deduce that it is true for all indices. You need to keep checking. However, you are right that as soon as you find "different structure", you can safely return, because it is enough to make the structures different overall.
When you have the same type for both elements, if you have an integer, then you convert them to strings, then compare lengths, otherwise, you compare the lengths. This looks strange to me: Why would you say that [1] and [11] have different structures ?
>>> array_structure([1],[11])
False
What about nested arrays ? maybe array_structure([1,[2]],[1,[2,[3,4,5,6]]]) should be false, and array_structure([4,[5,[8,3,0,1]]],[1,[2,[3,4,5,6]]]) should be true ? If you have seen the notion of recursion, this might be a good place to use it.
A good strategy for your problem would be to go over each element, make some tests to check if you have "different structures", and return False if that is the case. Otherwise, don't return, so you can go over all elements. This means that if anything is different, you will have exited during iteration, so you will only reach the end of the loop if everything was identical. At that point, you can return True. The bulk of the function should look like:
for i in range(<length>):
if <different structure>:
return False
return True
More tips for later:
Testing for types in general is not very pythonic. Unless the exercise tells you that you will only ever get integers or lists, you could have other things in these arrays that are not integers and don't have a length. Rather than testing whether the things are ints to deduce if they have a length, you could test for hasattr(array1[i], 'len'). This will be True if the element is of a type that has a length, and False otherwise.
Each time you have something that looks like if <condition>: return True else return False you can be sure it is identical to return <condition>. This helps in simplifying code, but in your example, it's not a great idea to change it right away, because you should not be returning when you find True to your conditions.
Each time you have something that looks like <condition> == False, it is better and more readable to write not <condition>. It is also possible to switch the logic over so that you have a positive condition first, which is often more readable.

How to check whether tensor values in a different tensor pytorch?

I have 2 tensors of unequal size
a = torch.tensor([[1,2], [2,3],[3,4]])
b = torch.tensor([[4,5],[2,3]])
I want a boolean array of whether each value exists in the other tensor without iterating. something like
a in b
and the result should be
[False, True, False]
as only the value of a[1] is in b
I think it's impossible without using at least some type of iteration. The most succinct way I can manage is using list comprehension:
[True if i in b else False for i in a]
Checks for elements in b that are in a and gives [False, True, False]. Can also be reversed to get elements a in b [False, True].
this should work
result = []
for i in a:
try: # to avoid error for the case of empty tensors
result.append(max(i.numpy()[1] == b.T.numpy()[1,i.numpy()[0] == b.T.numpy()[0,:]]))
except:
result.append(False)
result
Neither of the solutions that use tensor in tensor work in all cases for the OP. If the tensors contain elements/tuples that match in at least one dimension, the aforementioned operation will return True for those elements, potentially leading to hours of debugging. For example:
torch.tensor([2,5]) in torch.tensor([2,10]) # returns True
torch.tensor([5,2]) in torch.tensor([5,10]) # returns True
A solution for the above could be forcing the check for equality in each dimension, and then applying a Tensor Boolean add. Note, the following 2 methods may not be very efficient because Tensors are rather slow for iterating and equality checking, so converting to numpy may be needed for large data:
[all(torch.any(i == b, dim=0)) for i in a] # OR
[any((i[0] == b[:, 0]) & (i[1] == b[:, 1])) for i in a]
That being said, #yuri's solution also seems to work for these edge cases, but it still seems to fail occasionally, and it is rather unreadable.
If you need to compare all subtensors across the first dimension of a, use in:
>>> [i in b for i in a]
[False, True, False]
I recently also encountered this issue though my goal is to select those row sub-tensors not "in" the other tensor. My solution is to first convert the tensors to pandas dataframe, then use .drop_duplicates(). More specifically, for OP's problem, one can do:
import pandas as pd
import torch
tensor1_df = pd.DataFrame(tensor1)
tensor1_df['val'] = False
tensor2_df = pd.DataFrame(tensor2)
tensor2_df['val'] = True
tensor1_notin_tensor2 = torch.from_numpy(pd.concat([tensor1_df, tensor2_df]).reset_index().drop(columns=['index']).drop_duplicates(keep='last').reset_index().loc[np.arange(tensor1_df.shape[0])].val.values)

Python: Using Equality Operator Inside of Numpy Array Assignment

I saw this code in some examples online and am trying to understand and modify it:
c = a[b == 1]
Why does this work? It appears b == 1 returns true for each element of b that satisfies the equality. I don't understand how something like a[True] ends up evaluating to something like "For all values in a for which the same indexed value in b is equal to 1, copy them to c"
a,b, and c are all NumPy arrays of the same length containing some data.
I've searched around quite a bit but don't even know what to call this sort of thing.
If I want to add a second condition, for example:
c = a[b == 1 and d == 1]
I get
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
I know this happens because that combination of equality operations is ambiguous for reasons explained here, but I am unsure of how to add a.any() or a.all() into that expression in just one line.
EDIT:
For question 2, c = a[(b == 1) & (d == 1)] works. Any input on my first question about how/why this works?
Why wouldn't your example in point (1) work? This is Boolean indexing. If the arrays were different shapes then it may be a different matter, but:
c = a[b == 1]
Is indistinguishable from:
c = a[a == 1]
When you don't know the actual arrays. Nothing specific to a is going on here; a == 1 is just setting up a boolean mask, that you then re-apply to a in a[mask_here]. Doesn't matter what generated the mask.
You just need to put the conditions separately in brackets. Try using this
c = a[(b == 1) & (d == 1)]

What does x[x < 2] = 0 mean in Python?

I came across some code with a line similar to
x[x<2]=0
Playing around with variations, I am still stuck on what this syntax does.
Examples:
>>> x = [1,2,3,4,5]
>>> x[x<2]
1
>>> x[x<3]
1
>>> x[x>2]
2
>>> x[x<2]=0
>>> x
[0, 2, 3, 4, 5]
This only makes sense with NumPy arrays. The behavior with lists is useless, and specific to Python 2 (not Python 3). You may want to double-check if the original object was indeed a NumPy array (see further below) and not a list.
But in your code here, x is a simple list.
Since
x < 2
is False
i.e 0, therefore
x[x<2] is x[0]
x[0] gets changed.
Conversely, x[x>2] is x[True] or x[1]
So, x[1] gets changed.
Why does this happen?
The rules for comparison are:
When you order two strings or two numeric types the ordering is done in the expected way (lexicographic ordering for string, numeric ordering for integers).
When you order a numeric and a non-numeric type, the numeric type comes first.
When you order two incompatible types where neither is numeric, they are ordered by the alphabetical order of their typenames:
So, we have the following order
numeric < list < string < tuple
See the accepted answer for How does Python compare string and int?.
If x is a NumPy array, then the syntax makes more sense because of boolean array indexing. In that case, x < 2 isn't a boolean at all; it's an array of booleans representing whether each element of x was less than 2. x[x < 2] = 0 then selects the elements of x that were less than 2 and sets those cells to 0. See Indexing.
>>> x = np.array([1., -1., -2., 3])
>>> x < 0
array([False, True, True, False], dtype=bool)
>>> x[x < 0] += 20 # All elements < 0 get increased by 20
>>> x
array([ 1., 19., 18., 3.]) # Only elements < 0 are affected
>>> x = [1,2,3,4,5]
>>> x<2
False
>>> x[False]
1
>>> x[True]
2
The bool is simply converted to an integer. The index is either 0 or 1.
The original code in your question works only in Python 2. If x is a list in Python 2, the comparison x < y is False if y is an integer. This is because it does not make sense to compare a list with an integer. However in Python 2, if the operands are not comparable, the comparison is based in CPython on the alphabetical ordering of the names of the types; additionally all numbers come first in mixed-type comparisons. This is not even spelled out in the documentation of CPython 2, and different Python 2 implementations could give different results. That is [1, 2, 3, 4, 5] < 2 evaluates to False because 2 is a number and thus "smaller" than a list in CPython. This mixed comparison was eventually deemed to be too obscure a feature, and was removed in Python 3.0.
Now, the result of < is a bool; and bool is a subclass of int:
>>> isinstance(False, int)
True
>>> isinstance(True, int)
True
>>> False == 0
True
>>> True == 1
True
>>> False + 5
5
>>> True + 5
6
So basically you're taking the element 0 or 1 depending on whether the comparison is true or false.
If you try the code above in Python 3, you will get TypeError: unorderable types: list() < int() due to a change in Python 3.0:
Ordering Comparisons
Python 3.0 has simplified the rules for ordering comparisons:
The ordering comparison operators (<, <=, >=, >) raise a TypeError exception when the operands don’t have a meaningful natural ordering. Thus, expressions like 1 < '', 0 > None or len <= len are no longer valid, and e.g. None < None raises TypeError instead of returning False. A corollary is that sorting a heterogeneous list no longer makes sense – all the elements must be comparable to each other. Note that this does not apply to the == and != operators: objects of different incomparable types always compare unequal to each other.
There are many datatypes that overload the comparison operators to do something different (dataframes from pandas, numpy's arrays). If the code that you were using did something else, it was because x was not a list, but an instance of some other class with operator < overridden to return a value that is not a bool; and this value was then handled specially by x[] (aka __getitem__/__setitem__)
This has one more use: code golf. Code golf is the art of writing programs that solve some problem in as few source code bytes as possible.
return(a,b)[c<d]
is roughly equivalent to
if c < d:
return b
else:
return a
except that both a and b are evaluated in the first version, but not in the second version.
c<d evaluates to True or False.
(a, b) is a tuple.
Indexing on a tuple works like indexing on a list: (3,5)[1] == 5.
True is equal to 1 and False is equal to 0.
(a,b)[c<d]
(a,b)[True]
(a,b)[1]
b
or for False:
(a,b)[c<d]
(a,b)[False]
(a,b)[0]
a
There's a good list on the stack exchange network of many nasty things you can do to python in order to save a few bytes. https://codegolf.stackexchange.com/questions/54/tips-for-golfing-in-python
Although in normal code this should never be used, and in your case it would mean that x acts both as something that can be compared to an integer and as a container that supports slicing, which is a very unusual combination. It's probably Numpy code, as others have pointed out.
In general it could mean anything. It was already explained what it means if x is a list or numpy.ndarray but in general it only depends on how the comparison operators (<, >, ...) and also how the get/set-item ([...]-syntax) are implemented.
x.__getitem__(x.__lt__(2)) # this is what x[x < 2] means!
x.__setitem__(x.__lt__(2), 0) # this is what x[x < 2] = 0 means!
Because:
x < value is equivalent to x.__lt__(value)
x[value] is (roughly) equivalent to x.__getitem__(value)
x[value] = othervalue is (also roughly) equivalent to x.__setitem__(value, othervalue).
This can be customized to do anything you want. Just as an example (mimics a bit numpys-boolean indexing):
class Test:
def __init__(self, value):
self.value = value
def __lt__(self, other):
# You could do anything in here. For example create a new list indicating if that
# element is less than the other value
res = [item < other for item in self.value]
return self.__class__(res)
def __repr__(self):
return '{0} ({1})'.format(self.__class__.__name__, self.value)
def __getitem__(self, item):
# If you index with an instance of this class use "boolean-indexing"
if isinstance(item, Test):
res = self.__class__([i for i, index in zip(self.value, item) if index])
return res
# Something else was given just try to use it on the value
return self.value[item]
def __setitem__(self, item, value):
if isinstance(item, Test):
self.value = [i if not index else value for i, index in zip(self.value, item)]
else:
self.value[item] = value
So now let's see what happens if you use it:
>>> a = Test([1,2,3])
>>> a
Test ([1, 2, 3])
>>> a < 2 # calls __lt__
Test ([True, False, False])
>>> a[Test([True, False, False])] # calls __getitem__
Test ([1])
>>> a[a < 2] # or short form
Test ([1])
>>> a[a < 2] = 0 # calls __setitem__
>>> a
Test ([0, 2, 3])
Notice this is just one possibility. You are free to implement almost everything you want.

How to get the index of an integer from a list if the list contains a boolean?

I am just starting with Python.
How to get index of integer 1 from a list if the list contains a boolean True object before the 1?
>>> lst = [True, False, 1, 3]
>>> lst.index(1)
0
>>> lst.index(True)
0
>>> lst.index(0)
1
I think Python considers 0 as False and 1 as True in the argument of the index method. How can I get the index of integer 1 (i.e. 2)?
Also what is the reasoning or logic behind treating boolean object this way in list?
As from the solutions, I can see it is not so straightforward.
The documentation says that
Lists are mutable sequences, typically used to store collections of
homogeneous items (where the precise degree of similarity will vary by
application).
You shouldn't store heterogeneous data in lists.
The implementation of list.index only performs the comparison using Py_EQ (== operator). In your case that comparison returns truthy value because True and False have values of the integers 1 and 0, respectively (the bool class is a subclass of int after all).
However, you could use generator expression and the built-in next function (to get the first value from the generator) like this:
In [4]: next(i for i, x in enumerate(lst) if not isinstance(x, bool) and x == 1)
Out[4]: 2
Here we check if x is an instance of bool before comparing x to 1.
Keep in mind that next can raise StopIteration, in that case it may be desired to (re-)raise ValueError (to mimic the behavior of list.index).
Wrapping this all in a function:
def index_same_type(it, val):
gen = (i for i, x in enumerate(it) if type(x) is type(val) and x == val)
try:
return next(gen)
except StopIteration:
raise ValueError('{!r} is not in iterable'.format(val)) from None
Some examples:
In [34]: index_same_type(lst, 1)
Out[34]: 2
In [35]: index_same_type(lst, True)
Out[35]: 0
In [37]: index_same_type(lst, 42)
ValueError: 42 is not in iterable
Booleans are integers in Python, and this is why you can use them just like any integer:
>>> 1 + True
2
>>> [1][False]
1
[this doesn't mean you should :)]
This is due to the fact that bool is a subclass of int, and almost always a boolean will behave just like 0 or 1 (except when it is cast to string - you will get "False" and "True" instead).
Here is one more idea how you can achieve what you want (however, try to rethink you logic taking into account information above):
>>> class force_int(int):
... def __eq__(self, other):
... return int(self) == other and not isinstance(other, bool)
...
>>> force_int(1) == True
False
>>> lst.index(force_int(1))
2
This code redefines int's method, which is used to compare elements in the index method, to ignore booleans.
Here is a very simple naive one-liner solution using map and zip:
>>> zip(map(type, lst), lst).index((int, 1))
2
Here we map the type of each element and create a new list by zipping the types with the elements and ask for the index of (type, value).
And here is a generic iterative solution using the same technique:
>>> from itertools import imap, izip
>>> def index(xs, x):
... it = (i for i, (t, e) in enumerate(izip(imap(type, xs), xs)) if (t, e) == x)
... try:
... return next(it)
... except StopIteration:
... raise ValueError(x)
...
>>> index(lst, (int, 1))
2
Here we basically do the same thing but iteratively so as to not cost us much in terms of memory/space efficiency. We an iterator of the same expression from above but using imap and izip instead and build a custom index function that returns the next value from the iterator or a raise a ValueError if there is no match.
Try to this.
for i, j in enumerate([True, False, 1, 3]):
if not isinstance(j, bool) and j == 1:
print i
Output:
2

Categories

Resources