What does x[x < 2] = 0 mean in Python? - python

I came across some code with a line similar to
x[x<2]=0
Playing around with variations, I am still stuck on what this syntax does.
Examples:
>>> x = [1,2,3,4,5]
>>> x[x<2]
1
>>> x[x<3]
1
>>> x[x>2]
2
>>> x[x<2]=0
>>> x
[0, 2, 3, 4, 5]

This only makes sense with NumPy arrays. The behavior with lists is useless, and specific to Python 2 (not Python 3). You may want to double-check if the original object was indeed a NumPy array (see further below) and not a list.
But in your code here, x is a simple list.
Since
x < 2
is False
i.e 0, therefore
x[x<2] is x[0]
x[0] gets changed.
Conversely, x[x>2] is x[True] or x[1]
So, x[1] gets changed.
Why does this happen?
The rules for comparison are:
When you order two strings or two numeric types the ordering is done in the expected way (lexicographic ordering for string, numeric ordering for integers).
When you order a numeric and a non-numeric type, the numeric type comes first.
When you order two incompatible types where neither is numeric, they are ordered by the alphabetical order of their typenames:
So, we have the following order
numeric < list < string < tuple
See the accepted answer for How does Python compare string and int?.
If x is a NumPy array, then the syntax makes more sense because of boolean array indexing. In that case, x < 2 isn't a boolean at all; it's an array of booleans representing whether each element of x was less than 2. x[x < 2] = 0 then selects the elements of x that were less than 2 and sets those cells to 0. See Indexing.
>>> x = np.array([1., -1., -2., 3])
>>> x < 0
array([False, True, True, False], dtype=bool)
>>> x[x < 0] += 20 # All elements < 0 get increased by 20
>>> x
array([ 1., 19., 18., 3.]) # Only elements < 0 are affected

>>> x = [1,2,3,4,5]
>>> x<2
False
>>> x[False]
1
>>> x[True]
2
The bool is simply converted to an integer. The index is either 0 or 1.

The original code in your question works only in Python 2. If x is a list in Python 2, the comparison x < y is False if y is an integer. This is because it does not make sense to compare a list with an integer. However in Python 2, if the operands are not comparable, the comparison is based in CPython on the alphabetical ordering of the names of the types; additionally all numbers come first in mixed-type comparisons. This is not even spelled out in the documentation of CPython 2, and different Python 2 implementations could give different results. That is [1, 2, 3, 4, 5] < 2 evaluates to False because 2 is a number and thus "smaller" than a list in CPython. This mixed comparison was eventually deemed to be too obscure a feature, and was removed in Python 3.0.
Now, the result of < is a bool; and bool is a subclass of int:
>>> isinstance(False, int)
True
>>> isinstance(True, int)
True
>>> False == 0
True
>>> True == 1
True
>>> False + 5
5
>>> True + 5
6
So basically you're taking the element 0 or 1 depending on whether the comparison is true or false.
If you try the code above in Python 3, you will get TypeError: unorderable types: list() < int() due to a change in Python 3.0:
Ordering Comparisons
Python 3.0 has simplified the rules for ordering comparisons:
The ordering comparison operators (<, <=, >=, >) raise a TypeError exception when the operands don’t have a meaningful natural ordering. Thus, expressions like 1 < '', 0 > None or len <= len are no longer valid, and e.g. None < None raises TypeError instead of returning False. A corollary is that sorting a heterogeneous list no longer makes sense – all the elements must be comparable to each other. Note that this does not apply to the == and != operators: objects of different incomparable types always compare unequal to each other.
There are many datatypes that overload the comparison operators to do something different (dataframes from pandas, numpy's arrays). If the code that you were using did something else, it was because x was not a list, but an instance of some other class with operator < overridden to return a value that is not a bool; and this value was then handled specially by x[] (aka __getitem__/__setitem__)

This has one more use: code golf. Code golf is the art of writing programs that solve some problem in as few source code bytes as possible.
return(a,b)[c<d]
is roughly equivalent to
if c < d:
return b
else:
return a
except that both a and b are evaluated in the first version, but not in the second version.
c<d evaluates to True or False.
(a, b) is a tuple.
Indexing on a tuple works like indexing on a list: (3,5)[1] == 5.
True is equal to 1 and False is equal to 0.
(a,b)[c<d]
(a,b)[True]
(a,b)[1]
b
or for False:
(a,b)[c<d]
(a,b)[False]
(a,b)[0]
a
There's a good list on the stack exchange network of many nasty things you can do to python in order to save a few bytes. https://codegolf.stackexchange.com/questions/54/tips-for-golfing-in-python
Although in normal code this should never be used, and in your case it would mean that x acts both as something that can be compared to an integer and as a container that supports slicing, which is a very unusual combination. It's probably Numpy code, as others have pointed out.

In general it could mean anything. It was already explained what it means if x is a list or numpy.ndarray but in general it only depends on how the comparison operators (<, >, ...) and also how the get/set-item ([...]-syntax) are implemented.
x.__getitem__(x.__lt__(2)) # this is what x[x < 2] means!
x.__setitem__(x.__lt__(2), 0) # this is what x[x < 2] = 0 means!
Because:
x < value is equivalent to x.__lt__(value)
x[value] is (roughly) equivalent to x.__getitem__(value)
x[value] = othervalue is (also roughly) equivalent to x.__setitem__(value, othervalue).
This can be customized to do anything you want. Just as an example (mimics a bit numpys-boolean indexing):
class Test:
def __init__(self, value):
self.value = value
def __lt__(self, other):
# You could do anything in here. For example create a new list indicating if that
# element is less than the other value
res = [item < other for item in self.value]
return self.__class__(res)
def __repr__(self):
return '{0} ({1})'.format(self.__class__.__name__, self.value)
def __getitem__(self, item):
# If you index with an instance of this class use "boolean-indexing"
if isinstance(item, Test):
res = self.__class__([i for i, index in zip(self.value, item) if index])
return res
# Something else was given just try to use it on the value
return self.value[item]
def __setitem__(self, item, value):
if isinstance(item, Test):
self.value = [i if not index else value for i, index in zip(self.value, item)]
else:
self.value[item] = value
So now let's see what happens if you use it:
>>> a = Test([1,2,3])
>>> a
Test ([1, 2, 3])
>>> a < 2 # calls __lt__
Test ([True, False, False])
>>> a[Test([True, False, False])] # calls __getitem__
Test ([1])
>>> a[a < 2] # or short form
Test ([1])
>>> a[a < 2] = 0 # calls __setitem__
>>> a
Test ([0, 2, 3])
Notice this is just one possibility. You are free to implement almost everything you want.

Related

How to search a list of arrays

Consider the following list of two arrays:
from numpy import array
a = array([0, 1])
b = array([1, 0])
l = [a,b]
Then finding the index of a correctly gives
l.index(a)
>>> 0
while this does not work for b:
l.index(b)
ValueError: The truth value of an array with more than one element is ambiguous.
Use a.any() or a.all()
It seems to me, that calling a list's .index function is not working for lists of numpy arrays.
Does anybody know an explanation?
Up to now, I always solved this problem kind of daggy by converting the arrays to strings. Does someone know a more elegant and fast solution?
The good question is in fact how l.index[a] can return a correct value. Because numpy arrays treat equality in a special manner: l[1] == b returns an array and not a boolean, by comparing individual values. Here it gives array([ True, True], dtype=bool) which cannot be directly converted to a boolean, hence the error.
In fact, Python uses rich comparison and specifically PyObject_RichCompareBool to compare the searched value to every element of the list is sequence, that means that it first test identity (a is b) and next equality (a == b). So for the first element, as a is l[0], identity is true and index 0 is returned.
But for any other element, identity with first element is false, and the equality test causes the error. (thanks to Ashwini Chaudhary for its nice explaination in comment).
You can confirm it by testing a new copy of an array containing same elements as l[0]:
d = array([0,1])
l.index(d)
it gives the same error, because identity is false, and the equality test raises the error.
It means that you cannot rely on any list method using comparison (index, in, remove) and must use custom functions such as the one proposed by #orestiss. Alternatively, as a list of numpy arrays seems hard to use, you should considere wrapping the arrays:
>>> class NArray(object):
def __init__(self, arr):
self.arr = arr
def array(self):
return self.arr
def __eq__(self, other):
if (other.arr is self.arr):
return True
return (self.arr == other.arr).all()
def __ne__(self, other):
return not (self == other)
>>> a = array([0, 1])
>>> b = array([1, 0])
>>> l = [ NArray(a), NArray(b) ]
>>> l.index(NArray(a))
0
>>> l.index(NArray(b))
1
This error comes from the way numpy treats comparison between array elements see : link,
So I am guessing that since the first element is the instance of the search you get the index for it, but trying to compare the first element with the second you get this error.
I think you could use something like:
[i for i, temp in enumerate(l) if (temp == b).all()]
to get a list with the indices of equal arrays but since I am no expert in python there could be a better solution (it seems to work...)

Precise Membership Test in Python

The in operator tests for equivalence using comparison, but Python's comparison isn't precise in the sense that True == 1 and 0 == False, yielding -
>>> True in [ 1 ]
True
>>> False in [ 0 ]
True
>>> 1 in [ True ]
True
>>> 0 in [ False ]
True
whereas I need a precise comparison (similar to === in other languages) that would yield False in all of the above examples. I could of course iterate over the list:
res = False
for member in mylist:
if subject == member and type( subject ) == type( member ):
res = True
break
This is obviously much less efficient then using the builtin in operator, even if I pack it as a list comprehension. Is there some native alternative to in such as a list method or some way to tweak in's behavior to get the required result?
The in operator is used in my case for testing the uniqueness of all list members, so a native uniqueness test would do as well.
Important note: The list may contain mutable values, so using set isn't an option.
Python version is 3.4, would be great for the solution to work on 2.7 too.
EDIT TO ALL THOSE WHO SUGGEST USING IS:
I look for a non-iterating, native alternative to a in b.
The is operator is not relevant for this case. For example, in the following situation in works just fine but is won't:
>>> [1,2] in [[1,2]]
True
Please, do read the question before answering it...
in doesn't test for equivalence at all. It checks if an item is in a container. Example:
>>> 5 in [1,2,3,4,5]
True
>>> 6 in [1,2,3,4,5]
False
>>> True in {True, False}
True
>>> "k" in ("b","c")
True
What you are looking for is is.
>>> True == 1
True
>>> True is 1
False
>>> False == 0
True
>>> False is 0
False
EDIT
After reading your edit, I don't think there is something built in in python libraries that suits your needs. What you want is basically to differentiate between int and bool (True, False). But python itself treats True and False as integers. This is because bool is a subclass of int. Which is why True == 1 and False==0 evaluates to true. You can even do:
>>> isinstance ( True, int)
True
I cannot think of anything better than your own solution, However, if your list is certain to contain any item not more than once you can use list.index()
try:
index_val = mylist.index(subject)
except ValueError:
index_val = None
if (index_val!=None):
return type(subject) == type(member)
Since index is built-in, it might be a little faster, though rather inelegant.
Python in operator is precise and the behavior you're complaining of is perfectly expected, since bool is a subclass of int.
Below is the excerpt of the official Python documentation describing the boolean type:
Booleans
These represent the truth values False and True. The two objects representing the values False and True are the only Boolean objects. The Boolean type is a subtype of plain integers, and Boolean values behave like the values 0 and 1, respectively, in almost all contexts, the exception being that when converted to a string, the strings "False" or "True" are returned, respectively.
You can also have a look at PEP 285.
You're looking for the is operator:
if any(x is True for x in l):
...
is, however, isn't exactly === from other languages. is checks identity, not just equality without type coercion. Since CPython uses string and integer interning, two objects that are equal may not be the same object:
In [19]: a = '12'
In [20]: b = '123'
In [21]: a += '3'
In [22]: a is b
Out[22]: False
In [23]: a == b
Out[23]: True
In [27]: 100001 is 100000 + 1
Out[27]: False
In [28]: 100001 == 100000 + 1
Out[28]: True
In Python 3, None, True, and False are essentially singletons, so using is for discerning True from 1 will work perfectly fine. In Python 2, however, this is possible:
In [29]: True = 1
In [31]: True is 1
Out[31]: True
Equality can be overridden __eq__ method, so you can define an object that is equal to any other object:
In [1]: %paste
class Test(object):
def __eq__(self, other):
return True
## -- End pasted text --
In [2]: x = Test()
In [3]: x == None
Out[3]: True
In [4]: x == True
Out[4]: True
In [5]: x == False
Out[5]: True
In this case, how would === work? There is no general solution, so Python has no built-in method of lists that does what you want.

How to get the index of an integer from a list if the list contains a boolean?

I am just starting with Python.
How to get index of integer 1 from a list if the list contains a boolean True object before the 1?
>>> lst = [True, False, 1, 3]
>>> lst.index(1)
0
>>> lst.index(True)
0
>>> lst.index(0)
1
I think Python considers 0 as False and 1 as True in the argument of the index method. How can I get the index of integer 1 (i.e. 2)?
Also what is the reasoning or logic behind treating boolean object this way in list?
As from the solutions, I can see it is not so straightforward.
The documentation says that
Lists are mutable sequences, typically used to store collections of
homogeneous items (where the precise degree of similarity will vary by
application).
You shouldn't store heterogeneous data in lists.
The implementation of list.index only performs the comparison using Py_EQ (== operator). In your case that comparison returns truthy value because True and False have values of the integers 1 and 0, respectively (the bool class is a subclass of int after all).
However, you could use generator expression and the built-in next function (to get the first value from the generator) like this:
In [4]: next(i for i, x in enumerate(lst) if not isinstance(x, bool) and x == 1)
Out[4]: 2
Here we check if x is an instance of bool before comparing x to 1.
Keep in mind that next can raise StopIteration, in that case it may be desired to (re-)raise ValueError (to mimic the behavior of list.index).
Wrapping this all in a function:
def index_same_type(it, val):
gen = (i for i, x in enumerate(it) if type(x) is type(val) and x == val)
try:
return next(gen)
except StopIteration:
raise ValueError('{!r} is not in iterable'.format(val)) from None
Some examples:
In [34]: index_same_type(lst, 1)
Out[34]: 2
In [35]: index_same_type(lst, True)
Out[35]: 0
In [37]: index_same_type(lst, 42)
ValueError: 42 is not in iterable
Booleans are integers in Python, and this is why you can use them just like any integer:
>>> 1 + True
2
>>> [1][False]
1
[this doesn't mean you should :)]
This is due to the fact that bool is a subclass of int, and almost always a boolean will behave just like 0 or 1 (except when it is cast to string - you will get "False" and "True" instead).
Here is one more idea how you can achieve what you want (however, try to rethink you logic taking into account information above):
>>> class force_int(int):
... def __eq__(self, other):
... return int(self) == other and not isinstance(other, bool)
...
>>> force_int(1) == True
False
>>> lst.index(force_int(1))
2
This code redefines int's method, which is used to compare elements in the index method, to ignore booleans.
Here is a very simple naive one-liner solution using map and zip:
>>> zip(map(type, lst), lst).index((int, 1))
2
Here we map the type of each element and create a new list by zipping the types with the elements and ask for the index of (type, value).
And here is a generic iterative solution using the same technique:
>>> from itertools import imap, izip
>>> def index(xs, x):
... it = (i for i, (t, e) in enumerate(izip(imap(type, xs), xs)) if (t, e) == x)
... try:
... return next(it)
... except StopIteration:
... raise ValueError(x)
...
>>> index(lst, (int, 1))
2
Here we basically do the same thing but iteratively so as to not cost us much in terms of memory/space efficiency. We an iterator of the same expression from above but using imap and izip instead and build a custom index function that returns the next value from the iterator or a raise a ValueError if there is no match.
Try to this.
for i, j in enumerate([True, False, 1, 3]):
if not isinstance(j, bool) and j == 1:
print i
Output:
2

Why is cmp( ) useful?

According to the doc and this tutorial,
cmp() returns -1 if x < y
and
cmp() returns 0 if x == y
and
cmp() returns 1 if x > y
The tutorial also said that
cmp() returns the sign of the difference of two numbers
I don't really get what sign of the difference of two numbers means. Doesn't that mean that it returns a value when the sign of numbers aren't equal? Since...
cmp(80, 100) : -1 # both have positive sign.
cmp(180, 100) : 1 # both also have positive sign.
cmp(-80, 100) : -1
cmp(80, -100) : 1
**Note: code from the tutorial.*
Despite my confusion in sign differences, I can't really think of why do we need a built-in function to return a value of -1 when x < y.
Isn't the function cmp( ) easily implemented ? Is there any reason why Python creators keep cmp( ) function, or is there any hidden usage of this Python's cmp( ) function ?
Why cmp( ) is useful?
It isn't very useful, which is why it was deprecated (the builtin cmp is gone and builtin sorts no longer accept one in Python 3). Rich comparison methods supplanted it:
object.__lt__(self, other)
object.__le__(self, other)
object.__eq__(self, other)
object.__ne__(self, other)
object.__gt__(self, other)
object.__ge__(self, other)
This allows the < symbol (and other symbols) to be overloaded comparison operators, enabling, for example, subset and superset comparisons of set objects.
>>> set('abc') < set('cba')
False
>>> set('abc') <= set('cba')
True
>>> set('abc') == set('cba')
True
>>> set('abc') >= set('cba')
True
>>> set('abc') > set('cba')
False
while it could enable the above, cmp wouldn't allow the following:
>>> set('abc') == set('bcd')
False
>>> set('abc') >= set('bcd')
False
>>> set('abc') <= set('bcd')
False
Toy usage for cmp
Here's an interesting usage which uses its result as an index (it returns -1 if the first is less than the second, 0 if equal, and 1 if greater than):
def cmp_to_symbol(val, other_val):
'''returns the symbol representing the relationship between two values'''
return '=><'[cmp(val, other_val)]
>>> cmp_to_symbol(0, 1)
'<'
>>> cmp_to_symbol(1, 1)
'='
>>> cmp_to_symbol(1, 0)
'>'
According to the docs, you should treat cmp as if it wasn't there:
https://docs.python.org/3/whatsnew/3.0.html#ordering-comparisons
cmp removed, equivalent operation
But you can use this as the equivalent:
(a > b) - (a < b)
in our little toy function, that's this:
def cmp_to_symbol(val, other_val):
'''returns the symbol representing the relationship between two values'''
return '=><'[(val > other_val) - (val < other_val)]
I don't really get what does it mean sign of the difference of two numbers.
This means: take the difference, and then the sign of that difference. For example, if x and y are two numbers:
x < y => x - y < 0 and the function returns -1.
x == y => x - y == 0 and the function returns 0.
x > y => x - y > 0 and the function returns 1.
For more information on three-way comparisons, see Wikipedia.
Trivalued comparators are very useful when sorting. You don't just want to know whether two elements are equal; you also want to know their relative order so that you know how to rearrange them to move closer to a sorted list. This is why C (strcmp) and Perl (cmp) both have similar operations (in those cases for strings, but it's the same idea).
For sorting sequences of items. When you are sorting a list of items you only need to know one item is greater or less than another item.
More info here: http://wiki.python.org/moin/HowTo/Sorting/#The_Old_Way_Using_the_cmp_Parameter
Another use case: Finding the sign (- / +) of a number
If you want to find out, what the sign (+/-) of a number is, you can easily use 0 as the second argument to the cmp function
cmp(-123, 0) #returns -1
cmp( 123, 0) #returns 1

Is there a difference between -1 and False in Python?

I have always thought that using -1 in a condition is alway the same as the writing False (boolean value). But from my code, I get different results:
Using True and False:
def count(sub, s):
count = 0
index = 0
while True:
if string.find(s, sub, index) != False:
count += 1
index = string.find(s, sub, index) + 1
else:
return count
print count('nana', 'banana')
Result: Takes to long for interpreter to respond.
Using 1 and -1:
def count(sub, s):
count = 0
index = 0
while 1:
if string.find(s, sub, index) != -1:
count += 1
index = string.find(s, sub, index) + 1
else:
return count
print count('nana', 'banana')
Result: 1
Why does using -1 and 1 give me the correct result whereas using the bool values True and False do not?
string.find doesn't return a boolean so string.find('banana', 'nana', index) will NEVER return 0 (False) regardless of the value of index.
>>> import string
>>> help(string.find)
Help on function find in module string:
find(s, *args)
find(s, sub [, start [, end]]) -> int
Return the lowest index in s where substring sub is found,
such that sub is contained within s[start,end]. Optional
arguments start and end are interpreted as in slice notation.
Return -1 on failure.
>>>
Your example simply repeats:
index = string.find('banana', 'nana', 0) + 1 # index = 3
index = string.find('banana', 'nana', 3) + 1 # index = 0
The -1 version works because it correctly interprets the return value of string.find!
False is of type bool, which is a sub-type of int, and its value is 0.
In Python, False is similar to using 0, not -1
There's a difference between equality and converting to a boolean value for truth testing, for both historical and flexibility reasons:
>>> True == 1
True
>>> True == -1
False
>>> bool(-1)
True
>>> False == 0
True
>>> bool(0)
False
>>> True == 2
False
>>> bool(2)
True
I have always thought that using -1 in a condition is alway the same as the writing False (boolean value).
1) No. It is never the same, and I can't imagine why you would have ever thought this, let alone always thought it. Unless for some reason you had only ever used if with string.find or something.
2) You shouldn't be using the string module in the first place. Quoting directly from the documentation:
DESCRIPTION
Warning: most of the code you see here isn't normally used nowadays.
Beginning with Python 1.6, many of these functions are implemented as
methods on the standard string object. They used to be implemented by
a built-in module called strop, but strop is now obsolete itself.
So instead of string.find('foobar', 'foo'), we use the .find method of the str class itself (the class that 'foobar' and 'foo' belong to); and since we have objects of that class, we can make bound method calls, thus: 'foobar'.find('foo').
3) The .find method of strings returns a number that tells you where the substring was found, if it was found. If the substring wasn't found, it returns -1. It cannot return 0 in this case, because that would mean "was found at the beginning".
4) False will compare equal to 0. It is worth noting that Python actually implements its bool type as a subclass of int.
5) No matter what language you are using, you should not compare to boolean literals. x == False or equivalent is, quite simply, not the right thing to write. It gains you nothing in terms of clarity, and creates opportunities to make mistakes.
You would never, ever say "If it is true that it is raining, I will need an umbrella" in English, even though that is grammatically correct. There is no point; it is not more polite nor more clear than the obvious "If it is raining, I will need an umbrella".
If you want to use a value as a boolean, then use it as a boolean. If you want to use the result of a comparison (i.e. "is the value equal to -1 or not?"), then perform the comparison.

Categories

Resources