How does this list comprehension work? - python

list1 = ['Hello', 10, None]
list2 = [g.lower() for g in list1 if isinstance(g, str)]
list3 = [g.lower() if isinstance(g,str) else g for g in list1]
list4 = [isinstance(g, str) and g.lower() or g for g in list1]
If I want to convert the string in list to lowercase, I can use the method in list2 and the output will be ['hello'].
In addition to this conversion, if I want to keep integers (which is 10 in this case) and None, the methods in both list3 and list4 will work and the output will be ['hello', 10, None].
My question is that I can't understand how the method in list4 works.

To start, writing code like this:
condition and value1 or value2
was how people implemented a ternary conditional operator in Python before the:
value1 if condition else value2
conditional expression was introduced in version 2.5 because of PEP 0308. Using the old method is now deprecated in favor of the slightly more efficient and far more readable newer method.
The old method works because of how and and or operate in Python. Instead of returning boolean results like in most other languages, these operators return values.
Doing a and b returns a if a evaluates to False; otherwise, it returns b:
>>> 0 and 1
0
>>> 1 and 0
0
>>> 1 and 2
2
>>>
Doing a or b returns a if a evaluates to True; otherwise, it returns b:
>>> 1 or 0
1
>>> 0 or 1
1
>>> 1 or 2
1
>>>
Also, in case you do not know, 0 evaluates to False while every other number evaluates to True.
Coming to your code, this:
isinstance(g, str) and g.lower() or g
is actually interpreted by Python like:
(isinstance(g, str) and g.lower()) or g
Now if isinstance(g, str) returns False (g is not a string):
(False and g.lower()) or g
False is returned by and:
False or g
and then or returns g. Thus, we avoided calling .lower() on a non-string type.
If however isinstance(g, str) returns True (g is a string):
(True and g.lower()) or g
and returns g.lower():
g.lower() or g
and then or returns g.lower(), which is fine because g is a string.
Summed up, these two expressions:
g.lower() if isinstance(g,str) else g
isinstance(g, str) and g.lower() or g
are functionally equivalent. But please use the first!! The other is terrible for readability.

Quoting the doc:
The expression x and y first evaluates x; if x is false, its
value is returned; otherwise, y is evaluated and the resulting value
is returned.
The expression x or y first evaluates x; if x is true, its value is
returned; otherwise, y is evaluated and the resulting value is
returned.
Because of precedence rules, isinstance(g, str) and g.lower() or g is actually evaluated as
(isinstance(g, str) and g.lower()) or g (multiplication is of higher precedence than addition).
That basically means the following:
if isinstance(g, str) is true, result of g.lower() will be taken
otherwise g will be taken
As you see, it's the same thing that you have in list3 operation.

Related

What does x[x < 2] = 0 mean in Python?

I came across some code with a line similar to
x[x<2]=0
Playing around with variations, I am still stuck on what this syntax does.
Examples:
>>> x = [1,2,3,4,5]
>>> x[x<2]
1
>>> x[x<3]
1
>>> x[x>2]
2
>>> x[x<2]=0
>>> x
[0, 2, 3, 4, 5]
This only makes sense with NumPy arrays. The behavior with lists is useless, and specific to Python 2 (not Python 3). You may want to double-check if the original object was indeed a NumPy array (see further below) and not a list.
But in your code here, x is a simple list.
Since
x < 2
is False
i.e 0, therefore
x[x<2] is x[0]
x[0] gets changed.
Conversely, x[x>2] is x[True] or x[1]
So, x[1] gets changed.
Why does this happen?
The rules for comparison are:
When you order two strings or two numeric types the ordering is done in the expected way (lexicographic ordering for string, numeric ordering for integers).
When you order a numeric and a non-numeric type, the numeric type comes first.
When you order two incompatible types where neither is numeric, they are ordered by the alphabetical order of their typenames:
So, we have the following order
numeric < list < string < tuple
See the accepted answer for How does Python compare string and int?.
If x is a NumPy array, then the syntax makes more sense because of boolean array indexing. In that case, x < 2 isn't a boolean at all; it's an array of booleans representing whether each element of x was less than 2. x[x < 2] = 0 then selects the elements of x that were less than 2 and sets those cells to 0. See Indexing.
>>> x = np.array([1., -1., -2., 3])
>>> x < 0
array([False, True, True, False], dtype=bool)
>>> x[x < 0] += 20 # All elements < 0 get increased by 20
>>> x
array([ 1., 19., 18., 3.]) # Only elements < 0 are affected
>>> x = [1,2,3,4,5]
>>> x<2
False
>>> x[False]
1
>>> x[True]
2
The bool is simply converted to an integer. The index is either 0 or 1.
The original code in your question works only in Python 2. If x is a list in Python 2, the comparison x < y is False if y is an integer. This is because it does not make sense to compare a list with an integer. However in Python 2, if the operands are not comparable, the comparison is based in CPython on the alphabetical ordering of the names of the types; additionally all numbers come first in mixed-type comparisons. This is not even spelled out in the documentation of CPython 2, and different Python 2 implementations could give different results. That is [1, 2, 3, 4, 5] < 2 evaluates to False because 2 is a number and thus "smaller" than a list in CPython. This mixed comparison was eventually deemed to be too obscure a feature, and was removed in Python 3.0.
Now, the result of < is a bool; and bool is a subclass of int:
>>> isinstance(False, int)
True
>>> isinstance(True, int)
True
>>> False == 0
True
>>> True == 1
True
>>> False + 5
5
>>> True + 5
6
So basically you're taking the element 0 or 1 depending on whether the comparison is true or false.
If you try the code above in Python 3, you will get TypeError: unorderable types: list() < int() due to a change in Python 3.0:
Ordering Comparisons
Python 3.0 has simplified the rules for ordering comparisons:
The ordering comparison operators (<, <=, >=, >) raise a TypeError exception when the operands don’t have a meaningful natural ordering. Thus, expressions like 1 < '', 0 > None or len <= len are no longer valid, and e.g. None < None raises TypeError instead of returning False. A corollary is that sorting a heterogeneous list no longer makes sense – all the elements must be comparable to each other. Note that this does not apply to the == and != operators: objects of different incomparable types always compare unequal to each other.
There are many datatypes that overload the comparison operators to do something different (dataframes from pandas, numpy's arrays). If the code that you were using did something else, it was because x was not a list, but an instance of some other class with operator < overridden to return a value that is not a bool; and this value was then handled specially by x[] (aka __getitem__/__setitem__)
This has one more use: code golf. Code golf is the art of writing programs that solve some problem in as few source code bytes as possible.
return(a,b)[c<d]
is roughly equivalent to
if c < d:
return b
else:
return a
except that both a and b are evaluated in the first version, but not in the second version.
c<d evaluates to True or False.
(a, b) is a tuple.
Indexing on a tuple works like indexing on a list: (3,5)[1] == 5.
True is equal to 1 and False is equal to 0.
(a,b)[c<d]
(a,b)[True]
(a,b)[1]
b
or for False:
(a,b)[c<d]
(a,b)[False]
(a,b)[0]
a
There's a good list on the stack exchange network of many nasty things you can do to python in order to save a few bytes. https://codegolf.stackexchange.com/questions/54/tips-for-golfing-in-python
Although in normal code this should never be used, and in your case it would mean that x acts both as something that can be compared to an integer and as a container that supports slicing, which is a very unusual combination. It's probably Numpy code, as others have pointed out.
In general it could mean anything. It was already explained what it means if x is a list or numpy.ndarray but in general it only depends on how the comparison operators (<, >, ...) and also how the get/set-item ([...]-syntax) are implemented.
x.__getitem__(x.__lt__(2)) # this is what x[x < 2] means!
x.__setitem__(x.__lt__(2), 0) # this is what x[x < 2] = 0 means!
Because:
x < value is equivalent to x.__lt__(value)
x[value] is (roughly) equivalent to x.__getitem__(value)
x[value] = othervalue is (also roughly) equivalent to x.__setitem__(value, othervalue).
This can be customized to do anything you want. Just as an example (mimics a bit numpys-boolean indexing):
class Test:
def __init__(self, value):
self.value = value
def __lt__(self, other):
# You could do anything in here. For example create a new list indicating if that
# element is less than the other value
res = [item < other for item in self.value]
return self.__class__(res)
def __repr__(self):
return '{0} ({1})'.format(self.__class__.__name__, self.value)
def __getitem__(self, item):
# If you index with an instance of this class use "boolean-indexing"
if isinstance(item, Test):
res = self.__class__([i for i, index in zip(self.value, item) if index])
return res
# Something else was given just try to use it on the value
return self.value[item]
def __setitem__(self, item, value):
if isinstance(item, Test):
self.value = [i if not index else value for i, index in zip(self.value, item)]
else:
self.value[item] = value
So now let's see what happens if you use it:
>>> a = Test([1,2,3])
>>> a
Test ([1, 2, 3])
>>> a < 2 # calls __lt__
Test ([True, False, False])
>>> a[Test([True, False, False])] # calls __getitem__
Test ([1])
>>> a[a < 2] # or short form
Test ([1])
>>> a[a < 2] = 0 # calls __setitem__
>>> a
Test ([0, 2, 3])
Notice this is just one possibility. You are free to implement almost everything you want.

Booleans with numbers in Python

I came across this implementation of gcd in Python:
def gcd(x,y): return y and gcd(y, x % y) or x
What I don't understand is how the boolean is working in the return? After trying some numbers in the interpreter I noticed that and always returns the number on the right, while or returns the number on the left. Why is this? Also, can you walk me step by step through a simple call of this function so I can understand what is happening?
This is because of how and and or operators evaluate in Python.
From documentation -
The expression x and y first evaluates x; if x is false, its value is returned; otherwise, y is evaluated and the resulting value is returned.
The expression x or y first evaluates x; if x is true, its value is returned; otherwise, y is evaluated and the resulting value is returned.
They do not return True or False , they return the last evaluated value , and that is why we can write things like -
s = s or "Some default value"
To default the value of s if its None or empty string or empty list, or 0.
Basically, or returns the first non false-like value ( where false-like values are 0, or None or Empty string/list/tuple, etc ) Or the last false-like value if all the values are false-like. Example -
In [1]: 0 or 10
Out[1]: 10
In [2]: 5 or 0 or 10
Out[2]: 5
In [7]: 0 or '' or [] or ()
Out[7]: ()
And, and returns the first false-like value, or the last true-like value , if all the values are true-like. Example -
In [3]: 0 and 10
Out[3]: 0
In [4]: 5 and 10
Out[4]: 10
In [6]: 5 and 0 and 10
Out[6]: 0
In your case, it works as -
if y is 0 it returns x (irrespective of the value of x) .
otherwise it computes gcd(y, x%y) if that is non-zero returns it. (Though it would never really be 0)
if the result of gcd(y, x%y) is 0, then it returns x .
This is called "short circuiting." Whenever Python knows what the result of a boolean expression is going to be, it stops evaluating. This is an optimization, but it also makes for some handy idioms, like assigning default values.
def do_a_thing(maybelist=None):
# this is done all the time when you want the default argument to be
# a list, but you don't want to make the mistake of a mutable default argument
maybelist = maybelist or []
The implementation example you gave is confusing to someone who doesn't know about Euclid's Algorithm for calculating gcd's, because it's not obvious how the heck it actually calculates the gcd. I would say that that is an example of someone abusing short circuit evaluation.
In Python, the only integer that defaults to False is zero. Logical values perform short-circuiting, as one of the comments say.
With the statement:
a and b
If 'a' evaluates to False, then 'b' does not need to be evaluated, so the result of the expression is 'a', but if 'a' evaluates to True, then b still needs to be evaluated, so b will be the result of the expression unless a evaluates to False.
`or' works the other way around, which makes sense if you think about it.
and and or are not strictly Boolean operators in Python. x and y and x or y will always evaluate to either x or y, depending on the "truthiness" of the value. False values are numerical zeros, empty strings, and empty containers; all other values are true. This means that
x and y == x if x is false; y is not evaluated
x and y == y if x is true
x or y == x if x is true; y is not evaluated
x or y == y if x is false
The only time either evaluates to an actual Boolean value is when the x or y value involved is an actual Boolean value.

How to get the index of an integer from a list if the list contains a boolean?

I am just starting with Python.
How to get index of integer 1 from a list if the list contains a boolean True object before the 1?
>>> lst = [True, False, 1, 3]
>>> lst.index(1)
0
>>> lst.index(True)
0
>>> lst.index(0)
1
I think Python considers 0 as False and 1 as True in the argument of the index method. How can I get the index of integer 1 (i.e. 2)?
Also what is the reasoning or logic behind treating boolean object this way in list?
As from the solutions, I can see it is not so straightforward.
The documentation says that
Lists are mutable sequences, typically used to store collections of
homogeneous items (where the precise degree of similarity will vary by
application).
You shouldn't store heterogeneous data in lists.
The implementation of list.index only performs the comparison using Py_EQ (== operator). In your case that comparison returns truthy value because True and False have values of the integers 1 and 0, respectively (the bool class is a subclass of int after all).
However, you could use generator expression and the built-in next function (to get the first value from the generator) like this:
In [4]: next(i for i, x in enumerate(lst) if not isinstance(x, bool) and x == 1)
Out[4]: 2
Here we check if x is an instance of bool before comparing x to 1.
Keep in mind that next can raise StopIteration, in that case it may be desired to (re-)raise ValueError (to mimic the behavior of list.index).
Wrapping this all in a function:
def index_same_type(it, val):
gen = (i for i, x in enumerate(it) if type(x) is type(val) and x == val)
try:
return next(gen)
except StopIteration:
raise ValueError('{!r} is not in iterable'.format(val)) from None
Some examples:
In [34]: index_same_type(lst, 1)
Out[34]: 2
In [35]: index_same_type(lst, True)
Out[35]: 0
In [37]: index_same_type(lst, 42)
ValueError: 42 is not in iterable
Booleans are integers in Python, and this is why you can use them just like any integer:
>>> 1 + True
2
>>> [1][False]
1
[this doesn't mean you should :)]
This is due to the fact that bool is a subclass of int, and almost always a boolean will behave just like 0 or 1 (except when it is cast to string - you will get "False" and "True" instead).
Here is one more idea how you can achieve what you want (however, try to rethink you logic taking into account information above):
>>> class force_int(int):
... def __eq__(self, other):
... return int(self) == other and not isinstance(other, bool)
...
>>> force_int(1) == True
False
>>> lst.index(force_int(1))
2
This code redefines int's method, which is used to compare elements in the index method, to ignore booleans.
Here is a very simple naive one-liner solution using map and zip:
>>> zip(map(type, lst), lst).index((int, 1))
2
Here we map the type of each element and create a new list by zipping the types with the elements and ask for the index of (type, value).
And here is a generic iterative solution using the same technique:
>>> from itertools import imap, izip
>>> def index(xs, x):
... it = (i for i, (t, e) in enumerate(izip(imap(type, xs), xs)) if (t, e) == x)
... try:
... return next(it)
... except StopIteration:
... raise ValueError(x)
...
>>> index(lst, (int, 1))
2
Here we basically do the same thing but iteratively so as to not cost us much in terms of memory/space efficiency. We an iterator of the same expression from above but using imap and izip instead and build a custom index function that returns the next value from the iterator or a raise a ValueError if there is no match.
Try to this.
for i, j in enumerate([True, False, 1, 3]):
if not isinstance(j, bool) and j == 1:
print i
Output:
2

any() function in Python

I am new to Python and was playing with it until I have a problem with any() function. According to the Python library, the code for it is:
def any(iterable):
for element in iterable:
if element:
return True
return False
I created a list: list = [-1, 1] and I expected:
print any(list) < 0
print any(x < 0 for x in list)
would print two True's and the two statements are equivalent. But instead Python printed
False
True
Why is the first statement False? How is it different from the second one?
any(list) returns a boolean value, based only on the contents of list. Both -1 and 1 are true values (they are not numeric 0), so any() returns True:
>>> lst = [1, -1]
>>> any(lst)
True
Boolean values in Python are a subclass of int, where True == 1 and False == 0, so True is not smaller than 0:
>>> True < 0
False
The statement any(list) is in no way equivalent to any(x < 0 for x in list) here. That expression uses a generator expression to test each element individually against 0, and there is indeed one value smaller than 0 in that list, -1:
>>> (x < 0 for x in lst)
<generator object <genexpr> at 0x1077769d8>
>>> list(x < 0 for x in lst)
[False, True]
so any() returns True as soon as it encounters the second value in the sequence produced.
Note: You should avoid using list as a variable name as that masks the built-in type. I used lst in my answer instead, so that I could use the list() callable to illustrate what the generator expression produces.
As stated in the docs, any(list) returns a boolean. You're comparing that boolean to the integer 0:
>>> any(list)
True
>>> True < 0
False

Python "and" operator with ints

What is the explanation for this behavior in Python?
a = 10
b = 20
a and b # 20
b and a # 10
a and b evaluates to 20, while b and a evaluates to 10. Are positive ints equivalent to True? Why does it evaluate to the second value? Because it is second?
The documentation explains this quite well:
The expression x and y first evaluates x; if x is false, its value is returned; otherwise, y is evaluated and the resulting value is returned.
And similarly for or which will probably be the next question on your lips.
The expression x or y first evaluates x; if x is true, its value is returned; otherwise, y is evaluated and the resulting value is returned.
See the docs:
x and y if x is false, then x, else y
non-zero integers are treated as boolean true, so you get exactly the behavior described in the docs:
>>> a = 10
>>> b = 20
>>> a and b
20
>>> b and a
10
In python everything that is not None, 0, False, "", [], (), {} is True
a and b is readed as True and True in this case
the same for b and a
and yes in this case it takes the first value
edit: incomplete as in the comments

Categories

Resources