Why is cmp( ) useful? - python

According to the doc and this tutorial,
cmp() returns -1 if x < y
and
cmp() returns 0 if x == y
and
cmp() returns 1 if x > y
The tutorial also said that
cmp() returns the sign of the difference of two numbers
I don't really get what sign of the difference of two numbers means. Doesn't that mean that it returns a value when the sign of numbers aren't equal? Since...
cmp(80, 100) : -1 # both have positive sign.
cmp(180, 100) : 1 # both also have positive sign.
cmp(-80, 100) : -1
cmp(80, -100) : 1
**Note: code from the tutorial.*
Despite my confusion in sign differences, I can't really think of why do we need a built-in function to return a value of -1 when x < y.
Isn't the function cmp( ) easily implemented ? Is there any reason why Python creators keep cmp( ) function, or is there any hidden usage of this Python's cmp( ) function ?

Why cmp( ) is useful?
It isn't very useful, which is why it was deprecated (the builtin cmp is gone and builtin sorts no longer accept one in Python 3). Rich comparison methods supplanted it:
object.__lt__(self, other)
object.__le__(self, other)
object.__eq__(self, other)
object.__ne__(self, other)
object.__gt__(self, other)
object.__ge__(self, other)
This allows the < symbol (and other symbols) to be overloaded comparison operators, enabling, for example, subset and superset comparisons of set objects.
>>> set('abc') < set('cba')
False
>>> set('abc') <= set('cba')
True
>>> set('abc') == set('cba')
True
>>> set('abc') >= set('cba')
True
>>> set('abc') > set('cba')
False
while it could enable the above, cmp wouldn't allow the following:
>>> set('abc') == set('bcd')
False
>>> set('abc') >= set('bcd')
False
>>> set('abc') <= set('bcd')
False
Toy usage for cmp
Here's an interesting usage which uses its result as an index (it returns -1 if the first is less than the second, 0 if equal, and 1 if greater than):
def cmp_to_symbol(val, other_val):
'''returns the symbol representing the relationship between two values'''
return '=><'[cmp(val, other_val)]
>>> cmp_to_symbol(0, 1)
'<'
>>> cmp_to_symbol(1, 1)
'='
>>> cmp_to_symbol(1, 0)
'>'
According to the docs, you should treat cmp as if it wasn't there:
https://docs.python.org/3/whatsnew/3.0.html#ordering-comparisons
cmp removed, equivalent operation
But you can use this as the equivalent:
(a > b) - (a < b)
in our little toy function, that's this:
def cmp_to_symbol(val, other_val):
'''returns the symbol representing the relationship between two values'''
return '=><'[(val > other_val) - (val < other_val)]

I don't really get what does it mean sign of the difference of two numbers.
This means: take the difference, and then the sign of that difference. For example, if x and y are two numbers:
x < y => x - y < 0 and the function returns -1.
x == y => x - y == 0 and the function returns 0.
x > y => x - y > 0 and the function returns 1.
For more information on three-way comparisons, see Wikipedia.

Trivalued comparators are very useful when sorting. You don't just want to know whether two elements are equal; you also want to know their relative order so that you know how to rearrange them to move closer to a sorted list. This is why C (strcmp) and Perl (cmp) both have similar operations (in those cases for strings, but it's the same idea).

For sorting sequences of items. When you are sorting a list of items you only need to know one item is greater or less than another item.
More info here: http://wiki.python.org/moin/HowTo/Sorting/#The_Old_Way_Using_the_cmp_Parameter

Another use case: Finding the sign (- / +) of a number
If you want to find out, what the sign (+/-) of a number is, you can easily use 0 as the second argument to the cmp function
cmp(-123, 0) #returns -1
cmp( 123, 0) #returns 1

Related

What does the following line accomplish? [duplicate]

This question already has answers here:
How do "and" and "or" act with non-boolean values?
(8 answers)
Closed 2 years ago.
So I was trying to solve an algorithm and while trying to find other solutions to it, I found one which was very short and very fast, just one problem...I can't seem to understand what this line is doing:
Full solution:
def proper_fractions(n):
phi = n > 1 and n
print(phi)
for p in range(2, int(n ** .5) + 1):
if not n % p:
phi -= phi // p
while not n % p:
n //= p
if n > 1: phi -= phi // n
return phi
Line that I don't understand:
phi = n > 1 and n
Please forgive me If it is very easy to understand, I just have never come across something like this, I've only used and in if statements, here is what I changed the line to (I think it works like the other one, but not sure how the other one does exactly what the following line which I changed does):
phi = n if n > 1 else False
Please could someone clear-up how the line that I don't understand works?
As can be seen in the Python docs, Python logical operators are not necessarily "purely Boolean". In particular, and and or are not actually guaranteed to return True or False. Instead, they return one of their operands. Here's how Python defines the value of x and y:
if x is false, then x, else y
"False" in this context does not mean that x has to be the value False. Instead, it just has to be anything with a falsy value, like a zero of any type or an empty sequence or collection.
In this case, when n > 1 evaluates False, the operator short-circuits and returns n > 1, AKA False. But if n > 1 evaluates True, the operator simply returns n without modifying it in any way, as the docs describe.
The truth table for a and b looks like this:
True and True == True
True and False == False
False and True == False
False and False == False
We can observe three things:
When a is True, the result is always the same as b.
When a is False, the result is always False, in other words, it is always the same as a.
When a is False, b is completely irrelevant, so we don't even need to evaluate it.
Note that in Python, True and False are not the only objects that have a boolean value. In fact, every single object in Python has a boolean value. E.g. 0 has a falsey value, "Hello" has a truthy value, and so on.
So, with the optimizations we discovered about the truth table, and the added condition that we need to handle values other than True and False, we can construct the following, revised, truth table:
a and b == a # if `a` is *falsey*
a and b == b # if `a` is *truthy*
This matches up with the documentation of and:
The expression x and y first evaluates x; if x is false, its value is returned; otherwise, y is evaluated and the resulting value is returned.
Similar reasoning applies to or:
a or b == b # if `a` is *falsey*
a or b == a # if `a` is *truthy*
So, the result of the line in question:
phi = n > 1 and n
will be that phi is assigned False if n <= 1 and n if n > 1.
The further computation that is performed with phi in turn works because False is equivalent to 0 in a numeric context, i.e.
False + 1 == 1
False - 1 == -1
This makes the rest of the algorithm work, which contains statements like:
phi -= phi // p
Where arithmetic is performed with the value of phi.
See the documentation on the numeric types for details, which contains the following statement [bold emphasis mine]:
There are three distinct numeric types: integers, floating point numbers, and complex numbers. In addition, Booleans are a subtype of integers.
From section 6.11 of the documentation:
The expression x and y first evaluates x; if x is false, its value is returned; otherwise, y is evaluated and the resulting value is returned.
Note that neither and nor or restrict the value and type they return to False and True, but rather return the last evaluated argument. This is sometimes useful, e.g., if s is a string that should be replaced by a default value if it is empty, the expression s or 'foo' yields the desired value. Because not has to create a new value, it returns a boolean value regardless of the type of its argument (for example, not 'foo' produces False rather than ''.)
So it first check if n > 1, if it is true, then it return s n, otherwise it returns False.
First, it's first checking if the first evaluation is True or False (Null value are considered false). Then if it's True, will return the second value. In this case, n.
More details:
> if n = 3
> 1. phi = n > 1 and n
> 2. phi = 3 > 1 and 3
> 3. phi = True and 3
> 4. phi = 3

Is “An expression that has one of two values, depending on a condition.” an accurate definition of conditional expression?

In Think Python, 2nd Edition, the author defines conditional expression as "An expression that has one of two values, depending on a condition." But after I had reflected about it, I have thought that the accuracy of the definition may be questionable. Here's a function which is written using a conditional expression:
def get_sign(n):
"""Returns 1 if n is a positive number, -1 if n is a negative number,
or 0 if n is a zero
"""
return 1 if n > 0 else -1 if n < 0 else 0
Here the conditional expression is 1 if n > 0 else -1 if n < 0 else 0. And there are two observations about that:
the expression has one of three possible values, namely 1, -1, or 0.
the value depends on two conditions, namely n > 0, and n < 0.
So, is the author's definition accurate, why and why not? Is "An expression whose value depends on one or more conditions, and that has one of several values (at least two)." a more accurate definition of conditional expression, why and why not?
You still have two outcomes. That one of those two outcomes is itself dependent on another conditional expression doesn't change this.
I've added parentheses here to illustrate my point:
1 if n > 0 else (-1 if n < 0 else 0)
So the outcome of that expression is one of these two options:
1
-1 if n < 0 else 0
That second expression is itself another conditional expression. The first value is also just an expression, which has a value once you've evaluated it; the only difference is that it produces a simple literal value. All of this makes no difference to the top-level conditional expression, it still only deals with two outcomes.
Note that only one of the expressions is actually evaluated. This matters if one of those expressions has side effects (alters state outside of the expression) or is 'expensive' in terms of memory or processing time. For example:
import time
def sleep10secs():
time.sleep(10)
return 'slow'
print('instant' if True else sleep10secs())
will print instant instantly, the sleep10secs() function is not called.

What does x[x < 2] = 0 mean in Python?

I came across some code with a line similar to
x[x<2]=0
Playing around with variations, I am still stuck on what this syntax does.
Examples:
>>> x = [1,2,3,4,5]
>>> x[x<2]
1
>>> x[x<3]
1
>>> x[x>2]
2
>>> x[x<2]=0
>>> x
[0, 2, 3, 4, 5]
This only makes sense with NumPy arrays. The behavior with lists is useless, and specific to Python 2 (not Python 3). You may want to double-check if the original object was indeed a NumPy array (see further below) and not a list.
But in your code here, x is a simple list.
Since
x < 2
is False
i.e 0, therefore
x[x<2] is x[0]
x[0] gets changed.
Conversely, x[x>2] is x[True] or x[1]
So, x[1] gets changed.
Why does this happen?
The rules for comparison are:
When you order two strings or two numeric types the ordering is done in the expected way (lexicographic ordering for string, numeric ordering for integers).
When you order a numeric and a non-numeric type, the numeric type comes first.
When you order two incompatible types where neither is numeric, they are ordered by the alphabetical order of their typenames:
So, we have the following order
numeric < list < string < tuple
See the accepted answer for How does Python compare string and int?.
If x is a NumPy array, then the syntax makes more sense because of boolean array indexing. In that case, x < 2 isn't a boolean at all; it's an array of booleans representing whether each element of x was less than 2. x[x < 2] = 0 then selects the elements of x that were less than 2 and sets those cells to 0. See Indexing.
>>> x = np.array([1., -1., -2., 3])
>>> x < 0
array([False, True, True, False], dtype=bool)
>>> x[x < 0] += 20 # All elements < 0 get increased by 20
>>> x
array([ 1., 19., 18., 3.]) # Only elements < 0 are affected
>>> x = [1,2,3,4,5]
>>> x<2
False
>>> x[False]
1
>>> x[True]
2
The bool is simply converted to an integer. The index is either 0 or 1.
The original code in your question works only in Python 2. If x is a list in Python 2, the comparison x < y is False if y is an integer. This is because it does not make sense to compare a list with an integer. However in Python 2, if the operands are not comparable, the comparison is based in CPython on the alphabetical ordering of the names of the types; additionally all numbers come first in mixed-type comparisons. This is not even spelled out in the documentation of CPython 2, and different Python 2 implementations could give different results. That is [1, 2, 3, 4, 5] < 2 evaluates to False because 2 is a number and thus "smaller" than a list in CPython. This mixed comparison was eventually deemed to be too obscure a feature, and was removed in Python 3.0.
Now, the result of < is a bool; and bool is a subclass of int:
>>> isinstance(False, int)
True
>>> isinstance(True, int)
True
>>> False == 0
True
>>> True == 1
True
>>> False + 5
5
>>> True + 5
6
So basically you're taking the element 0 or 1 depending on whether the comparison is true or false.
If you try the code above in Python 3, you will get TypeError: unorderable types: list() < int() due to a change in Python 3.0:
Ordering Comparisons
Python 3.0 has simplified the rules for ordering comparisons:
The ordering comparison operators (<, <=, >=, >) raise a TypeError exception when the operands don’t have a meaningful natural ordering. Thus, expressions like 1 < '', 0 > None or len <= len are no longer valid, and e.g. None < None raises TypeError instead of returning False. A corollary is that sorting a heterogeneous list no longer makes sense – all the elements must be comparable to each other. Note that this does not apply to the == and != operators: objects of different incomparable types always compare unequal to each other.
There are many datatypes that overload the comparison operators to do something different (dataframes from pandas, numpy's arrays). If the code that you were using did something else, it was because x was not a list, but an instance of some other class with operator < overridden to return a value that is not a bool; and this value was then handled specially by x[] (aka __getitem__/__setitem__)
This has one more use: code golf. Code golf is the art of writing programs that solve some problem in as few source code bytes as possible.
return(a,b)[c<d]
is roughly equivalent to
if c < d:
return b
else:
return a
except that both a and b are evaluated in the first version, but not in the second version.
c<d evaluates to True or False.
(a, b) is a tuple.
Indexing on a tuple works like indexing on a list: (3,5)[1] == 5.
True is equal to 1 and False is equal to 0.
(a,b)[c<d]
(a,b)[True]
(a,b)[1]
b
or for False:
(a,b)[c<d]
(a,b)[False]
(a,b)[0]
a
There's a good list on the stack exchange network of many nasty things you can do to python in order to save a few bytes. https://codegolf.stackexchange.com/questions/54/tips-for-golfing-in-python
Although in normal code this should never be used, and in your case it would mean that x acts both as something that can be compared to an integer and as a container that supports slicing, which is a very unusual combination. It's probably Numpy code, as others have pointed out.
In general it could mean anything. It was already explained what it means if x is a list or numpy.ndarray but in general it only depends on how the comparison operators (<, >, ...) and also how the get/set-item ([...]-syntax) are implemented.
x.__getitem__(x.__lt__(2)) # this is what x[x < 2] means!
x.__setitem__(x.__lt__(2), 0) # this is what x[x < 2] = 0 means!
Because:
x < value is equivalent to x.__lt__(value)
x[value] is (roughly) equivalent to x.__getitem__(value)
x[value] = othervalue is (also roughly) equivalent to x.__setitem__(value, othervalue).
This can be customized to do anything you want. Just as an example (mimics a bit numpys-boolean indexing):
class Test:
def __init__(self, value):
self.value = value
def __lt__(self, other):
# You could do anything in here. For example create a new list indicating if that
# element is less than the other value
res = [item < other for item in self.value]
return self.__class__(res)
def __repr__(self):
return '{0} ({1})'.format(self.__class__.__name__, self.value)
def __getitem__(self, item):
# If you index with an instance of this class use "boolean-indexing"
if isinstance(item, Test):
res = self.__class__([i for i, index in zip(self.value, item) if index])
return res
# Something else was given just try to use it on the value
return self.value[item]
def __setitem__(self, item, value):
if isinstance(item, Test):
self.value = [i if not index else value for i, index in zip(self.value, item)]
else:
self.value[item] = value
So now let's see what happens if you use it:
>>> a = Test([1,2,3])
>>> a
Test ([1, 2, 3])
>>> a < 2 # calls __lt__
Test ([True, False, False])
>>> a[Test([True, False, False])] # calls __getitem__
Test ([1])
>>> a[a < 2] # or short form
Test ([1])
>>> a[a < 2] = 0 # calls __setitem__
>>> a
Test ([0, 2, 3])
Notice this is just one possibility. You are free to implement almost everything you want.

Associativity of comparison operators in Python

What is the associativity of comparison operators in Python? It is straightforward for three comparisons, but for more than that, I'm not sure how it does it. They don't seem to be right- or left-associative.
For example:
>>> 7410 >= 8690 <= -4538 < 9319 > -7092
False
>>> (((7410 >= 8690) <= -4538) < 9319) > -7092
True
So, not left-associative.
>>> 81037572 > -2025 < -4722 < 6493
False
>>> (81037572 > (-2025 < (-4722 < 6493)))
True
So it's not right-associative either.
I have seen some places that they are 'chained', but how does that work with four or more comparisons?
Chained comparisons are expanded with and, so:
a <= b <= c
becomes:
a <= b and b <= c
(b is only evaluated once, though). This is explained in the language reference on comparisons.
Note that lazy evaluation means that if a > b, the result is False and b is never compared to c.
Your versions with parentheses are completely different; a <= (b <= c) will evaluate b <= c then compare a to the result of that, and isn't involved at all, so it's not meaningful to compare the results to determine associativity.
python short-circits boolean tests from left to right:
7410>=8690<=-4538<9319>-7092 -> False
7410>=8690 is False. that's it. the rest of the tests is not preformed.
note that
True == 1
False == 0
are both True and apply when you compare the booleans with integers. so when you surround the statement with brackets you force python to do all the tests; in detail:
(((7410>=8690)<=-4538)<9319)>-7092
False <=-4538
False <9319
True >-7092
True
You are making an error with types, when you write 81037572>-2025 then the system thinks of this as True or False and associates it with 1 and 0. It therefore then gives you a comparison with those binary numbers.

Is there a difference between -1 and False in Python?

I have always thought that using -1 in a condition is alway the same as the writing False (boolean value). But from my code, I get different results:
Using True and False:
def count(sub, s):
count = 0
index = 0
while True:
if string.find(s, sub, index) != False:
count += 1
index = string.find(s, sub, index) + 1
else:
return count
print count('nana', 'banana')
Result: Takes to long for interpreter to respond.
Using 1 and -1:
def count(sub, s):
count = 0
index = 0
while 1:
if string.find(s, sub, index) != -1:
count += 1
index = string.find(s, sub, index) + 1
else:
return count
print count('nana', 'banana')
Result: 1
Why does using -1 and 1 give me the correct result whereas using the bool values True and False do not?
string.find doesn't return a boolean so string.find('banana', 'nana', index) will NEVER return 0 (False) regardless of the value of index.
>>> import string
>>> help(string.find)
Help on function find in module string:
find(s, *args)
find(s, sub [, start [, end]]) -> int
Return the lowest index in s where substring sub is found,
such that sub is contained within s[start,end]. Optional
arguments start and end are interpreted as in slice notation.
Return -1 on failure.
>>>
Your example simply repeats:
index = string.find('banana', 'nana', 0) + 1 # index = 3
index = string.find('banana', 'nana', 3) + 1 # index = 0
The -1 version works because it correctly interprets the return value of string.find!
False is of type bool, which is a sub-type of int, and its value is 0.
In Python, False is similar to using 0, not -1
There's a difference between equality and converting to a boolean value for truth testing, for both historical and flexibility reasons:
>>> True == 1
True
>>> True == -1
False
>>> bool(-1)
True
>>> False == 0
True
>>> bool(0)
False
>>> True == 2
False
>>> bool(2)
True
I have always thought that using -1 in a condition is alway the same as the writing False (boolean value).
1) No. It is never the same, and I can't imagine why you would have ever thought this, let alone always thought it. Unless for some reason you had only ever used if with string.find or something.
2) You shouldn't be using the string module in the first place. Quoting directly from the documentation:
DESCRIPTION
Warning: most of the code you see here isn't normally used nowadays.
Beginning with Python 1.6, many of these functions are implemented as
methods on the standard string object. They used to be implemented by
a built-in module called strop, but strop is now obsolete itself.
So instead of string.find('foobar', 'foo'), we use the .find method of the str class itself (the class that 'foobar' and 'foo' belong to); and since we have objects of that class, we can make bound method calls, thus: 'foobar'.find('foo').
3) The .find method of strings returns a number that tells you where the substring was found, if it was found. If the substring wasn't found, it returns -1. It cannot return 0 in this case, because that would mean "was found at the beginning".
4) False will compare equal to 0. It is worth noting that Python actually implements its bool type as a subclass of int.
5) No matter what language you are using, you should not compare to boolean literals. x == False or equivalent is, quite simply, not the right thing to write. It gains you nothing in terms of clarity, and creates opportunities to make mistakes.
You would never, ever say "If it is true that it is raining, I will need an umbrella" in English, even though that is grammatically correct. There is no point; it is not more polite nor more clear than the obvious "If it is raining, I will need an umbrella".
If you want to use a value as a boolean, then use it as a boolean. If you want to use the result of a comparison (i.e. "is the value equal to -1 or not?"), then perform the comparison.

Categories

Resources