Comparing strings and numbers in Python [duplicate] - python

This question already has answers here:
How does Python 2 compare string and int? Why do lists compare as greater than numbers, and tuples greater than lists?
(2 answers)
Closed 5 months ago.
Why does the following piece behave like it behaves?
>>> '10' > 100
True
>>> 100 < '10'
True
Shouldn't it raise an exception?

From the documentation:
CPython implementation detail: Objects of different types except numbers are ordered by their type names; objects of the same types that don’t support proper comparison are ordered by their address.
So it's just something that happens in CPython ('int' < 'str'), but that isn't guaranteed to happen in other implementations.
In fact, this behaviour has been removed in python3:
>>> '10' > 100
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unorderable types: str() > int()
>>> 100 < '10'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unorderable types: int() < str()

From manual:
CPython implementation detail: Objects of different types except numbers are
ordered by their type names; objects of the same types that don’t
support proper comparison are ordered by their address.
So if you compare this two types: int / string you have a lexicographic order bye the type of elements

different operators are getting called, at one point int's __gt__, at another, str's __lt__
check this:
class t(int):
def __gt__(self, other):
print 'here', v
return v
class t2(str):
def __lt__(self, other):
v = super(t2, self).__lt__(other)
print 'ohere', v
return v
if __name__ == '__main__':
a = t('10')
b = t2(100)
a > b
b < a

Because Python implements implicit type conversions for numbers so they may be printed as strings without doing an explicit conversion.
Python is converting 1000 into the string "1000" when doing the comparison to the string "10". And according to the Python intepreter, "1000" is indeed larger than "10".
This is why: "I've got %s bananas" % 5000 works, and unlike in C or another language without implicit type conversion, I didn't have to do printf("I've got %i bananas", 5000);
Check out Python docs chapter 5: Built-in Types

I am not 100% sure, but some internal type conversion might be happening here. It might be doing what is called lexicographic comparison, where '1' which is 49 in ASCII is greater than 1 (the first digit), and so on.

Related

When I run this python command in the interpretor, I get a TypeError [duplicate]

I use a negative index in replacement fields to output a formatted list,but it raises a TypeError.The codes are as follows:
>>> a=[1,2,3]
>>> a[2]
3
>>> a[-1]
3
>>> 'The last:{0[2]}'.format(a)
'The last:3'
>>> 'The last:{0[-1]}'.format(a)
Traceback (most recent call last):
File "", line 1, in
TypeError: list indices must be integers, not str
It's what I would call a design glitch in the format string specs. Per the docs,
element_index ::= integer | index_string
but, alas, -1 is not "an integer" -- it's an expression. The unary-minus operator doesn't even have particularly high priority, so that for example print(-2**2) emits -4 -- another common issue and arguably a design glitch (the ** operator has higher priority, so the raise-to-power happens first, then the change-sign requested by the lower priority unary -).
Anything in that position in the format string that's not an integer (but, for example, an expression) is treated as a string, to index a dict argument -- for example:
$ python3 -c "print('The last:{0[2+2]}'.format({'2+2': 23}))"
The last:23
Not sure whether this is worth raising an issue in the Python trac, but it's certainly a somewhat surprising behavior:-(.
There are a few problems here, once you start digging:
The item in question is called "element_index" which is defined to be an integer.
Problem 1: unless users follow the link from "integer" to the language reference manual, they won't know that -1 is deemed to be an expression, not an integer. By the way, anyone tempted to say "works as documented" should see proplem 7 first :-)
Preferred solution: change the definition so that "element_index" can have an optional '-' before the integer.
It's an integer, right? Not so fast ... later the docs say that "an expression of the form '[index]' does an index lookup using __getitem__()"
Problem 3: Should say '[element_index]' (index is not defined).
Problem 4: Not everybody knows off the top of their heads what __getitem__() does. Needs clearer docs.
So we can use a dict here as well as an integer, can we? Yes, with a problem or two:
The element_index is a integer? Yes, that works with a dict:
>>> "{0[2]}".format({2: 'int2'})
'int2'
It seems that we can also use non-integer strings, but this needs more explicit documentation (Problem 5):
>>> "{0[foo]}".format({'foo': 'bar'})
'bar'
But we can't use a dict with a key like '2' (Problem 6):
>>> "{0[2]}".format({'2': 'str2'})
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 2
>>> "{0['2']}".format({'2': 'str2'})
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: "'2'"
Problem 7: That "integer" should really be documented to be "decimalinteger" ... 0x22 and 0b11 are treated as str, and 010 (an "octalinteger") is treated as 10, not 8:
>>> "{0[010]}".format('0123456789abcdef')
'a'
Update: PEP 3101 tells the true story:
"""
The rules for parsing an item key are very simple. If it starts with a digit, then it is treated as a number, otherwise it is used as a string.
Because keys are not quote-delimited, it is not possible to specify arbitrary dictionary keys (e.g., the strings "10" or ":-]") from within a format string.
"""
Correct, it does not work. solution:
>>> 'The last:{0}'.format(a[-1])
'The last:3'
I often take Python format strings as config options - with the format string provided with a specific, known list of keyword arguments. Therefore addressing the indexes of a variable length list forwards or backwards within the format string is exactly the kind of thing I end up needing.
I've just written this hack to make the negative indexing work:
string_to_tokenise = "Hello_world"
tokens = re.split(r"[^A-Z\d]+", string_to_tokenise, flags=re.I)
token_dict = {str(i) if i < 0 else i: tokens[i] for i in range(-len(tokens) + 1, len(tokens))}
print "{thing[0]} {thing[-1]}".format(thing=token_dict)
Result:
Hello world
So to explain, instead of passing in the list of tokens, I create a dictionary with all the required integer keys for indexing the list from 0 to len(..)-1, and I also add the negative integer keys for indexing from the end from -1 to -(len(..)-1), however these keys are converted from integers to strings, as that's how format will interpret them.

(Help) TypeError: 'str' object cannot be interpreted as an integer

Traceback (most recent call last):
File "<pyshell#0>", line 1, in <module>
get_odd_palindrome_at('racecar', 3)
File "C:\Users\musar\Documents\University\Courses\Python\Assignment 2\palindromes.py", line 48, in get_odd_palindrome_at
for i in range(string[index:]):
TypeError: 'str' object cannot be interpreted as an integer
I want to use the value index refers to but how do I do that?
It seems from your error than the 'index' variable is a string, not an int. You could convert it using int().
index = int(index)
for i in range(string[index:]):
Now, string[index:] will also be an string. So you would need to convert that too:
>>> string = "5"
>>> range(string)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: range() integer end argument expected, got str.
>>> range(int(string))
[0, 1, 2, 3, 4]
>>>
That's assuming that string[index:] only contains a number. If that's not always the case, you can do something like:
# 'index' contains only numbers
index = int(index)
number = string[index:]
if number.isdigit():
number = int(number)
for i in range(number):
From the Wikipedia article on Python:
Python uses duck typing and has typed objects but untyped variable names. Type constraints are not checked at compile time; rather, operations on an object may fail, signifying that the given object is not of a suitable type. Despite being dynamically typed, Python is strongly typed, forbidding operations that are not well-defined (for example, adding a number to a string) rather than silently attempting to make sense of them.
In this case, you try to pass a string to range(). This function waits for a number (a positive integer, as it is). That's why you need to convert your string to int. You could actually do a bit more of checking, depending on your needs. Python cares for types.
HTH,

Why doesn't join() automatically convert its arguments to strings? When would you ever not want them to be strings?

We have a list:
myList = [1, "two"]
And want to print it out, normally I would use something like:
"{0} and {1}".format(*myList)
But you could also do:
" and ".join(myList)
But unfortunately:
>>> " and ".join(myList)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: sequence item 0: expected string, int found
Why doesn't it just automatically convert the list it receives to strings?
When would you ever not need it to convert them to strings? Is there some tiny edge case I'm missing?
From the Zen of Python:
Explicit is better than implicit.
and
Errors should never pass silently.
Converting to strings implicitly can easily hide bugs, and I'd really want to know if I suddenly have different types somewhere that were meant to be strings.
If you want to explicitly convert to strings, you can do so using map(), for example:
''.join(map(str, myList))
The problem with attempting to execute something like x = 4 + "8" as written is that the intended meaning is ambiguous. Should x contain "48" (implicitly converting 4 to str) or 12 (implicitly converting "8" to int)? We can't justify either result.
To avoid this confusion, Python requires explicit conversion of one of the operands:
>>> x = str(4) + "8"
>>> y = 4 + int("8")
>>> print x
48
>>> print y
12
Using the correct type is part of programming in Python. A general built-in like print does do the conversion (if the class supports __str__), which is where you should be doing it:
Let print do the work:
print(*myList, sep = " and ")
That's for Python 3, if you are still on Python 2 then use:
from __future__ import print_function

converting hex to int, the 'L' character [duplicate]

This question already has answers here:
Python Trailing L Problem
(5 answers)
Closed 9 years ago.
I have a 64bit hex number and I want to convert it to unsigned integer. I run
>>> a = "ffffffff723b8640"
>>> int(a,16)
18446744071331087936L
So what is the 'L' at the end of the number?
Using the following commands also don't help
>>> int(a,16)[:-1]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'long' object is unsubscriptable
>>> int(a,16).rstrip("L")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'long' object has no attribute 'rstrip'
Python2.x has 2 classes of integer (neither of them are unsigned btw). There is the usual class int which is based on your system's concept of an integer (often a 4-byte integer). There's also the arbitrary "precision" type of integer long. They behave the same in almost1 all circumstances and int objects automatically convert to long if they overflow. Don't worry about the L in the representation -- It just means your integer is too big for int (there was an Overflow) so python automatically created a long instead.
It is also worth pointing out that in python3.x, they removed python2.x's int in favor of always using long. Since they're now always using long, they renamed it to int as that name is much more common in code. PEP-237 gives more rational behind this decision.
1The only time they behave differently that I can think of is that long's __repr__ adds that extra L on the end that you're seeing.
You are trying to apply string methods to an integer. But the string representation of a long integer doesn't have the L at the end:
In [1]: a = "ffffffff723b8640"
In [2]: int(a, 16)
Out[2]: 18446744071331087936L
In [3]: str(int(a, 16))
Out[3]: '18446744071331087936'
The __repr__ does, though (as #mgilson notes):
In [4]: repr(int(a, 16))
Out[4]: '18446744071331087936L'
In [5]: repr(int(a, 16))[:-1]
Out[5]: '18446744071331087936'
you can't call rstrip on an integer, you have to call it on the string representation of the integer.
>>> a = "ffffffff723b8640"
>>> b = int(a,16)
>>> c = repr(b).rstrip("L")
>>> c
'18446744071331087936'
Note however, that this would only be for displaying the number or something. Turning the string back into an integer will append the 'L' again:
>>> int(c)
18446744071331087936L

str.format(list) with negative index doesn't work in Python

I use a negative index in replacement fields to output a formatted list,but it raises a TypeError.The codes are as follows:
>>> a=[1,2,3]
>>> a[2]
3
>>> a[-1]
3
>>> 'The last:{0[2]}'.format(a)
'The last:3'
>>> 'The last:{0[-1]}'.format(a)
Traceback (most recent call last):
File "", line 1, in
TypeError: list indices must be integers, not str
It's what I would call a design glitch in the format string specs. Per the docs,
element_index ::= integer | index_string
but, alas, -1 is not "an integer" -- it's an expression. The unary-minus operator doesn't even have particularly high priority, so that for example print(-2**2) emits -4 -- another common issue and arguably a design glitch (the ** operator has higher priority, so the raise-to-power happens first, then the change-sign requested by the lower priority unary -).
Anything in that position in the format string that's not an integer (but, for example, an expression) is treated as a string, to index a dict argument -- for example:
$ python3 -c "print('The last:{0[2+2]}'.format({'2+2': 23}))"
The last:23
Not sure whether this is worth raising an issue in the Python trac, but it's certainly a somewhat surprising behavior:-(.
There are a few problems here, once you start digging:
The item in question is called "element_index" which is defined to be an integer.
Problem 1: unless users follow the link from "integer" to the language reference manual, they won't know that -1 is deemed to be an expression, not an integer. By the way, anyone tempted to say "works as documented" should see proplem 7 first :-)
Preferred solution: change the definition so that "element_index" can have an optional '-' before the integer.
It's an integer, right? Not so fast ... later the docs say that "an expression of the form '[index]' does an index lookup using __getitem__()"
Problem 3: Should say '[element_index]' (index is not defined).
Problem 4: Not everybody knows off the top of their heads what __getitem__() does. Needs clearer docs.
So we can use a dict here as well as an integer, can we? Yes, with a problem or two:
The element_index is a integer? Yes, that works with a dict:
>>> "{0[2]}".format({2: 'int2'})
'int2'
It seems that we can also use non-integer strings, but this needs more explicit documentation (Problem 5):
>>> "{0[foo]}".format({'foo': 'bar'})
'bar'
But we can't use a dict with a key like '2' (Problem 6):
>>> "{0[2]}".format({'2': 'str2'})
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 2
>>> "{0['2']}".format({'2': 'str2'})
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: "'2'"
Problem 7: That "integer" should really be documented to be "decimalinteger" ... 0x22 and 0b11 are treated as str, and 010 (an "octalinteger") is treated as 10, not 8:
>>> "{0[010]}".format('0123456789abcdef')
'a'
Update: PEP 3101 tells the true story:
"""
The rules for parsing an item key are very simple. If it starts with a digit, then it is treated as a number, otherwise it is used as a string.
Because keys are not quote-delimited, it is not possible to specify arbitrary dictionary keys (e.g., the strings "10" or ":-]") from within a format string.
"""
Correct, it does not work. solution:
>>> 'The last:{0}'.format(a[-1])
'The last:3'
I often take Python format strings as config options - with the format string provided with a specific, known list of keyword arguments. Therefore addressing the indexes of a variable length list forwards or backwards within the format string is exactly the kind of thing I end up needing.
I've just written this hack to make the negative indexing work:
string_to_tokenise = "Hello_world"
tokens = re.split(r"[^A-Z\d]+", string_to_tokenise, flags=re.I)
token_dict = {str(i) if i < 0 else i: tokens[i] for i in range(-len(tokens) + 1, len(tokens))}
print "{thing[0]} {thing[-1]}".format(thing=token_dict)
Result:
Hello world
So to explain, instead of passing in the list of tokens, I create a dictionary with all the required integer keys for indexing the list from 0 to len(..)-1, and I also add the negative integer keys for indexing from the end from -1 to -(len(..)-1), however these keys are converted from integers to strings, as that's how format will interpret them.

Categories

Resources