Why is int(50)<str(5) in python 2.x? - python

In python 3, int(50)<'2' causes a TypeError, and well it should. In python 2.x, however, int(50)<'2' returns True (this is also the case for other number formats, but int exists in both py2 and py3). My question, then, has several parts:
Why does Python 2.x (< 3?) allow this behavior?
(And who thought it was a good idea to allow this to begin with???)
What does it mean that an int is less than a str?
Is it referring to ord / chr?
Is there some binary format which is less obvious?
Is there a difference between '5' and u'5' in this regard?

It works like this1.
>>> float() == long() == int() < dict() < list() < str() < tuple()
True
Numbers compare as less than containers. Numeric types are converted to a common type and compared based on their numeric value. Containers are compared by the alphabetic value of their names.2
From the docs:
CPython implementation detail: Objects of different types except numbers are ordered by >their type names; objects of the same types that don’t support proper comparison are >ordered by their address.
Objects of different builtin types compare alphabetically by the name of their type int starts with an 'i' and str starts with an s so any int is less than any str..
I have no idea.
A drunken master.
It means that a formal order has been introduced on the builtin types.
It's referring to an arbitrary order.
No.
No. strings and unicode objects are considered the same for this purpose. Try it out.
In response to the comment about long < int
>>> int < long
True
You probably meant values of those types though, in which case the numeric comparison applies.
1 This is all on Python 2.6.5
2 Thank to kRON for clearing this up for me. I'd never thought to compare a number to a dict before and comparison of numbers is one of those things that's so obvious that it's easy to overlook.

The reason why these comparisons are allowed, is sorting. Python 2.x can sort lists containing mixed types, including strings and integers -- integers always appear first. Python 3.x does not allow this, for the exact reasons you pointed out.
Python 2.x:
>>> sorted([1, '1'])
[1, '1']
>>> sorted([1, '1', 2, '2'])
[1, 2, '1', '2']
Python 3.x:
>>> sorted([1, '1'])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unorderable types: str() < int()

(And who thought it was a good idea to allow this to begin with???)
I can imagine that the reason might be to allow object from different types to be stored in tree-like structures, which use comparisons internally.

As Aaron said. Breaking it up into your points:
Because it makes sort do something halfway usable where it otherwise would make no sense at all (mixed lists). It's not a good idea generally, but much in Python is designed for convenience over strictness.
Ordered by type name. This means things of the same type group together, where they can be sorted. They should probably be grouped by type class, such as numbers together, but there's no proper type class framework. There may be a few more specific rules in there (probably is one for numeric types), I'd have to check the source.
One is string and the other is unicode. They may have a direct comparison operation, however, but it's conceivable a non-comparable type would get grouped between them, causing a mess. I don't know if there's code to avoid this.
So, it doesn't make sense in the general case, but occasionally it's helpful.
from random import shuffle
letters=list('abcdefgh')
ints=range(8)
both=ints+letters
shuffle(ints)
shuffle(letters)
shuffle(both)
print sorted(ints+letters)
print sorted(both)
Both print the ints first, then the letters.
As a rule, you don't want to mix types randomly within a program, and apparently Python 3 prevents it where Python 2 tries to make vague sense where none exists. You could still sort by lambda a,b: cmp(repr(a),repr(b)) (or something better) if you really want to, but it appears the language developers agreed it's impractical default behaviour. I expect it varies which gives the least surprise, but it's a lot harder to detect a problem in the Python 2 sense.

Related

Why does python max('a', 5) return the string value?

Tracing back a ValueError: cannot convert float NaN to integer I found out that the line:
max('a', 5)
max(5, 'a')
will return a instead of 5.
In the above case I used the example string a but in my actual case the string is a NaN (the result of a fitting process that failed to converge).
What is the rationale behind this behaviour? Why doesn't python recognize automatically that there's a string there and that it should return the number?
Even more curious is that min() does work as expected since:
min('a', 5)
min(5, 'a')
returns 5.
In Python 2, numeric values always sort before strings and almost all other types:
>>> sorted(['a', 5])
[5, 'a']
Numbers then, are considered smaller than strings. When using max(), that means the string is picked over a number.
That numbers are smaller is an arbitrary implementation choice. See the Comparisons documentation:
The operators <, >, ==, >=, <=, and != compare the values of two objects. The objects need not have the same type. If both are numbers, they are converted to a common type. Otherwise, objects of different types always compare unequal, and are ordered consistently but arbitrarily.
Bold emphasis mine.
Python 2 tried real hard to make heterogenous types sortable, which has caused a lot of hard to debug problems, such as programmers trying to compare integers with strings and getting unexpected results. Python 3 corrected this mistake; you'll get a TypeError instead:
>>> max(5, 'a')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unorderable types: str() > int()
I've written elsewhere about the ordering rules, and even re-implemented the Python 2 rules for Python 3, if you really wanted those back.
In CPython 2.x strings are always greater than numbers, that's why you see those behaviors.
OTOH, I don't get why you think that 5 is "obviously" greater than "a"... Values of different types are comparable just for convenience (e.g. if you are building an RB tree with eterogeneous keys you want everything to be comparable), and such comparisons do define a strict weak ordering, but inter-type comparisons are not intended to be sensible in any way (how do you compare a number to a string or an object?), just coherent.

Passing dictionary to a function with **

I'm trying to understand the following.
def exp(**argd):
print(argd)
a={1:'a',2:'b'}
exp(**a)
This will give TypeError: exp() keywords must be strings.
This is working fine if i use a={'1':'a','2':'b'}. why i can't pass the dictionary key as a number to the exp function ?
exp(**a) in your example expands literally to exp(1='a', 2='b'), which is an error because integer literals cannot be variable names.
You might think, why doesn't the ** process cast keys into strings as part of the expansion? There's no one singular reason, but in general Python's philosophy is "explicit is better than implicit", and implicit casting can have some pitfalls -- many object types that are distinct from each other, for instance, will cast to the same string, which could cause unintended consequences if you relied on implicit string casting during expansion.
because you cannot (Guido is probably the only one who can tell you why) ... it makes them partially adhere to variable naming rules ... the **a_dict unpacks the dict
a={1:'a',2:'b'}
exp(**a) #is basically exp(1='a',2='b')
which is obviously a syntax error
although it does allow funny things like
a = {'a variable':7,'some$thing':88}
exp(**a)
as long as they are strings... it seems the only rule they enforce is that they are strings ... this is likely to guarantee that they are hashable(a huge guess...)
disclaimer: this is probably a gross oversimplification

Python min() of arguments with different types

I was surprised recently to find that you can take the min() of arguments of different types in Python and not get a ValueError.
min(3, "blah") ==> 3
min(300, 'zzz') ==> 300
The documentation is unclear on this - it just says min() takes "the smallest of the arguments". How is it actually determining which element is the smallest?
It determines this by comparing them using the usual rules. If objects are of different types and can't be compared sensibly (because neither of them implement the required special methods, or the implementation doesn't work with the type of the other object) then they are given a consistent order by type; e.g., all integers are less than all strings. Try it: 1 < "1"
(By the way, Booleans are implemented as a subclass of integers, and can be compared with numbers, so they'll sort False as 0 and True as 1.)
It was implemented this way so that if you sort a list containing various types, like types will be sorted together. In Python 3 forward, however, this was changed and you can no longer implicitly compare dissimilar types.
In Python 2, there existed an arbitrary but predictable comparison of values with different types. I think it was something like lexicographic comparison of the type name (ints < floats < strs < tuples ).
It is correct according to Python's definition of ordering:
>>> 3 < "blah"
True
>>> 300 < 'zzz'
True
The rule is:
If both are numbers, they are converted to a common type. Otherwise, objects of different types always compare unequal, and are ordered consistently but arbitrarily.

Python Remove last char from string and return it

While I know that there is the possibility:
>>> a = "abc"
>>> result = a[-1]
>>> a = a[:-1]
Now I also know that strings are immutable and therefore something like this:
>>> a.pop()
c
is not possible.
But is this really the preferred way?
Strings are "immutable" for good reason: It really saves a lot of headaches, more often than you'd think. It also allows python to be very smart about optimizing their use. If you want to process your string in increments, you can pull out part of it with split() or separate it into two parts using indices:
a = "abc"
a, result = a[:-1], a[-1]
This shows that you're splitting your string in two. If you'll be examining every byte of the string, you can iterate over it (in reverse, if you wish):
for result in reversed(a):
...
I should add this seems a little contrived: Your string is more likely to have some separator, and then you'll use split:
ans = "foo,blah,etc."
for a in ans.split(","):
...
Not only is it the preferred way, it's the only reasonable way. Because strings are immutable, in order to "remove" a char from a string you have to create a new string whenever you want a different string value.
You may be wondering why strings are immutable, given that you have to make a whole new string every time you change a character. After all, C strings are just arrays of characters and are thus mutable, and some languages that support strings more cleanly than C allow mutable strings as well. There are two reasons to have immutable strings: security/safety and performance.
Security is probably the most important reason for strings to be immutable. When strings are immutable, you can't pass a string into some library and then have that string change from under your feet when you don't expect it. You may wonder which library would change string parameters, but if you're shipping code to clients you can't control their versions of the standard library, and malicious clients may change out their standard libraries in order to break your program and find out more about its internals. Immutable objects are also easier to reason about, which is really important when you try to prove that your system is secure against particular threats. This ease of reasoning is especially important for thread safety, since immutable objects are automatically thread-safe.
Performance is surprisingly often better for immutable strings. Whenever you take a slice of a string, the Python runtime only places a view over the original string, so there is no new string allocation. Since strings are immutable, you get copy semantics without actually copying, which is a real performance win.
Eric Lippert explains more about the rationale behind immutable of strings (in C#, not Python) here.
The precise wording of the question makes me think it's impossible.
return to me means you have a function, which you have passed a string as a parameter.
You cannot change this parameter. Assigning to it will only change the value of the parameter within the function, not the passed in string. E.g.
>>> def removeAndReturnLastCharacter(a):
c = a[-1]
a = a[:-1]
return c
>>> b = "Hello, Gaukler!"
>>> removeAndReturnLastCharacter(b)
!
>>> b # b has not been changed
Hello, Gaukler!
Yes, python strings are immutable and any modification will result in creating a new string. This is how it's mostly done.
So, go ahead with it.
I decided to go with a for loop and just avoid the item in question, is it an acceptable alternative?
new = ''
for item in str:
if item == str[n]:
continue
else:
new += item

Why does Python not perform type conversion when concatenating strings?

In Python, the following code produces an error:
a = 'abc'
b = 1
print(a + b)
(The error is "TypeError: cannot concatenate 'str' and 'int' objects").
Why does the Python interpreter not automatically try using the str() function when it encounters concatenation of these types?
The problem is that the conversion is ambiguous, because + means both string concatenation and numeric addition. The following question would be equally valid:
Why does the Python interpreter not automatically try using the int() function when it encounters addition of these types?
This is exactly the loose-typing problem that unfortunately afflicts Javascript.
There's a very large degree of ambiguity with such operations. Suppose that case instead:
a = '4'
b = 1
print(a + b)
It's not clear if a should be coerced to an integer (resulting in 5), or if b should be coerced to a string (resulting in '41'). Since type juggling rules are transitive, passing a numeric string to a function expecting numbers could get you in trouble, especially since almost all arithmetic operators have overloaded operations for strings too.
For instance, in Javascript, to make sure you deal with integers and not strings, a common practice is to multiply a variable by one; in Python, the multiplication operator repeats strings, so '41' * 1 is a no-op. It's probably better to just ask the developer to clarify.
The short answer would be because Python is a strongly typed language.
This was a design decision made by Guido. It could have been one way or another really, concatenating str and int to str or int.
The best explanation, is still the one given by guido, you can check it here
The other answers have provided pretty good explanations, but have failed to mention that this feature is known a Strong Typing. Languages that perform implicit conversions are Weakly Typed.
Because Python does not perform type conversion when concatenating strings. This behavior is by design, and you should get in the habit of performing explicit type conversions when you need to coerce objects into strings or numbers.
Change your code to:
a = 'abc'
b = 1
print(a + str(b))
And you'll see the desired result.
Python would have to know what's in the string to do it correctly. There's an ambiguous case: what should '5' + 5 generate? A number or a string? That should certainly throw an error. Now to determine whether that situation holds, python would have to examine the string to tell. Should it do that every time you try to concatenate or add two things? Better to just let the programmer convert the string explicitly.
More generally, implicit conversions like that are just plain confusing! They're hard to predict, hard to read, and hard to debug.
That's just how they decided to design the language. Probably the rationale is that requiring explicit conversions to string reduces the likelihood of unintended behavior (e.g. integer addition if both operands happen to be ints instead of strings).
tell python that the int is a list to disambiguate the '+' operation.
['foo', 'bar'] + [5]
this returns: ['foo', 'bar', 5]

Categories

Resources