Wrong boolean result in simple expression - python

I am getting a strange result in this boolean in python. I keep getting the wrong result.
string = '94070'
string[0:2] is '95' or string[0:2] is '94'
returns False, but when I hardcode in the value '94', it works
'94' is '95' or '94' is '94'
returns True. I've checked the data types and they are both of type 'str' so I'm not sure what is going on here.

Use == instead of is. In Python, the is operator does an object identity check. The == operator checks two objects (which may be different objects) to see whether they contain the same contents.

is is an identity test (is this the exact same object?), not equality test. While is works coincidentally, as an implementation detail for some things that aren't logically singletons, it shouldn't be used like this; use value equality testing with ==.
Your test of '94' is '94' can work due to a couple of related possibilities:
Python often coalesces constant literals in a function (sometimes only on a single line)
String literals are often interned by Python, so the same string literal expressed anywhere in the code references a common copy of that string
When you slice off bits of a string, interning isn't involved, so the identity test fails.

Use is to see if two arguments refer to the same object and == to see if they have the same value.
>>> a = 'this is some text.'
>>> b = 'this is some text.'
>>> a == b
True
>>> a is b
False
>>> a = 'this is some text.'
>>> b = a
>>> a == b
True
>>> a is b
True

Related

Simplifying "a == True:" to "a" - is it a good idea?

PEP8 suggests the following code should be simplified.
The original
if a == True:
The suggestion
if a:
However, these two are not the same. I figured out when I followed the PEP8 recommendation. Try with the following code
import numpy as np
a = np.nan
if a == True:
print('a is True')
else:
print('a is not True')
if a:
print('a is True')
else:
print('a is not True')
And you will figure out that the first tells a is not true (correctly) while the second one incorrectly tells a is true.
a is not True
a is True
What is the point of this misleading suggestion?
You are misreading the PEP8 style guide. Here is the relevant part (emphasis mine):
Don't compare boolean values to True or False using ==:
# Correct:
if greeting:
# Wrong:
if greeting == True:
Since np.nan is not a boolean value, this advice does not apply.
Note that if you are comparing a numeric value to True, then you are normally doing something wrong in the first place. The numeric values 1 and 1.0 are both equal to True, so if you have a variable which could be either numeric or boolean, this test may give you unexpected results. It is also generally an anti-pattern to have a variable which could be either a boolean or something other than a boolean.
First off, np.nan is works the same way as float('nan').
import numpy as np
print(type(np.nan)) # <class 'float'>
Python normally says:
By default, an object is considered true unless its class defines
either a __bool__() method that returns False or a __len__() method
that returns zero, when called with the object.
Then for built-in numeric types, it says any zeros are considered False:
zero of any numeric type: 0, 0.0, 0j, Decimal(0), Fraction(0, 1)
As I bolded the only float type which is False, any other float numbers are considered True.
so :
print(bool(float('nan'))) # True
Numpy also acts like how python does.
When you say if obj: python tries to get the truth value of the obj by the help of bool() which indeed looks at __bool__ and __len__ special methods. (__bool__ has higher priority if implemented).
I would suggest to use the explicit conditional. The second option will always give you True if a!=0, besides, that type of conditionals are confusing when you didn't write the code.
This means that the variable a has a value equal to True or not.
if a == True:
But does this variable a have a value or not.
if a:

Is there something about a string with an exclamation mark that prevent Python from string interning? [duplicate]

I learnt that in some immutable classes, __new__ may return an existing instance - this is what the int, str and tuple types sometimes do for small values.
But why do the following two snippets differ in the behavior?
With a space at the end:
>>> a = 'string '
>>> b = 'string '
>>> a is b
False
Without a space:
>>> c = 'string'
>>> d = 'string'
>>> c is d
True
Why does the space bring the difference?
This is a quirk of how the CPython implementation chooses to cache string literals. String literals with the same contents may refer to the same string object, but they don't have to. 'string' happens to be automatically interned when 'string ' isn't because 'string' contains only characters allowed in a Python identifier. I have no idea why that's the criterion they chose, but it is. The behavior may be different in different Python versions or implementations.
From the CPython 2.7 source code, stringobject.h, line 28:
Interning strings (ob_sstate) tries to ensure that only one string
object with a given value exists, so equality tests can be one pointer
comparison. This is generally restricted to strings that "look like"
Python identifiers, although the intern() builtin can be used to force
interning of any string.
You can see the code that does this in Objects/codeobject.c:
/* Intern selected string constants */
for (i = PyTuple_Size(consts); --i >= 0; ) {
PyObject *v = PyTuple_GetItem(consts, i);
if (!PyString_Check(v))
continue;
if (!all_name_chars((unsigned char *)PyString_AS_STRING(v)))
continue;
PyString_InternInPlace(&PyTuple_GET_ITEM(consts, i));
}
Also, note that interning is a separate process from the merging of string literals by the Python bytecode compiler. If you let the compiler compile the a and b assignments together, e.g. by placing them in a module or an if True:, you would find that a and b would be the same string.
This behavior is not consistent, and as others have mentioned depends on the variant of Python being executed. For a deeper discussion, see this question.
If you want to make sure that the same object is being used you can force the interning of strings by the appropriately named intern:
intern(...)
intern(string) -> string
``Intern'' the given string. This enters the string in the (global)
table of interned strings whose purpose is to speed up dictionary lookups.
Return the string itself or the previously interned string object with the
same value.
>>> a = 'string '
>>> b = 'string '
>>> id(a) == id(b)
False
>>> a = intern('string ')
>>> b = intern('string ')
>>> id(a) == id(b)
True
Note in Python3, you have to explicitly import intern from sys import intern.

Why is eval('"\x27"') == eval('"\\x27"')?

I'm very confused with python's eval():
I tried eval('"\x27"') == eval('"\\x27"') and it evaluates to True. Can somebody explain why this is the case? Both expressions evaluate to "'". I understand why eval('"\x27"') does (the string evaluated has a single character, which is an escaped hexadecimal representing an apostrophe), but shouldn't eval('"\\x27"') be equal to "\\x27"?
Secondly, adding to the confusion, if I set the following variables,
s = "\x27"
t = "\\x27"
then eval('s') is again "'", but eval('t') is "\\x27". Why is that?
According to the docs, eval "parses and evaluates the argument as a python expression". In other words, it's applying the same processing that is applied if you write x = "foobar \n" inside a program or the IDLE. In this example, \n gets turned into a newline-character (which, note, is not identical to the literal \n).
If you typed x = "\x27" into the IDLE, you'd get x == "'". The x27 is escaped because of the backslash and thus changed during evaluation. If you escape the backslash, then x27 is not changed during evaluation. Instead, you simply get a string with a backslash followed by x27.
Now if you evaluated that string again, you only have one backslash left - seemingly escaping x27. Thus, it is changed to '.
Another way to look at this: eval("\x27") evaluates the argument twice, but it is only changed the first time, to "'". eval("\\x27") also evaluates the argument twice, first to \x27, then to "'".
Here's an easier example to demonstrate how this works:
>>> x = "\"foobar\""
>>> x == "foobar"
False
>>> x == "\"foobar\""
True
>>> x = eval(x) # changes value of x from "foobar" to just foobar. Still string though, thus still ""
>>> x == "foobar"
True
>>> x == "\"foobar\""
False
Look at it like this: The right hand side of y = "2" contains two components: the information that y should be of type string, expressed using the two ", and the value of that string, expressed by the character 2. The separation of these two aspects is done during evaluation of the code you write. The string object itself never sees the " during initialization.
So in the above example, after the first line, we have x of type str with value "foobar". If you evaluate that again, the " are interpreted this time not as part of the value of x but as the type of x. So eval("\"foobar\"") basically transforms the string "foobar" to foobar, which, if you want to use that using the language Python, you have to write as "\"foobar\"" and "foobar".
For '"\x27"' the backslash escape is expanded during parsing, so this is literally
'"\'"'. '"\\x27"' only strips the backslash, i.e. it is equal to r'"\x27"'.
A direct eval() on the string literals adds another iteration of special character expansion: In the first case, \' is a valid escape sequence yielding '. The second case is the same as above.
When you use variable names, only one round of unescaping is performed when you assign the values. eval('s') simply expands to the value of s without further unescaping. If you want an emulation of the first case, you need to eval(s), i.e. evaluate the value of the string referenced by s.

Intern and efficient memory use of string in python [duplicate]

I learnt that in some immutable classes, __new__ may return an existing instance - this is what the int, str and tuple types sometimes do for small values.
But why do the following two snippets differ in the behavior?
With a space at the end:
>>> a = 'string '
>>> b = 'string '
>>> a is b
False
Without a space:
>>> c = 'string'
>>> d = 'string'
>>> c is d
True
Why does the space bring the difference?
This is a quirk of how the CPython implementation chooses to cache string literals. String literals with the same contents may refer to the same string object, but they don't have to. 'string' happens to be automatically interned when 'string ' isn't because 'string' contains only characters allowed in a Python identifier. I have no idea why that's the criterion they chose, but it is. The behavior may be different in different Python versions or implementations.
From the CPython 2.7 source code, stringobject.h, line 28:
Interning strings (ob_sstate) tries to ensure that only one string
object with a given value exists, so equality tests can be one pointer
comparison. This is generally restricted to strings that "look like"
Python identifiers, although the intern() builtin can be used to force
interning of any string.
You can see the code that does this in Objects/codeobject.c:
/* Intern selected string constants */
for (i = PyTuple_Size(consts); --i >= 0; ) {
PyObject *v = PyTuple_GetItem(consts, i);
if (!PyString_Check(v))
continue;
if (!all_name_chars((unsigned char *)PyString_AS_STRING(v)))
continue;
PyString_InternInPlace(&PyTuple_GET_ITEM(consts, i));
}
Also, note that interning is a separate process from the merging of string literals by the Python bytecode compiler. If you let the compiler compile the a and b assignments together, e.g. by placing them in a module or an if True:, you would find that a and b would be the same string.
This behavior is not consistent, and as others have mentioned depends on the variant of Python being executed. For a deeper discussion, see this question.
If you want to make sure that the same object is being used you can force the interning of strings by the appropriately named intern:
intern(...)
intern(string) -> string
``Intern'' the given string. This enters the string in the (global)
table of interned strings whose purpose is to speed up dictionary lookups.
Return the string itself or the previously interned string object with the
same value.
>>> a = 'string '
>>> b = 'string '
>>> id(a) == id(b)
False
>>> a = intern('string ')
>>> b = intern('string ')
>>> id(a) == id(b)
True
Note in Python3, you have to explicitly import intern from sys import intern.

'is' operator behaves differently when comparing strings with spaces

I've started learning Python (python 3.3) and I was trying out the is operator. I tried this:
>>> b = 'is it the space?'
>>> a = 'is it the space?'
>>> a is b
False
>>> c = 'isitthespace'
>>> d = 'isitthespace'
>>> c is d
True
>>> e = 'isitthespace?'
>>> f = 'isitthespace?'
>>> e is f
False
It seems like the space and the question mark make the is behave differently. What's going on?
EDIT: I know I should be using ==, I just wanted to know why is behaves like this.
Warning: this answer is about the implementation details of a specific python interpreter. comparing strings with is==bad idea.
Well, at least for cpython3.4/2.7.3, the answer is "no, it is not the whitespace". Not only the whitespace:
Two string literals will share memory if they are either alphanumeric or reside on the same block (file, function, class or single interpreter command)
An expression that evaluates to a string will result in an object that is identical to the one created using a string literal, if and only if it is created using constants and binary/unary operators, and the resulting string is shorter than 21 characters.
Single characters are unique.
Examples
Alphanumeric string literals always share memory:
>>> x='aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
>>> y='aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
>>> x is y
True
Non-alphanumeric string literals share memory if and only if they share the enclosing syntactic block:
(interpreter)
>>> x='`!##$%^&*() \][=-. >:"?<a'; y='`!##$%^&*() \][=-. >:"?<a';
>>> z='`!##$%^&*() \][=-. >:"?<a';
>>> x is y
True
>>> x is z
False
(file)
x='`!##$%^&*() \][=-. >:"?<a';
y='`!##$%^&*() \][=-. >:"?<a';
z=(lambda : '`!##$%^&*() \][=-. >:"?<a')()
print(x is y)
print(x is z)
Output: True and False
For simple binary operations, the compiler is doing very simple constant propagation (see peephole.c), but with strings it does so only if the resulting string is shorter than 21 charcters. If this is the case, the rules mentioned earlier are in force:
>>> 'a'*10+'a'*10 is 'a'*20
True
>>> 'a'*21 is 'a'*21
False
>>> 'aaaaaaaaaaaaaaaaaaaaa' is 'aaaaaaaa' + 'aaaaaaaaaaaaa'
False
>>> t=2; 'a'*t is 'aa'
False
>>> 'a'.__add__('a') is 'aa'
False
>>> x='a' ; x+='a'; x is 'aa'
False
Single characters always share memory, of course:
>>> chr(0x20) is ' '
True
To expand on Ignacio’s answer a bit: The is operator is the identity operator. It is used to compare object identity. If you construct two objects with the same contents, then it is usually not the case that the object identity yields true. It works for some small strings because CPython, the reference implementation of Python, stores the contents separately, making all those objects reference to the same string content. So the is operator returns true for those.
This however is an implementation detail of CPython and is generally neither guaranteed for CPython nor any other implementation. So using this fact is a bad idea as it can break any other day.
To compare strings, you use the == operator which compares the equality of objects. Two string objects are considered equal when they contain the same characters. So this is the correct operator to use when comparing strings, and is should be generally avoided if you do not explicitely want object identity (example: a is False).
If you are really interested in the details, you can find the implementation of CPython’s strings here. But again: This is implementation detail, so you should never require this to work.
The is operator relies on the id function, which is guaranteed to be unique among simultaneously existing objects. Specifically, id returns the object's memory address. It seems that CPython has consistent memory addresses for strings containing only characters a-z and A-Z.
However, this seems to only be the case when the string has been assigned to a variable:
Here, the id of "foo" and the id of a are the same. a has been set to "foo" prior to checking the id.
>>> a = "foo"
>>> id(a)
4322269384
>>> id("foo")
4322269384
However, the id of "bar" and the id of a are different when checking the id of "bar" prior to setting a equal to "bar".
>>> id("bar")
4322269224
>>> a = "bar"
>>> id(a)
4322268984
Checking the id of "bar" again after setting a equal to "bar" returns the same id.
>>> id("bar")
4322268984
So it seems that cPython keeps consistent memory addresses for strings containing only a-zA-Z when those strings are assigned to a variable. It's also entirely possible that this is version dependent: I'm running python 2.7.3 on a macbook. Others might get entirely different results.
In fact your code amounts to comparing objects id (i.e. their physical address). So instead of your is comparison:
>>> b = 'is it the space?'
>>> a = 'is it the space?'
>>> a is b
False
You can do:
>>> id(a) == id(b)
False
But, note that if a and b were directly in the comparison it would work.
>>> id('is it the space?') == id('is it the space?')
True
In fact, in an expression there's sharing between the same static strings. But, at the program scale there's only sharing for word-like strings (so neither spaces nor punctuations).
You should not rely on this behavior as it's not documented anywhere and is a detail of implementation.
Two or more identical strings of consecutive alphanumeric (only) characters are stored in one structure, thus they share their memory reference. There are posts about this phenomenon all over the internet since the 1990's. It has evidently always been that way. I have never seen a reasonable guess as to why that's the case. I only know that it is. Furthermore, if you split and re-join alphanumeric strings to remove spaces between words, the resulting identical alphanumeric strings do NOT share a reference, which I find odd. See below:
Add any non-alphanumeric value identically to both strings, and they instantly become copies, but not shared references.
a ="abbacca"; b = "abbacca"; a is b => True
a ="abbacca "; b = "abbacca "; a is b => False
a ="abbacca?"; b = "abbacca?"; a is b => False
~Dr. C.
'is' operator compare the actual object.
c is d should also be false. My guess is that python make some optimization and in that case, it is the same object.

Categories

Resources