Weird behaviour of id function in cpython [duplicate]

Weird behaviour of id function in cpython [duplicate] - python

This question already has answers here:
"is" operator behaves unexpectedly with integers
(11 answers)
Closed 6 years ago.
I did the following:
>>> a=10
>>> id(a)
31817408L
>>>
>>> id(10)
31817408L
So, we can see that id(a) equals id(10)
Now,i do
>>>a='what is this'
>>> id(a)
35412416L
>>>
>>>
>>>
>>> id('what is this')
31951968L
why in this case id(a) not equal to id('what is this')?
What is actually happening behind the scenes?

Different IDs mean different addresses in memory, so your two 'what is this' strings are truly two strings, even though they store the same value. On the other hand, Python optimizes the frequently-used integers so that all the occurrences point to the same object in memory. And fortunately, that object is immutable, so you can't say 10=9. If you choose an infrequently-used integer, you can see what's going on:
>>> a=555555
>>> id(a)
44506456L
>>> id(555555)
44506528L

Related

Why hyphen(-) behaves peculiarly in python strings? [duplicate]

This question already has answers here:
Python string interning
(2 answers)
Are strings cached? [duplicate]
(1 answer)
About the changing id of an immutable string
(5 answers)
Closed 4 years ago.
I found a peculiar behavior while going through Python 3 data types especially string. If two strings a and b have the same value then a is b becomes True (Strings must not contain hyphen of course).
If:
>>> a = 'string_without_hyphen'
>>> b = 'string_without_hyphen'
Then:
>>> a is b
True
>>> a == b
True
But if:
>>> a = 'string-with-hyphen'
>>> b = 'string-with-hyphen'
Then,
>>> a is b
False
>>> a == b
True
which confused me.
Why is this happening?

Because moon rays and unicorns implementation details.
The is operator compares objects by identity, not by content.
The Python implementation you're using may or may not decide to reuse the same string object for both a and b, if it feels like it, since strings are immutable in Python. The same may or may not occur for integers (and in fact, this also happens with Java's Integers if they're sufficiently small).
The gist is: never use is unless you really do need identity (address) comparison; things may be weird. Use == instead.

Different behaviors of 'is' operator when comparing variables with same int values [duplicate]

This question already has answers here:
Is there a difference between "==" and "is"?
(13 answers)
"is" operator behaves unexpectedly with integers
(11 answers)
Closed 4 years ago.
I'm playing with the 'is' operator in the interactive shell when I encountered an odd behavior with the below code:
It goes as expected at first:
>>> x = 11
>>> y = 11
>>> x is y
True
But when I tried this one:
>>> x = 987456
>>> y = 987456
>>> x is y
False
After further tries using id() function, I noticed that integers >256 points on the same object while others are not. I also noticed that this behavior only occurs in the python interactive shell. What's with this behavior?

is checks for memory address. Immutable objects that are wrappers around C type tends to have same memory address, whereas others don't. The difference here is the bytes required to store the integers.

python reference about floating point number [duplicate]

This question already has answers here:
Why id function behaves differently with integer and float?
(6 answers)
Closed 7 years ago.
Before question, Here are sample code.
Take a look at those first,please.
>>> id(1)
1636939440
>>> a = 1
>>> b = 1
>>> c = 1
>>> id(a)
1636939440
>>> id(b)
1636939440
>>> id(c)
1636939440
>>> id("hello")
43566560
>>> a = "hello"
>>> b = "hello"
>>> c = "hello"
>>> id(a)
43566560
>>> id(b)
43566560
>>> id(c)
43566560
>>> id(3.14)
34312864
>>> a = 3.14
>>> b = 3.14
>>> c = 3.14
>>> id(a)
34312864
>>> id(b)
34312600
>>> id(c)
34312432
As you see above, in terms of Integer and String, Python variable references
the object the same way. But floating point number works in different way.
Why is that? Is there any special reason for that?

For small integers and strings Python uses internal memory optimization. Since any variable in Python is a reference to memory object, Python puts such small values into the memory only once. Then, whenever the same value is assigned to any other variable, it makes that variable point to the object already kept in memory. This works for strings and integers as they are immutable and if the variable value changes, effectively it's the reference used by this variable that is changed, the object in memory with original value is not itself affected.
First of all, floating point numbers are not 'small', and, second, the same 3.14 in memory depending on calculations might be kept as 3.14123123456789 and 3.14123987654321 (just example numbers to explain). So these two values are two different objects, but during calculations and displaying the meaningful part looks the same, i.e. 3.14 (in fact there's obviously many more possible values in memory for the same floating point number). That's why reusing the same floating point number object in memory is problematic and doesn't worth it after all.
See more on how floating point numbers are kept in memory here:
http://floating-point-gui.de/
http://docs.python.org/2/tutorial/floatingpoint.html
Also, there's a big article on floating point numbers at Oracle docs.

Mutable and immutable.
Strings, tuples and bytes are immutable, whilst lists and byte arrays are mutable. Read more about the concept here: Data models in Python.

'is' operator behaves differently when comparing strings with spaces

I've started learning Python (python 3.3) and I was trying out the is operator. I tried this:
>>> b = 'is it the space?'
>>> a = 'is it the space?'
>>> a is b
False
>>> c = 'isitthespace'
>>> d = 'isitthespace'
>>> c is d
True
>>> e = 'isitthespace?'
>>> f = 'isitthespace?'
>>> e is f
False
It seems like the space and the question mark make the is behave differently. What's going on?
EDIT: I know I should be using ==, I just wanted to know why is behaves like this.

Warning: this answer is about the implementation details of a specific python interpreter. comparing strings with is==bad idea.
Well, at least for cpython3.4/2.7.3, the answer is "no, it is not the whitespace". Not only the whitespace:
Two string literals will share memory if they are either alphanumeric or reside on the same block (file, function, class or single interpreter command)
An expression that evaluates to a string will result in an object that is identical to the one created using a string literal, if and only if it is created using constants and binary/unary operators, and the resulting string is shorter than 21 characters.
Single characters are unique.
Examples
Alphanumeric string literals always share memory:
>>> x='aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
>>> y='aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
>>> x is y
True
Non-alphanumeric string literals share memory if and only if they share the enclosing syntactic block:
(interpreter)
>>> x='`!##$%^&*() \][=-. >:"?<a'; y='`!##$%^&*() \][=-. >:"?<a';
>>> z='`!##$%^&*() \][=-. >:"?<a';
>>> x is y
True
>>> x is z
False
(file)
x='`!##$%^&*() \][=-. >:"?<a';
y='`!##$%^&*() \][=-. >:"?<a';
z=(lambda : '`!##$%^&*() \][=-. >:"?<a')()
print(x is y)
print(x is z)
Output: True and False
For simple binary operations, the compiler is doing very simple constant propagation (see peephole.c), but with strings it does so only if the resulting string is shorter than 21 charcters. If this is the case, the rules mentioned earlier are in force:
>>> 'a'*10+'a'*10 is 'a'*20
True
>>> 'a'*21 is 'a'*21
False
>>> 'aaaaaaaaaaaaaaaaaaaaa' is 'aaaaaaaa' + 'aaaaaaaaaaaaa'
False
>>> t=2; 'a'*t is 'aa'
False
>>> 'a'.__add__('a') is 'aa'
False
>>> x='a' ; x+='a'; x is 'aa'
False
Single characters always share memory, of course:
>>> chr(0x20) is ' '
True

To expand on Ignacio’s answer a bit: The is operator is the identity operator. It is used to compare object identity. If you construct two objects with the same contents, then it is usually not the case that the object identity yields true. It works for some small strings because CPython, the reference implementation of Python, stores the contents separately, making all those objects reference to the same string content. So the is operator returns true for those.
This however is an implementation detail of CPython and is generally neither guaranteed for CPython nor any other implementation. So using this fact is a bad idea as it can break any other day.
To compare strings, you use the == operator which compares the equality of objects. Two string objects are considered equal when they contain the same characters. So this is the correct operator to use when comparing strings, and is should be generally avoided if you do not explicitely want object identity (example: a is False).
If you are really interested in the details, you can find the implementation of CPython’s strings here. But again: This is implementation detail, so you should never require this to work.

The is operator relies on the id function, which is guaranteed to be unique among simultaneously existing objects. Specifically, id returns the object's memory address. It seems that CPython has consistent memory addresses for strings containing only characters a-z and A-Z.
However, this seems to only be the case when the string has been assigned to a variable:
Here, the id of "foo" and the id of a are the same. a has been set to "foo" prior to checking the id.
>>> a = "foo"
>>> id(a)
4322269384
>>> id("foo")
4322269384
However, the id of "bar" and the id of a are different when checking the id of "bar" prior to setting a equal to "bar".
>>> id("bar")
4322269224
>>> a = "bar"
>>> id(a)
4322268984
Checking the id of "bar" again after setting a equal to "bar" returns the same id.
>>> id("bar")
4322268984
So it seems that cPython keeps consistent memory addresses for strings containing only a-zA-Z when those strings are assigned to a variable. It's also entirely possible that this is version dependent: I'm running python 2.7.3 on a macbook. Others might get entirely different results.

In fact your code amounts to comparing objects id (i.e. their physical address). So instead of your is comparison:
>>> b = 'is it the space?'
>>> a = 'is it the space?'
>>> a is b
False
You can do:
>>> id(a) == id(b)
False
But, note that if a and b were directly in the comparison it would work.
>>> id('is it the space?') == id('is it the space?')
True
In fact, in an expression there's sharing between the same static strings. But, at the program scale there's only sharing for word-like strings (so neither spaces nor punctuations).
You should not rely on this behavior as it's not documented anywhere and is a detail of implementation.

Two or more identical strings of consecutive alphanumeric (only) characters are stored in one structure, thus they share their memory reference. There are posts about this phenomenon all over the internet since the 1990's. It has evidently always been that way. I have never seen a reasonable guess as to why that's the case. I only know that it is. Furthermore, if you split and re-join alphanumeric strings to remove spaces between words, the resulting identical alphanumeric strings do NOT share a reference, which I find odd. See below:
Add any non-alphanumeric value identically to both strings, and they instantly become copies, but not shared references.
a ="abbacca"; b = "abbacca"; a is b => True
a ="abbacca "; b = "abbacca "; a is b => False
a ="abbacca?"; b = "abbacca?"; a is b => False
~Dr. C.

'is' operator compare the actual object.
c is d should also be false. My guess is that python make some optimization and in that case, it is the same object.

Python is vs == [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
String comparison in Python: is vs. ==
When is the == operator not equivalent to the is operator? (Python)
I'm pretty new to Python still. I heard someone say use is, not == because "this isn't C". But I had some code x is 5 and it was not working as expected.
So, following proper Python/PEP style, when is the time to use is and when is the time to use == ?

You should use == to compare two values. You should use is to see if two names are bound to the same object.
You should almost never use x is 5 because depending on the implementation small integers might be interned. This can lead to surprising results:
>>> x = 256
>>> x is 256
True
>>> x = 257
>>> x is 257
False

The two operators have different meaning.
is tests object identity. Do the two operands refer to the same object?
== tests equality of value. Do the two operands have the same value?
When it comes to comparing x and 5 you invariably are interested in the value rather than the object holding the value.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Weird behaviour of id function in cpython [duplicate] - python

Related

Why hyphen(-) behaves peculiarly in python strings? [duplicate]

Different behaviors of 'is' operator when comparing variables with same int values [duplicate]

python reference about floating point number [duplicate]

'is' operator behaves differently when comparing strings with spaces

Python is vs == [duplicate]

Categories

Resources