python reference about floating point number [duplicate] - python

This question already has answers here:
Why id function behaves differently with integer and float?
(6 answers)
Closed 7 years ago.
Before question, Here are sample code.
Take a look at those first,please.
>>> id(1)
1636939440
>>> a = 1
>>> b = 1
>>> c = 1
>>> id(a)
1636939440
>>> id(b)
1636939440
>>> id(c)
1636939440
>>> id("hello")
43566560
>>> a = "hello"
>>> b = "hello"
>>> c = "hello"
>>> id(a)
43566560
>>> id(b)
43566560
>>> id(c)
43566560
>>> id(3.14)
34312864
>>> a = 3.14
>>> b = 3.14
>>> c = 3.14
>>> id(a)
34312864
>>> id(b)
34312600
>>> id(c)
34312432
As you see above, in terms of Integer and String, Python variable references
the object the same way. But floating point number works in different way.
Why is that? Is there any special reason for that?

For small integers and strings Python uses internal memory optimization. Since any variable in Python is a reference to memory object, Python puts such small values into the memory only once. Then, whenever the same value is assigned to any other variable, it makes that variable point to the object already kept in memory. This works for strings and integers as they are immutable and if the variable value changes, effectively it's the reference used by this variable that is changed, the object in memory with original value is not itself affected.
First of all, floating point numbers are not 'small', and, second, the same 3.14 in memory depending on calculations might be kept as 3.14123123456789 and 3.14123987654321 (just example numbers to explain). So these two values are two different objects, but during calculations and displaying the meaningful part looks the same, i.e. 3.14 (in fact there's obviously many more possible values in memory for the same floating point number). That's why reusing the same floating point number object in memory is problematic and doesn't worth it after all.
See more on how floating point numbers are kept in memory here:
http://floating-point-gui.de/
http://docs.python.org/2/tutorial/floatingpoint.html
Also, there's a big article on floating point numbers at Oracle docs.

Mutable and immutable.
Strings, tuples and bytes are immutable, whilst lists and byte arrays are mutable. Read more about the concept here: Data models in Python.

Related

How does Python know the values already stored in its memory?

I want to know how Python knows (if it knows) that a value-type object is already stored in its memory (and also knows where it is).
For this code, when assigning the value 1 for b, how does it know that the value 1 is already in its memory and stores its reference in b?
>>> a = 1
>>> b = 1
>>> a is b
True
Python (CPython precisely) uses shared small integers to help quick access. Integers range from [-5, 256] already exists in memory, so if you check the address, they are the same. However, for larger integers, it's not true.
a = 100000
b = 100000
a is b # False
Wait, what? If you check the address of the numbers, you'll find something interesting:
a = 1
b = 1
id(a) # 4463034512
id(b) # 4463034512
a = 257
b = 257
id(a) # 4642585200
id(b) # 4642585712
It's called integer cache. You can read more about the integer cache here.
Thanks comments from #KlausD and #user2357112 mentioning, direct access on small integers will be using integer cache, while if you do calculations, though they might equals to a number in range [-5, 256], it's not a cached integer. e.g.
pow(3, 47159012670, 47159012671) is 1 # False
pow(3, 47159012670, 47159012671) == 1 # True
“The current implementation keeps an array of integer objects for all
integers between -5 and 256, when you create an int in that range you
actually just get back a reference to the existing object.”
Why? Because small integers are more frequently used by loops. Using reference to existing objects instead of creating a new object saves an overhead.
If you take a look at Objects/longobject.c, which implements the int type for CPython, you will see that the numbers between -5 (NSMALLNEGINTS) and 256 (NSMALLPOSINTS - 1) are pre-allocated and cached. This is done to avoid the penalty of allocating multiple unnecessary objects for the most commonly used integers. This works because integers are immutable: you don't need multiple references to represent the same number.
Python doesn't know anything until you tell it. So in your code above, when you initialize a and b, you are storing those values(in the register or RAM), and calling the place to store it a and b, so that you can reference them later. If you didn't initialize the variable first, python would just give you an error.

Why hyphen(-) behaves peculiarly in python strings? [duplicate]

This question already has answers here:
Python string interning
(2 answers)
Are strings cached? [duplicate]
(1 answer)
About the changing id of an immutable string
(5 answers)
Closed 4 years ago.
I found a peculiar behavior while going through Python 3 data types especially string. If two strings a and b have the same value then a is b becomes True (Strings must not contain hyphen of course).
If:
>>> a = 'string_without_hyphen'
>>> b = 'string_without_hyphen'
Then:
>>> a is b
True
>>> a == b
True
But if:
>>> a = 'string-with-hyphen'
>>> b = 'string-with-hyphen'
Then,
>>> a is b
False
>>> a == b
True
which confused me.
Why is this happening?
Because moon rays and unicorns implementation details.
The is operator compares objects by identity, not by content.
The Python implementation you're using may or may not decide to reuse the same string object for both a and b, if it feels like it, since strings are immutable in Python. The same may or may not occur for integers (and in fact, this also happens with Java's Integers if they're sufficiently small).
The gist is: never use is unless you really do need identity (address) comparison; things may be weird. Use == instead.

Curious Modulus Operator (%) Result

What's going on here?
>>> a = np.int8(1)
>>> a%2
1
>>> a = np.uint8(1)
>>> a%2
1
>>> a = np.int32(1)
>>> a%2
1
>>> a = np.uint32(1)
>>> a%2
1
>>> a = np.int64(1)
>>> a%2
1
>>> a = np.uint64(1)
>>> a%2
'1.0'
We suddenly get what appears to be a a string containing the float 1.0!?
>>> a = np.uint64(1)
>>> type(a%2)
<type 'numpy.float64'>
...though it turns out it's simply a float.
What's the philosophy behind this?
I understand that numpy wants to be stricter about things like types and typing rules in order to be more efficient than basic python, but in this case the downsides of returning a very unexpected result to the user (likely breaking their program) seems to far outweigh the slight increase in cost of just checking the sign of the modulus before wandering down this slippery path.
It's not too rare to be working with uint64 values. For example, if you ever load an image into a numpy int array and then sum it, you have uint64(s). On the other hand, it's extremely rare to ever mod anything by a negative number (I've never done it except to see what would happen), because you generally mod things you can count such as indices, and different languages/standards/libraries can each have their own idea of what the result should be.
All this put together leaves me rather confused.
We suddenly get what appears to be a a string containing the float 1.0!?
This is still a float64 - it just looks weird due to a bug in numpy 1.14.3, which is fixed in 1.15.0-dev.
You'd normally thing that there are only two ways to convert to a string - __repr__ (tp_repr), and __str__ (tp_str).
It turns out that in python 2, there's one more - tp_print. This is only called when outputting directly to the console or the interpreter.
It turns out we implemented this wrong for only the interpreter. It's pretty tricky to test interpreter behavior in the test suite!
though it turns out it's simply a float.
This is sort of by design - 2 is inferred to be np.int64(2), and coercing {int64, uint64} -> float64 (to not cause truncation). There are numerous issues about this, but it's tricky to fix.

Weird behaviour of id function in cpython [duplicate]

This question already has answers here:
"is" operator behaves unexpectedly with integers
(11 answers)
Closed 6 years ago.
I did the following:
>>> a=10
>>> id(a)
31817408L
>>>
>>> id(10)
31817408L
So, we can see that id(a) equals id(10)
Now,i do
>>>a='what is this'
>>> id(a)
35412416L
>>>
>>>
>>>
>>> id('what is this')
31951968L
why in this case id(a) not equal to id('what is this')?
What is actually happening behind the scenes?
Different IDs mean different addresses in memory, so your two 'what is this' strings are truly two strings, even though they store the same value. On the other hand, Python optimizes the frequently-used integers so that all the occurrences point to the same object in memory. And fortunately, that object is immutable, so you can't say 10=9. If you choose an infrequently-used integer, you can see what's going on:
>>> a=555555
>>> id(a)
44506456L
>>> id(555555)
44506528L

'is' operator behaves differently when comparing strings with spaces

I've started learning Python (python 3.3) and I was trying out the is operator. I tried this:
>>> b = 'is it the space?'
>>> a = 'is it the space?'
>>> a is b
False
>>> c = 'isitthespace'
>>> d = 'isitthespace'
>>> c is d
True
>>> e = 'isitthespace?'
>>> f = 'isitthespace?'
>>> e is f
False
It seems like the space and the question mark make the is behave differently. What's going on?
EDIT: I know I should be using ==, I just wanted to know why is behaves like this.
Warning: this answer is about the implementation details of a specific python interpreter. comparing strings with is==bad idea.
Well, at least for cpython3.4/2.7.3, the answer is "no, it is not the whitespace". Not only the whitespace:
Two string literals will share memory if they are either alphanumeric or reside on the same block (file, function, class or single interpreter command)
An expression that evaluates to a string will result in an object that is identical to the one created using a string literal, if and only if it is created using constants and binary/unary operators, and the resulting string is shorter than 21 characters.
Single characters are unique.
Examples
Alphanumeric string literals always share memory:
>>> x='aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
>>> y='aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
>>> x is y
True
Non-alphanumeric string literals share memory if and only if they share the enclosing syntactic block:
(interpreter)
>>> x='`!##$%^&*() \][=-. >:"?<a'; y='`!##$%^&*() \][=-. >:"?<a';
>>> z='`!##$%^&*() \][=-. >:"?<a';
>>> x is y
True
>>> x is z
False
(file)
x='`!##$%^&*() \][=-. >:"?<a';
y='`!##$%^&*() \][=-. >:"?<a';
z=(lambda : '`!##$%^&*() \][=-. >:"?<a')()
print(x is y)
print(x is z)
Output: True and False
For simple binary operations, the compiler is doing very simple constant propagation (see peephole.c), but with strings it does so only if the resulting string is shorter than 21 charcters. If this is the case, the rules mentioned earlier are in force:
>>> 'a'*10+'a'*10 is 'a'*20
True
>>> 'a'*21 is 'a'*21
False
>>> 'aaaaaaaaaaaaaaaaaaaaa' is 'aaaaaaaa' + 'aaaaaaaaaaaaa'
False
>>> t=2; 'a'*t is 'aa'
False
>>> 'a'.__add__('a') is 'aa'
False
>>> x='a' ; x+='a'; x is 'aa'
False
Single characters always share memory, of course:
>>> chr(0x20) is ' '
True
To expand on Ignacio’s answer a bit: The is operator is the identity operator. It is used to compare object identity. If you construct two objects with the same contents, then it is usually not the case that the object identity yields true. It works for some small strings because CPython, the reference implementation of Python, stores the contents separately, making all those objects reference to the same string content. So the is operator returns true for those.
This however is an implementation detail of CPython and is generally neither guaranteed for CPython nor any other implementation. So using this fact is a bad idea as it can break any other day.
To compare strings, you use the == operator which compares the equality of objects. Two string objects are considered equal when they contain the same characters. So this is the correct operator to use when comparing strings, and is should be generally avoided if you do not explicitely want object identity (example: a is False).
If you are really interested in the details, you can find the implementation of CPython’s strings here. But again: This is implementation detail, so you should never require this to work.
The is operator relies on the id function, which is guaranteed to be unique among simultaneously existing objects. Specifically, id returns the object's memory address. It seems that CPython has consistent memory addresses for strings containing only characters a-z and A-Z.
However, this seems to only be the case when the string has been assigned to a variable:
Here, the id of "foo" and the id of a are the same. a has been set to "foo" prior to checking the id.
>>> a = "foo"
>>> id(a)
4322269384
>>> id("foo")
4322269384
However, the id of "bar" and the id of a are different when checking the id of "bar" prior to setting a equal to "bar".
>>> id("bar")
4322269224
>>> a = "bar"
>>> id(a)
4322268984
Checking the id of "bar" again after setting a equal to "bar" returns the same id.
>>> id("bar")
4322268984
So it seems that cPython keeps consistent memory addresses for strings containing only a-zA-Z when those strings are assigned to a variable. It's also entirely possible that this is version dependent: I'm running python 2.7.3 on a macbook. Others might get entirely different results.
In fact your code amounts to comparing objects id (i.e. their physical address). So instead of your is comparison:
>>> b = 'is it the space?'
>>> a = 'is it the space?'
>>> a is b
False
You can do:
>>> id(a) == id(b)
False
But, note that if a and b were directly in the comparison it would work.
>>> id('is it the space?') == id('is it the space?')
True
In fact, in an expression there's sharing between the same static strings. But, at the program scale there's only sharing for word-like strings (so neither spaces nor punctuations).
You should not rely on this behavior as it's not documented anywhere and is a detail of implementation.
Two or more identical strings of consecutive alphanumeric (only) characters are stored in one structure, thus they share their memory reference. There are posts about this phenomenon all over the internet since the 1990's. It has evidently always been that way. I have never seen a reasonable guess as to why that's the case. I only know that it is. Furthermore, if you split and re-join alphanumeric strings to remove spaces between words, the resulting identical alphanumeric strings do NOT share a reference, which I find odd. See below:
Add any non-alphanumeric value identically to both strings, and they instantly become copies, but not shared references.
a ="abbacca"; b = "abbacca"; a is b => True
a ="abbacca "; b = "abbacca "; a is b => False
a ="abbacca?"; b = "abbacca?"; a is b => False
~Dr. C.
'is' operator compare the actual object.
c is d should also be false. My guess is that python make some optimization and in that case, it is the same object.

Categories

Resources