python atomic data types

python atomic data types - python

It was written here that Python has both atomic and reference object types. Atomic objects are: int, long, complex.
When assigning atomic object, it's value is copied, when assigning reference object it's reference is copied.
My question is:
why then, when i do the code bellow i get 'True'?
a = 1234
b = a
print id(a) == id(b)
It seems to me that i don't copy value, i just copy reference, no matter what type it is.

Assignment (binding) in Python NEVER copies data. It ALWAYS copies a reference to the value being bound.
The interpreter computes the value on the right-hand side, and the left-hand side is bound to the new value by referencing it. If expression on the right-hand side is an existing value (in other words, if no operators are required to compute its value) then the left-hand side will be a reference to the same object.
After
a = b
is executed,
a is b
will ALWAYS be true - that's how assignment works in Python. It's also true for containers, so x[i].some_attribute = y will make x[i].some_attribute is y true.
The assertion that Python has atomic types and reference types seems unhelpful to me, if not just plain untrue. I'd say it has atomic types and container types. Containers are things like lists, tuples, dicts, and instances with private attributes (to a first approximation).
As #BallPointPen helpfully pointed out in their comment, mutable values can be altered without needing to re-bind the reference. Since immutable values cannot be altered, references must be re-bound in order to refer to a different value.
Edit: Recently reading the English version of the quoted page (I'm afraid I don't understand Russian) I see "Python uses dynamic typing, and a combination of reference counting and a cycle-detecting garbage collector for memory management." It's possible the Russian page has mistranslated this to give a false impression of the language, or that it was misunderstood by the OP. But Python doesn't have "reference types" except in the most particular sense for weakrefs and similar constructs.

int types are immutable.
what you see is the reference for the number 1234 and that will never change.
for mutable object like list, dictionary you can use
import copy
a = copy.deepcopy(b)

Actually like #spectras said there are only references but there are immutable objects like floats, ints, tuples. For immutable objects (apart from memory consumption) it just does not matter if you pass around references or create copies.
The interpreter even does some optimizations making use of numbers with the same value being interchangeable making checking numbers for identity interesting because eg for
a=1
b=1
c=2/2
d=12345
e=12345*1
a is b is true and a is c is also true but d is e is false (== works normally as expected)
Immutable objects are atomic the way that changing them is threadsafe because you do not actually change the object itself but just put a new reference in a variable (which is threadsafe).

Related

Python's Passing by References [duplicate]

This question already has answers here:
"is" operator behaves unexpectedly with integers
(11 answers)
Closed 5 years ago.
Hello I am trying to understand how Python's pass by reference works.
I have an example:
>>>a = 1
>>>b = 1
>>>id(a);id(b)
140522779858088
140522779858088
This makes perfect sense since a and b are both referencing the same value that they would have the identity. What I dont quite understand is how this example:
>>>a = 4.4
>>>b = 1.0+3.4
>>>id(a);id(b)
140522778796184
140522778796136
Is different from this example:
>>>a = 2
>>>b = 2 + 0
>>>id(a);id(b)
140522779858064
140522779858064
Is it because in the 3rd example the 0 int object is being viewed as "None" by the interpreter and is not being recognized as needing a different identity from the object which variable "a" is referencing(2)? Whereas in the 2nd example "b" is adding two different int objects and the interpreter is allocating memory for both of those objects to be added, which gives variable "a", a different identity from variable "b"?

In your first example, the names a and b are both "referencing" the same object because of interning. The assignment statement resulted in an integer with the same id only because it has reused a preexisting object that happened to be hanging around in memory already. That's not a reliable behavior of integers:
>>> a = 257
>>> b = 257
>>> id(a), id(b)
(30610608, 30610728)
As demonstrated above, if you pick a big enough integer then it will behave as the floats in your second example behaved. And interning small integers is optional in the Python language anyway, this happens to be a CPython implementation detail: it's a performance optimization intended to avoid the overhead of creating a new object. We can speed things up by caching commonly used integer instances, at the cost of a higher memory footprint of the Python interpreter.
Don't think about "reference" and "value" when dealing with Python, the model that works for C doesn't really work well here. Instead think of "names" and "objects".
The diagram above illustrates your third example. 2 is an object, a and b are names. We can have different names pointing at the same object. And objects can exist without any name.
Assigning a variable only attaches a nametag. And deleting a variable only removes a nametag. If you keep this idea in mind, then the Python object model will never surprise you again.

As stated here, CPython caches integers from -5 to 256. So all variables within this range with the same value share the same id (I wouldn't bet on that for future versions, though, but that's the current implementation)
There's no such thing for floats probably because there's an "infinity" of possible values (well not infinite but big because of floating point), so the chance of computing the same value by different means is really low compared to integers.
>>> a=4.0
>>> b=4.0
>>> a is b
False

Python variables are always references to objects. These objects can be divided into mutable and immutable objects.
A mutable type can be modified without its id being modified, thus every variable that points to this object will get updated. However, an immutable object can not be modified this way, so Python generates a new object with the changes and reassigns the variable to point to this new object, thus the id of the variable before the change will not match the id of the variable after the change.
Integers and floats are immutable, so after a change they will point to a different object and thus have different ids.
The problem is that CPython "caches" some common integer values so that there are not multiple objects with the same value in order to save memory, and 2 is one of those cache-ed integers, so every time a variable points to the integer 2 it will have the same id (its value will be different for different python executions).

When is a reference created, and when is a new memory block allocated and then copy?

>>> d
{1: 1, 2: 2, 3: 3}
>>> lst = [d, d]
>>> c=lst[0]
>>> c[1]=5
>>> lst
[{1: 5, 2: 2, 3: 3}, {1: 5, 2: 2, 3: 3}]
When lst = [d, d], are lst[0] and lsg[1] both references to the memory block of d, instead of creating two memory blocks and copy the content of d to them respectively?
When c=lst[0], is c just a reference to the memory occupied by lst[0], instead of creating a new memory block and copy the content from lst[0]?
In Python, when is a reference created to point to an existing memory block, and when is a new memory block allocated and then copy?
This language feature of Python is different from C. What is the name of this language feature?
Thanks.

All variables (and other containers, such as dictionaries, lists, and object attributes) hold references to objects. Memory allocation occurs when the object is instantiated. Simple assignment always creates another reference to the existing object. For example, if you have:
a = [1, 2, 3]
b = a
Then b and a point to the same object, a list. You can verify this using the is operator:
print(b is a) # True
If you change a, then b changes too, because they are two names for the same object.
a.append(4)
print(b[3] == 4) # True
print(b[3] is a[3]) # also True
If you want to create a copy, you must do so explicitly. Here are some ways of doing this:
For lists, use a slice: b = a[:].
For many types, you can use the type name to copy an existing object of that type: b = list(a). When creating your own classes, this is a good approach to take if you need copy functionality.
The copy module has methods that can be used to copy objects (either shallowly or deeply).
For immutable types, such as strings, numbers, and tuples, there is never any need to make a copy. You can only "change" these kinds of values by referencing different ones.
The best way of describing this is probably "everything's an object." In C, "primitive" types like integers are treated differently from arrays. In Python, they are not: all values are stored as references to objects—even integers.

This paragraph from the Python tutorial should help clear things up for you:
Objects have individuality, and multiple names (in multiple scopes)
can be bound to the same object. This is known as aliasing in other
languages. This is usually not appreciated on a first glance at
Python, and can be safely ignored when dealing with immutable basic
types (numbers, strings, tuples). However, aliasing has a possibly
surprising effect on the semantics of Python code involving mutable
objects such as lists, dictionaries, and most other types. This is
usually used to the benefit of the program, since aliases behave like
pointers in some respects. For example, passing an object is cheap
since only a pointer is passed by the implementation; and if a
function modifies an object passed as an argument, the caller will see
the change — this eliminates the need for two different argument
passing mechanisms as in Pascal.
To answer your individual questions in more detail:
When lst = [d, d], are lst[0] and lst[1] both references to the memory block of d, instead of creating two memory blocks and copy the content of d to them respectively?
No. They don't refer to the memory block of d. lst[0] and lst[1] are aliasing the same object as d, at that point in time. Proof: If you assign d to a new object after initializing the list, lst[0] and lst[1] will be unchanged. If you mutate the object aliased by d, then the mutation is visible lst[0] and lst[1], because they alias the same object.
When c=lst[0], is c just a reference to the memory occupied by lst[0], instead of creating a new memory block and copy the content from lst[0]?
Again no. It's not a reference to the memory occupied by lst[0]. Proof: if you assign lst[0] to a new object, c will be unchanged. If you modify a mutable object (like the dictionary that lst[0] points to) you will see the change in c, because c is referring to the same object, the original dictionary.
In Python, when is a reference created to point to an existing memory block, and when is a new memory block allocated and then copy?
Python doesn't really work with "memory blocks" in the same way that C does. It is an abstraction away from that. Whenever you create a new object, and assign it to a variable, you've obviously got memory allocated for that object. But you will never work with that memory directly, you work with references to the objects in that memory.
Those references are the values that get assigned to symbolic names, AKA variables, AKA aliases. "pass-by-reference" is a concept from pointer-based languages like C and C++, and does not apply to Python. There is a blog post which I believe covers this topic the best.
It is often argued whether Python is pass-by-value, pass-by-reference, or pass-by-object-reference. The truth is that it doesn't matter how you think of it, as long as you understand that the entire language specification is just an abstraction for working with names and objects. Java and Ruby have similar execution models, but the Java docs call it pass-by-value while the Ruby docs call it pass-by-reference. The Python docs remain neutral on the subject, so it's best not to speculate and just see things for what they are.
This language feature of Python is different from C. What is the name of this language feature?
Associating names with objects is known as name binding. Allowing multiple names (in potentially multiple scopes) to be bound to the same object is known as aliasing. You can read more about aliasing in the Python tutorial and on Wikipedia.
It might also be helpful for you to read would be the execution model documentation where it talks about name binding and scopes in more detail.

In short; Python is pass-by-reference. Objects are created and memory allocated upon their construction. Referencing objects does not allocate more memory unless you are either creating new objects or expanding existing objects (list.append())
This post Is Python pass-by-reference or pass-by-value covers it very well.
As a side note; if you are worried about how memory is allocated in a manage programming language like Python then you're probably using the wrong language and/or prematurely optimizing. Also how memory is managed in Python is implemtnation specific as there are many implementations of Python; CPython (what you are probably using); Jython, IronPython, PyPy, MicroPython, etc.

Query on returning locally scoped objects in python

Below is the program that returns function type object defined in function f whose stack frame(f1) is still alive until the program exits.
Below is the program that returns int type object whose value is 1024, but the stack frame does not exist after we return int type object?
As per the above two diagrams, Why this difference in return type mechanisms, where frame is not alive, when you return int type object.
What is the idea for stack frame being alive when function type object is returned?

Python never make copies unless explicitly asked to (for example, slicing a list does ask Python to copy that part of the list, shallowly).
"Does add_three refer to same int object that n is pointing to?" -- yes, only references to that int are being passed around and held in frames. In this case this applies whatever the value of n.
Any Python implementation is allowed to keep a single copy, or multiple copies, of immutable objects line ints -- whatever's most convenient to that implementation, given the semantics are not affected anyway.
So in a given implementation it could happen that every mention of literal 3 refers to the same int object but mentions of literal 333 need not. E.g:
2>>> a=333; b=333; print(id(a), id(b))
(4298804944, 4298804944)
2>>> a=333
2>>> b=333
2>>> print(id(a), id(b))
(4298753600, 4298753336)
The semantics of the two cases are absolutely identical; in the first case the compiler (intrinsically called on the whole line at once) finds it handy to instantiate and use a single int worth 333, in the second case it prefers to make and use two such instances -- either is completely fine, given int's immutability (same goes for other number types, strings, tuples, frozen sets -- but not for mutable types).
Note that when the Python specification refers to "same semantics", it explicitly includes introspection, which may be able to pinpoint implementation differences between semantically equivalent states.
id (normally returning the memory address of an object, in current popular implementations of Python, but in any case an id that's unique per object as long as the object lives, per language specs) is introspection, as consequently is the is operator. So you can if you wish use it to understand some optimizations a given implementation may perform, or not.
So on to your other Qs: "Is my understanding correct?" -- no.
"Why this difference" -- def builds a function object, which is mutable, so any def even with identical function definitions must return a new object, just like e.g [] builds a list object, mutable, so any [] must return a new object. 3 build an int object, which is immutable, so any 3 is allowed (per language rules) to return either the same or a new object.
One more question was added in an edit: "What is the idea for stack frame being alive when function type object is returned?"
Answer: every object stays alive as long as it's reachable. An outer function's frame, in particular, stays alive as long as inner (nested) functions is returned, if they refer to names in the outer frame.
(Any Python implementation doesn't have to garbage-collect objects that don't any more need to be alive -- it may delay that garbage collection as long as it pleases, or can perform it at once -- implementation details!-).

How to make a variable (truly) local to a procedure or function

ie we have the global declaration, but no local.
"Normally" arguments are local, I think, or they certainly behave that way.
However if an argument is, say, a list and a method is applied which modifies the list, some surprising (to me) results can ensue.
I have 2 questions: what is the proper way to ensure that a variable is truly local?
I wound up using the following, which works, but it can hardly be the proper way of doing it:
def AexclB(a,b):
z = a+[] # yuk
for k in range(0, len(b)):
try: z.remove(b[k])
except: continue
return z
Absent the +[], "a" in the calling scope gets modified, which is not desired.
(The issue here is using a list method,
The supplementary question is, why is there no "local" declaration?
Finally, in trying to pin this down, I made various mickey mouse functions which all behaved as expected except the last one:
def fun4(a):
z = a
z = z.append(["!!"])
return z
a = ["hello"]
print "a=",a
print "fun4(a)=",fun4(a)
print "a=",a
which produced the following on the console:
a= ['hello']
fun4(a)= None
a= ['hello', ['!!']]
...
>>>
The 'None' result was not expected (by me).
Python 2.7 btw in case that matters.
PS: I've tried searching here and elsewhere but not succeeded in finding anything corresponding exactly - there's lots about making variables global, sadly.

It's not that z isn't a local variable in your function. Rather when you have the line z = a, you are making z refer to the same list in memory that a already points to. If you want z to be a copy of a, then you should write z = a[:] or z = list(a).
See this link for some illustrations and a bit more explanation http://henry.precheur.org/python/copy_list

Python will not copy objects unless you explicitly ask it to. Integers and strings are not modifiable, so every operation on them returns a new instance of the type. Lists, dictionaries, and basically every other object in Python are mutable, so operations like list.append happen in-place (and therefore return None).
If you want the variable to be a copy, you must explicitly copy it. In the case of lists, you slice them:
z = a[:]

There is a great answer than will cover most of your question in here which explains mutable and immutable types and how they are kept in memory and how they are referenced. First section of the answer is for you. (Before How do we get around this? header)
In the following line
z = z.append(["!!"])
Lists are mutable objects, so when you call append, it will update referenced object, it will not create a new one and return it. If a method or function do not retun anything, it means it returns None.
Above link also gives an immutable examle so you can see the real difference.
You can not make a mutable object act like it is immutable. But you can create a new one instead of passing the reference when you create a new object from an existing mutable one.
a = [1,2,3]
b = a[:]
For more options you can check here

What you're missing is that all variable assignment in python is by reference (or by pointer, if you like). Passing arguments to a function literally assigns values from the caller to the arguments of the function, by reference. If you dig into the reference, and change something inside it, the caller will see that change.
If you want to ensure that callers will not have their values changed, you can either try to use immutable values more often (tuple, frozenset, str, int, bool, NoneType), or be certain to take copies of your data before mutating it in place.
In summary, scoping isn't involved in your problem here. Mutability is.
Is that clear now?
Still not sure whats the 'correct' way to force the copy, there are
various suggestions here.
It differs by data type, but generally <type>(obj) will do the trick. For example list([1, 2]) and dict({1:2}) both return (shallow!) copies of their argument.
If, however, you have a tree of mutable objects and also you don't know a-priori which level of the tree you might modify, you need the copy module. That said, I've only needed this a handful of times (in 8 years of full-time python), and most of those ended up causing bugs. If you need this, it's a code smell, in my opinion.
The complexity of maintaining copies of mutable objects is the reason why there is a growing trend of using immutable objects by default. In the clojure language, all data types are immutable by default and mutability is treated as a special cases to be minimized.

If you need to work on a list or other object in a truly local context you need to explicitly make a copy or a deep copy of it.
from copy import copy
def fn(x):
y = copy(x)

Why do Python variables take a new address (id) every time they're modified?

Just wondering what the logic behind this one is? On the surface it seems kind of inefficient, that every time you do something simple like "x=x+1" that it has to take a new address and discard the old one.

The Python variable (called an identifier or name, in Python) is a reference to a value. The id() function says something for that value, not the name.
Many values are not mutable; integers, strings, floats all do not change in place. When you add 1 to another integer, you return a new integer that then replaces the reference to the old value.
You can look at Python names as labels, tied to values. If you imagine values as balloons, you are retying the label a new balloon each time you assign to that name. If there are no other labels attached to a balloon anymore, it simply drifts away in the wind, never to be seen again. The id() function gives you a unique number for that balloon.
See this previous answer of mine where I talk a little bit more about that idea of values-as-balloons.
This may seem inefficient. For many often used and small values, Python actually uses a process called interning, where it will cache a stash of these values for re-use. None is such a value, as are small integers and the empty tuple (()). You can use the intern() function to do the same with strings you expect to use a lot.
But note that values are only cleaned up when their reference count (the number of 'labels') drops to 0. Loads of values are reused all over the place all the time, especially those interned integers and singletons.

Because the basic types are immutable, so every time you modify it, it needs to be instantiated again
...which is perfectly fine, especially for thread-safe functions

The = operator doesn't modify an object, it assigns the name to a completely different object, which may or may not already have an id.
For your example, integers are immutable; there's no way to add something to one and keep the same id.
And, in fact, small integers are interned at least in cPython, so if you do:
x = 1
y = 2
x = x + 1
Then x and y may have the same id.

In python "primitive" types like ints and strings are immutable, which means they can not be modified.
Python is actually quite efficient, because, as #Wooble commented, «Very short strings and small integers are interned.»: if two variables reference the same (small) immutable value their id is the same (reducing duplicated immutables).
>>> a = 42
>>> b = 5
>>> id(a) == id(b)
False
>>> b += 37
>>> id(a) == id(b)
True
The reason behind the use of immutable types is a safe approach to the concurrent access on those values.
At the end of the day it depends on a design choice.
Depending on your needs you can take more advantage of an implementation instead of another.
For instance, a different philosophy can be found in a somewhat similar language, Ruby, where those types that in Python are immutable, are not.

To be accurate, assignment x=x+1 doesn't modify the object that x is referencing, it just lets the x point to another object whose value is x+1.
To understand the logic behind, one needs to understand the difference between value semantics and reference semantics.
An object with value semantics means only its value matters, not its identity. While an object with reference semantics focuses on its identity(in Python, identity can be returned from id(obj)).
Typically, value semantics implies immutability of the object. Or conversely, if an object is mutable(i.e. in-place change), that means it has reference semantics.
Let's briefly explain the rationale behind this immutability.
Objects with reference semantics can be changed in-place without losing their original addresses/identities. This makes sense in that it's the identity of an object with reference semantics that makes itself distinguishable from other objects.
In contrast, an object with value-semantics should never change itself.
First, this is possible and reasonable in theory. Since only the value(not its identity) is significant, when a change is needed, it's safe to swap it to another identity with different value. This is called referential transparency. Be noted that this is impossible for the objects with reference semantics.
Secondly, this is beneficial in practice. As the OP thought, it seems inefficient to discard the old objects each time when it's changed , but most time it's more efficient than not. For one thing, Python(or any other language) has intern/cache scheme to make less objects to be created. What's more, if objects of value-semantics were designed to be mutable, it would take much more space in most cases.
For example, Date has a value semantics. If it's designed to be mutable, any method that returning a date from internal field will exposes the handle to outside world, which is risky(e.g. outside can directly modify this internal field without resorting to public interface). Similarly, if one passes any date object by reference to some function/method, this object could be modified in that function/method, which may be not as expected. To avoid these kinds of side-effect, one has to do defensive programming: instead of directly returning the inner date field, he returns a clone of it; instead of passing by reference, he passes by value which means extra copies are made. As one could imagine, there are more chances to create more objects than necessary. What's worse, code becomes more complicated with these extra cloning.
In a word, immutability enforces the value-semantics, it usually involves less object creation, has less side-effects and less hassles, and is more test-friendly. Besides, immutable objects are inherently thread-safe, which means less locks and better efficiency in multithreading environment.
That's the reason why basic data types of value-semantics like number, string, date, time are all immutable(well, string in C++ is an exception, that's why there're so many const string& stuffs to avoid string being modified unexpectedly). As a lesson, Java made mistakes on designing value-semantic class Date, Point, Rectangle, Dimension as mutable.
As we know, objects in OOP have three characteristics: state, behavior and identity. Objects with value semantics are not typical objects in that their identities do not matter at all. Usually they are passive, and mostly used to describe other real, active objects(i.e. those with reference semantics). This is a good hint to distinguish between value semantics and reference semantics.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.