I asked a previous question on stackoverflow here: Python immutable types in function calls
which made it clear that only references to immutable objects are passed to functions, and so passing a tuple to a function does not result in a full memory copy of that object.
However, according to: http://www.testingreflections.com/node/view/5126
"Some objects, like strings, tuples,
and numbers, are immutable. Altering
them inside a function/method will
create a new instance and the original
instance outside the function/method
is not changed."
I wrote some test code, where an immutable object is passed to a function. As expected, I can modify the object via the parameter-name/reference defined as part of the function header, and all changes only persist within the called function, leaving the original object outside of the function untouched.
So my question is:
Is the new instance created only when an attempt is made to alter/modify the object passed in? I'm guessing that if the object is not changed, a reference to it is all that is required. More importantly, if it does create a copy upon attempted modification, how does python manage the memory? With a zero-copy/copy-on-write, or does it create a complete replicated object (with the whole object's size reserved in memory) visible only within the called function?
You will think a lot more clearly about variables in Python if you think of them not as boxes that contain values, but names that are attached to objects. Any object can have any number of names attached to it; some of the names are local to functions and will be taken off the object automatically when the function returns.
So when you do something like this:
name = "Slartibartfast"
person = name
There is a string object, which contains the text "Slartibartfast", and there are two names by which it can be referred: name and person. You get the same object in both cases; you can verify this with the id() function.
Which is the "real" name of the string, name or person? This is a trick question. The string does not inherently have a name; it is just a string. name is not a box that contains "Slartibartfast", it is just an identifier that refers to the object. person has exactly the same standing in Python; name is not "more important" just because it was assigned first.
NOTE: Some objects, such as functions and classes, have a __name__ attribute that holds the name that was used to declare it in the def or class statement. This is the object's "real name" if it can be said to have one. However, you can still reference it through any number of assigned names.
Now, suppose you "modify" the string to give it a bit more of a Dutch flavor:
person = person.replace("art", "aart")
"Modify" is in quotes because you can't modify a string. Since a string is immutable, every string operation creates a new string. When does it happen? Immediately. In this case, the new string "Slaartibaartfast" is created and the name person is adjusted to refer to that. However, the name name still refers to the original string, because you haven't told it to refer to anything else. As long as at least one name refers to it, Python will keep good old "Slartibartfast" around.
This is no different when dealing with a function:
def dutchnametag(name):
name = name.replace("art", "aart")
print "HELLO! My Dutch name is", name
person = "Slartibartfast"
dutchnametag(person)
Here we assign the string "Slartibartfast" to the global name person. We then pass that string to our function, where it receives the additional local name name. The string's replace() method is then called through the name identifier, creating a new string. The identifier name is then reassigned to point to the new string. Outside the function, the global identifier person still refers to the original string, because nothing has changed it to point to anything else.
I'm not speaking about python per se. But generally, in immutable data structures, every method that you use that needs to change state will return a new object (with the modified state). The old one will remain the same.
For example, a Java mutable list could have:
void addItem(Object item) { ... }
the correspondent immutable List would have a method in the lines of
List addItem(Object item) { ... }
So, there is generally nothing special about immutable data structures. In any language you may create immutable data structures. But some languages make it hard or impossible to create mutable data structures (generally, functional languages).
Some languages may provide pseudo-immutable data structures. They make some special data structures look like immutable to the coder, while indeed they aren't.
If an object is immutable there is no way to change it. You could assign a new object to the name formerly associated with the argument object. To do this you first need to make a new object. So yes, you would allocate space for a complete new object.
Related
After getting in touch with the more deeper workings of Python, I've come to the understanding that assigning a variable equals to the creation of a new object which has their own specific address regardless of the variable-name that was assigned to the object.
In that case though, it makes me wonder what happens to an object that was created and modified later on. Does it sit there and consumes memory?
The scenarion in mind looks something like this:
# Creates object with id 10101001 (random numbers)
x = 5
# Creates object with id 10010010 using the value from object 10101001.
x += 10
What happens to object with the id 10101001?
Out of curiosity too, why do objects need an ID AND a refrence that is the variable name, wouldn't it be better to just assign the address with the variable name?
I apologize in advance for the gringe this question might invoke in someone.
Here is a great talk that was given at PyCon by Ned Batchelder this year about how Python manages variables.
https://www.youtube.com/watch?v=_AEJHKGk9ns
I think it will help clear up some of your confusion.
First of all Augmented assignment statements states:
An augmented assignment expression like x += 1 can be rewritten x = x + 1 to achieve a similar, but not exactly equal effect. In the augmented version, x is only evaluated once. Also, when possible, the actual operation is performed in-place, meaning that rather than creating a new object and assigning that to the target, the old object is modified instead.
So depending on the type of x this might not create a new object.
Python is reference counted. So the reference count of the object with id 10101001 decremented. If this count hits zero, the is freed almost immediately. But most low range integers are cached anyways. Refer to Objects, Types and Reference Counts for all the details.
Regarding the id of an object:
CPython implementation detail: This is the address of the object in memory.
So basically id and reference are the same. The variable name is just a binding to the object itself.
Below is the program that returns function type object defined in function f whose stack frame(f1) is still alive until the program exits.
Below is the program that returns int type object whose value is 1024, but the stack frame does not exist after we return int type object?
As per the above two diagrams, Why this difference in return type mechanisms, where frame is not alive, when you return int type object.
What is the idea for stack frame being alive when function type object is returned?
Python never make copies unless explicitly asked to (for example, slicing a list does ask Python to copy that part of the list, shallowly).
"Does add_three refer to same int object that n is pointing to?" -- yes, only references to that int are being passed around and held in frames. In this case this applies whatever the value of n.
Any Python implementation is allowed to keep a single copy, or multiple copies, of immutable objects line ints -- whatever's most convenient to that implementation, given the semantics are not affected anyway.
So in a given implementation it could happen that every mention of literal 3 refers to the same int object but mentions of literal 333 need not. E.g:
2>>> a=333; b=333; print(id(a), id(b))
(4298804944, 4298804944)
2>>> a=333
2>>> b=333
2>>> print(id(a), id(b))
(4298753600, 4298753336)
The semantics of the two cases are absolutely identical; in the first case the compiler (intrinsically called on the whole line at once) finds it handy to instantiate and use a single int worth 333, in the second case it prefers to make and use two such instances -- either is completely fine, given int's immutability (same goes for other number types, strings, tuples, frozen sets -- but not for mutable types).
Note that when the Python specification refers to "same semantics", it explicitly includes introspection, which may be able to pinpoint implementation differences between semantically equivalent states.
id (normally returning the memory address of an object, in current popular implementations of Python, but in any case an id that's unique per object as long as the object lives, per language specs) is introspection, as consequently is the is operator. So you can if you wish use it to understand some optimizations a given implementation may perform, or not.
So on to your other Qs: "Is my understanding correct?" -- no.
"Why this difference" -- def builds a function object, which is mutable, so any def even with identical function definitions must return a new object, just like e.g [] builds a list object, mutable, so any [] must return a new object. 3 build an int object, which is immutable, so any 3 is allowed (per language rules) to return either the same or a new object.
One more question was added in an edit: "What is the idea for stack frame being alive when function type object is returned?"
Answer: every object stays alive as long as it's reachable. An outer function's frame, in particular, stays alive as long as inner (nested) functions is returned, if they refer to names in the outer frame.
(Any Python implementation doesn't have to garbage-collect objects that don't any more need to be alive -- it may delay that garbage collection as long as it pleases, or can perform it at once -- implementation details!-).
I saw in a book about language description that says
On the other hand, a name can be bound to no object (a dangling pointer),
one object (the usual case), or several objects (a parameter name in a
recursive function).
How can we bind a name to several objects? Isnt that what we call an array for example where all elements have the same name but with index? For a recursive function like the example here:
x = 0
def f(y):
global x
x += 1
if x < 4 :
y +=100
f(y)
else: return
f(100)
Is the name y binded with multiple values that are created recursively since the nametable has already the y name binded to an initial value which is being reproduced with recursion?
EDITED Just press here Visualizer and see what it generates. :)
No.
A name is bound to one single object . When we are talking about Python - it is either bound to a single object in a given context, or do not exist at all.
What happens, is that the inner workings may have the name defined in several "layers" - but your code will only see one of those.
If a name is a variable in a recursive function, you will only see whatver is bound to it in the current running context - each time there is a function call in Python, the execution frame, which is an object which holds several attributes of the running code, including a reference to the local variables, is frozen. On the called function, a new execuciton frame is created, and there, the variable names are bound again to whatever new values they have in the called context. Your code just "see" this instance.
Then, there is the issue of global variables and builtin objects in Python: if a name is not a local variable in the function execution context, it is searched in the globals variables for the module (again, just one of those will be visible).ANd if the name is not defiend in the globals, them, Python looks for it in globals().__builtins__ that is your last call.
If I understand you correctly, you're asking about what rules Python has for creating variables in different scopes. Python uses lexical scoping on the function level.
It's hard to tell exactly what you're getting at with the code you've written, but, while there may be a different value associated with y in different scopes (with a value of y defined at each level of recursion), your code will only ever be able to see one at a time (the value defined at the scope in which you're operating).
To really understand scoping rules in Python, I would have a look at PEP 227. Also, have a look at this Stack Overflow question.
Finally, to be able to speak intelligently about what a "name" is in Python, I suggest you read about how Python is a "Call-By-Object" language.
At this point, we are capable of understanding that, instead of a "nametable", python uses a dictionary to hold what is accessible in a given scope. See this answer for a little more detail. The implication of this is that you can never have two of the same name in a single scope (for the same reason you can't have two of the same key in a python dictionary). So, while y may exist in a dictionary for a different scope, you have no way of accessing it, since you can only access the variables in the current scope's dictionary.
The key is:
several objects (a parameter name in a recursive function).
The passage is almost certainly not referring to arrays, but simply to the fact that in a recursive function (or any function, but a recursive function is likely to have multiple activations at one time), a parameter may be bound to a different value in each recursive call.
This does not mean that you can access each such object in every stack frame; indeed the point of the technique is to ensure that only one such value is accessible in each stack frame.
Firstly, you should mention in the question that the sentence from the book is not related explicitly to Python (as jsbueno wrote, one name is bound to exactly one object in Python).
Anyway, name bound to no object is a bit inaccurate. Generally, names are related to variables, and name related to a dangling pointer is the name of that pointer variable.
When speaking about the variable scope (i.e. the part of code where the variable is used), one variable name can be used only for a single value at a time. However, there may be other parts of code, independent on the one where we think about that variable. In the other part of code, the same name can be used; however, the two variables with the same name are totally isolated. This is the case of local variables also in the case of function bodies. If the language allows recursion, it must be capable to create another isolated space of local variable even for another call of the same function.
In Python, each function can also access outer variables, but it is more usual to use the inner, local variables. Whenever you assign a name some value, it is created in the local space.
I have to write a testing module and have c++-Background. That said, I am aware that there are no pointers in python but how do I achieve the following:
I have a test method which looks in pseudocode like this:
def check(self,obj,prop,value):
if obj.prop <> value: #this does not work,
#getattr does not work either, (objects has no such method (interpreter output)
#I am working with objects from InCyte's python interface
#the supplied findProp method does not do either (i get
#None for objects I can access on the shell with obj.prop
#and yes I supply the method with a string 'prop'
if self._autoadjust:
print("Adjusting prop from x to y")
obj.prop = value #setattr does not work, see above
else:
print("Warning Value != expected value for obj")
Since I want to check many different objects in separate functions I would like to be able to keep the check method in place.
In general, how do I ensure that a function affects the passed object and does not create a copy?
myobj.size=5
resize(myobj,10)
print myobj.size #jython =python2.5 => print is not a function
I can't make resize a member method since the myobj implementation is out of reach, and I don't want to type myobj=resize(myobj, 10) everywhere
Also, how can I make it so that I can access those attributes in a function to which i pass the object and the attribute name?
getattr isn't a method, you need to call it like this
getattr(obj, prop)
similarly setattr is called like this
setattr(obj, prop, value)
In general how do I ensure that a function affects the passed object and does not create a copy?
Python is not C++, you never create copies unless you explicitly do so.
I cant make resize a member method since myobj implementation is out of reach, and I don't want to type myobj=resize(myobj,10) everywere
I don't get it? Why should be out of reach? if you have the instance, you can invoke its methods.
In general, how do I ensure that a function affects the passed object
By writing code inside the function that affects the passed-in object, instead of re-assigning to the name.
and does not create a copy?
A copy is never created unless you ask for one.
Python "variables" are names for things. They don't store objects; they refer to objects. However, unlike C++ references, they can be made to refer to something else.
When you write
def change(parameter):
parameter = 42
x = 23
change(x)
# x is still 23
The reason x is still 23 is not because a copy was made, because a copy wasn't made. The reason is that, inside the function, parameter starts out as a name for the passed-in integer object 23, and then the line parameter = 42 causes parameter to stop being a name for 23, and start being a name for 42.
If you do
def change(parameter):
parameter.append(42)
x = [23]
change(x)
# now x is [23, 42]
The passed-in parameter changes, because .append on a list changes the actual list object.
I can't make resize a member method since the myobj implementation is out of reach
That doesn't matter. When Python compiles, there is no type-checking step, and there is no step to look up the implementation of a method to insert the call. All of that is handled when the code actually runs. The code will get to the point myobj.resize(), look for a resize attribute of whatever object myobj currently refers to (after all, it can't know ahead of time even what kind of object it's dealing with; variables don't have types in Python but instead objects do), and attempt to call it (throwing the appropriate exceptions if (a) the object turns out not to have that attribute; (b) the attribute turns out not to actually be a method or other sort of function).
Also, how can I make it so that I can access those attributes in a function to which i pass the object and the attribute name? / getattr does not work either
Certainly it works if you use it properly. It is not a method; it is a built-in top-level function. Same thing with setattr.
I have the following in a Python script:
setattr(stringRESULTS, "b", b)
Which gives me the following error:
AttributeError: 'str' object has no attribute 'b'
Can any-one telling me what the problem is here?
Don't do this. To quote the inestimable Greg Hewgill,
"If you ever find yourself using quoted names to refer to variables,
there's usually a better way to do whatever you're trying to do."
[Here you're one level up and using a string variable for the name, but it's the same underlying issue.] Or as S. Lott followed up with in the same thread:
"90% of the time, you should be using a dictionary. The other 10% of
the time, you need to stop what you're doing entirely."
If you're using the contents of stringRESULTS as a pointer to some object fred which you want to setattr, then these objects you want to target must already exist somewhere, and a dictionary is the natural data structure to store them. In fact, depending on your use case, you might be able to use dictionary key/value pairs instead of attributes in the first place.
IOW, my version of what (I'm guessing) you're trying to do would probably look like
d[stringRESULTS].b = b
or
d[stringRESULTS]["b"] = b
depending on whether I wanted/needed to work with an object instance or a dictionary would suffice.
(P.S. relatively few people subscribe to the python-3.x tag. You'll usually get more attention by adding the bare 'python' tag as well.)
Since str is a low-level primitive type, you can't really set any arbitrary attribute on it. You probably need either a dict or a subclass of str:
class StringResult(str):
pass
which should behave as you expect:
my_string_result = StringResult("spam_and_eggs")
my_string_result.b = b
EDIT:
If you're trying to do what DSM suggests, ie. modify a property on a variable that has the same name as the value of the stringRESULTS variable then this should do the trick:
locals()[stringRESULTS].b = b
Please note that this is an extremely dangerous operation and can wreak all kinds of havoc on your app if you aren't careful.