I thought I had a good handle on how Python passes objects (this article seemed enlightening).
Then I tried something simple, just assigning functions to variables.
class Thingy:
def __init__(self):
self.foo = {"egg": [1], "spam": [2]}
def calc(self):
self.foo["egg"][0] = 3
self.foo["spam"][0] = 4
def egg(self):
return self.foo["egg"][0]
def spam(self):
return self.foo["spam"]
thingy = Thingy()
x = thingy.egg()
y = thingy.spam()
print(x) # prints 1
print(y[0]) # prints 2
print(thingy.foo)
thingy.calc()
print(x) # prints 1 (???)
print(y[0]) # prints 4
print(thingy.foo)
I'm not entirely sure what's going on, especially as the value in the dictionary has been updated. My guess is that when the variable x is assigned, it is actually referring to a function whose return value has been evaluated to "1" already.
Is my understanding correct? I'd appreciate a clear explanation of why Python is deciding to treat .egg() and .spam() differently.
The statements
x = thingy.egg()
y = thingy.spam()
create x as an integer and y as a list. But what you must know is that the line y = thingy.spam() is just a shallow copy.
This is how shallow copy is defined by medium :
Shallow copy is a bit-wise copy of an object. A new object is created that has an exact copy of the values in the original object. If any of the fields of the object are references to other objects, just the reference addresses are copied i.e., only the memory address is copied.
So the variable y contains the address (or reference in more layman's term) to the elements of the list, and changing the list changes it also, unlike x where a new memory location is assigned.
when you run
x = thingy.egg()
thingy.egg() return an int witch has the same value as the one in foo["egg"][0] ans is than assigned to x whereas
y = thingy.spam()
thingy.spam() returns a list containing 4 witch is the same list as foo["spam"]. This has to do with object mutability. In your calculate function the 2 integers foo["egg"][0] and foo["spam"][0] are redefined with new objects. However the lists hey are contained in remaine as the same object. As x refers to the integer it remains the old object so 1 and y alsow stays as the same object however that object is a list and its first element now refers to the new int object witch is 4
I know the is operator in Python has an unexpected behavior on immutable objects like integers and strings. See "is" operator behaves unexpectedly with integers
>>> a = 0
>>> b = 0
>>> a is b
True # Unexpected, we assigned b independently from a
When it comes to mutable objects, are we guaranteed that two variables expected (as written in the code) to reference two distinct objects (with equal value), will not be internally bound to the same object ? (Until we mutate one of the two variables, then of course the references will differ.)
>>> a = [0]
>>> b = [0]
>>> a is b
# Is False guaranteed ?
Put in other words, if somewhere x is y returns True (x and y being mutable objects), are we guaranteed that mutating x will mutate y as well ?
So long as you think some "is" behavior is "unexpected", your mental model falls short of reality ;-)
Your question is really about when Python guarantees to create a new object. And when it doesn't. For mutable objects, yes, a constructor (including a literal) yielding a mutable object always creates a new object. That's why:
>>> a = [0]
>>> b = [0]
>>> a is b
is always False. Python could have said that it's undefined whether each instance of [0] creates a new object, but it doesn't: it guarantees each instance always creates a new object. is behavior is a consequence of that, not a driver of that.
Similarly,
>>> a = set()
>>> b = set()
>>> a is b
False
is also guaranteed. Because set() returns a mutable object, it always guarantees to create a new such object.
But for immutable objects, it's not defined. For example, the result of this is not defined:
>>> a = frozenset()
>>> b = frozenset()
>>> a is b
frozenset() - like integer literals - returns an immutable object, and it's up to the implementation whether to return a new object or reuse an existing one. In this specific example, a is b is True, because the implementation du jour happens to reuse an empty frozenset. But, e.g., it just so happens that
>>> a = frozenset([3])
>>> b = frozenset([3])
>>> a is b
False
today. It could just as well return True tomorrow (although that's unlikely - while an empty frozenset is an easy-to-detect special case, it would be expensive to ensure uniqueness across all frozenset objects).
I have read that while writing functions it is good practice to copy the arguments into other variables because it is not always clear whether the variable is immutable or not. [I don't remember where so don't ask]. I have been writing functions according to this.
As I understand creating a new variable takes some overhead. It may be small but it is there. So what should be done? Should I be creating new variables or not to hold the arguments?
I have read this and this. I have confusion regarding as to why float's and int's are immutable if they can be changed this easily?
EDIT:
I am writing simple functions. I'll post example. I wrote the first one when after I read that in Python arguments should be copied and the second one after I realized by hit-and-trial that it wasn't needed.
#When I copied arguments into another variable
def zeros_in_fact(num):
'''Returns the number of zeros at the end of factorial of num'''
temp = num
if temp < 0:
return 0
fives = 0
while temp:
temp /= 5
fives += temp
return fives
#When I did not copy arguments into another variable
def zeros_in_fact(num):
'''Returns the number of zeros at the end of factorial of num'''
if num < 0:
return 0
fives = 0
while num:
num /= 5
fives += num
return fives
I think it's best to keep it simple in questions like these.
The second link in your question is a really good explanation; in summary:
Methods take parameters which, as pointed out in that explanation, are passed "by value". The parameters in functions take the value of variables passed in.
For primitive types like strings, ints, and floats, the value of the variable is a pointer (the arrows in the following diagram) to a space in memory that represents the number or string.
code | memory
|
an_int = 1 | an_int ----> 1
| ^
another_int = 1 | another_int /
When you reassign within the method, you change where the arrow points.
an_int = 2 | an_int -------> 2
| another_int --> 1
The numbers themselves don't change, and since those variables have scope only inside the functions, outside the function, the variables passed in remain the same as they were before: 1 and 1. But when you pass in a list or object, for example, you can change the values they point to outside of the function.
a_list = [1, 2, 3] | 1 2 3
| a_list ->| ^ | ^ | ^ |
| 0 2 3
a_list[0] = 0 | a_list ->| ^ | ^ | ^ |
Now, you can change where the arrows in the list, or object, point to, but the list's pointer still points to the same list as before. (There should probably actually only be one 2 and 3 in the diagram above for both sets of arrows, but the arrows would have gotten difficult to draw.)
So what does the actual code look like?
a = 5
def not_change(a):
a = 6
not_change(a)
print(a) # a is still 5 outside the function
b = [1, 2, 3]
def change(b):
b[0] = 0
print(b) # b is now [0, 2, 3] outside the function
Whether you make a copy of the lists and objects you're given (ints and strings don't matter) and thus return new variables or change the ones passed in depends on what functionality you need to provide.
What you are doing in your code examples involves no noticeable overhead, but it also doesn't accomplish anything because it won't protect you from mutable/immutable problems.
The way to think about this is that there are two kinds of things in Python: names and objects. When you do x = y you are operating on a name, attaching that name to the object y. When you do x += y or other augmented assignment operators, you also are binding a name (in addition to doing the operation you use, + in this case). Anything else that you do is operating on objects. If the objects are mutable, that may involve changing their state.
Ints and floats cannot be changed. What you can do is change what int or float a name refers to. If you do
x = 3
x = x + 4
You are not changing the int. You are changing the name x so that it now is attached to the number 7 instead of the number 3. On the other hand when you do this:
x = []
x.append(2)
You are changing the list, not just pointing the name at a new object.
The difference can be seen when you have multiple names for the same object.
>>> x = 2
>>> y = x
>>> x = x + 3 # changing the name
>>> print x
5
>>> print y # y is not affected
2
>>> x = []
>>> y = x
>>> x.append(2) # changing the object
>>> print x
[2]
>>> print y # y is affected
[2]
Mutating an object means that you alter the object itself, so that all names that point to it see the changes. If you just change a name, other names are not affected.
The second question you linked to provides more information about how this works in the context of function arguments. The augmented assignment operators (+=, *=, etc.) are a bit trickier since they operate on names but may also mutate objects at the same time. You can find other questions on StackOverflow about how this works.
If you are rebinding the name then mutability of the object it contains is irrelevant. Only if you perform mutating operations must you create a copy. (And if you read between the lines, that indirectly says "don't mutate objects passed to you".)
This question already has answers here:
Does Python make a copy of objects on assignment?
(5 answers)
How do I pass a variable by reference?
(39 answers)
Why can a function modify some arguments as perceived by the caller, but not others?
(13 answers)
Closed last month.
For a project I'm working on, I'm implementing a linked-list data-structure, which is based on the idea of a pair, which I define as:
class Pair:
def __init__(self, name, prefs, score):
self.name = name
self.score = score
self.preferences = prefs
self.next_pair = 0
self.prev_pair = 0
where self.next_pair and self.prev_pair are pointers to the previous and next links, respectively.
To set up the linked-list, I have an install function that looks like this.
def install(i, pair):
flag = 0
try:
old_pair = pair_array[i]
while old_pair.next_pair != 0:
if old_pair == pair:
#if pair in remainders: remainders.remove(pair)
return 0
if old_pair.score < pair.score:
flag = 1
if old_pair.prev_pair == 0: # we are at the beginning
old_pair.prev_pair = pair
pair.next_pair = old_pair
pair_array[i] = pair
break
else: # we are not at the beginning
pair.prev_pair = old_pair.prev_pair
pair.next_pair = old_pair
old_pair.prev_pair = pair
pair.prev_pair.next_pair = pair
break
else:
old_pair = old_pair.next_pair
if flag==0:
if old_pair == pair:
#if pair in remainders: remainders.remove(pair)
return 0
if old_pair.score < pair.score:
if old_pair.prev_pair==0:
old_pair.prev_pair = pair
pair.next_pair = old_pair
pair_array[i] = pair
else:
pair.prev_pair = old_pair.prev_pair
pair.next_pair = old_pair
old_pair.prev_pair = pair
pair.prev_pair.next_pair = pair
else:
old_pair.next_pair = pair
pair.prev_pair = old_pair
except KeyError:
pair_array[i] = pair
pair.prev_pair = 0
pair.next_pair = 0
Over the course of the program, I am building up a dictionary of these linked-lists, and taking links off of some and adding them in others. Between being pruned and re-installed, the links are stored in an intermediate array.
Over the course of debugging this program, I have come to realize that my understanding of the way Python passes arguments to functions is flawed. Consider this test case I wrote:
def test_install():
p = Pair(20000, [3, 1, 2, 50], 45)
print p.next_pair
print p.prev_pair
parse_and_get(g)
first_run()
rat = len(juggler_array)/len(circuit_array)
pref_size = get_pref_size()
print pref_size
print install(3, p)
print p.next_pair.name
print p.prev_pair
When I run this test, I get the following result.
0
0
10
None
10108
0
What I don't understand is why the second call to p.next_pair produces a different result (10108) than the first call (0). install does not return a Pair object that can overwrite the one passed in (it returns None), and it's not as though I'm passing install a pointer.
My understanding of call-by-value is that the interpreter copies the values passed into a function, leaving the caller's variables unchanged. For example, if I say
def foo(x):
x = x+1
return x
baz = 2
y = foo(baz)
print y
print baz
Then 3 and 2 should be printed, respectively. And indeed, when I test that out in the Python interpreter, that's what happens.
I'd really appreciate it if anyone can point me in the right direction here.
In Python, everything is an object. Simple assignment stores a reference to the assigned object in the assigned-to name. As a result, it is more straightforward to think of Python variables as names that are assigned to objects, rather than objects that are stored in named locations.
For example:
baz = 2
... stores in baz a pointer, or reference, to the integer object 2 which is stored elsewhere. (Since the type int is immutable, Python actually has a pool of small integers and reuses the same 2 object everywhere, but this is an implementation detail that need not concern us much.)
When you call foo(baz), foo()'s local variable x also points to the integer object 2 at first. That is, the foo()-local name x and the global name baz are names for the same object, 2. Then x = x + 1 is executed. This changes x to point to a different object: 3.
It is important to understand: x is not a box that holds 2, and 2 is then incremented to 3. No, x initially points to 2 and that pointer is then changed to point to 3. Naturally, since we did not change what object baz points to, it still points to 2.
Another way to explain it is that in Python, all argument passing is by value, but all values are references to objects.
A counter-intuitive result of this is that if an object is mutable, it can be modified through any reference and all references will "see" the change. For example, consider this:
baz = [1, 2, 3]
def foo(x):
x[0] = x[0] + 1
foo(baz)
print baz
>>> [2, 2, 3]
This seems very different from our first example. But in reality, the argument is passed the same way. foo() receives a pointer to baz under the name x and then performs an operation on it that changes it (in this case, the first element of the list is pointed to a different int object). The difference is that the name x is never pointed to a new object; it is x[0] that is modified to point to a different object. x itself still points to the same object as baz. (In fact, under the hood the assignment to x[0] becomes a method call: x.__setitem__().) Therefore baz "sees" the modification to the list. How could it not?
You don't see this behavior with integers and strings because you can't change integers or strings; they are immutable types, and when you modify them (e.g. x = x + 1) you are not actually modifying them but binding your variable name to a completely different object. If you change baz to a tuple, e.g. baz = (1, 2, 3), you will find that foo() gives you an error because you can`t assign to elements of a tuple; tuples are another immutable type. "Changing" a tuple requires creating a new one, and assignment then points the variable to the new object.
Objects of classes you define are mutable and so your Pair instance can be modified by any function it is passed into -- that is, attributes may be added, deleted, or reassigned to other objects. None of these things will re-bind any of the names pointing to your object, so all the names that currently point to it will "see" the changes.
Python does not copy anything when passing variables to a function. It is neither call-by-value nor call-by-reference, but of those two it is more similar to call-by-reference. You could think of it as "call-by-value, but the value is a reference".
If you pass a mutable object to a function, then modifying that object inside the function will affect the object everywhere it appears. (If you pass an immutable object to a function, like a string or an integer, then by definition you can't modify the object at all.)
The reason this isn't technically pass-by-reference is that you can rebind a name so that the name refers to something else entirely. (For names of immutable objects, this is the only thing you can do to them.) Rebinding a name that exists only inside a function doesn't affect any names that might exist outside the function.
In your first example with the Pair objects, you are modifying an object, so you see the effects outside of the function.
In your second example, you are not modifying any objects, you are just rebinding names to other objects (other integers in this case). baz is a name that points to an integer object (in Python, everything is an object, even integers) with a value of 2. When you pass baz to foo(x), the name x is created locally inside the foo function on the stack, and x is set to the pointer that was passed into the function -- the same pointer as baz. But x and baz are not the same thing, they only contain pointers to the same object. On the x = x+1 line, x is rebound to point to an integer object with a value of 3, and that pointer is what is returned from the function and used to bind the integer object to y.
If you rewrote your first example to explicitly create a new Pair object inside your function based on the information from the Pair object passed into it (whether this is a copy you then modify, or if you make a constructor that modifies the data on construction) then your function would not have the side-effect of modifying the object that was passed in.
Edit: By the way, in Python you shouldn't use 0 as a placeholder to mean "I don't have a value" -- use None. And likewise you shouldn't use 0 to mean False, like you seem to be doing in flag. But all of 0, None and False evaluate to False in boolean expressions, so no matter which of those you use, you can say things like if not flag instead of if flag == 0.
I suggest that you forget about implementing a linked list, and simply use an instance of a Python list. If you need something other than the default Python list, maybe you can use something from a Python module such as collections.
A Python loop to follow the links in a linked list will run at Python interpreter speed, which is to say, slowly. If you simply use the built-in list class, your list operations will happen in Python's C code, and you will gain speed.
If you need something like a list but with fast insertion and fast deletion, can you make a dict work? If there is some sort of ID value (string or integer or whatever) that can be used to impose an ordering on your values, you could just use that as a key value and gain lightning fast insert and delete of values. Then if you need to extract values in order, you can use the dict.keys() method function to get a list of key values and use that.
But if you really need linked lists, I suggest you find code written and debugged by someone else, and adapt it to your needs. Google search for "python linked list recipe" or "python linked list module".
I'm going to throw in a slightly complicating factor:
>>> def foo(x):
... x *= 2
... return x
...
Define a slightly different function using a method I know is supported for numbers, lists, and strings.
First, call it with strings:
>>> baz = "hello"
>>> y = foo(baz)
>>> y
'hellohello'
>>> baz
'hello'
Next, call it with lists:
>>> baz=[1,2,2]
>>> y = foo(baz)
>>> y
[1, 2, 2, 1, 2, 2]
>>> baz
[1, 2, 2, 1, 2, 2]
>>>
With strings, the argument isn't modified. With lists, the argument is modified.
If it were me, I'd avoid modifying arguments within methods.
Is there a way to assign references in python?
For example, in php i can do this:
$a = 10;
$b = &$a;
$a = 20;
echo $a." ".$b; // 20, 20
how can i do same thing in python?
In python, if you're doing this with non-primitive types, it acts exactly like you want: assigning is done using references. That's why, when you run the following:
>>> a = {'key' : 'value'}
>>> b = a
>>> b['key'] = 'new-value'
>>> print a['key']
you get 'new-value'.
Strictly saying, if you do the following:
>>> a = 5
>>> b = a
>>> print id(a) == id(b)
you'll get True.
But! Because of primitive types are immutable, you cant change the value of variable b itself. You are just able create a new variable with a new value, based on b. For example, if you do the following:
>>> print id(b)
>>> b = b + 1
>>> print id(b)
you'll get two different values.
This means that Python created a new variable, computed its value basing on b's value and then gave this new variable the name b. This concerns all of the immutable types. Connecting two previous examples together:
>>> a = 5
>>> b = a
>>> print id(a)==id(b)
True
>>> b += 1
>>> print id(b)==id(a)
False
So, when you assign in Python, you always assign reference. But some types cannot be changed, so when you do some changes, you actually create a new variable with another reference.
In Python, everything is by default a reference. So when you do something like:
x=[1,2,3]
y=x
x[1]=-1
print y
It prints [1,-1,3].
The reason this does not work when you do
x=1
y=x
x=-1
print y
is that ints are immutable. They cannot be changed. Think about it, does a number really ever change? When you assign a new value to x, you are assigning a new value - not changing the old one. So y still points to the old one. Other immutable types (e.g. strings and tuples) behave in the same way.