I've been learning python for some time, but it keeps suprising me.
I have following code:
def update_list(input_list):
input_list.append(len(input_list))
input_list[0] = 11
return input_list
def update_string(input_string):
input_string = 'NEW'
return input_string
my_list = [0,1,2]
print my_list
print update_list(my_list)
print my_list
my_string = 'OLD'
print my_string
print update_string(my_string)
print my_string
This code provides following output:
[0, 1, 2]
[11, 1, 2, 3]
[11, 1, 2, 3]
OLD
NEW
OLD
Why variable my_list is modified without attribution, and my_string value stays the same after update_string() function? I don't understand that mechanism, can you explain it to me?
There is nothing different about the behaviour of functions. What is different is that in one of them you rebound the name:
input_string = 'NEW'
This sets the name input_string to a new object. In the other function you make no assignments to a name. You only call a method on the object, and assign to indices on the object. This happens to alter the object contents:
input_list.append(len(input_list))
input_list[0] = 11
Note that assigning to an index is not the same as assigning to a name. You could assign the list object to another name first, then do the index assignment separately, and nothing would change:
_temp = input_list
_temp[0] = 11
because assigning to an index alters one element contained in the list, not the name that you used to reference the list.
Had you assigned directly to input_list, you'd have seen the same behaviour:
input_list = []
input_list.append(len(input_list))
input_list[0] = 11
You can do this outside a function too:
>>> a_str = 'OLD'
>>> b_str = a_str
>>> b_str = 'NEW'
>>> a_list = ['foo', 'bar', 'baz']
>>> b_list = a_list
>>> b_list.append('NEW')
>>> b_list[0] = 11
>>> a_str
'OLD'
>>> b_str
'NEW'
>>> a_list
[11, 'bar', 'baz', 'NEW']
>>> b_list
[11, 'bar', 'baz', 'NEW']
The initial assignments to b_str and b_list is exactly what happens when you call a function; the arguments of the function are assigned the values you passed to the function. Assignments do not create a copy, they create additional references to the object.
If you wanted to pass in a copy of the list object, do so by creating a copy:
new_list = old_list[:] # slicing from start to end creates a shallow copy
Related
This question already has answers here:
Why can a function modify some arguments as perceived by the caller, but not others?
(13 answers)
How do I pass a variable by reference?
(39 answers)
Closed 1 year ago.
I have a list which I give to a function as argument.
Inside function, I'm altering the list, but I don't return the list using return.
I'm running the function and print the list afterwards.
Excpeted result: []
Actual result: [5]
How is this possible?
def foo(bar):
for i in range(2):
if bar == []:
bar.append(4)
else:
bar[0] += 1
myvar = []
foo(myvar)
print (myvar)
One thing to know about lists is that they contain references (pointers) to list objects rather than the list itself. This can be seen by running the following code:
>>> list1 = [0, 1, 2, 3]
>>> list2 = list1
>>> list1[0] = 5
>>> list2
[5, 1, 2, 3]
Here, list1 stores a reference to the list [0, 1, 2, 3], and list2 is given that same reference. When the list is modified through the reference in list1, the list referenced by list2 is also modified.
>>> def foo(bar):
for i in range(2):
if bar == []:
bar.append(4)
else:
bar[0] += 1
>>> myvar = []
>>> foo(myvar)
>>> myvar
[5]
Passing the list myvar to the function foo() as the argument bar assigns a pointer to the list [] to the parameter bar. Since myvar and bar reference the same list, when the list is modified through bar, retrieving the list through myvar reflects the same change.
Hopefully this clarifies any confusion!
Because empty list created first in the memory location and second time it refers same list instead of creating new one.
first bar =[] is empty.
for loop checks if bar is empty, add 4 to it.
Now bar = [4] so again it checks for bar == [] but now it's value is 4. So it enters into else part and add 1.
Now value of bar becomes 5.
During this process it bar refers to same memory location.
Below example will help you to understand it better way,
def foo (bar = []):
bar.append('baz')
return bar
foo()
['baz']
foo()
['baz', 'baz']
foo()
['baz', 'baz', 'baz']
It keep appending default value 'baz' to the existing list each time foo() was called.
This is because, the default value for a function argument is only evaluated once, at the time of function is defined.
Thus the bar argument is initialized to its default(i.e. an empty list) only when foo() is first defines, but then call to foo() will continue to use the same list to which bar was originally initialized.
After I apply an operation to a list, I would like to get access to both the modified list and the original one.
Somehow I am not able to.
In the following code snippet, I define two functions with which I modify the original list.
Afterwards, I get my values from a class and apply the transformation.
def get_min_by_col(li, col): # get minimum from list
return min(li, key=lambda x: x[col - 1])[col - 1]
def hashCluster(coords): # transform to origin
min_row = get_min_by_col(coords,0)
min_col = get_min_by_col(coords,1)
for pix in coords:
pix[1] = pix[1] - min_row
pix[0] = pix[0] - min_col
return (coords)
pixCoords = hashCoords = originalPixCoords = [] # making sure they are empty
for j in dm.getPixelsForCluster(dm.clusters[i]):
pixCoords.append([j['m_column'], j['m_row']]) # getting some values from a class -- ex: [[613, 265], [613, 266]] or [[615, 341], [615, 342], [616, 341], [616, 342]]
originalPixCoords = pixCoords.copy() # just to be safe, I make a copy of the original list
print ('Original : ', originalPixCoords)
hashCoords = hashCluster(pixCoords) # apply transformation
print ('Modified : ', hashCoords)
print ('Original : ', originalPixCoords) # should get the original list
Some results [Jupyter Notebook]:
Original : [[607, 268]]
Modified : [[0, 0]]
Original : [[0, 0]]
Original : [[602, 264], [603, 264]]
Modified : [[0, 0], [1, 0]]
Original : [[0, 0], [1, 0]]
Original : [[613, 265], [613, 266]]
Modified : [[0, 0], [0, 1]]
Original : [[0, 0], [0, 1]]
Is the function hashCluster able to modify the new list as well? Even after the .copy()?
What am I doing wrong? My goal is to have access to both the original and modified lists, with as less operations and copies of lists as possible (since I am looping over a very large document).
You have a list of lists, and are modifying the inner lists. The operation pixCoords.copy() creates a shallow copy of the outer list. Both pixCoords and originalPixCoords now have two list buffers pointing to the same mutable objects. There are two ways to handle this situation, each with its own pros and cons.
The knee-jerk method that most users seem to have is to make a deep copy:
originalPixCoords = copy.deepcopy(pixCoords)
I would argue that this method is the less pythonic and more error prone approach. A better solution would be to make hashCluster actually return a new list. By doing that, you will make it treat the input as immutable, and eliminate the problem entirely. I consider this more pythonic because it reduces the maintenance burden. Also, conventionally, python functions that return a value create a new list without modifying the input while in-place operations generally don't return a value.
def hashCluster(coords):
min_row = get_min_by_col(coords, 0)
min_col = get_min_by_col(coords, 1)
return [[pix[0] - min_col, pix[1] - min_row] for pix in coords]
use
import copy
OriginalPixCoords= copy.deepcopy(pixCoords)
What you're using is a shallow copy. It effectively means you created a new list and just pointed to the old memory spaces. Meaning if those object got modified, your new list will still reflect those updates since they occurred in the same memory space.
>>> # Shallow Copy
>>> mylist = []
>>> mylist.append({"key": "original"})
>>> mynewlist = mylist.copy()
>>> mynewlist
[{'key': 'original'}]
>>> mylist[0]["key"] = "new value"
>>> mylist
[{'key': 'new value'}]
>>> mynewlist
[{'key': 'new value'}]
>>> # Now Deep Copy
>>> mylist = []
>>> mylist.append({"key": "original"})
>>> from copy import deepcopy
>>> mynewlist = deepcopy(mylist)
>>> mynewlist
[{'key': 'original'}]
>>> mylist[0]["key"] = "new value"
>>> mylist
[{'key': 'new value'}]
>>> mynewlist
[{'key': 'original'}]
Another similar question: What is the difference between shallow copy, deepcopy and normal assignment operation?
Settings multiple variables equal to the same value is the equivalent of a pointer in Python.
Check this out
a = b = [1,2,3]
a == b # True
a is b # True (same memory location)
b[1] = 3
print(b) # [1,3,3]
print(a) #[1,3,3]
Right now, you are creating shallow copies. If you need both copies (with different values and data history), you can simply assign the variables in the following manner:
import copy
original = data
original_copy = copy.deepcopy(data)
original_copy == original == data # True
original_copy is original # False
original_copy[0] = 4
original_copy == original # False
Let's say we have the following list and we are creating an iterator for it:
lst = [1,2,3]
itr = iter(lst)
Next lets say we are changing our list with completely different values:
lst = ['a', 'b', 'c']
And if I we run the following loop:
for x in itr:
print x
We will get '1,2,3'. But why? As far as I understand, iterator doesn't copy all values from iterating object. At least iterator for list from three elements has the same size as a list of 100000 elements. sys.getsizeof(i) returns 64. How can iterator be so small by size and keep 'old' values of list?
The iterator itself contains a reference to the list. Since lst is rebound instead of mutated, this reference does not change.
>>> lst = [1, 2, 3]
>>> itr = iter(lst)
>>> lst[:] = ['a', 'b', 'c']
>>> for x in itr:
... print x
...
a
b
c
The iterator references a list object not a name. So reassigning the name lst to another object does not affect the iterator in anyway; names are bound to objects, and refer to objects, but the names are not the object themselves.
You can get a snoop of the object the iterator is referencing with gc.get_referents:
>>> import gc
>>> lst = [1,2,3]
>>> itr = iter(lst) # return an iterator for the list
>>> lst = ['a', 'b', 'c'] # Bind name lst to another object
>>> gc.get_referents(itr)[0]
[1, 2, 3]
As you'll notice, the iterator is still referring to the first list object.
The following reference will help you learn more about names and binding in Python:
Execution model - Naming and binding
Welcome to Python's object reference system. The variable names do not really have a deep relationship with the actual object stored in memory.
lst = [1, 2, 3]
itr = iter(lst) # iter object now points to the list pointed to by lst
print(next(itr)) # prints 1
# Both `lst` and `lst1` now refer to the same list
lst1 = lst
# `lst` now points to a new list, while `lst1` still points to the original list.
lst = ['a', 'b', 'c']
print(next(itr)) # prints 2
lst.append(4)
lst1.append(5) # here the list pointed to by `itr` is updated
for i in itr:
print(i) # prints 3, 5
TL;DR: Python variable names are just tags, that refer to some object in space.
When you call iter on the list named lst, the iterator object points to the actual object, and not the name lst.
If you can modify the original object, by calling append, extend, pop, remove, etc, the iterator's output will be affected. But when you assign a new value to lst, a new object is created (if it didn't previously exist), and lst simply starts pointing to that new object.
The garbage collector will delete the original object if no other object is pointing to it (itr is pointing to it in this case, so the original object won't be deleted yet).
http://foobarnbaz.com/2012/07/08/understanding-python-variables/
Extra:
lst1.extend([6, 7, 8])
next(itr) # raises StopIteration
This doesn't have anything to do with object referencing, the iterator just stores internally that it has iterated the complete list.
I see at many places the use of slice assignment for lists. I am able to understand its use when used with (non-default) indices, but I am not able to understand its use like:
a_list[:] = ['foo', 'bar']
How is that different from
a_list = ['foo', 'bar']
?
a_list = ['foo', 'bar']
Creates a new list in memory and points the name a_list at it. It is irrelevant what a_list pointed at before.
a_list[:] = ['foo', 'bar']
Calls the __setitem__ method of the a_list object with a slice as the index, and a new list created in memory as the value.
__setitem__ evaluates the slice to figure out what indexes it represents, and calls iter on the value it was passed. It then iterates over the object, setting each index within the range specified by the slice to the next value from the object. For lists, if the range specified by the slice is not the same length as the iterable, the list is resized. This allows you to do a number of interesting things, like delete sections of a list:
a_list[:] = [] # deletes all the items in the list, equivalent to 'del a_list[:]'
or inserting new values in the middle of a list:
a_list[1:1] = [1, 2, 3] # inserts the new values at index 1 in the list
However, with "extended slices", where the step is not one, the iterable must be the correct length:
>>> lst = [1, 2, 3]
>>> lst[::2] = []
Traceback (most recent call last):
File "<interactive input>", line 1, in <module>
ValueError: attempt to assign sequence of size 0 to extended slice of size 2
The main things that are different about slice assignment to a_list are:
a_list must already point to an object
That object is modified, instead of pointing a_list at a new object
That object must support __setitem__ with a slice index
The object on the right must support iteration
No name is pointed at the object on the right. If there are no other references to it (such as when it is a literal as in your example), it will be reference counted out of existence after the iteration is complete.
The difference is quite huge! In
a_list[:] = ['foo', 'bar']
You modify a existing list that was bound to the name a_list. On the other hand,
a_list = ['foo', 'bar']
assigns a new list to the name a_list.
Maybe this will help:
a = a_list = ['foo', 'bar'] # another name for the same list
a_list = ['x', 'y'] # reassigns the name a_list
print a # still the original list
a = a_list = ['foo', 'bar']
a_list[:] = ['x', 'y'] # changes the existing list bound to a
print a # a changed too since you changed the object
By assigning to a_list[:], a_list still reference to the same list object, with contents modified. By assigning to a_list, a_list now reference to a new list object.
Check out its id:
>>> a_list = []
>>> id(a_list)
32092040
>>> a_list[:] = ['foo', 'bar']
>>> id(a_list)
32092040
>>> a_list = ['foo', 'bar']
>>> id(a_list)
35465096
As you can see, its id doens't change with the slice assignment version.
The different between the two could result quite different result, for instance, when the list is a parameter of function:
def foo(a_list):
a_list[:] = ['foo', 'bar']
a = ['original']
foo(a)
print(a)
With this, a is modified as well, but if a_list = ['foo', 'bar'] were used instead, a remains its original value.
a_list = ['foo', 'bar']
a=a_list[:] # by this you get an exact copy of a_list
print(a)
a=[1,2,3] # even if you modify a it will not affect a_list
print(a)
print(a_list)
I occasionally see the list slice syntax used in Python code like this:
newList = oldList[:]
Surely this is just the same as:
newList = oldList
Or am I missing something?
[:] Shallow copies the list, making a copy of the list structure containing references to the original list members. This means that operations on the copy do not affect the structure of the original. However, if you do something to the list members, both lists still refer to them, so the updates will show up if the members are accessed through the original.
A Deep Copy would make copies of all the list members as well.
The code snippet below shows a shallow copy in action.
# ================================================================
# === ShallowCopy.py =============================================
# ================================================================
#
class Foo:
def __init__(self, data):
self._data = data
aa = Foo ('aaa')
bb = Foo ('bbb')
# The initial list has two elements containing 'aaa' and 'bbb'
OldList = [aa,bb]
print OldList[0]._data
# The shallow copy makes a new list pointing to the old elements
NewList = OldList[:]
print NewList[0]._data
# Updating one of the elements through the new list sees the
# change reflected when you access that element through the
# old list.
NewList[0]._data = 'xxx'
print OldList[0]._data
# Updating the new list to point to something new is not reflected
# in the old list.
NewList[0] = Foo ('ccc')
print NewList[0]._data
print OldList[0]._data
Running it in a python shell gives the following transcript. We can see the
list being made with copies of the old objects. One of the objects can have
its state updated by reference through the old list, and the updates can be
seen when the object is accessed through the old list. Finally, changing a
reference in the new list can be seen to not reflect in the old list, as the
new list is now referring to a different object.
>>> # ================================================================
... # === ShallowCopy.py =============================================
... # ================================================================
... #
... class Foo:
... def __init__(self, data):
... self._data = data
...
>>> aa = Foo ('aaa')
>>> bb = Foo ('bbb')
>>>
>>> # The initial list has two elements containing 'aaa' and 'bbb'
... OldList = [aa,bb]
>>> print OldList[0]._data
aaa
>>>
>>> # The shallow copy makes a new list pointing to the old elements
... NewList = OldList[:]
>>> print NewList[0]._data
aaa
>>>
>>> # Updating one of the elements through the new list sees the
... # change reflected when you access that element through the
... # old list.
... NewList[0]._data = 'xxx'
>>> print OldList[0]._data
xxx
>>>
>>> # Updating the new list to point to something new is not reflected
... # in the old list.
... NewList[0] = Foo ('ccc')
>>> print NewList[0]._data
ccc
>>> print OldList[0]._data
xxx
Like NXC said, Python variable names actually point to an object, and not a specific spot in memory.
newList = oldList would create two different variables that point to the same object, therefore, changing oldList would also change newList.
However, when you do newList = oldList[:], it "slices" the list, and creates a new list. The default values for [:] are 0 and the end of the list, so it copies everything. Therefore, it creates a new list with all the data contained in the first one, but both can be altered without changing the other.
As it has already been answered, I'll simply add a simple demonstration:
>>> a = [1, 2, 3, 4]
>>> b = a
>>> c = a[:]
>>> b[2] = 10
>>> c[3] = 20
>>> a
[1, 2, 10, 4]
>>> b
[1, 2, 10, 4]
>>> c
[1, 2, 3, 20]
Never think that 'a = b' in Python means 'copy b to a'. If there are variables on both sides, you can't really know that. Instead, think of it as 'give b the additional name a'.
If b is an immutable object (like a number, tuple or a string), then yes, the effect is that you get a copy. But that's because when you deal with immutables (which maybe should have been called read only, unchangeable or WORM) you always get a copy, by definition.
If b is a mutable, you always have to do something extra to be sure you have a true copy. Always. With lists, it's as simple as a slice: a = b[:].
Mutability is also the reason that this:
def myfunction(mylist=[]):
pass
... doesn't quite do what you think it does.
If you're from a C-background: what's left of the '=' is a pointer, always. All variables are pointers, always. If you put variables in a list: a = [b, c], you've put pointers to the values pointed to by b and c in a list pointed to by a. If you then set a[0] = d, the pointer in position 0 is now pointing to whatever d points to.
See also the copy-module: http://docs.python.org/library/copy.html
Shallow Copy: (copies chunks of memory from one location to another)
a = ['one','two','three']
b = a[:]
b[1] = 2
print id(a), a #Output: 1077248300 ['one', 'two', 'three']
print id(b), b #Output: 1077248908 ['one', 2, 'three']
Deep Copy: (Copies object reference)
a = ['one','two','three']
b = a
b[1] = 2
print id(a), a #Output: 1077248300 ['one', 2, 'three']
print id(b), b #Output: 1077248300 ['one', 2, 'three']