After I apply an operation to a list, I would like to get access to both the modified list and the original one.
Somehow I am not able to.
In the following code snippet, I define two functions with which I modify the original list.
Afterwards, I get my values from a class and apply the transformation.
def get_min_by_col(li, col): # get minimum from list
return min(li, key=lambda x: x[col - 1])[col - 1]
def hashCluster(coords): # transform to origin
min_row = get_min_by_col(coords,0)
min_col = get_min_by_col(coords,1)
for pix in coords:
pix[1] = pix[1] - min_row
pix[0] = pix[0] - min_col
return (coords)
pixCoords = hashCoords = originalPixCoords = [] # making sure they are empty
for j in dm.getPixelsForCluster(dm.clusters[i]):
pixCoords.append([j['m_column'], j['m_row']]) # getting some values from a class -- ex: [[613, 265], [613, 266]] or [[615, 341], [615, 342], [616, 341], [616, 342]]
originalPixCoords = pixCoords.copy() # just to be safe, I make a copy of the original list
print ('Original : ', originalPixCoords)
hashCoords = hashCluster(pixCoords) # apply transformation
print ('Modified : ', hashCoords)
print ('Original : ', originalPixCoords) # should get the original list
Some results [Jupyter Notebook]:
Original : [[607, 268]]
Modified : [[0, 0]]
Original : [[0, 0]]
Original : [[602, 264], [603, 264]]
Modified : [[0, 0], [1, 0]]
Original : [[0, 0], [1, 0]]
Original : [[613, 265], [613, 266]]
Modified : [[0, 0], [0, 1]]
Original : [[0, 0], [0, 1]]
Is the function hashCluster able to modify the new list as well? Even after the .copy()?
What am I doing wrong? My goal is to have access to both the original and modified lists, with as less operations and copies of lists as possible (since I am looping over a very large document).
You have a list of lists, and are modifying the inner lists. The operation pixCoords.copy() creates a shallow copy of the outer list. Both pixCoords and originalPixCoords now have two list buffers pointing to the same mutable objects. There are two ways to handle this situation, each with its own pros and cons.
The knee-jerk method that most users seem to have is to make a deep copy:
originalPixCoords = copy.deepcopy(pixCoords)
I would argue that this method is the less pythonic and more error prone approach. A better solution would be to make hashCluster actually return a new list. By doing that, you will make it treat the input as immutable, and eliminate the problem entirely. I consider this more pythonic because it reduces the maintenance burden. Also, conventionally, python functions that return a value create a new list without modifying the input while in-place operations generally don't return a value.
def hashCluster(coords):
min_row = get_min_by_col(coords, 0)
min_col = get_min_by_col(coords, 1)
return [[pix[0] - min_col, pix[1] - min_row] for pix in coords]
use
import copy
OriginalPixCoords= copy.deepcopy(pixCoords)
What you're using is a shallow copy. It effectively means you created a new list and just pointed to the old memory spaces. Meaning if those object got modified, your new list will still reflect those updates since they occurred in the same memory space.
>>> # Shallow Copy
>>> mylist = []
>>> mylist.append({"key": "original"})
>>> mynewlist = mylist.copy()
>>> mynewlist
[{'key': 'original'}]
>>> mylist[0]["key"] = "new value"
>>> mylist
[{'key': 'new value'}]
>>> mynewlist
[{'key': 'new value'}]
>>> # Now Deep Copy
>>> mylist = []
>>> mylist.append({"key": "original"})
>>> from copy import deepcopy
>>> mynewlist = deepcopy(mylist)
>>> mynewlist
[{'key': 'original'}]
>>> mylist[0]["key"] = "new value"
>>> mylist
[{'key': 'new value'}]
>>> mynewlist
[{'key': 'original'}]
Another similar question: What is the difference between shallow copy, deepcopy and normal assignment operation?
Settings multiple variables equal to the same value is the equivalent of a pointer in Python.
Check this out
a = b = [1,2,3]
a == b # True
a is b # True (same memory location)
b[1] = 3
print(b) # [1,3,3]
print(a) #[1,3,3]
Right now, you are creating shallow copies. If you need both copies (with different values and data history), you can simply assign the variables in the following manner:
import copy
original = data
original_copy = copy.deepcopy(data)
original_copy == original == data # True
original_copy is original # False
original_copy[0] = 4
original_copy == original # False
Related
I have written the python code in the following form
temp=[]
x=[1,2,3]
for i in range(4):
temp=temp+[x]
temp[1][1]=x[1]+1
print temp[0]
print temp[1]
Here, I wanted the value of just temp[1][1], but the value of temp[0][1] also gets changed. Is there a way of changing just one value? I created a new list and tried to add it to temp, but that does not seem to work as well.
Update:
Thanks, but it did not seem to work in my case (which was a multi dimensional array). I have the code has follows:
tempList=[]
for i in range(openList[0].hx):
tempList=tempList+[copy.copy(abc)]
tempList[0][0][0]=123
print sudokuList
Here abc is a two dimensional list. Modifying the value of tempList[0][0][0] changes the value of tempList[1][0][0] and so on.
That's because of that you are assigning the x to all of your list items so all of them are references to one object and once you change one on them actually you have changed all of them. for getting ride of this problem you can use a list comprehension to define the temp list :
temp=[[1,2,3] for _ in range(4)]
temp[1][1]=7
print temp[0]
print temp[1]
result :
[1, 2, 3]
[1, 7, 3]
This is actually a common error for beginners to Python: How to clone or copy a list?
When you add x to temp four times, you're creating a temp which has the same x four number of times.
So, temp[0], temp[2], temp[3] and temp[4] are all pointing to the same x you declared at the first line.
Just make a copy when adding:
temp=[]
x=[1,2,3]
for i in range(4):
temp=temp.append(x[:])
temp[1][1]=x[1]+1
print temp[0]
print temp[1]
You can see it with id function, which returns a different value for different objects:
>>> temp=[]
>>> x=[1,2,3]
>>> for i in range(4):
... temp=temp+[x]
...
>>> id(temp[0]), id(temp[1])
(4301992880, 4301992880) # they're the same
>>> temp=[]
>>> x=[1,2,3]
>>> for i in range(4):
... temp=temp+[x[:]]
...
>>> id(temp[0]), id(temp[1])
(4301992088, 4302183024) # now they are not
Try, the following. x in for loop is a reference to the original x and not a copy. Because of this reference, changing any element reflects on all objects. So you would need to make a copy as used in following snippet.
temp=[]
x=[1,2,3]
for i in range(4):
temp=temp+[x[:]]
temp[1][1]=x[1]+1
print temp[0]
print temp[1]
----EDIT----
As per your comment, use copy.deepcopy to copy the list. deepcopy would recursively copy all the referenced elements inside the list. Check copy.deepcopy. So the code looks like:-
import copy
temp=[]
x=[1,2,3]
for i in range(4):
x_copy = copy.deepcopy(x)
#do something with x_copy. use this inplace of x in your code.
#will work for 1D or 2D or any other higher order lists.
See my code in python 3.4. I can get around it fine. It bugs me a little. I'm guessing it's something to do with foo2 resetting a rather than treating it as list 1.
def foo1(a):
a.append(3) ### add element 3 to end of list
return()
def foo2(a):
a=a+[3] #### add element 3 to end of list
return()
list1=[1,2]
foo1(list1)
print(list1) ### shows [1,2,3]
list1=[1,2]
foo2(list1)
print(list1) #### shows [1,2]
In foo2 you do not mutate the original list referred to by a - instead, you create a new list from list1 and [3], and bind the result which is a new list to the local name a. So list1 is not changed at all.
There is a difference between append and +=
>>> a = []
>>> id(a)
11814312
>>> a.append("hello")
>>> id(a)
11814312
>>> b = []
>>> id(b)
11828720
>>> c = b + ["hello"]
>>> id(c)
11833752
>>> b += ["hello"]
>>> id(b)
11828720
As you can see, append and += have the same result; they add the item to the list, without producing a new list. Using + adds the two lists and produces a new list.
In the first example, you're using a method that modifies a in-place. In the second example, you're making a new a that replaces the old a but without modifying the old a - that's usually what happens when you use the = to assign a new value. One exception is when you use slicing notation on the left-hand side: a[:] = a + [3] would work as your first example did.
This question already has answers here:
How do I clone a list so that it doesn't change unexpectedly after assignment?
(24 answers)
Closed 5 months ago.
I want to pass a list into function by value.
By default, lists and other complex objects passed to function by reference.
Here is some desision:
def add_at_rank(ad, rank):
result_ = copy.copy(ad)
.. do something with result_
return result_
Can this be written shorter?
In other words, I wanna not to change ad.
You can use [:], but for list containing lists(or other mutable objects) you should go for copy.deepcopy():
lis[:] is equivalent to list(lis) or copy.copy(lis), and returns a shallow copy of the list.
In [33]: def func(lis):
print id(lis)
....:
In [34]: lis = [1,2,3]
In [35]: id(lis)
Out[35]: 158354604
In [36]: func(lis[:])
158065836
When to use deepcopy():
In [41]: lis = [range(3), list('abc')]
In [42]: id(lis)
Out[42]: 158066124
In [44]: lis1=lis[:]
In [45]: id(lis1)
Out[45]: 158499244 # different than lis, but the inner lists are still same
In [46]: [id(x) for x in lis1] = =[id(y) for y in lis]
Out[46]: True
In [47]: lis2 = copy.deepcopy(lis)
In [48]: [id(x) for x in lis2] == [id(y) for y in lis]
Out[48]: False
This might be an interesting use case for a decorator function. Something like this:
def pass_by_value(f):
def _f(*args, **kwargs):
args_copied = copy.deepcopy(args)
kwargs_copied = copy.deepcopy(kwargs)
return f(*args_copied, **kwargs_copied)
return _f
pass_by_value takes a function f as input and creates a new function _f that deep-copies all its parameters and then passes them to the original function f.
Usage:
#pass_by_value
def add_at_rank(ad, rank):
ad.append(4)
rank[3] = "bar"
print "inside function", ad, rank
a, r = [1,2,3], {1: "foo"}
add_at_rank(a, r)
print "outside function", a, r
Output:
"inside function [1, 2, 3, 4] {1: 'foo', 3: 'bar'}"
"outside function [1, 2, 3] {1: 'foo'}"
A shallow copy is usually good enough, and potentially mush faster than deep copy.
You can take advantage of this if the modifications you are making to result_ are not mutating the items/attributes it contains.
For a simple example if you have a chessboard
board = [[' ']*8 for x in range(8)]
You could make a shallow copy
board2 = copy.copy(board)
It's safe to append/insert/pop/delete/replace items from board2, but not the lists it contains. If you want to modify one of the contianed lists you must create a new list and replace the existing one
row = list(board2[2])
row[3] = 'K'
board2[2] = row
It's a little more work, but a lot more efficient in time and storage
In case of ad is list you can simple call your function as add_at_rank(ad + [], rank).
This will create NEW instance of list every time you call function, that value equivalented of ad.
>>>ad == ad + []
True
>>>ad is ad +[]
False
Pure pythonic :)
Simplified version of my code:
sequence = [['WT_1', 'AAAAAAAA'], ['WT_2', 'BBBBBBB']]
def speciate(sequence):
lineage_1 = []
lineage_2 = []
for i in sequence:
lineage_1.append(i)
for k in sequence:
lineage_2.append(k)
lineage_1[0][0] = 'L1_A'
lineage_1[1][0] = 'L1_B'
lineage_2[0][0] = 'L2_A'
lineage_2[1][0] = 'L2_B'
print lineage_1
print lineage_2
speciate(sequence)
outputs:
[['L2_A', 'AAAAAAAA'], ['L2_B','BBBBBBB']]
[['L2_A','AAAAAAAA'], ['L2_B','BBBBBBB']]
when I would expect to get this:
[['L1_A', 'AAAAAAAA'], ['L1_B','BBBBBBB']]
[['L2_A','AAAAAAAA'], ['L2_B','BBBBBBB']]
Does anybody know what the problem is?
You have to make a deep copy (or shallow copy suffices in this case) when you append. Else lineage_1[0][0] and lineage_2[0][0] reference the same object.
from copy import deepcopy
for i in sequence:
lineage_1.append(deepcopy(i))
for k in sequence:
lineage_2.append(deepcopy(k))
See also: http://docs.python.org/library/copy.html
You are appending list objects in your for-loops -- the same list object (sequence[0]).
So when you modify the first element of that list:
lineage_1[0][0] = 'L1_A'
lineage_1[1][0] = 'L1_B'
lineage_2[0][0] = 'L2_A'
lineage_2[1][0] = 'L2_B'
you're seeing it show up as modified in both the lineage_X lists that contain copies of the list that is in sequence[0].
Do something like:
import copy
for i in sequence:
lineage_1.append(copy.copy(i))
for k in sequence:
lineage_2.append(copy.copy(k))
this will make copies of the sublists of sequence so that you don't have this aliasing issue. (If the real code has deeper nesting, you can use copy.deepcopy instead of copy.copy.)
Consider this simple example:
>>> aa = [1, 2, 3]
>>> bb = aa
>>> bb[0] = 999
>>> aa
[999, 2, 3]
What happened here?
"Names" like aa and bb simply reference the list, the same list. Hence when you change the list through bb, aa sees it as well. Using id shows this in action:
>>> id(aa)
32343984
>>> id(bb)
32343984
Now, this is exactly what happens in your code:
for i in sequence:
lineage_1.append(i)
for k in sequence:
lineage_2.append(k)
You append references to the same lists to lineage_1 and lineage_2.
I occasionally see the list slice syntax used in Python code like this:
newList = oldList[:]
Surely this is just the same as:
newList = oldList
Or am I missing something?
[:] Shallow copies the list, making a copy of the list structure containing references to the original list members. This means that operations on the copy do not affect the structure of the original. However, if you do something to the list members, both lists still refer to them, so the updates will show up if the members are accessed through the original.
A Deep Copy would make copies of all the list members as well.
The code snippet below shows a shallow copy in action.
# ================================================================
# === ShallowCopy.py =============================================
# ================================================================
#
class Foo:
def __init__(self, data):
self._data = data
aa = Foo ('aaa')
bb = Foo ('bbb')
# The initial list has two elements containing 'aaa' and 'bbb'
OldList = [aa,bb]
print OldList[0]._data
# The shallow copy makes a new list pointing to the old elements
NewList = OldList[:]
print NewList[0]._data
# Updating one of the elements through the new list sees the
# change reflected when you access that element through the
# old list.
NewList[0]._data = 'xxx'
print OldList[0]._data
# Updating the new list to point to something new is not reflected
# in the old list.
NewList[0] = Foo ('ccc')
print NewList[0]._data
print OldList[0]._data
Running it in a python shell gives the following transcript. We can see the
list being made with copies of the old objects. One of the objects can have
its state updated by reference through the old list, and the updates can be
seen when the object is accessed through the old list. Finally, changing a
reference in the new list can be seen to not reflect in the old list, as the
new list is now referring to a different object.
>>> # ================================================================
... # === ShallowCopy.py =============================================
... # ================================================================
... #
... class Foo:
... def __init__(self, data):
... self._data = data
...
>>> aa = Foo ('aaa')
>>> bb = Foo ('bbb')
>>>
>>> # The initial list has two elements containing 'aaa' and 'bbb'
... OldList = [aa,bb]
>>> print OldList[0]._data
aaa
>>>
>>> # The shallow copy makes a new list pointing to the old elements
... NewList = OldList[:]
>>> print NewList[0]._data
aaa
>>>
>>> # Updating one of the elements through the new list sees the
... # change reflected when you access that element through the
... # old list.
... NewList[0]._data = 'xxx'
>>> print OldList[0]._data
xxx
>>>
>>> # Updating the new list to point to something new is not reflected
... # in the old list.
... NewList[0] = Foo ('ccc')
>>> print NewList[0]._data
ccc
>>> print OldList[0]._data
xxx
Like NXC said, Python variable names actually point to an object, and not a specific spot in memory.
newList = oldList would create two different variables that point to the same object, therefore, changing oldList would also change newList.
However, when you do newList = oldList[:], it "slices" the list, and creates a new list. The default values for [:] are 0 and the end of the list, so it copies everything. Therefore, it creates a new list with all the data contained in the first one, but both can be altered without changing the other.
As it has already been answered, I'll simply add a simple demonstration:
>>> a = [1, 2, 3, 4]
>>> b = a
>>> c = a[:]
>>> b[2] = 10
>>> c[3] = 20
>>> a
[1, 2, 10, 4]
>>> b
[1, 2, 10, 4]
>>> c
[1, 2, 3, 20]
Never think that 'a = b' in Python means 'copy b to a'. If there are variables on both sides, you can't really know that. Instead, think of it as 'give b the additional name a'.
If b is an immutable object (like a number, tuple or a string), then yes, the effect is that you get a copy. But that's because when you deal with immutables (which maybe should have been called read only, unchangeable or WORM) you always get a copy, by definition.
If b is a mutable, you always have to do something extra to be sure you have a true copy. Always. With lists, it's as simple as a slice: a = b[:].
Mutability is also the reason that this:
def myfunction(mylist=[]):
pass
... doesn't quite do what you think it does.
If you're from a C-background: what's left of the '=' is a pointer, always. All variables are pointers, always. If you put variables in a list: a = [b, c], you've put pointers to the values pointed to by b and c in a list pointed to by a. If you then set a[0] = d, the pointer in position 0 is now pointing to whatever d points to.
See also the copy-module: http://docs.python.org/library/copy.html
Shallow Copy: (copies chunks of memory from one location to another)
a = ['one','two','three']
b = a[:]
b[1] = 2
print id(a), a #Output: 1077248300 ['one', 'two', 'three']
print id(b), b #Output: 1077248908 ['one', 2, 'three']
Deep Copy: (Copies object reference)
a = ['one','two','three']
b = a
b[1] = 2
print id(a), a #Output: 1077248300 ['one', 2, 'three']
print id(b), b #Output: 1077248300 ['one', 2, 'three']