Preventing reference re-use during deepcopy - python

Consider the following example:
from copy import deepcopy
item = [0]
orig = [item, item]
copy = deepcopy(orig)
orig[0][0] = 1
print(f"{orig=} {copy=}")
copy[0][0] = 2
print(f"{orig=} {copy=}")
The first print outputs what I would expect because the same reference is duplicated in the list.
orig=[[1], [1]] copy=[[0], [0]]
However, the second print surprised me.
orig=[[1], [1]] copy=[[2], [2]]
I would have expected the deepcopy to end up with two independent references inside the copy list. Instead it maintains the property of a single list reference duplicated. I'm guessing that's alluded to in this part of the docs:
A deep copy constructs a new compound object and then, recursively, inserts copies into it of the objects found in the original.
I see that the deepcopy function has a memo argument. Is there anything interesting that could be done with this argument to prevent the duplicated reference, such that the final output would become:
orig=[[1], [1]] copy=[[2], [0]]

If your whole point is to copy data that could come from JSON, i.e. list, dict, string, numbers, bool, then you can trivially implement your own function:
def copy_jsonlike(data):
if isinstance(data, list):
return [copy_jsonlike(x) for x in data]
elif isinstance(data, dict):
return {k: copy_jsonlike(v) for k,v in data.items()}
else:
return data
It has the added bonus of probably being faster than copy.deepcopy
Or, your original solution, json.loads(json.dumps(data)) isn't a bad idea either.

Huh, seems like this was easier to do than I thought, but I'm 90% sure it's evil. If someone posts a better answer or explains why this is totally awful, I'll remove it.
Implement a dict that only pretends to set a value. Then the example returns separate copies of the same reference.
class NoMemo(dict):
def __setitem__(self, key, value):
return value
...
copy = deepcopy(orig, memo=NoMemo())
...
Prints:
orig=[[1], [1]] copy=[[0], [0]]
orig=[[1], [1]] copy=[[2], [0]]

Related

Python using type() to deepcopy

I was working on a leetcode question and ran into a problem where I'd have to deepcopy a list. I found a solution that used type() like such:
orignallist=[1,2,3]
deepcopylist=type(orignallist)(orignallist)
Sure enough, it works and deepcopylist is a deepcopy but how on earth is this working? Python's type() documentation doesn't make any mention of this and I also don't understand how parentheses work with the second (orignallist) added in.
First off, it's not a deep copy. You've made a shallow copy, exactly equivalent to what list(orignallist) would produce (it doesn't matter, because all the values contained in your example list are immutable types, specifically int, but if they weren't, the distinction between deep and shallow copies would be important).
Second, all type(orignallist) is doing is extracting the class that the object bound to orignallist is an instance of, in this case, list. It's runtime determined, so if orignallist was actually a set, it would get set, but right here it's getting list. After that, it's nothing special, it's just constructing an instance of whatever orignallist is using orignallist as the argument to the constructor. If you want to see what it's doing, you can do it piecemeal:
>>> orignallist=[1,2,3]
>>> type_of_orignallist = type(orignallist)
>>> type_of_orignallist is list # It's just another alias to list
True
>>> type_of_orignallist(orignallist) # Since it's an alias of list, calling it makes a new list
[1, 2, 3]
In any event, the correct way to deep copy any object in Python is the copy.deepcopy routine:
>>> import copy
>>> lst_of_lst = [[]] # List with mutable element to demonstrate difference between shallow and deep copy
>>> shallow_copy = type(lst_of_lst)(lst_of_lst) # Or lst_of_lst[:], or lst_of_lst.copy(), or list(lst_of_lst)
>>> deep_copy = copy.deepcopy(lst_of_lst)
>>> lst_of_lst[0].append(1)
>>> lst_of_lst is shallow_copy # We copied the outer list structure
False
>>> lst_of_lst
[[1]]
>>> shallow_copy # Oops, shallow, not deep
[[1]]
>>> lst_of_lst[0] is shallow_copy[0] # Because we didn't copy the inner list
>>> deep_copy # Does what it says on the tin
[[]]
>>> lst_of_lst is deep_copy
False
>>> lst_of_lst[0] is deep_copy[0] # Yep, it recursively deepcopied so the inner list differs
False

Initialize a dictionary where each item is a list of empty unique lists

I'm terribly sorry if this was already asked, but while I could find something similar I didn't find my specific issue. I have Python 3.7.4 - 64 bit. Basically I want to initialize a dictionary where each element is a list of empty lists. The problem is that in the way I'm doing it now I get that every single empty sub-list from the different items' list is the same object even though I am assigning a copy of the list to each item. As you can see in the code below, each sub-list in empty_list_of_lists is a different object. Then I assign the items to the dictionary as a copy of empty_list_of_lists. When I call my_dict['a'] is my_dict['b'] I get an expected False, but when I call my_dict['a'][0] is my_dict['b'][0] I get a True which puzzles me because empty_list_of_lists[0] is empty_list_of_lists[1] returns False and I don't get the logic. How should I go about that?
Here is my code:
empty_list_of_lists = [[] for i in range(5)]
print(empty_list_of_lists[0] is empty_list_of_lists[1]) # returns False --> expected
dict1 = {'a': empty_list_of_lists.copy(), 'b': empty_list_of_lists.copy()}
print(dict1['a'] is dict1['b']) # returns False --> expected
print(dict1['a'][0] is dict1['b'][0]) # returns True --> What?
you can use:
dict1 = {'a': [[] for _ in range(5)], 'b': [[] for _ in range(5)]}
or you can use copy.deepcopy
import copy
dict1 = {'a': copy.deepcopy(empty_list_of_lists), 'b': copy.deepcopy(empty_list_of_lists)}
you can read more about shallow and deep copy operations here
in your code you are using a shallow copy but what you need a deep copy, from the above docs:
A shallow copy constructs a new compound object and then (to the
extent possible) inserts references into it to the objects found in
the original.
A deep copy constructs a new compound object and then, recursively,
inserts copies into it of the objects found in the original.

How do I change an element in a list, and keep a copy of the original list?

I've searched around and tried a lot of stuff but I can't get this to work. I think the problem is something to do with how Python list names point to the list, rather than being the actual list, but I still can't figure it out. The situation is this (it's a list of dictionaries):
list_original = [dictionary1, dictionary2, dictionary3]
dictionary2_modified = dictionarymodifier(dictionary2) #some function that modifies the chosen element
list_modified = [i for i in list_original] #makes copy of original list
for i,n in enumerate(dictionary_original):
if i==1:
list_modified[1] = dictionary2_modified #replaces element with modified version
return list_original, list_modified
And many similar things, but I always either get two of the original list or two of the new list! I add that I'm using python 2.4 and don't have a choice in that.
Many thanks for any help
Mutable vs. Immutable
You need to know the difference between mutable and immutable elements. Namely, both dictionaries and lists in Python are mutable. Which means that if you modify it in one place, it is also modified in the other place.
In addition, the variable of mutable type (like list or dict) can contain immutable elements (eg. str), as well as the other way around: variable of immutable type (eg. tuple) can contain mutable elements (such as list or dict).
Example for mutability
So, this shows the mutability using the example of list:
>>> a = [1, 2, 3, 4]
>>> b = a
>>> a
[1, 2, 3, 4]
>>> b
[1, 2, 3, 4]
>>> a[2] = 'x'
>>> a
[1, 2, 'x', 4]
>>> b
[1, 2, 'x', 4]
How to obtain a copy of list or dict
To obtain a copy of list, you simply can do this instead:
new_list = old_list[:] # the slicing at the end just takes the whole list
In case of dict this is generally sufficient:
new_dict = old_dict.copy()
Nested lists / dicts
However, although lists / dicts that are flat or contain only mutable elements, can be copied the way I showed, to obtain a copy of more complex mutable data structures you need to do something more...
In such case very helpful may be the copy module with its deepcopy function. Documentation of copy module says more about its purpose:
Assignment statements in Python do not copy objects, they create bindings between a target and an object. For collections that are mutable or contain mutable items, a copy is sometimes needed so one can change one copy without changing the other. This module provides generic shallow and deep copy operations (explained below).
Is your dictionarymodifier actually mutating dictionary2 in place? If so, the way you build your list is irrelevant.
Simply using list_modified = list(list_original) works fine to create a shallow copy of the list, which you can then modify to your heart's content, but only if you don't modify the items in the original list (which you can't if they're immutable built-in things like numbers or strings, so beginners often mistake this for a deep copy).
If you really need to copy the list, you can use copy.deepcopy to do so.
You need to create a copy of the list.
copy=list(original)
original[0] = None

"Deep copy" nested list without using the deepcopy function

I am trying to copy the nested list a, but do not know how to do it without using the copy.deepcopy function.
a = [[1, 2], [3, 4]]
I used:
b = a[:]
and
b = a[:][:]
But they all turn out to be shallow copy.
Any hints?
My entry to simulate copy.deepcopy:
def deepcopy(obj):
if isinstance(obj, dict):
return {deepcopy(key): deepcopy(value) for key, value in obj.items()}
if hasattr(obj, '__iter__'):
return type(obj)(deepcopy(item) for item in obj)
return obj
The strategy: iterate across each element of the passed-in object, recursively descending into elements that are also iterable and making new objects of their same type.
I make no claim whatsoever that this is comprehensive or without fault [1] (don't pass in an object that references itself!) but should get you started.
[1] Truly! The point here is to demonstrate, not cover every possible eventuality. The source to copy.deepcopy is 50 lines long and it doesn't handle everything.
You can use a LC if there's but a single level.
b = [x[:] for x in a]
This is a complete cheat - but will work for lists of "primitives" - lists, dicts, strings, numbers:
def cheat_copy(nested_content):
return eval(repr(nested_content))
There are strong security implications to consider for this - and it will not be particularly fast. Using json.dumps and loads will be more secure.
I found a way to do it using recursion.
def deep_copy(nested_content):
if not isinstance(nested_content,list):
return nested_content
else:
holder = []
for sub_content in nested_content:
holder.append(deep_copy(sub_content))
return holder
For the recursive version, you have to keep track of a secondary list and return each time.

python list.pop() modifies original list (not just copy)

Situation: After making a copy of the original list I use pop to modify said copy. As it turns out, the original list gets affected by the change.
I even after checking the original list and the copy are not the same object, poping an element of the copy will will pop the same element in the original.
See below for an example of the script. Thanks in advance for your help.
l = [['1412898', 'Jack', 'headache med', '8ET-500'],
['1423859', 'Sonny', 'prostate med', '8ET-800'],
['1413836', 'Paco', 'headache med', '8ET-500']]
class App(object):
def __init__(self, info):
self.fp_rows= info
def sortbyauditor(self):
self.fp_rows_copy = self.fp_rows[:]
print self.fp_rows is self.fp_rows_copy
for i in self.fp_rows_copy:
i.pop(1)
print self.fp_rows_copy
print self.fp_rows
app= App(l)
app.sortbyauditor()
some_list[:] is only a shallow copy. You seem to need a deep copy
from copy import deepcopy
copy = deepcopy(some_list)
Edit
To understand why "one objects affects the other" take a look at the id of each list:
original = [[1, 2], [3, 4]]
shallow = original[:]
deep = deepcopy(original)
print([id(l) for l in original])
# [2122937089096, 2122937087880]
print([id(l) for l in shallow])
# [2122937089096, 2122937087880]
print([id(l) for l in deep])
# [2122937088968, 2122937089672]
You can see that the ids of the lists in original are the same as the ids in shallow. That means the nested lists are the exact same objects. When you modify one nested list the changes are also in the other list.
The ids for deep are different. That are just copies. Changing them does not affect the original list.

Categories

Resources