How to clone or copy a set in Python? - python

For copying a list: shallow_copy_of_list = old_list[:].
For copying a dict: shallow_copy_of_dict = dict(old_dict).
But for a set, I was worried that a similar thing wouldn't work, because saying new_set = set(old_set) would give a set of a set?
But it does work. So I'm posting the question and answer here for reference. In case anyone else has the same confusion.

Both of these will give a duplicate of a set:
shallow_copy_of_set = set(old_set)
Or:
shallow_copy_of_set = old_set.copy() #Which is more readable.
The reason that the first way above doesn't give a set of a set, is that the proper syntax for that would be set([old_set]). Which wouldn't work, because sets can't be elements in other sets, because they are unhashable by virtue of being mutable. However, this isn't true for frozensets, so e.g. frozenset(frozenset(frozenset([1,2,3]))) == frozenset([1, 2, 3]).
So a rule of thumb for replicating any of instance of the basic data structures in Python (lists, dict, set, frozenset, string):
a2 = list(a) #a is a list
b2 = set(b) #b is a set
c2 = dict(c) #c is a dict
d2 = frozenset(d) #d is a frozenset
e2 = str(e) #e is a string
#All of the above give a (shallow) copy.
So, if x is either of those types, then
shallow_copy_of_x = type(x)(x) #Highly unreadable! But economical.
Note that only dict, set and frozenset have the built-in copy() method. It would probably be a good idea that lists and strings had a copy() method too, for uniformity and readability. But they don't, at least in Python 2.7.3 which I'm testing with.

Besides the type(x)(x) hack, you can import copy module to make either shallow copy or deep copy:
In [29]: d={1: [2,3]}
In [30]: sd=copy.copy(d)
...: sd[1][0]=321
...: print d
{1: [321, 3]}
In [31]: dd=copy.deepcopy(d)
...: dd[1][0]=987
...: print dd, d
{1: [987, 3]} {1: [321, 3]}
From the docstring:
Definition: copy.copy(x)
Docstring:
Shallow copy operation on arbitrary Python objects.

Related

Deep copy of dictionary created with comprehension not working

Could you please help me understand why does deepcopy not work for all the elements in the dictionary from the example below?
import copy
a = [{'id':1, 'list':[1,2,3], 'num':3}, {'id':2,' list':[4,5,6], 'num':65}]
b = {i['id']:copy.deepcopy(i) for i in a}
In [1]: print(id(a) == id(b))
Out[1]: False
In [2]: print(id(a[0]) == id(b[1]))
Out[2]: False
In [3]: print(id(a[0]['list']) == id(b[1]['list']))
Out[3]: False
In [4]: print(id(a[0]['num']) == id(b[1]['num']))
Out[4]: True
In particular, the values associated to the 'num' key are the same while those for the 'list' key seem to have been copied successfully with deepcopy. I'm guessing it has to do with the data type of the value being stored, could someone please point me in the right direction?
Thanks!
This has nothing to do with the dict comprehension but, as you suggested, with the data type:
>>> import copy
>>> x = 1
>>> copy.deepcopy(x) is x
True
>>> x = [1]
>>> copy.deepcopy(x) is x
False
The distinction made by #mengban is correct: you have mutable and immutable objects (that depends on the type of the object). Typical examples of immutable objects are: integers (0, 1, 2...), floats (3.14159), but also strings ("foo") and tuples ((1, 3)). Typical examples of mutable objects are: lists ([1, 2, 3]) or dictionaries ({'a': 1, 'b': 2}).
Basically, the deepcopy of an immutable object returns the object itself: no actual copy is performed (there's a little trick with tuples: I'll explain it later):
>>> x = "foo"
>>> copy.deepcopy(x) is x
True
>>> x = (1, 2)
>>> copy.deepcopy(x) is x
True
And deepcopy of mutable objects creates a new instance of the object having the same elements.
This the right behavior because when you have acquired a deep copy o2 of an object o, the contract is that this is your copy. No operation performed on o should be able to modify o2. If o is immutable, this is guaranteed for free. But if o is mutable, then you need to create a new instance, having the same content (this implies a recursive deep copy).
Now what's the matter with tuples?
>>> o = ([1], [2])
>>> copy.deepcopy(o) is o
False
Even if the tuple itself is immutable, maybe one of its elements could be mutable. If I give you a reference o2 to the value of o (ie o2 = o), you can write o2[0].append(10) and my object o is modified. Hence the deepcopy function looks for mutable objects in the tuple, and decides whether an actual copy is necessary or not.
Bonus: have a look at the deepcopy implementation. The _deepcopy_dispatch maps types to the actual copier:
_deepcopy_dispatch = d = {}
...
d[int] = _deepcopy_atomic
d[float] = _deepcopy_atomic
d[bool] = _deepcopy_atomic
...
d[str] = _deepcopy_atomic
...
d[list] = _deepcopy_list
...
d[tuple] = _deepcopy_tuple
...
d[dict] = _deepcopy_dict
...
While _deepcopy_atomic simply returns the value, _deepcopy_list, _deepcopy_tuple, _deepcopy_dict... perform usually an in-depth copy.
You can check the _deepcopy_tuple function to understand the process. Basically, deep copy every element until an actual copy is made. If a copy was made, create a new tuple of deep copies. Else return the initial tuple.
If you don't want a reference in your dictionary in your new dictionary you can do the following:
new_dictionary = json.loads(json.dumps(old_dictionary))
There is a big difference between mutable and immutable types in python.
In general, variable types in Python include lists, dictionaries, and collections. Immutable types include strings, int, float, and tuples.
Re-assigning a variable of an immutable type is actually re-creating an object of an immutable type and re-pointing the original variable to the newly created object (a new memory address is opened up), if no other variables refer to the original object (That is, the reference count is 0), the original object will be recycled.

python dictionary: how does appending of items work? [duplicate]

This question already has answers here:
How do I initialize a dictionary of empty lists in Python?
(7 answers)
Closed 2 years ago.
I came across this behavior that surprised me in Python 2.6 and 3.2:
>>> xs = dict.fromkeys(range(2), [])
>>> xs
{0: [], 1: []}
>>> xs[0].append(1)
>>> xs
{0: [1], 1: [1]}
However, dict comprehensions in 3.2 show a more polite demeanor:
>>> xs = {i:[] for i in range(2)}
>>> xs
{0: [], 1: []}
>>> xs[0].append(1)
>>> xs
{0: [1], 1: []}
>>>
Why does fromkeys behave like that?
Your Python 2.6 example is equivalent to the following, which may help to clarify:
>>> a = []
>>> xs = dict.fromkeys(range(2), a)
Each entry in the resulting dictionary will have a reference to the same object. The effects of mutating that object will be visible through every dict entry, as you've seen, because it's one object.
>>> xs[0] is a and xs[1] is a
True
Use a dict comprehension, or if you're stuck on Python 2.6 or older and you don't have dictionary comprehensions, you can get the dict comprehension behavior by using dict() with a generator expression:
xs = dict((i, []) for i in range(2))
In the first version, you use the same empty list object as the value for both keys, so if you change one, you change the other, too.
Look at this:
>>> empty = []
>>> d = dict.fromkeys(range(2), empty)
>>> d
{0: [], 1: []}
>>> empty.append(1) # same as d[0].append(1) because d[0] references empty!
>>> d
{0: [1], 1: [1]}
In the second version, a new empty list object is created in every iteration of the dict comprehension, so both are independent from each other.
As to "why" fromkeys() works like that - well, it would be surprising if it didn't work like that. fromkeys(iterable, value) constructs a new dict with keys from iterable that all have the value value. If that value is a mutable object, and you change that object, what else could you reasonably expect to happen?
To answer the actual question being asked: fromkeys behaves like that because there is no other reasonable choice. It is not reasonable (or even possible) to have fromkeys decide whether or not your argument is mutable and make new copies every time. In some cases it doesn't make sense, and in others it's just impossible.
The second argument you pass in is therefore just a reference, and is copied as such. An assignment of [] in Python means "a single reference to a new list", not "make a new list every time I access this variable". The alternative would be to pass in a function that generates new instances, which is the functionality that dict comprehensions supply for you.
Here are some options for creating multiple actual copies of a mutable container:
As you mention in the question, dict comprehensions allow you to execute an arbitrary statement for each element:
d = {k: [] for k in range(2)}
The important thing here is that this is equivalent to putting the assignment k = [] in a for loop. Each iteration creates a new list and assigns it to a value.
Use the form of the dict constructor suggested by #Andrew Clark:
d = dict((k, []) for k in range(2))
This creates a generator which again makes the assignment of a new list to each key-value pair when it is executed.
Use a collections.defaultdict instead of a regular dict:
d = collections.defaultdict(list)
This option is a little different from the others. Instead of creating the new list references up front, defaultdict will call list every time you access a key that's not already there. You can there fore add the keys as lazily as you want, which can be very convenient sometimes:
for k in range(2):
d[k].append(42)
Since you've set up the factory for new elements, this will actually behave exactly as you expected fromkeys to behave in the original question.
Use dict.setdefault when you access potentially new keys. This does something similar to what defaultdict does, but it has the advantage of being more controlled, in the sense that only the access you want to create new keys actually creates them:
d = {}
for k in range(2):
d.setdefault(k, []).append(42)
The disadvantage is that a new empty list object gets created every time you call the function, even if it never gets assigned to a value. This is not a huge problem, but it could add up if you call it frequently and/or your container is not as simple as list.

python initialize nested dictionary with keys and ambiguous behavior of dict.fromkeys class method [duplicate]

This question already has answers here:
How do I initialize a dictionary of empty lists in Python?
(7 answers)
Closed 2 years ago.
I came across this behavior that surprised me in Python 2.6 and 3.2:
>>> xs = dict.fromkeys(range(2), [])
>>> xs
{0: [], 1: []}
>>> xs[0].append(1)
>>> xs
{0: [1], 1: [1]}
However, dict comprehensions in 3.2 show a more polite demeanor:
>>> xs = {i:[] for i in range(2)}
>>> xs
{0: [], 1: []}
>>> xs[0].append(1)
>>> xs
{0: [1], 1: []}
>>>
Why does fromkeys behave like that?
Your Python 2.6 example is equivalent to the following, which may help to clarify:
>>> a = []
>>> xs = dict.fromkeys(range(2), a)
Each entry in the resulting dictionary will have a reference to the same object. The effects of mutating that object will be visible through every dict entry, as you've seen, because it's one object.
>>> xs[0] is a and xs[1] is a
True
Use a dict comprehension, or if you're stuck on Python 2.6 or older and you don't have dictionary comprehensions, you can get the dict comprehension behavior by using dict() with a generator expression:
xs = dict((i, []) for i in range(2))
In the first version, you use the same empty list object as the value for both keys, so if you change one, you change the other, too.
Look at this:
>>> empty = []
>>> d = dict.fromkeys(range(2), empty)
>>> d
{0: [], 1: []}
>>> empty.append(1) # same as d[0].append(1) because d[0] references empty!
>>> d
{0: [1], 1: [1]}
In the second version, a new empty list object is created in every iteration of the dict comprehension, so both are independent from each other.
As to "why" fromkeys() works like that - well, it would be surprising if it didn't work like that. fromkeys(iterable, value) constructs a new dict with keys from iterable that all have the value value. If that value is a mutable object, and you change that object, what else could you reasonably expect to happen?
To answer the actual question being asked: fromkeys behaves like that because there is no other reasonable choice. It is not reasonable (or even possible) to have fromkeys decide whether or not your argument is mutable and make new copies every time. In some cases it doesn't make sense, and in others it's just impossible.
The second argument you pass in is therefore just a reference, and is copied as such. An assignment of [] in Python means "a single reference to a new list", not "make a new list every time I access this variable". The alternative would be to pass in a function that generates new instances, which is the functionality that dict comprehensions supply for you.
Here are some options for creating multiple actual copies of a mutable container:
As you mention in the question, dict comprehensions allow you to execute an arbitrary statement for each element:
d = {k: [] for k in range(2)}
The important thing here is that this is equivalent to putting the assignment k = [] in a for loop. Each iteration creates a new list and assigns it to a value.
Use the form of the dict constructor suggested by #Andrew Clark:
d = dict((k, []) for k in range(2))
This creates a generator which again makes the assignment of a new list to each key-value pair when it is executed.
Use a collections.defaultdict instead of a regular dict:
d = collections.defaultdict(list)
This option is a little different from the others. Instead of creating the new list references up front, defaultdict will call list every time you access a key that's not already there. You can there fore add the keys as lazily as you want, which can be very convenient sometimes:
for k in range(2):
d[k].append(42)
Since you've set up the factory for new elements, this will actually behave exactly as you expected fromkeys to behave in the original question.
Use dict.setdefault when you access potentially new keys. This does something similar to what defaultdict does, but it has the advantage of being more controlled, in the sense that only the access you want to create new keys actually creates them:
d = {}
for k in range(2):
d.setdefault(k, []).append(42)
The disadvantage is that a new empty list object gets created every time you call the function, even if it never gets assigned to a value. This is not a huge problem, but it could add up if you call it frequently and/or your container is not as simple as list.

evaluation of list comprehensions in python

In trying to use a list comprehension to make a list given a conditional, I see the following:
In [1]: mydicts = [{'foo':'val1'},{'foo':''}]
In [2]: mylist = [d for d in mydicts if d['foo']]
In [3]: mylist
Out[3]: [{'foo': 'val1'}]
In [4]: mydicts[1]['foo'] = 'val2'
In [5]: mydicts
Out[5]: [{'foo': 'val1'}, {'foo': 'val2'}]
In [6]: mylist
Out[6]: [{'foo': 'val1'}]
I've been reading the docs to try and understand this but have come up with nothing so far, so I'll ask my question here: why is it that mylist never includes {'foo': 'val2'} even though the reference in the list comprehension points to mydict, which by In [6] contains {'foo': 'val2'}? Is this because Python eagerly evaluates list comprehensions? Or is the lazy/eager dichotomy totally irrelevant to this?
There's no lazy evaluation of lists in Python. List comprehensions simply create a new list. If you want "lazy" evaluation, use a generator expression instead.
my_generator_expression = (d for d in mydicts if d['foo']) # note parentheses
mydicts[1]['foo'] = 'val2'
print(my_generator_expression) # >>> <generator object <genexpr> at 0x00000000>
for d in my_generator_expression:
print(d) # >>> {'foo': 'val1'}
# >>> {'foo': 'val2'}
Note that generators differ from lists in several important ways. Perhaps the most notable is that once you iterate over them, they are exhausted, so they're best to use if you only need the data they contain once.
I think you're a bit confused about what list comprehensions do.
When you do this:
[d for d in mydicts if d['foo']]
That evaluates to a new list. So, when you do this:
mylist = [d for d in mydicts if d['foo']]
You're assigning that list as the value of mylist. You can see this very easily:
assert type(mylist) == list
You're not assigning "a list comprehension" that gets reevaluated every time to mylist. There are no magic values in Python that get reevaluated every time. (You can fake them by, e.g., creating a class with a #property, but that's not really an exception; it's the expression myobj.myprop that's being reevaluated, not myprop itself.)
In fact, mylist = [d for d in mydicts if d['foo']] is basically the same mylist = [1, 2, 3].* In both cases, you're creating a new list, and assigning it to mylist. You wouldn't expect the second one to re-evaluate [1, 2, 3] each time (otherwise, doing mylist[0] = 0 wouldn't do much good, because as soon as you try to view mylist you'd be getting a new, pristine list!). The same is true here.
* In Python 3.x, they aren't just basically the same; they're both just different types of list displays. In 2.x, it's a bit more murky, and they just happen to both evaluate to new list objects.
mylist contains the result of a previous list comprehension evaluation, it won't magically updated just because you update a variable that was used for its computation.

Dictionary creation with fromkeys and mutable objects. A surprise [duplicate]

This question already has answers here:
How do I initialize a dictionary of empty lists in Python?
(7 answers)
Closed 2 years ago.
I came across this behavior that surprised me in Python 2.6 and 3.2:
>>> xs = dict.fromkeys(range(2), [])
>>> xs
{0: [], 1: []}
>>> xs[0].append(1)
>>> xs
{0: [1], 1: [1]}
However, dict comprehensions in 3.2 show a more polite demeanor:
>>> xs = {i:[] for i in range(2)}
>>> xs
{0: [], 1: []}
>>> xs[0].append(1)
>>> xs
{0: [1], 1: []}
>>>
Why does fromkeys behave like that?
Your Python 2.6 example is equivalent to the following, which may help to clarify:
>>> a = []
>>> xs = dict.fromkeys(range(2), a)
Each entry in the resulting dictionary will have a reference to the same object. The effects of mutating that object will be visible through every dict entry, as you've seen, because it's one object.
>>> xs[0] is a and xs[1] is a
True
Use a dict comprehension, or if you're stuck on Python 2.6 or older and you don't have dictionary comprehensions, you can get the dict comprehension behavior by using dict() with a generator expression:
xs = dict((i, []) for i in range(2))
In the first version, you use the same empty list object as the value for both keys, so if you change one, you change the other, too.
Look at this:
>>> empty = []
>>> d = dict.fromkeys(range(2), empty)
>>> d
{0: [], 1: []}
>>> empty.append(1) # same as d[0].append(1) because d[0] references empty!
>>> d
{0: [1], 1: [1]}
In the second version, a new empty list object is created in every iteration of the dict comprehension, so both are independent from each other.
As to "why" fromkeys() works like that - well, it would be surprising if it didn't work like that. fromkeys(iterable, value) constructs a new dict with keys from iterable that all have the value value. If that value is a mutable object, and you change that object, what else could you reasonably expect to happen?
To answer the actual question being asked: fromkeys behaves like that because there is no other reasonable choice. It is not reasonable (or even possible) to have fromkeys decide whether or not your argument is mutable and make new copies every time. In some cases it doesn't make sense, and in others it's just impossible.
The second argument you pass in is therefore just a reference, and is copied as such. An assignment of [] in Python means "a single reference to a new list", not "make a new list every time I access this variable". The alternative would be to pass in a function that generates new instances, which is the functionality that dict comprehensions supply for you.
Here are some options for creating multiple actual copies of a mutable container:
As you mention in the question, dict comprehensions allow you to execute an arbitrary statement for each element:
d = {k: [] for k in range(2)}
The important thing here is that this is equivalent to putting the assignment k = [] in a for loop. Each iteration creates a new list and assigns it to a value.
Use the form of the dict constructor suggested by #Andrew Clark:
d = dict((k, []) for k in range(2))
This creates a generator which again makes the assignment of a new list to each key-value pair when it is executed.
Use a collections.defaultdict instead of a regular dict:
d = collections.defaultdict(list)
This option is a little different from the others. Instead of creating the new list references up front, defaultdict will call list every time you access a key that's not already there. You can there fore add the keys as lazily as you want, which can be very convenient sometimes:
for k in range(2):
d[k].append(42)
Since you've set up the factory for new elements, this will actually behave exactly as you expected fromkeys to behave in the original question.
Use dict.setdefault when you access potentially new keys. This does something similar to what defaultdict does, but it has the advantage of being more controlled, in the sense that only the access you want to create new keys actually creates them:
d = {}
for k in range(2):
d.setdefault(k, []).append(42)
The disadvantage is that a new empty list object gets created every time you call the function, even if it never gets assigned to a value. This is not a huge problem, but it could add up if you call it frequently and/or your container is not as simple as list.

Categories

Resources