This question already has answers here:
How do I initialize a dictionary of empty lists in Python?
(7 answers)
Closed 2 years ago.
I came across this behavior that surprised me in Python 2.6 and 3.2:
>>> xs = dict.fromkeys(range(2), [])
>>> xs
{0: [], 1: []}
>>> xs[0].append(1)
>>> xs
{0: [1], 1: [1]}
However, dict comprehensions in 3.2 show a more polite demeanor:
>>> xs = {i:[] for i in range(2)}
>>> xs
{0: [], 1: []}
>>> xs[0].append(1)
>>> xs
{0: [1], 1: []}
>>>
Why does fromkeys behave like that?
Your Python 2.6 example is equivalent to the following, which may help to clarify:
>>> a = []
>>> xs = dict.fromkeys(range(2), a)
Each entry in the resulting dictionary will have a reference to the same object. The effects of mutating that object will be visible through every dict entry, as you've seen, because it's one object.
>>> xs[0] is a and xs[1] is a
True
Use a dict comprehension, or if you're stuck on Python 2.6 or older and you don't have dictionary comprehensions, you can get the dict comprehension behavior by using dict() with a generator expression:
xs = dict((i, []) for i in range(2))
In the first version, you use the same empty list object as the value for both keys, so if you change one, you change the other, too.
Look at this:
>>> empty = []
>>> d = dict.fromkeys(range(2), empty)
>>> d
{0: [], 1: []}
>>> empty.append(1) # same as d[0].append(1) because d[0] references empty!
>>> d
{0: [1], 1: [1]}
In the second version, a new empty list object is created in every iteration of the dict comprehension, so both are independent from each other.
As to "why" fromkeys() works like that - well, it would be surprising if it didn't work like that. fromkeys(iterable, value) constructs a new dict with keys from iterable that all have the value value. If that value is a mutable object, and you change that object, what else could you reasonably expect to happen?
To answer the actual question being asked: fromkeys behaves like that because there is no other reasonable choice. It is not reasonable (or even possible) to have fromkeys decide whether or not your argument is mutable and make new copies every time. In some cases it doesn't make sense, and in others it's just impossible.
The second argument you pass in is therefore just a reference, and is copied as such. An assignment of [] in Python means "a single reference to a new list", not "make a new list every time I access this variable". The alternative would be to pass in a function that generates new instances, which is the functionality that dict comprehensions supply for you.
Here are some options for creating multiple actual copies of a mutable container:
As you mention in the question, dict comprehensions allow you to execute an arbitrary statement for each element:
d = {k: [] for k in range(2)}
The important thing here is that this is equivalent to putting the assignment k = [] in a for loop. Each iteration creates a new list and assigns it to a value.
Use the form of the dict constructor suggested by #Andrew Clark:
d = dict((k, []) for k in range(2))
This creates a generator which again makes the assignment of a new list to each key-value pair when it is executed.
Use a collections.defaultdict instead of a regular dict:
d = collections.defaultdict(list)
This option is a little different from the others. Instead of creating the new list references up front, defaultdict will call list every time you access a key that's not already there. You can there fore add the keys as lazily as you want, which can be very convenient sometimes:
for k in range(2):
d[k].append(42)
Since you've set up the factory for new elements, this will actually behave exactly as you expected fromkeys to behave in the original question.
Use dict.setdefault when you access potentially new keys. This does something similar to what defaultdict does, but it has the advantage of being more controlled, in the sense that only the access you want to create new keys actually creates them:
d = {}
for k in range(2):
d.setdefault(k, []).append(42)
The disadvantage is that a new empty list object gets created every time you call the function, even if it never gets assigned to a value. This is not a huge problem, but it could add up if you call it frequently and/or your container is not as simple as list.
Related
Given a dictionary my_dict, we apply list(my_dict) and my_dict.keys() in the following code
my_dict = {'1': 1, '2': 2, '3': 3, '4': 4}
list_keys = list(my_dict)
view_keys = my_dict.keys()
list_from_view = list(my_dict.keys())
print("list_keys : ", list_keys)
print("view_keys : ", view_keys)
print("list_from_view", list_from_view)
Results:
list_keys : ['1', '2', '3', '4']
view_keys : dict_keys(['1', '2', '3', '4'])
list_from_view ['1', '2', '3', '4']
What are the differences between using list(my_dict), my_dict.keys(), and list(my_dict.keys()), especially:
list(my_dict) vs my_dict.keys()
list(my_dict) vs list(my_dict.keys()) (what is the best (fast) way to get a list of keys)
Thanks.
A Python list is an iterable, but not all iterables are list...
Let us examine your expressions:
list_keys = list(my_dict): here you use my_dict as an iterable over the keys and build a new list from it. Long story made short, you have copied the keys into a list. From that point on, you can apply any changes to the list or the initial dict without changing anything to the other object
view_keys = my_dict.keys(): here you get a dict_keys view on the dictionary. It is a non modifiable iterable that can be used to access the keys of the dictionary. If you add an item to the dictionary, you will see it immediately in the view, but you can neither add a new element to view_keys nor change or remove one
list_from_view = list(my_dict.keys()): here you access the view on the keys, and iterate it to build a list. In the end, it is exactly the same as the first way: you get an independent list
Which one is best? It depends.
As I have already said, 1 and 3 give equivalent lists. 1 is probably more Pythonic because it uses the fact that a Python dictionary is implicitly an iterable over its keys. 3 is probably easier to understand for new Python users because it explicitly references an operation on keys.
2 is a completely different animal because instead of having an independent list object, you have a view on the initial dictionary that will follow its changes.
Now for the question:
what is the best (fast) way to get a list of keys
2 will not return a list of keys because a list have append or remove methods that a view has not
1 and 3 should be seen as equivalent on a performance point of view, and I have already spoken of readability which is the most important quality of Python code
The question that you should have asked:
what is the more pythonic way to iterate over the list of keys of a dictionary?
With no doubt for key in my_dict. No need to convert that to a list, and a view is seldom necessary
The major difference in these is between two types, namely list and dict_keys.
list is taking all the values from given keys at time and storing them into a list object.
dict_keys object on the other hand provides you with a view on dictionary keys.
Difference between these is shown in the following:
d = {1: 2, 3: 4}
a = d.keys()
b = list(d)
a
# dict_keys([1, 3])
b
# [1, 3]
d[5] = 6
a
# dict_keys([1, 3, 5])
b
# [1, 3]
In conclusion, dict_keys object will show you the updates to your dict as soon as they are introduced, while list will stay the same.
Should you make changes to the list those changes will not be reflected onto dict, while on the other hand you cannot make changes to dict_keys.
I'd have assumed the results of purge and purge2 would be the same in the following code (remove duplicate elements, keeping the first occurrences and their order):
def purge(a):
l = []
return (l := [x for x in a if x not in l])
def purge2(a):
d = {}
return list(d := {x: None for x in a if x not in d})
t = [2,5,3,7,2,6,2,5,2,1,7]
print(purge(t), purge2(t))
But it looks like with dict comprehensions, unlike with lists, the value of d is built incrementally. Is this what's actually happening? Do I correctly infer the semantics of dict comprehensions from this sample code and their difference from list comprehensions? Does it work only with comprehensions, or also with other right-hand sides referring to the dictionary being assigned to (e.g. comprehensions nested inside other expressions, something involving iterators, comprehensions of types other than dict)? Where is it specified and full semantics can be consulted? Or is it just an undocumented behaviour of the implementation, not to be relied upon?
There's nothing "incremental" going on here. The walrus operator doesn't assign to the variable until the dictionary comprehension completes. if x not in d is referring to the original empty dictionary, not the dictionary that you're building with the comprehension, just as the version with the list comprehension is referring to the original l.
The reason the duplicates are filtered out is simply because dictionary keys are always unique. Trying to create a duplicate key simply ignores the second one. It's the same as if you'd written:
return {2: None, 2: None}
you'll just get {2: None}.
So your function can be simplified to
def purge2(a):
return list({x: None for x in a})
This question already has answers here:
How do I initialize a dictionary of empty lists in Python?
(7 answers)
Closed 2 years ago.
I came across this behavior that surprised me in Python 2.6 and 3.2:
>>> xs = dict.fromkeys(range(2), [])
>>> xs
{0: [], 1: []}
>>> xs[0].append(1)
>>> xs
{0: [1], 1: [1]}
However, dict comprehensions in 3.2 show a more polite demeanor:
>>> xs = {i:[] for i in range(2)}
>>> xs
{0: [], 1: []}
>>> xs[0].append(1)
>>> xs
{0: [1], 1: []}
>>>
Why does fromkeys behave like that?
Your Python 2.6 example is equivalent to the following, which may help to clarify:
>>> a = []
>>> xs = dict.fromkeys(range(2), a)
Each entry in the resulting dictionary will have a reference to the same object. The effects of mutating that object will be visible through every dict entry, as you've seen, because it's one object.
>>> xs[0] is a and xs[1] is a
True
Use a dict comprehension, or if you're stuck on Python 2.6 or older and you don't have dictionary comprehensions, you can get the dict comprehension behavior by using dict() with a generator expression:
xs = dict((i, []) for i in range(2))
In the first version, you use the same empty list object as the value for both keys, so if you change one, you change the other, too.
Look at this:
>>> empty = []
>>> d = dict.fromkeys(range(2), empty)
>>> d
{0: [], 1: []}
>>> empty.append(1) # same as d[0].append(1) because d[0] references empty!
>>> d
{0: [1], 1: [1]}
In the second version, a new empty list object is created in every iteration of the dict comprehension, so both are independent from each other.
As to "why" fromkeys() works like that - well, it would be surprising if it didn't work like that. fromkeys(iterable, value) constructs a new dict with keys from iterable that all have the value value. If that value is a mutable object, and you change that object, what else could you reasonably expect to happen?
To answer the actual question being asked: fromkeys behaves like that because there is no other reasonable choice. It is not reasonable (or even possible) to have fromkeys decide whether or not your argument is mutable and make new copies every time. In some cases it doesn't make sense, and in others it's just impossible.
The second argument you pass in is therefore just a reference, and is copied as such. An assignment of [] in Python means "a single reference to a new list", not "make a new list every time I access this variable". The alternative would be to pass in a function that generates new instances, which is the functionality that dict comprehensions supply for you.
Here are some options for creating multiple actual copies of a mutable container:
As you mention in the question, dict comprehensions allow you to execute an arbitrary statement for each element:
d = {k: [] for k in range(2)}
The important thing here is that this is equivalent to putting the assignment k = [] in a for loop. Each iteration creates a new list and assigns it to a value.
Use the form of the dict constructor suggested by #Andrew Clark:
d = dict((k, []) for k in range(2))
This creates a generator which again makes the assignment of a new list to each key-value pair when it is executed.
Use a collections.defaultdict instead of a regular dict:
d = collections.defaultdict(list)
This option is a little different from the others. Instead of creating the new list references up front, defaultdict will call list every time you access a key that's not already there. You can there fore add the keys as lazily as you want, which can be very convenient sometimes:
for k in range(2):
d[k].append(42)
Since you've set up the factory for new elements, this will actually behave exactly as you expected fromkeys to behave in the original question.
Use dict.setdefault when you access potentially new keys. This does something similar to what defaultdict does, but it has the advantage of being more controlled, in the sense that only the access you want to create new keys actually creates them:
d = {}
for k in range(2):
d.setdefault(k, []).append(42)
The disadvantage is that a new empty list object gets created every time you call the function, even if it never gets assigned to a value. This is not a huge problem, but it could add up if you call it frequently and/or your container is not as simple as list.
This question already has answers here:
How do I initialize a dictionary of empty lists in Python?
(7 answers)
Closed 2 years ago.
I came across this behavior that surprised me in Python 2.6 and 3.2:
>>> xs = dict.fromkeys(range(2), [])
>>> xs
{0: [], 1: []}
>>> xs[0].append(1)
>>> xs
{0: [1], 1: [1]}
However, dict comprehensions in 3.2 show a more polite demeanor:
>>> xs = {i:[] for i in range(2)}
>>> xs
{0: [], 1: []}
>>> xs[0].append(1)
>>> xs
{0: [1], 1: []}
>>>
Why does fromkeys behave like that?
Your Python 2.6 example is equivalent to the following, which may help to clarify:
>>> a = []
>>> xs = dict.fromkeys(range(2), a)
Each entry in the resulting dictionary will have a reference to the same object. The effects of mutating that object will be visible through every dict entry, as you've seen, because it's one object.
>>> xs[0] is a and xs[1] is a
True
Use a dict comprehension, or if you're stuck on Python 2.6 or older and you don't have dictionary comprehensions, you can get the dict comprehension behavior by using dict() with a generator expression:
xs = dict((i, []) for i in range(2))
In the first version, you use the same empty list object as the value for both keys, so if you change one, you change the other, too.
Look at this:
>>> empty = []
>>> d = dict.fromkeys(range(2), empty)
>>> d
{0: [], 1: []}
>>> empty.append(1) # same as d[0].append(1) because d[0] references empty!
>>> d
{0: [1], 1: [1]}
In the second version, a new empty list object is created in every iteration of the dict comprehension, so both are independent from each other.
As to "why" fromkeys() works like that - well, it would be surprising if it didn't work like that. fromkeys(iterable, value) constructs a new dict with keys from iterable that all have the value value. If that value is a mutable object, and you change that object, what else could you reasonably expect to happen?
To answer the actual question being asked: fromkeys behaves like that because there is no other reasonable choice. It is not reasonable (or even possible) to have fromkeys decide whether or not your argument is mutable and make new copies every time. In some cases it doesn't make sense, and in others it's just impossible.
The second argument you pass in is therefore just a reference, and is copied as such. An assignment of [] in Python means "a single reference to a new list", not "make a new list every time I access this variable". The alternative would be to pass in a function that generates new instances, which is the functionality that dict comprehensions supply for you.
Here are some options for creating multiple actual copies of a mutable container:
As you mention in the question, dict comprehensions allow you to execute an arbitrary statement for each element:
d = {k: [] for k in range(2)}
The important thing here is that this is equivalent to putting the assignment k = [] in a for loop. Each iteration creates a new list and assigns it to a value.
Use the form of the dict constructor suggested by #Andrew Clark:
d = dict((k, []) for k in range(2))
This creates a generator which again makes the assignment of a new list to each key-value pair when it is executed.
Use a collections.defaultdict instead of a regular dict:
d = collections.defaultdict(list)
This option is a little different from the others. Instead of creating the new list references up front, defaultdict will call list every time you access a key that's not already there. You can there fore add the keys as lazily as you want, which can be very convenient sometimes:
for k in range(2):
d[k].append(42)
Since you've set up the factory for new elements, this will actually behave exactly as you expected fromkeys to behave in the original question.
Use dict.setdefault when you access potentially new keys. This does something similar to what defaultdict does, but it has the advantage of being more controlled, in the sense that only the access you want to create new keys actually creates them:
d = {}
for k in range(2):
d.setdefault(k, []).append(42)
The disadvantage is that a new empty list object gets created every time you call the function, even if it never gets assigned to a value. This is not a huge problem, but it could add up if you call it frequently and/or your container is not as simple as list.
In trying to use a list comprehension to make a list given a conditional, I see the following:
In [1]: mydicts = [{'foo':'val1'},{'foo':''}]
In [2]: mylist = [d for d in mydicts if d['foo']]
In [3]: mylist
Out[3]: [{'foo': 'val1'}]
In [4]: mydicts[1]['foo'] = 'val2'
In [5]: mydicts
Out[5]: [{'foo': 'val1'}, {'foo': 'val2'}]
In [6]: mylist
Out[6]: [{'foo': 'val1'}]
I've been reading the docs to try and understand this but have come up with nothing so far, so I'll ask my question here: why is it that mylist never includes {'foo': 'val2'} even though the reference in the list comprehension points to mydict, which by In [6] contains {'foo': 'val2'}? Is this because Python eagerly evaluates list comprehensions? Or is the lazy/eager dichotomy totally irrelevant to this?
There's no lazy evaluation of lists in Python. List comprehensions simply create a new list. If you want "lazy" evaluation, use a generator expression instead.
my_generator_expression = (d for d in mydicts if d['foo']) # note parentheses
mydicts[1]['foo'] = 'val2'
print(my_generator_expression) # >>> <generator object <genexpr> at 0x00000000>
for d in my_generator_expression:
print(d) # >>> {'foo': 'val1'}
# >>> {'foo': 'val2'}
Note that generators differ from lists in several important ways. Perhaps the most notable is that once you iterate over them, they are exhausted, so they're best to use if you only need the data they contain once.
I think you're a bit confused about what list comprehensions do.
When you do this:
[d for d in mydicts if d['foo']]
That evaluates to a new list. So, when you do this:
mylist = [d for d in mydicts if d['foo']]
You're assigning that list as the value of mylist. You can see this very easily:
assert type(mylist) == list
You're not assigning "a list comprehension" that gets reevaluated every time to mylist. There are no magic values in Python that get reevaluated every time. (You can fake them by, e.g., creating a class with a #property, but that's not really an exception; it's the expression myobj.myprop that's being reevaluated, not myprop itself.)
In fact, mylist = [d for d in mydicts if d['foo']] is basically the same mylist = [1, 2, 3].* In both cases, you're creating a new list, and assigning it to mylist. You wouldn't expect the second one to re-evaluate [1, 2, 3] each time (otherwise, doing mylist[0] = 0 wouldn't do much good, because as soon as you try to view mylist you'd be getting a new, pristine list!). The same is true here.
* In Python 3.x, they aren't just basically the same; they're both just different types of list displays. In 2.x, it's a bit more murky, and they just happen to both evaluate to new list objects.
mylist contains the result of a previous list comprehension evaluation, it won't magically updated just because you update a variable that was used for its computation.