python: union keys from multiple dictionary?

python: union keys from multiple dictionary? - python

I have 5 dictionaries and I want a union of their keys.
alldict = [dict1, dict2, dict3, dict4, dict5]
I tried
allkey = reduce(lambda x, y: set(x.keys()).union(y.keys()), alldict)
but it gave me an error
AttributeError: 'set' object has no attribute 'keys'
Am I doing it wrong ? I using normal forloop but I wonder why the above code didn't work.

I think #chuck already answered the question why it doesn't work, but a simpler way to do this would be to remember that the union method can take multiple arguments:
allkey = set().union(*alldict)
does what you want without any loops or lambdas.

Your solution works for the first two elements in the list, but then dict1 and dict2 got reduced into a set and that set is put into your lambda as the x. So now x does not have the method keys() anymore.
The solution is to make x be a set from the very beginning by initializing the reduction with an empty set (which happens to be the neutral element of the union).
Try it with an initializer:
allkey = reduce(lambda x, y: x.union(y.keys()), alldict, set())
An alternative without any lambdas would be:
allkey = reduce(set.union, map(set, map(dict.keys, alldict)))

A simple strategy for non-functional neurons (pun intended):
allkey = []
for dictio in alldict:
for key in dictio:
allkey.append(key)
allkey = set(allkey)
We can convert this code to a much sorter form using set comprehensions:
allkey = {key for dictio in alldict for key in dictio}
This one-liner is still very readable in comparison with the conventional for loop.
The key to convert a nested loop to a list or set comprehension is to write the inner loop (the one that varies faster in the nested loop) as the last index (that is, for key in dictio).

set().union(dict1.keys(),dict2.keys()...)
I tried the list and it didnt work so just putting it up here for anyone.

Just one more way, 'cause what the hay:
a={}; [ a.update(b) for b in alldict ] and a.keys()
or the slightly-more-mysterious
reduce(lambda a, b: a.update(b) or a, alldict, {}).keys()
(I'm bummed that there's no built-in function equivalent to
def f(a,b):
r = {}
r.update(a)
r.update(b)
return r
is there?)

If you only want to union keys of 2 dicts you could use operator |.
Quote from docs:
Return a new set with elements from the set and all others.
Example:
all_keys = (dict1.keys() | dict2.keys())

Related

Python dictionary comprehension to group together equal keys

I have a code snippit that groups together equal keys from a list of dicts and adds the dict with equal ObjectID to a list under that key.
Code bellow works, but I am trying to convert it to a Dictionary comprehension
group togheter subblocks if they have equal ObjectID
output = {}
subblkDBF : list[dict]
for row in subblkDBF:
if row["OBJECTID"] not in output:
output[row["OBJECTID"]] = []
output[row["OBJECTID"]].append(row)

Using a comprehension is possible, but likely inefficient in this case, since you need to (a) check if a key is in the dictionary at every iteration, and (b) append to, rather than set the value. You can, however, eliminate some of the boilerplate using collections.defaultdict:
output = defaultdict(list)
for row in subblkDBF:
output[row['OBJECTID']].append(row)
The problem with using a comprehension is that if really want a one-liner, you have to nest a list comprehension that traverses the entire list multiple times (once for each key):
{k: [d for d in subblkDBF if d['OBJECTID'] == k] for k in set(d['OBJECTID'] for d in subblkDBF)}
Iterating over subblkDBF in both the inner and outer loop leads to O(n^2) complexity, which is pointless, especially given how illegible the result is.
As the other answer shows, these problems go away if you're willing to sort the list first, or better yet, if it is already sorted.

If rows are sorted by Object ID (or all rows with equal Object ID are at least next to each other, no matter the overall order of those IDs) you could write a neat dict comprehension using itertools.groupby:
from itertools import groupby
from operator import itemgetter
output = {k: list(g) for k, g in groupby(subblkDBF, key=itemgetter("OBJECTID"))}
However, if this is not the case, you'd have to sort by the same key first, making this a lot less neat, and less efficient than above or the loop (O(nlogn) instead of O(n)).
key = itemgetter("OBJECTID")
output = {k: list(g) for k, g in groupby(sorted(subblkDBF, key=key), key=key)}

You can adding an else block to safe on time n slightly improve perfomrance a little:
output = {}
subblkDBF : list[dict]
for row in subblkDBF:
if row["OBJECTID"] not in output:
output[row["OBJECTID"]] = [row]
else:
output[row["OBJECTID"]].append(row)

Are dict comprehensions evaluated incrementally in Python?

I'd have assumed the results of purge and purge2 would be the same in the following code (remove duplicate elements, keeping the first occurrences and their order):
def purge(a):
l = []
return (l := [x for x in a if x not in l])
def purge2(a):
d = {}
return list(d := {x: None for x in a if x not in d})
t = [2,5,3,7,2,6,2,5,2,1,7]
print(purge(t), purge2(t))
But it looks like with dict comprehensions, unlike with lists, the value of d is built incrementally. Is this what's actually happening? Do I correctly infer the semantics of dict comprehensions from this sample code and their difference from list comprehensions? Does it work only with comprehensions, or also with other right-hand sides referring to the dictionary being assigned to (e.g. comprehensions nested inside other expressions, something involving iterators, comprehensions of types other than dict)? Where is it specified and full semantics can be consulted? Or is it just an undocumented behaviour of the implementation, not to be relied upon?

There's nothing "incremental" going on here. The walrus operator doesn't assign to the variable until the dictionary comprehension completes. if x not in d is referring to the original empty dictionary, not the dictionary that you're building with the comprehension, just as the version with the list comprehension is referring to the original l.
The reason the duplicates are filtered out is simply because dictionary keys are always unique. Trying to create a duplicate key simply ignores the second one. It's the same as if you'd written:
return {2: None, 2: None}
you'll just get {2: None}.
So your function can be simplified to
def purge2(a):
return list({x: None for x in a})

python: flat list of dict values

I have a list of dicts like so:
a = [ {'list':[1,2,3]}, {'list':[1,4,5]} ]
Am trying to get a flat set of the values in the list key like {1,2,3,4,5}. What's the quickest way?

You can write a loop like:
result = set()
for row in a:
result.update(row['list'])
which I think will work reasonably fast.
Or you can simply use set comprehension and that will result in the following one-liner:
result = {x for row in a for x in row['list']}
In case not all elements contain a 'list' key, you can use .get(..) with an empty tuple (this will reduce construction time):
result = {x for row in a for x in row.get('list',())}

It is not clear what your definition of "quickest" is, but whether it is speed or number of lines I would use a combination of itertools and a generator.
>>> import itertools
>>> a = [ {'list':[1,2,3]}, {'list':[1,4,5]} ]
>>> b = set(itertools.chain.from_iterable(x['list'] for x in a if 'list' in x))
Note that I have added a guard against any elements that may not contain a 'list' key; you can omit that if you know this will always be true.

flat list can be made through reduce easily.
All you need to use initializer - third argument in the reduce function.
reduce(
lambda _set, _dict, key='list': _set.update(
_dict.get(key) or set()) or _set,
a,
set())
Above code works for both python2 and python3, but you need to import reduce module as from functools import reduce. Refer below link for details.
for python2
for python3

Python sorting a list of dictionaries with if statement

Given a list of dictionaries like this:
x = [
{'name':'a', 'student': 1 , 'age':19},
{'name':'b', 'student': 0 , 'age':10}
]
I want to sort it by age only if student is equal to 1. Can I somehow put that if in the following statement?
sortedlist = sorted(x, key=lambda k: k['age'])
Thanks,

If you use itemgetter + a generator, instead of a lambda + list comp, you get the best performance I have found so far. This was tested on a dicts list of 10k elements. Almost a 30% speed increase over list comp + lambda. Also, if you can safely assume 'student' is always a valid key and access it directly, you again gain more speed over having to use d.get('student', 0) == 1
from operator import itemgetter
sorted((d for d in x if d['student']==1), key=itemgetter('age'))
Note about lambda vs itemgetter: The reason itemgetter is faster (and I am mostly sure about this) is because the lookup is done on the C side of code. Whereas when you use a lambda you are doing it on the python side which is slower.

If you are just throwing out the values you can do something like this:
sorted([d for d in x if d.get('student', 0) == 1], key=itemgetter('age'))
The lambda function that you were using is a very common operation and can be replaced with itemgetter.

In the case where you want to throw out the students that aren't equal to one:
sortedlist = sorted([x for x in dicts if x['student']==1], key=lambda k:k['age'])

Comparing multiple dictionaries in Python

I'm new to Python and am running to a problem I can't google my way out of. I've built a GUI using wxPython and ObjectiveListView. In its very center, the GUI has a list control displaying data in X rows (the data is loaded by the user) and in five columns.
When the user selects multiple entries from the list control (pressing CTRL or shift while clicking), the ObjectiveListView module gives me a list of dictionaries, the dictionaries containing the data in the rows of the list control. This is exactly what I want, good!
The returned list looks something like this:
print MyList
[{'id':1023, 'type':'Purchase', 'date':'23.8.2008', 'sum':'-21,90', 'target':'Apple Store'}, {'id':1024, 'type':'Purchase', 'date':'24.8.2008', 'sum':'-21,90', 'target':'Apple Store'}, {'id':23, 'type':'Purchase', 'date':'2.8.2008', 'sum':'-21,90', 'target':'Apple Store'}]
All the dictionaries have the same keys, but the values change. The 'id' value is unique. Here the problems start. I want to get the common values for all the items the user selected. In the above list they would be 'sum':'-21,90' and 'target':'Apple Store'.
I don't know how to properly compare the dicts in the list. One big problem is that I don't know beforehand how many dicts the list contains, since it's decided by the user.
I have a vague idea that list comprehensions would be the way to go, but I only know how to compare two lists with list comprehensions, not n lists. Any help would be appreciated.

>>> mysets = (set(x.items()) for x in MyList)
>>> reduce(lambda a,b: a.intersection(b), mysets)
set([('sum', '-21,90'), ('type', 'Purchase'), ('target', 'Apple Store')])
First, I've created a generator that will convert the list of dicts into an iterable sequence of sets of key,value pairs. You could use a list comprehension here but this way doesn't convert your entire list into yet another list, useful if you don't know how big it will be.
Then I've used reduce to apply a function that finds the common values between each set. It finds the intersection of set 1 & set 2, which is itself a set, then the intersection of that set & set 3 etc. The mysets generator will happily feed each set on demand to the reduce function until its done.
I believe reduce has been deprecated as a built-in in Python 3.0, but should still be available in functools.
You could of course make it a one-liner by replacing mysets in the reduce with the generator expression, but that reduces the readability IMO. In practice I'd probably even go one step further and break the lambda out into its own line as well:
>>> mysets = (set(x.items()) for x in MyList)
>>> find_common = lambda a,b: a.intersection(b)
>>> reduce(find_common, mysets)
set([('sum', '-21,90'), ('type', 'Purchase'), ('target', 'Apple Store')])
And if you need the end result to be a dict, just wrap it like so:
>>> dict(reduce(find_common, mysets))
{'sum': '-21,90', 'type': 'Purchase', 'target': 'Apple Store'}
dict can accept any iterator of key,value pairs, such as the set of tuples returned at the end.

My answer is identical to Matthew Trevor's, except for one difference:
>>> mysets = (set(x.items()) for x in MyList)
>>> reduce(set.intersection, mysets)
set([('sum', '-21,90'), ('type', 'Purchase'), ('target', 'Apple Store')])
Here I use set.intersection instead of creating a new lambda. In my opinion this is more readable, as this intuitively reads as "reduce is reducing this list using the set intersection operator." This should also be much faster, as set.intersection is a built-in C function.
To fully answer your question, you can extract the values using a list comprehension:
>>> mysets = (set(x.items()) for x in MyList)
>>> result = reduce(set.intersection, mysets)
>>> values = [r[1] for r in result]
>>> values
['-21,90', 'Purchase', 'Apple Store']
This would end up on one line for me. but that's entirely up to you:
>>> [r[1] for r in reduce(set.intersection, (set(x.items()) for x in myList))]
['-21,90', 'Purchase', 'Apple Store']

First, we need a function to compute intersection of two dictionaries:
def IntersectDicts( d1, d2 ) :
return dict(filter(lambda (k,v) : k in d2 and d2[k] == v, d1.items()))
Then we can use it to process any number of dictionaries:
result = reduce(IntersectDicts, MyList)

Since you're only looking for the common set, you can compare the keys in the first dictionary to the keys in all other dictionaries:
common = {}
for k in MyList[0]:
for i in xrange(1,len(MyList)):
if MyList[0][k] != MyList[i][k]: continue
common[k] = MyList[0][k]
>>> common
{'sum': '-21,90', 'type': 'Purchase', 'target': 'Apple Store'}

Sorry, yes, 'type':'Purchase' is also one of the common values.Should have logged in to edit the question.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

python: union keys from multiple dictionary? - python

I think #chuck already answered the question why it doesn't work, but a simpler way to do this would be to remember that the union method can take multiple arguments: allkey = set().union(*alldict) does what you want without any loops or lambdas.

set().union(dict1.keys(),dict2.keys()...) I tried the list and it didnt work so just putting it up here for anyone.

If you only want to union keys of 2 dicts you could use operator |. Quote from docs: Return a new set with elements from the set and all others. Example: all_keys = (dict1.keys() | dict2.keys())

Related

Python dictionary comprehension to group together equal keys

Are dict comprehensions evaluated incrementally in Python?

python: flat list of dict values

Python sorting a list of dictionaries with if statement

Comparing multiple dictionaries in Python

Categories

Resources