Find only common key-value pairs of several dicts: dict intersection - python

I have 0 or more dicts in a list:
>>> dicts = [dict(a=3, b=89, d=2), dict(a=3, b=89, c=99), dict(a=3, b=42, c=33)]
I want to create a new dict that contains only keys that are in all the above dicts, and only if the values are all the same:
>>> dict_intersection(*dicts)
{"a": 3}
I feel that there should be an elegant way of writing dict_intersection, but I'm only coming up with inelegant and/or inefficient solutions myself.

>>> dict(set.intersection(*(set(d.iteritems()) for d in dicts)))
{'a': 3}
Note: This solution requires the dictionary values to be hashable, in addition to the keys.

Since the key/value pairs must already be in the first dict, you can iterate over this dict's items.
dict(pair for pair in dicts[0].items()
if all((pair in d.items() for d in dicts[1:])))
Looks less elegant than interjay's answer, but works without the restriction of hashable values.
Edit: Changed the all expression to a generator expression for speed improvement

How's this?
def intersect_two_dicts (d1, d2):
return { k:v for k,v in d1.iteritems() if ((k in d2)and(d1[k]==d2[k])) }
def intersect_dicts (list_of_dicts):
return reduce(intersect_two_dicts, list_of_dicts)
# Tests
dicts = [dict(a=3, b=89, d=2), dict(a=3, b=89, c=99), dict(a=3, b=42, c=33)]
print (intersect_two_dicts(dicts[0], dicts[1]))
print (intersect_dicts(dicts))
Edit(1): I'm not sure which of these is fastest. The set.intersection solutions are certainly most elegant (short one liners!) but I would be interested to see some benchmarking.
Edit(2): Bonus - get any dictionary entries whose (key:value) pairs are common to any two dictionaries:
{k:count for k,count in
collections.Counter(itertools.chain(*[d.iteritems() for d in dicts])).iteritems()
if count > 1}

>>> dicts
[{'a': 3, 'b': 89, 'd': 2}, {'a': 3, 'c': 99, 'b': 89}, {'a': 3, 'c': 33, 'b': 42}]
>>> sets = (set(d.iteritems()) for d in dicts)
>>> dict_intersection = dict(set.intersection(*sets))
>>> dict_intersection
{'a': 3}

A slightly more hands-dirty approach: Take the list of keys for each dictionary, sort each list, and then proceed as if you were merging them (keep an index for each list, advance the one w/ the lowest value). Whenever all of the indices point to the same key, check the values for equality; either way, advance all indices.

Related

How to merge two or more dict into one dict with retaining multiple values of same key as list?

I have two or more dictionary, I like to merge it as one with retaining multiple values of the same key as list. I would not able to share the original code, so please help me with the following example.
Input:
a= {'a':1, 'b': 2}
b= {'aa':4, 'b': 6}
c= {'aa':3, 'c': 8}
Output:
c= {'a':1,'aa':[3,4],'b': [2,6], 'c': 8}
I suggest you read up on the defaultdict: it lets you provide a factory method that initializes missing keys, i.e. if a key is looked up but not found, it creates a value by calling factory_method(missing_key). See this example, it might make things clearer:
from collections import defaultdict
a = {'a': 1, 'b': 2}
b = {'aa': 4, 'b': 6}
c = {'aa': 3, 'c': 8}
stuff = [a, b, c]
# our factory method is the list-constructor `list`,
# so whenever we look up a value that doesn't exist, a list is created;
# we can always be sure that we have list-values
store = defaultdict(list)
for s in stuff:
for k, v in s.items():
# since we know that our value is always a list, we can safely append
store[k].append(v)
print(store)
This has the "downside" of creating one-element lists for single occurences of values, but maybe you are able to work around that.
Please find below to resolve your issue. I hope this would work for you.
from collections import defaultdict
a = {'a':1, 'b': 2}
b = {'aa':4, 'b': 6}
c={'aa':3, 'c': 8}
dd = defaultdict(list)
for d in (a,b,c):
for key, value in d.items():
dd[key].append(value)
print(dd)
Use defaultdict to automatically create a dictionary entry with an empty list.
To process all source dictionaries in a single loop, use itertools.chain.
The main loop just adds a value from the current item, to the list under
the current key.
As you wrote, for cases when under some key there is only one item,
you have to generate a work dictionary (using dictonary comprehension),
limited to items with value (list) containing only one item.
The value of such item shoud contain only the first (and only) number
from the source list.
Then use this dictionary to update d.
So the whole script can be surprisingly short, as below:
from collections import defaultdict
from itertools import chain
a = {'a':1, 'b': 2}
b = {'aa':4, 'b': 6}
c = {'aa':3, 'c': 8}
d = defaultdict(list)
for k, v in chain(a.items(), b.items(), c.items()):
d[k].append(v)
d.update({ k: v[0] for k, v in d.items() if len(v) == 1 })
As you can see, the actual processing code is contained in only 4 (last) lines.
If you print d, the result is:
defaultdict(list, {'a': 1, 'b': [2, 6], 'aa': [4, 3], 'c': 8})

How to check if there are equal values associated to a key in Dictionary ,Python?

I explain better my problem:
I have a dictionary made by:
d={'name':(values), (values), values), 'name2':(values),(values), ...ecc}
so values are tuples.
I want to check if some tuples associated to a value are the same.
>>> d={'a':3 , 'b':5, 'c':1, 'a':3, 'b':5}
>>> d
{'a': 3, 'c': 1, 'b': 5}
>>>
Dictionaries can not have duplicate keys.
This is not a valid Python dictionary, you can't have duplicate keys to begin with...

Updating a dictionary

I have created three dictionaries-dict1, dict2, and dict2. I want to update dict1 with dict2 first, and resulting dictionary with dict3. I am not sure why they are not adding up.
def wordcount_directory(directory):
dict = {}
filelist=[os.path.join(directory,f) for f in os.listdir(directory)]
dicts=[wordcount_file(file) for file in filelist]
dict1=dicts[0]
dict2=dicts[1]
dict3=dicts[2]
for k,v in dict1.iteritems():
if k in dict2.keys():
dict1[k]+=1
else:
dict1[k]=v
for k1,v1 in dict1.iteritems():
if k1 in dict3.keys():
dict1[k1]+=1
else:
dict1[k1]=v1
return dict1
print wordcount_directory("C:\\Users\\Phil2040\\Desktop\\Word_count")
Maybe I am not understanding you question right, but are you trying to add all the values from each of the dictionaries together into one final dictionary? If so:
dict1 = {'a': 1, 'b': 2, 'c': 3}
dict2 = {'b': 5, 'c': 1, 'd': 9}
dict3 = {'d': 1, 'e': 7}
def add_dict(to_dict, from_dict):
for key, value in from_dict.iteritems():
to_dict[key] = to_dict.get(key, 0) + value
result = dict(dict1)
add_dict(result, dict2)
add_dict(result, dict3)
print result
This yields: {'a': 1, 'c': 4, 'b': 7, 'e': 7, 'd': 10}
It would be really helpful to post what the expected outcome should be for your question.
EDIT:
For an arbitrary amount of dictionaries:
result = dict(dicts[0])
for dict_sum in dicts[1:]:
add_dict(result, dict_sum)
print(result)
If you really want to fix the code from your original question in the format it is in:
You are using dict1[k]+=1 when you should be performing dict1[k]+=dict2.get(k, 0).
The introduction of get removes the need to check for its existence with an if statement.
You need to iterate though dict2 and dict3 to introduce new keys from them into dict1
(not really a problem, but worth mentioning) In the if statement to check if the key is in the dictionary, it is recommended to simply the operation to if k in dict2: (see this post for more details)
With the amazing built-in library found by #DisplacedAussie, the answer can be simplified even further:
from collections import Counter
print(Counter(dict1) + Counter(dict2) + Counter(dict3))
The result yields: Counter({'d': 10, 'b': 7, 'e': 7, 'c': 4, 'a': 1})
The Counter object is a sub-class of dict, so it can be used in the same way as a standard dict.
Hmmm, here a simple function that might help:
def dictsum(dict1, dict2):
'''Modify dict1 to accumulate new sums from dict2
'''
k1 = set(dict1.keys())
k2 = set(dict2.keys())
for i in k1 & k2:
dict1[i] += dict2[i]
for i in k2 - k1:
dict1[i] = dict2[i]
return None
... for the intersection update each by adding the second value to the existing one; then for the difference add those key/value pairs.
With that defined you'd simple call:
dictsum(dict1, dict2)
dictsum(dict1, dict3)
... and be happy.
(I will note that functions modify the contents of dictionaries in this fashion are not all that common. I'm returning None explicitly to follow the convention established by the list.sort() method ... functions which modify the contents of a container, in Python, do not normally return copies of the container).
If I understand your question correctly, you are iterating on the wrong dictionary. You want to iterate over dict2 and update dict1 with matching keys or add non-matching keys to dict1.
If so, here's how you need to update the for loops:
for k,v in dict2.iteritems(): # Iterate over dict2
if k in dict1.keys():
dict1[k]+=1 # Update dict1 for matching keys
else:
dict1[k]=v # Add non-matching keys to dict1
for k1,v1 in dict3.iteritems(): # Iterate over dict3
if k1 in dict1.keys():
dict1[k1]+=1 # Update dict1 for matching keys
else:
dict1[k1]=v1 # Add non-matching keys to dict1
I assume that wordcount_file(file) returns a dict of the words found in file, with each key being a word and the associated value being the count for that word. If so, your updating algorithm is wrong. You should do something like this:
keys1 = dict1.keys()
for k,v in dict2.iteritems():
if k in keys1:
dict1[k] += v
else:
dict1[k] = v
If there's a lot of data in these dicts you can make the key lookup faster by storing the keys in a set:
keys1 = set(dict1.keys())
You should probably put that code into a function, so you don't need to duplicate the code when you want to update dict1 with the data in dict3.
You should take a look at collections.Counter, a subclass of dict that supports counting; using Counters would simplify this task considerably. But if this is an assignment (or you're using Python 2.6 or older) you may not be able to use Counters.

Merge/join lists of dictionaries based on a common value in Python

I have two lists of dictionaries (returned as Django querysets). Each dictionary has an ID value. I'd like to merge the two into a single list of dictionaries, based on the ID value.
For example:
list_a = [{'user__name': u'Joe', 'user__id': 1},
{'user__name': u'Bob', 'user__id': 3}]
list_b = [{'hours_worked': 25, 'user__id': 3},
{'hours_worked': 40, 'user__id': 1}]
and I want a function to yield:
list_c = [{'user__name': u'Joe', 'user__id': 1, 'hours_worked': 40},
{'user__name': u'Bob', 'user__id': 3, 'hours_worked': 25}]
Additional points to note:
The IDs in the lists may not be in the same order (as with the example above).
The lists will probably have the same number of elements, but I want to account for the option if they're not but keeping all the values from list_a (essentially list_a OUTER JOIN list_b USING user__id).
I've tried doing this in SQL but it's not possible since some of the values are aggregates based on some exclusions.
It's safe to assume there will only be at most one dictionary with the same user__id in each list due to the database queries used.
Many thanks for your time.
I'd use itertools.groupby to group the elements:
lst = sorted(itertools.chain(list_a,list_b), key=lambda x:x['user__id'])
list_c = []
for k,v in itertools.groupby(lst, key=lambda x:x['user__id']):
d = {}
for dct in v:
d.update(dct)
list_c.append(d)
#could also do:
#list_c.append( dict(itertools.chain.from_iterable(dct.items() for dct in v)) )
#although that might be a little harder to read.
If you have an aversion to lambda functions, you can always use operator.itemgetter('user__id') instead. (it's probably slightly more efficient too)
To demystify lambda/itemgetter a little bit, Note that:
def foo(x):
return x['user__id']
is the same thing* as either of the following:
foo = operator.itemgetter('user__id')
foo = lambda x: x['user__id']
*There are a few differences, but they're not important for this problem
from collections import defaultdict
from itertools import chain
list_a = [{'user__name': u'Joe', 'user__id': 1},
{'user__name': u'Bob', 'user__id': 3}]
list_b = [{'hours_worked': 25, 'user__id': 3},
{'hours_worked': 40, 'user__id': 1}]
collector = defaultdict(dict)
for collectible in chain(list_a, list_b):
collector[collectible['user__id']].update(collectible.iteritems())
list_c = list(collector.itervalues())
As you can see, this just uses another dict to merge the existing dicts. The trick with defaultdict is that it takes out the drudgery of creating a dict for a new entry.
There is no need to group or sort these inputs. The dict takes care of all of that.
A truly bulletproof solution would catch the potential key error in case the input does not have a 'user__id' key, or use a default value to collect up all of the dicts without such a key.

Convert sets to frozensets as values of a dictionary

I have dictionary that is built as part of the initialization of my object. I know that it will not change during the lifetime of the object. The dictionary maps keys to sets. I want to convert all the values from sets to frozensets, to make sure they do not get changed. Currently I do that like this:
for key in self.my_dict.iterkeys():
self.my_dict[key] = frozenset(self.my_dict[key])
Is there a simpler way to achieve this? I cannot build frozenset right away, because I do not how much items will be in each set until i have built the complete dictionary.
Given, for instance,
>>> d = {'a': set([1, 2]), 'b': set([3, 4])}
>>> d
{'a': set([1, 2]), 'b': set([3, 4])}
You can do the conversion in place as
>>> d.update((k, frozenset(v)) for k, v in d.iteritems())
With the result
>>> d
{'a': frozenset([1, 2]), 'b': frozenset([3, 4])}
If you have to do it in-place, probably this is the simplest way (almost the same as you posted):
for key, value in self.my_dict.iteritems():
self.my_dict[key] = frozenset(value)
This a variant which builds a temporary dict:
self.my_dict = dict(((key, frozenset(value)) \
for key, value in self.my_dict.iteritems()))
In Python 3, you could use a dictionary comprehension:
d = {k: frozenset(v) for k, v in d.items()}
In Python 2, though, I don't know that there's anything shorter -- this at least feels less "redundant":
for k,v in d.iteritems():
d[k] = frozenset(v)

Categories

Resources