How do I merge dictionaries together, using weights? - python

d1 = {'weight':1, 'data': { 'apples': 8, 'oranges': 7 } }
d2 = {'weight':3, 'data': { 'apples': 4, 'bananas': 3 } }
all_dictionaries = [d1, d2, ... ]
def mergeDictionariesWithWeight(all_dictionaries)
How do I merge these dictionaries together (if overlap, multiple value with the weight)
The function would return:
{ 'apples': 4, 'oranges': 7, 'bananas': 3 }
Apples is 4 because 8 * .25 + 4 * .75
Edit: I just wrote one that takes the average, something like this. But of course it's really different from what I want to do, because I stick everything in a list and just divide by the length.
result = {}
keymap = {}
for the_dict in dlist:
for (k, v) in the_dict.items():
if not keymap.has_key(k):
keymap[k] = []
keymap[k].append(v)
for (k, v) in keymap.items():
average = sum(int(x) for x in keymap[k]) / float(len(keymap[k]))
result[k] = float(average)
return result

>>> from collections import defaultdict
>>> d=defaultdict(lambda:(0,0))
>>> for D in all_dictionaries:
... weight = D['weight']
... for k,v in D['data'].items():
... d[k]=d[k][0]+weight*v,d[k][1]+weight
...
>>> dict((k,v[0]/v[1]) for k,v in d.items())
{'apples': 5, 'oranges': 7, 'bananas': 3}
If you need floating point result
>>> dict((k,1.*v[0]/v[1]) for k,v in d.items())
{'apples': 5.0, 'oranges': 7.0, 'bananas': 3.0}
Notes about defaultdict
Often you see defaultdict(int) or defaultdict(list) maybe even defaultdict(set). The argument to defaultdict must be callable with no parameters. The result of calling this parameter is used whenever a key is found to be missing. ie - calling this returns the default value for the dictionary
for example
>>> d=defaultdict(int)
>>> d[1]
0
>>> d['foo']
0
This is often used for counting things up because int() returns 0. If you want the default value to be 1 instead of 0, it's more tricky because you can't pass a parameter to int, but all you need is a callable that returns 1. This can be accomplished without too much fuss by using a lambda function.
>>> d=defaultdict(lambda:1)
>>> d[1]
1
>>> d['foo']
1
In this answer, I want to keep track of the weighted total, and the total of the weights. I can do this by using a 2-tuple as the default value.
>>> d=defaultdict(lambda:(0,0))
>>> d[1]
(0, 0)
>>> d['foo']
(0, 0)

Here's a solution that first uses gathers the items into a list using a temporary dict, and then computes the final weighted dict. It can probably be done without a temporary, but this is easy to understand.
from collections import defaultdict
def mergeDictionariesWithWeight(dlist):
tmp = defaultdict(list)
for d in dlist:
weight = d['weight']
for k, v in d['data'].items():
tmp[k].append((weight, v))
r = {}
for k, v in tmp.items():
# If there's just one item, ignore the weight
if len(v) == 1:
r[k] = v[0][1]
else:
total_weight = sum((x[0] for x in v), 0.0)
r[k] = sum(x[1] * x[0]/total_weight for x in v)
return r
Returns: {'apples': 5.0, 'oranges': 7, 'bananas': 3} (because 8 * .25 + 4 * .75 = 5.0)

try this:
def mergeDictionariesWithWeight(all_dictionaries):
weightSum = 0
weightDictionary ={}
for dictionary in all_dictionaries:
weight = dictionary['weight']
data = dictionary['data']
#find the total weight of the elements in data
for (k,v) in data.items():
if k in weightDictionary:
weightDictionary[k] += weight*v
weightSum += weight
#normalize the results by deviding by the weight sum
for (key, value) in weightDictionary:
weightDictionary[key] = value / float(weightSum)
return weightDictionary
d1 = {'weight':1, 'data': { 'apples': 8, 'oranges': 7 } }
d2 = {'weight':3, 'data': { 'apples': 4, 'bananas': 3 } }
all_dictionaries = [d1, d2]
mergeDictionariesWithWeight(all_dictionaries)

from collections import defaultdict
def merge_dictionaries_with_weight(all_dictionaries):
totals = defaultdict(int)
result = defaultdict(int)
for each in all_dictionaries:
weight = float(each['weight'])
for key, value in each['data'].items():
totals[key] += weight
result[key] += weight * value
for key, total in totals.items():
result[key] /= total
return result

Algorithmically indistinguishable from gnibbler's, but somehow the generator expression pleases me.
>>> from collections import defaultdict
>>> weights, values = defaultdict(int), defaultdict(int)
>>> key_weight_value = ((key, d['weight'], value)
for d in all_dictionaries
for key, value in d['data'].iteritems())
>>> for k, w, v in key_weight_value:
... weights[k], values[k] = weights[k] + w, values[k] + w * v
...
>>> dict((k, values[k] * 1.0 / weights[k]) for k in weights)
{'apples': 5.0, 'oranges': 7.0, 'bananas': 3.0}

Related

How do I check to see if values in a dict are the exact same?

I currently have a dictionary d with key: string, and values is another dict.
In the d dictionary values, how can I check which key and values are ALL the same?
Example Dictionary:
zybook, zybooks, zybookz are keys. There can be more than three keys, but I only put two for now. And then the values of d are another dict with {file name : number}
d = {"zybook":
{
"noodle.json": 5,
"testing.json": 1,
"none.json": 5
},
"zybooks":
{
"noodle.json": 5,
"ok.json": 1
},
"zybookz":
{
"noodle.json": 5
}
}
Expected Output:
Because {"noodle.json": 5} {"noodle.json": 5} are both the same in zybook, zybooks, and zybookz the output will create another dictionary with all 3 matches.
{"noodle.json": 5}
My attempt:
I honestly don't know how to approach this.
d = {"zybook": { "noodle.json": 5, "testing.json": 1, "none.json": 5},
"zybooks": {"noodle.json": 5, "ok.json": 1},
"zybookz": {"noodle.json": 5}
}
for key, value in d.items():
for k, v in value.items():
if
from functools import reduce
sets = (set(val.items()) for val in d.values())
desired = dict(reduce(set.intersection, sets))
print(desired)
# {'noodle.json': 5}
We first form sets out of the file_name:num pairs of each dictionary. Then, reduce cumulatively looks each set and reduces them to the desired result by taking intersection of those sets. Lastly, converting to a dict as needed.
Try this:
from collections import Counter
res = {z[0]: z[1] for z, count in Counter([(k, v) for x in d for k, v in d[x].items()]).items() if count == len(d)}
With only the use of embedded Python methods
new = []
for v in d.values():
new+=list(v.items())
# [('noodle.json', 5), ('testing.json', 1), ('none.json', 5), ('noodle.json', 5), ('ok.json', 1)]
cnt_dict = {v:new.count(v) for v in new}
# {('noodle.json', 5): 3, ('testing.json', 1): 1, ('none.json', 5): 1, ('ok.json', 1): 1}
d2 = {k[0]:k[1] for k,v in cnt_dict.items() if v > 1}
print(d2)
# {'noodle.json': 5}

Python: Different types of dictionary key data

Given a dictionary like the one below:
dic = {1:10, 2:20, 3:30, 'A': 10, 'B': 20, 'C':30}
How can I calculate the mean values ​​of int keys and string keys separately?
I think (?) the OP wants the mean of the values of integer keyed items and also a separate mean of the string keyed items. Here is an alternate option:
dic = {1:10, 2:20, 3:30, 'A': 10, 'B': 20, 'C':30}
int_keyed_values = [v for k,v in dic.items() if type(k) is int]
str_keyed_values = [v for k,v in dic.items() if type(k) is str]
int_mean = sum(int_keyed_values)/len(int_keyed_values)
str_mean = sum(str_keyed_values)/len(str_keyed_values)
What does the values of string keys mean? if they are single character keys I assume you want to calculate the ASCI value of the keys. If that is the case here is code to do that
dic = {1:10, 2:20, 3:30, 'A': 10, 'B': 20, 'C':30
total_int = 0
total_str = 0
count_int = 0
count_str = 0
for keys,values in dic.items():
if type(keys) is int : #checking the key is int
count_int += 1
total_int += keys
elif type(keys) is str:
count_str += 1
total_str += ord(keys)
print total_int/count_int # will print int avg
print total_str/count_str # will print str avg
Want to introduce one not so common way:
from operator import itemgetter
from statistics import mean
from itertools import groupby
dic = {1:10, 2:20, 3:30, 'A': 10, 'B': 20, 'C':30}
[mean(itemgetter(*g)(dic)) for _, g in groupby(dic, key=lambda k: isinstance(k, int))]
# [20, 20]
or
{k: mean(itemgetter(*g)(dic)) for k, g in groupby(dic, key=lambda i: type(i))}
# {int: 20, str: 20}
possible overhead, but very suitable. And, the three helper are interesting :-)
You need to check if keys is int and calculate the mean :
dic = {1:10, 2:20, 3:30, 'A': 10, 'B': 20, 'C':30}
total_int = 0
count_str = 0
total_str = 0
count_int = 0
for keys,values in dic.items():
if type(keys) is int :
count_int = count_int + 1
total_int = total_int + values
print (values)
elif type(keys) is str :
count_str = count_str + 1
total_str = total_str + values
print (total_int/count_int)
print (total_str/count_str)

Number of different values assoicated with a key in a list of dicts

Given a list of dictionaries ( each of which have same keys), I want total number of different values with which a given key is associated
$ li = [{1:2,2:3},{1:2,2:4}] $ the expected output is {1:1,2:2}
I came up with the following piece of code...Is there a better way of doing this ?
counts = {}
values = {}
for i in li:
for key,item in i.items():
try:
if item in values[key]:
continue
except KeyError:
else:
try:
counts[key] += 1
except KeyError:
counts[key] = 1
try:
values[key].append(item)
except KeyError:
values[key] = [item]
Something like this is probably more direct:
from collections import defaultdict
counts = defaultdict(set)
for mydict in li:
for k, v in mydict.items():
counts[k].add(v)
That takes care of the collecting / counting of the values. To display them like you want them, this would get you there:
print dict((k, len(v)) for k, v in counts.items())
# prints {1: 1, 2: 2}
Here is yet another alternative:
from collections import defaultdict
counts = defaultdict(int)
for k, v in set(pair for d in li for pair in d.items()):
counts[k] += 1
And the result:
>>> counts
defaultdict(<type 'int'>, {1: 1, 2: 2})
You could so something like this:
li = [{1:2,2:3},{1:2,2:4}]
def makesets(x, y):
for k, v in x.iteritems():
v.add(y[k])
return x
distinctValues = reduce(makesets, li, dict((k, set()) for k in li[0].keys()))
counts = dict((k, len(v)) for k, v in distinctValues.iteritems())
print counts
When I run this it prints:
{1: 1, 2: 2}
which is the desired result.
counts = {}
values = {}
for i in li:
for key,item in i.items():
if not (key in values.keys()):
values[key] = set()
values[key].add(item)
for key in values.keys():
counts[key] = len(values[key])
using flattening list in case dicts are not alway same length:
li=[{1: 2, 2: 3}, {1: 2, 2: 4}, {1: 3}]
dic={}
for i,j in [item for sublist in li for item in sublist.items()]:
dic[i] = dic[i]+1 if i in dic else 1

What's the most pythonic way to merge 2 dictionaries, but make the values the average values?

d1 = { 'apples': 2, 'oranges':5 }
d2 = { 'apples': 1, 'bananas': 3 }
result_dict = { 'apples': 1.5, 'oranges': 5, 'bananas': 3 }
What's the best way to do this?
Here is one way:
result = dict(d2)
for k in d1:
if k in result:
result[k] = (result[k] + d1[k]) / 2.0
else:
result[k] = d1[k]
This would work for any number of dictionaries:
dicts = ({"a": 5},{"b": 2, "a": 10}, {"a": 15, "b": 4})
keys = set()
averaged = {}
for d in dicts:
keys.update(d.keys())
for key in keys:
values = [d[key] for d in dicts if key in d]
averaged[key] = float(sum(values)) / len(values)
print averaged
# {'a': 10.0, 'b': 3.0}
Update: #mhyfritz showed a way how you could reduce 3 lines to one!
dicts = ({"a": 5},{"b": 2, "a": 10}, {"a": 15, "b": 4})
averaged = {}
keys = set().union(*dicts)
for key in keys:
values = [d[key] for d in dicts if key in d]
averaged[key] = float(sum(values)) / len(values)
print averaged
Your question was for the most 'Pythonic' way.
I think for a problem like this, the Pythonic way is one that is very clear. There are many ways to implement the solution to this problem! If you really do have only 2 dicts then the solutions that assume this are great because they are much simpler (and easier to read and maintain as a result). However, it's often a good idea to have the general solution because it means you won't need to duplicate the bulk of the logic for other cases where you have 3 dictionaries, for example.
As an addendum, phant0m's answer is nice because it uses a lot of Python's features to make the solution readable. We see a list comprehension:
[d[key] for d in dicts if key in d]
Use of Python's very useful set type:
keys = set()
keys.update(d.keys())
And generally, good use of Python's type methods and globals:
d.keys()
keys.update( ... )
keys.update
len(values)
Thinking of and implementing an algorithm to solve this problem is one thing, but making it this elegant and readable by utilising the power of the language is what most people would deem 'Pythonic'.
(I would use phant0m's solution)
Yet another way:
result = dict(d1)
for (k,v) in d2.items():
result[k] = (result.get(k,v) + v) / 2.0
A Counter and some Generators are useful in this situation
General Case:
>>> d1 = { 'apples': 2, 'oranges':5 }
>>> d2 = { 'apples': 1, 'bananas': 3 }
>>> all_d=[d1,d2]
>>> from collections import Counter
>>> counts=Counter(sum((d.keys() for d in all_d),[]))
>>> counts
Counter({'apples': 2, 'oranges': 1, 'bananas': 1})
>>> s=lambda k: sum((d.get(k,0) for d in all_d))
>>> result_set=dict(((k,1.0*s(k)/counts[k]) for k in counts.keys()))
>>> result_set
{'apples': 1.5, 'oranges': 5.0, 'bananas': 3.0}
d1 = { 'apples': 2, 'oranges':5 }
d2 = { 'apples': 1, 'bananas': 3, 'oranges':0 }
dicts = [d1, d2]
result_dict = {}
for dict in dicts:
for key, value in dict.iteritems():
if key in result_dict:
result_dict[key].append(value)
else:
result_dict[key] = [value]
for key, values in result_dict.iteritems():
result_dict[key] = float(sum(result_dict[key])) / len(result_dict[key])
print result_dict

Comparing dictionaries in Python

Given two dictionaries, d1 and d2, and an integer l, I want to find all keys k in d1 such that either d2[k]<l or k not in l. I want to output the keys and the corresponding values in d2, except if d2 does not contain the key, I want to print 0. For instance, if d1 is
a: 1
b: 1
c: 1
d: 1
and d2 is
a: 90
b: 89
x: 45
d: 90
and l is 90, the output would be (possibly in a different order)
b 89
c 0
What is the best way to do this in Python? I am just starting to learn the language, and so far this is what I have:
for k in d1.keys():
if k not in d2:
print k, 0
else:
if d2[k]<l:
print k, d2[k]
This works of course (unless I have a typo), but it seems to me that there would be a more pythonic way of doing it.
Yours is actually fine -- you could simplify it to
for k in d1:
if d2.get(k, 0) < l:
print k, d2.get(k, 0)
which is (to me) pythonic, and is pretty much a direct "translation" into code of your description.
If you want to avoid the double lookup, you could do
for k in d1:
val = d2.get(k, 0)
if val < l:
print k, val
You can simplify this by using a defaultdict. Calling __getitem__ on a defaultdict will return the "default" value.
from collections import defaultdict
d = defaultdict(int)
print d['this key does not exist'] # will print 0
Another bit that you could change is not to call keys. The dictionary implements iter. It would be preferable to simply write:
for k in d1:
Here is a compact version, but yours is perfectly OK:
from collections import defaultdict
d1 = {'a': 1, 'b': 1, 'c': 1, 'd': 1}
d2 = {'a': 90, 'b': 89, 'x': 45, 'd': 90}
l = 90
# The default (==0) is a substitute for the condition "not in d2"
# As daniel suggested, it would be better if d2 itself was a defaultdict
d3 = defaultdict(int, d2)
print [ (k, d3[k]) for k in d1 if d3[k] < l ]
Output:
[('c', 0), ('b', 89)]
Yours is good enough but here's one that is a little simpler:
for k in d1:
val = d2.get(k, 0)
if val < l:
print k, val

Categories

Resources