Aggregate dictionary values with different function - python

I have a dictionary which basically it's look like this:
dict = {'A': [1,5,6,7],
'B':[1,8,8]}
I want to grp by keys and aggregate values with different function. i.e mean or standard deviation
Mean:
result = {'A':4.75, 'B': 5.6}
etc
Thanks

Using a dictionary comprehension and functions from statistics:
from statistics import mean, stdev
d = {'A': [1,5,6,7], 'B':[1,8,8]}
d_mean = {k:round(mean(v), 2) for k,v in d.items()}
# {'A': 4.75, 'B': 5.67}
d_std = {k:round(stdev(v), 2) for k,v in d.items()}
# {'A': 2.63, 'B': 4.04}

Related

How can I merge multiple dictionaries and add the values of the same key? (Python) [duplicate]

This question already has answers here:
Sum corresponding elements of multiple python dictionaries
(4 answers)
Closed 2 years ago.
Suppose I have the following dictionaries:
dict1 = {'a': 10, 'b': 8, 'c':3}
dict2 = {'c': 4}
dict3 = {'e':9, 'a':3}
I'm trying to merge them in a way such that the new (combinational) dictionary contains all the keys and all the values of the same key are added together. For instance, in this case, my desired output looks like:
dict = {'a': 13, 'b': 8, 'c':7, 'e':9}
It looks like the update() method doesn't work since some values are overwritten. I also tried ChainMaps and encountered the same issue. How can I merge multiple dictionaries and add the values of the same key?Thanks a lot:)
Here's a dictionary comprehension to achieve this using itertools.chain.from_iterable() and sum(). Here, I am creating a set of keys from all the three dicts. Then I am iteration over this set inside dictionary comprehension to get the sum of values per key.
>>> from itertools import chain
>>> dict1 = {'a': 10, 'b': 8, 'c':3}
>>> dict2 = {'c': 4}
>>> dict3 = {'e':9, 'a':3}
>>> my_dicts = dict1, dict2, dict3
>>> {k: sum(dd.get(k, 0) for dd in my_dicts) for k in set(chain.from_iterable(d.keys() for d in my_dicts))}
{'a': 13, 'e': 9, 'b': 8, 'c': 7}
This code below should do the trick:
dict1 = {'a': 10, 'b': 8, 'c':3}
dict2 = {'c': 4}
dict3 = {'e':9, 'a':3}
multiple_dict = [dict1, dict2, dict3]
final_dict = {}
for dict in multiple_dict:
for key, value in dict.items():
if key in final_dict:
final_dict[key] += value
else:
final_dict[key] = value
print(final_dict)

How to merge two or more dict into one dict with retaining multiple values of same key as list?

I have two or more dictionary, I like to merge it as one with retaining multiple values of the same key as list. I would not able to share the original code, so please help me with the following example.
Input:
a= {'a':1, 'b': 2}
b= {'aa':4, 'b': 6}
c= {'aa':3, 'c': 8}
Output:
c= {'a':1,'aa':[3,4],'b': [2,6], 'c': 8}
I suggest you read up on the defaultdict: it lets you provide a factory method that initializes missing keys, i.e. if a key is looked up but not found, it creates a value by calling factory_method(missing_key). See this example, it might make things clearer:
from collections import defaultdict
a = {'a': 1, 'b': 2}
b = {'aa': 4, 'b': 6}
c = {'aa': 3, 'c': 8}
stuff = [a, b, c]
# our factory method is the list-constructor `list`,
# so whenever we look up a value that doesn't exist, a list is created;
# we can always be sure that we have list-values
store = defaultdict(list)
for s in stuff:
for k, v in s.items():
# since we know that our value is always a list, we can safely append
store[k].append(v)
print(store)
This has the "downside" of creating one-element lists for single occurences of values, but maybe you are able to work around that.
Please find below to resolve your issue. I hope this would work for you.
from collections import defaultdict
a = {'a':1, 'b': 2}
b = {'aa':4, 'b': 6}
c={'aa':3, 'c': 8}
dd = defaultdict(list)
for d in (a,b,c):
for key, value in d.items():
dd[key].append(value)
print(dd)
Use defaultdict to automatically create a dictionary entry with an empty list.
To process all source dictionaries in a single loop, use itertools.chain.
The main loop just adds a value from the current item, to the list under
the current key.
As you wrote, for cases when under some key there is only one item,
you have to generate a work dictionary (using dictonary comprehension),
limited to items with value (list) containing only one item.
The value of such item shoud contain only the first (and only) number
from the source list.
Then use this dictionary to update d.
So the whole script can be surprisingly short, as below:
from collections import defaultdict
from itertools import chain
a = {'a':1, 'b': 2}
b = {'aa':4, 'b': 6}
c = {'aa':3, 'c': 8}
d = defaultdict(list)
for k, v in chain(a.items(), b.items(), c.items()):
d[k].append(v)
d.update({ k: v[0] for k, v in d.items() if len(v) == 1 })
As you can see, the actual processing code is contained in only 4 (last) lines.
If you print d, the result is:
defaultdict(list, {'a': 1, 'b': [2, 6], 'aa': [4, 3], 'c': 8})

Python convert dict of lists to dict of sets?

I have:
myDict = {'a': [1,2,3], 'b':[4,5,6], 'c':[7,8,9]}
I want:
myDict = {'a': set([1,2,3]), 'b':set([4,5,6]), 'c':set([7,8,9])}
Is there a one-liner I can use to do this rather than looping through it and converting the type of the values?
You'll have to loop anyway:
{key: set(value) for key, value in yourData.items()}
If you're using Python 3.6+, you can also do this:
dict(zip(myDict.keys(), map(set, myDict.values())))
This can be done with map by mapping the values to type set
myDict = dict(map(lambda x: (x[0], set(x[1])), myDict.items()))
Or with either version of dictionary comprehension as well
myDict = {k: set(v) for k, v in myDict.items()}
myDict = {k: set(myDict[k]) for k in myDict}
You can use comprehension for it:
Basically, loop through the key-value pairs and create set out of each value for the corresponding key.
>>> myDict = {'a': [1,2,3], 'b':[4,5,6], 'c':[7,8,9]}
>>> myDict = {k: set(v) for k, v in myDict.items()}
>>> myDict
{'a': {1, 2, 3}, 'b': {4, 5, 6}, 'c': {8, 9, 7}}
You can't do it without looping anyway, but you can have the looping done in one line, with the following code:
myDict = {k:set(v) for k, v in myDict.items()}
This is basically traversing each item in your dictionary and converting the lists to sets and combining the key(str):value(set) pairs to a new dictionary and assigning it back to myDict variable.

Python - Find average in dict elements

I have dict like:
dict = [{'a':2, 'b':3}, {'b':4}, {'a':1, 'c':5}]
I need to get average of all different keys. Result should looks like:
avg = [{'a':1.5, 'b':3.5, 'c':5}]
I can get summary of all keys, but Im failing to realize how can I count same keys in order to get average number.
This can be easily done with pandas:
>>> import pandas
>>> df = pandas.DataFrame([{'a':2, 'b':3}, {'b':4}, {'a':1, 'c':5}])
>>> df.mean()
a 1.5
b 3.5
c 5.0
dtype: float64
If you need a dictionary as result:
>>> dict(df.mean())
{'a': 1.5, 'b': 3.5, 'c': 5.0}
You could create an intermediate dictionary that collects all encountered values as lists:
dct = [{'a':2, 'b':3}, {'b':4}, {'a':1, 'c':5}]
from collections import defaultdict
intermediate = defaultdict(list)
for subdict in dct:
for key, value in subdict.items():
intermediate[key].append(value)
# intermediate is now: defaultdict(list, {'a': [2, 1], 'b': [3, 4], 'c': [5]})
And finally calculate the average by dividing the sum of each list by the length of each list:
for key, value in intermediate.items():
print(key, sum(value)/len(value))
which prints:
b 3.5
c 5.0
a 1.5
You can use a for loop with a counter and then divide the sum of each by the counter.
Also it is weird you are calling the array/list a dict...
I'd suggest something like this:
Create a new dict:
letter_count = {}
-For loop over the current dicts
-Add the letter to the letter count if it doesn't exist
-If it does exist, update the value with the value of the item (+=number) as well as update the counter by one
-Once the for loop is done, divide each value by the counter
-Return the new dict letter_count
I thought of adding a unique answer using PyFunctional
from functional import seq
l = [{'a':2, 'b':3}, {'b':4}, {'a':1, 'c':5}]
a = (seq(l)
# convert dictionary to list
.map(lambda d: seq(d).map(lambda k: (k, d[k])))
.flatten()
# append 1 for counter
.map(lambda (k, v): (k, (v, 1)))
# sum of values, and counts
.reduce_by_key(lambda a, b: (a[0]+b[0], a[1]+b[1]))
# average
.map(lambda (k, (v, c)): (k, float(v)/c))
# convert to dict
.to_dict()
)
print(a)
Output
{'a': 1.5, 'c': 5.0, 'b': 3.5}

How can I find dict keys for matching values in two dicts?

I have two dictionaries mapping IDs to values. For simplicity, lets say those are the dictionaries:
d_source = {'a': 1, 'b': 2, 'c': 3, '3': 3}
d_target = {'A': 1, 'B': 2, 'C': 3, '1': 1}
As named, the dictionaries are not symmetrical.
I would like to get a dictionary of keys from dictionaries d_source and d_target whose values match. The resulting dictionary would have d_source keys as its own keys, and d_target keys as that keys value (in either a list, tuple or set format).
This would be The expected returned value for the above example should be the following list:
{'a': ('1', 'A'),
'b': ('B',),
'c': ('C',),
'3': ('C',)}
There are two somewhat similar questions, but those solutions can't be easily applied to my question.
Some characteristics of the data:
Source would usually be smaller than target. Having roughly few thousand sources (tops) and a magnitude more targets.
Duplicates in the same dict (both d_source and d_target) are not too likely on values.
matches are expected to be found for (a rough estimate) not more than 50% than d_source items.
All keys are integers.
What is the best (performance wise) solution to this problem?
Modeling data into other datatypes for improved performance is totally ok, even when using third party libraries (i'm thinking numpy)
All answers have O(n^2) efficiency which isn't very good so I thought of answering myself.
I use 2(source_len) + 2(dict_count)(dict_len) memory and I have O(2n) efficiency which is the best you can get here I believe.
Here you go:
from collections import defaultdict
d_source = {'a': 1, 'b': 2, 'c': 3, '3': 3}
d_target = {'A': 1, 'B': 2, 'C': 3, '1': 1}
def merge_dicts(source_dict, *rest):
flipped_rest = defaultdict(list)
for d in rest:
while d:
k, v = d.popitem()
flipped_rest[v].append(k)
return {k: tuple(flipped_rest.get(v, ())) for k, v in source_dict.items()}
new_dict = merge_dicts(d_source, d_target)
By the way, I'm using a tuple in order not to link the resulting lists together.
As you've added specifications for the data, here's a closer matching solution:
d_source = {'a': 1, 'b': 2, 'c': 3, '3': 3}
d_target = {'A': 1, 'B': 2, 'C': 3, '1': 1}
def second_merge_dicts(source_dict, *rest):
"""Optimized for ~50% source match due to if statement addition.
Also uses less memory.
"""
unique_values = set(source_dict.values())
flipped_rest = defaultdict(list)
for d in rest:
while d:
k, v = d.popitem()
if v in unique_values:
flipped_rest[v].append(k)
return {k: tuple(flipped_rest.get(v, ())) for k, v in source_dict.items()}
new_dict = second_merge_dicts(d_source, d_target)
from collections import defaultdict
from pprint import pprint
d_source = {'a': 1, 'b': 2, 'c': 3, '3': 3}
d_target = {'A': 1, 'B': 2, 'C': 3, '1': 1}
d_result = defaultdict(list)
{d_result[a].append(b) for a in d_source for b in d_target if d_source[a] == d_target[b]}
pprint(d_result)
Output:
{'3': ['C'],
'a': ['A', '1'],
'b': ['B'],
'c': ['C']}
Timing results:
from collections import defaultdict
from copy import deepcopy
from random import randint
from timeit import timeit
def Craig_match(source, target):
result = defaultdict(list)
{result[a].append(b) for a in source for b in target if source[a] == target[b]}
return result
def Bharel_match(source_dict, *rest):
flipped_rest = defaultdict(list)
for d in rest:
while d:
k, v = d.popitem()
flipped_rest[v].append(k)
return {k: tuple(flipped_rest.get(v, ())) for k, v in source_dict.items()}
def modified_Bharel_match(source_dict, *rest):
"""Optimized for ~50% source match due to if statement addition.
Also uses less memory.
"""
unique_values = set(source_dict.values())
flipped_rest = defaultdict(list)
for d in rest:
while d:
k, v = d.popitem()
if v in unique_values:
flipped_rest[v].append(k)
return {k: tuple(flipped_rest.get(v, ())) for k, v in source_dict.items()}
# generate source, target such that:
# a) ~10% duplicate values in source and target
# b) 2000 unique source keys, 20000 unique target keys
# c) a little less than 50% matches source value to target value
# d) numeric keys and values
source = {}
for k in range(2000):
source[k] = randint(0, 1800)
target = {}
for k in range(20000):
if k < 1000:
target[k] = randint(0, 2000)
else:
target[k] = randint(2000, 19000)
best_time = {}
approaches = ('Craig', 'Bharel', 'modified_Bharel')
for a in approaches:
best_time[a] = None
for _ in range(3):
for approach in approaches:
test_source = deepcopy(source)
test_target = deepcopy(target)
statement = 'd=' + approach + '_match(test_source,test_target)'
setup = 'from __main__ import test_source, test_target, ' + approach + '_match'
t = timeit(stmt=statement, setup=setup, number=1)
if not best_time[approach] or (t < best_time[approach]):
best_time[approach] = t
for approach in approaches:
print(approach, ':', '%0.5f' % best_time[approach])
Output:
Craig : 7.29259
Bharel : 0.01587
modified_Bharel : 0.00682
Here is another solution. There are a lot of ways to do this
for key1 in d1:
for key2 in d2:
if d1[key1] == d2[key2]:
stuff
Note that you can use any name for key1 and key2.
This maybe "cheating" in some regards, although if you are looking for the matching values of the keys regardless of the case sensitivity then you might be able to do:
import sets
aa = {'a': 1, 'b': 2, 'c':3}
bb = {'A': 1, 'B': 2, 'd': 3}
bbl = {k.lower():v for k,v in bb.items()}
result = {k:k.upper() for k,v in aa.iteritems() & bbl.viewitems()}
print( result )
Output:
{'a': 'A', 'b': 'B'}
The bbl declaration changes the bb keys into lowercase (it could be either aa, or bb).
* I only tested this on my phone, so just throwing this idea out there I suppose... Also, you've changed your question radically since I began composing my answer, so you get what you get.
It is up to you to determine the best solution. Here is a solution:
def dicts_to_tuples(*dicts):
result = {}
for d in dicts:
for k,v in d.items():
result.setdefault(v, []).append(k)
return [tuple(v) for v in result.values() if len(v) > 1]
d1 = {'a': 1, 'b': 2, 'c':3}
d2 = {'A': 1, 'B': 2}
print dicts_to_tuples(d1, d2)

Categories

Resources