Sum values of similar keys inside two nested dictionary in python - python

I have nested dictionary like this:
data = {
"2010":{
'A':2,
'B':3,
'C':5,
'D':-18,
},
"2011":{
'A':1,
'B':2,
'C':3,
'D':1,
},
"2012":{
'A':1,
'B':2,
'C':4,
'D':2
}
}
In my case, i need to sum all values based on its similar keys in every year, from 2010 till 2012..
So the result i expected should be like this:
data = {'A':4,'B':7, 'C':12, 'D':-15}

You can use collections.Counter() (works only for positive values!):
In [17]: from collections import Counter
In [18]: sum((Counter(d) for d in data.values()), Counter())
Out[18]: Counter({'C': 12, 'B': 7, 'A': 4, 'D': 3})
Note that based on python documentation Counter is designed only for use cases with positive values:
The multiset methods are designed only for use cases with positive values. The inputs may be negative or zero, but only outputs with positive values are created. There are no type restrictions, but the value type needs to support addition, subtraction, and comparison.
The elements() method requires integer counts. It ignores zero and negative counts.
So if you want to get a comprehensive result you can do the summation manually. The collections.defaultdict() is a good way for getting around this problem:
In [28]: from collections import defaultdict
In [29]: d = defaultdict(int)
In [30]: for sub in data.values():
....: for i, j in sub.items():
....: d[i] += j
....:
In [31]: d
Out[31]: defaultdict(<class 'int'>, {'D': -15, 'A': 4, 'C': 12, 'B': 7})

Try this,
reduce(lambda x, y: dict((k, v + y[k]) for k, v in x.iteritems()), data.values())
Result
{'A': 4, 'B': 7, 'C': 12, 'D': -15}

Related

Combine two dicts and replace missing values [duplicate]

This question already has answers here:
How to merge dicts, collecting values from matching keys?
(17 answers)
Closed 6 days ago.
I am looking to combine two dictionaries by grouping elements that share common keys, but I would also like to account for keys that are not shared between the two dictionaries. For instance given the following two dictionaries.
d1 = {'a':1, 'b':2, 'c': 3, 'e':5}
d2 = {'a':11, 'b':22, 'c': 33, 'd':44}
The intended code would output
df = {'a':[1,11] ,'b':[2,22] ,'c':[3,33] ,'d':[0,44] ,'e':[5,0]}
Or some array like:
df = [[a,1,11] , [b,2,22] , [c,3,33] , [d,0,44] , [e,5,0]]
The fact that I used 0 specifically to denote an entry not existing is not important per se. Just any character to denote the missing value.
I have tried using the following code
df = defaultdict(list)
for d in (d1, d2):
for key, value in d.items():
df[key].append(value)
But get the following result:
df = {'a':[1,11] ,'b':[2,22] ,'c':[3,33] ,'d':[44] ,'e':[5]}
Which does not tell me which dict was missing the entry.
I could go back and look through both of them, but was looking for a more elegant solution
You can use a dict comprehension like so:
d1 = {'a':1, 'b':2, 'c': 3, 'e':5}
d2 = {'a':11, 'b':22, 'c': 33, 'd':44}
res = {k: [d1.get(k, 0), d2.get(k, 0)] for k in set(d1).union(d2)}
print(res)
Another solution:
d1 = {"a": 1, "b": 2, "c": 3, "e": 5}
d2 = {"a": 11, "b": 22, "c": 33, "d": 44}
df = [[k, d1.get(k, 0), d2.get(k, 0)] for k in sorted(d1.keys() | d2.keys())]
print(df)
Prints:
[['a', 1, 11], ['b', 2, 22], ['c', 3, 33], ['d', 0, 44], ['e', 5, 0]]
If you do not want sorted results, leave the sorted() out.

Prefer a key by max-value in dictionary?

You can get the key with max value in dictionary this way max(d, key=d.get).
The question when two or more keys have the max how can you set a preferred key.
I found a way to do this by perpending the key with a number.
Is there a better way ?
In [56]: d = {'1a' : 5, '2b' : 1, '3c' : 5 }
In [57]: max(d, key=d.get)
Out[57]: '1a'
In [58]: d = {'4a' : 5, '2b' : 1, '3c' : 5 }
In [59]: max(d, key=d.get)
Out[59]: '3c'
The function given in the key argument can return a tuple. The second element of the tuple will be used if there are several maximums for the first element. With that, you can use the method you want, for example with two dictionnaries:
d = {'a' : 5, 'b' : 1, 'c' : 5 }
d_preference = {'a': 1, 'b': 2, 'c': 3}
max(d, key=lambda key: (d[key], d_preference[key]))
# >> 'c'
d_preference = {'a': 3, 'b': 2, 'c': 1}
max(d, key=lambda key: (d[key], d_preference[key]))
# >> 'a'
This is a similar idea to #AxelPuig's solution. But, instead of relying on an auxiliary dictionary each time you wish to retrieve an item with max or min value, you can perform a single sort and utilise collections.OrderedDict:
from collections import OrderedDict
d = {'a' : 5, 'b' : 1, 'c' : 5 }
d_preference1 = {'a': 1, 'b': 2, 'c': 3}
d_preference2 = {'a': 3, 'b': 2, 'c': 1}
d1 = OrderedDict(sorted(d.items(), key=lambda x: -d_preference1[x[0]]))
d2 = OrderedDict(sorted(d.items(), key=lambda x: -d_preference2[x[0]]))
max(d1, key=d.get) # c
max(d2, key=d.get) # a
Since OrderedDict is a subclass of dict, there's generally no need to convert to a regular dict. If you are using Python 3.7+, you can use the regular dict constructor, since dictionaries are insertion ordered.
As noted on the docs for max:
If multiple items are maximal, the function returns the first one
encountered.
A slight variation on #AxelPuig's answer. You fix an order of keys in a priorities list and take the max with key=d.get.
d = {"1a": 5, "2b": 1, "3c": 5}
priorities = list(d.keys())
print(max(priorities, key=d.get))

How to find sum of dictionaries in a pandas DataFrame across all rows?

I have a DataFrame
df = pd.DataFrame({'keywords': [{'a': 3, 'b': 4, 'c': 5}, {'c':1, 'd':2}, {'a':5, 'c':21, 'd':4}, {'b':2, 'c':1, 'g':1, 'h':1, 'i':1}]})
I want to add all the elements across all rows that would give the result without using iterrows:
a: 8
b: 6
c: 28
d: 6
g: 1
h: 1
i: 1
note: no element occurs twice in a single row in the original DataFrame.
Using collections.Counter, you can sum an iterable of Counter objects. Since Counter is a subclass of dict, you can then feed to pd.DataFrame.from_dict.
from collections import Counter
counts = sum(map(Counter, df['keywords']), Counter())
res = pd.DataFrame.from_dict(counts, orient='index')
print(res)
0
a 8
b 6
c 28
d 6
g 1
h 1
i 1
Not sure how this compares in terms of optimization with #jpp's answer, but I'll give it a shot.
# What we're starting out with
df = pd.DataFrame({'keywords': [{'a': 3, 'b': 4, 'c': 5}, {'c':1, 'd':2}, {'a':5, 'c':21, 'd':4}, {'b':2, 'c':1, 'g':1, 'h':1, 'i':1}]})
# Turns the array of dictionaries into a DataFrame
values_df = pd.DataFrame(df["keywords"].values.tolist())
# Sums up the individual keys
sums = {key:values_df[key].sum() for key in values_df.columns}

Python: Sum values in a dictionary based on condition

I have a dictionary that has Key:Values.
The values are integers. I would like to get a sum of the values based on a condition...say all values > 0 (i.e).
I've tried few variations, but nothing seems to work unfortunately.
Try using the values method on the dictionary (which returns a generator in Python 3.x), iterating through each value and summing if it is greater than 0 (or whatever your condition is):
In [1]: d = {'one': 1, 'two': 2, 'twenty': 20, 'negative 4': -4}
In [2]: sum(v for v in d.values() if v > 0)
Out[2]: 23
>>> a = {'a' : 5, 'b': 8}
>>> sum(value for _, value in a.items() if value > 0)

How to count each group of values in a dictionary in Python 3?

I have a dictionary with multiple values under multiple keys. I do NOT want a single sum of the values. I want to find a way to find the sum for each key.
The file is tab delimited, with an identifier being a combination of two of these items, Btarg. There are multiple values for each of these identifiers.
Here is a test file:
Here is a test file with the desired result below:
Pattern Item Abundance
1 Ant 2
2 Dog 10
3 Giraffe 15
1 Ant 4
2 Dog 5
Here is the expected results:
Pattern1Ant, 6
Pattern2Dog, 15
Pattern3Giraffe, 15
This is what I have so far:
for line in K:
if "pattern" in line:
find = line
Bsplit = find.split("\t")
Buid = Bsplit[0]
Borg = Bsplit[1]
Bnum = (Bsplit[2])
Btarg = Buid[:-1] + "//" + Borg
if Btarg not in dict1:
dict1[Btarg] = []
dict1[Btarg].append(Bnum)
#The following used to work
#for key in dict1.iterkeys():
#dict1[key] = sum(dict1[key])
#print (dict1)
How do I make this work in Python 3 without the error message "Unsupported operand type(s) for +: 'int' and 'list'?
Thanks in advance!
Use from collections import Counter
From the documentation:
c = Counter('gallahad')
Counter({'a': 3, 'l': 2, 'h': 1, 'g': 1, 'd': 1})
Responding to your comment, now I think I know what you want, although I don't know what structure you have your data in. I will take for granted that you can organize your data like this:
In [41]: d
Out[41]: [{'Ant': 2}, {'Dog': 10}, {'Giraffe': 15}, {'Ant': 4}, {'Dog': 5}]
First create a defaultdict
from collections import defaultdict
a = defaultdict(int)
Then start couting:
In [42]: for each in d:
a[each.keys()[0]] += each.values()[0]
Result:
In [43]: a
Out[43]: defaultdict(<type 'int'>, {'Ant': 6, 'Giraffe': 15, 'Dog': 15})
UPDATE 2
Supposing you can get your data in this format:
In [20]: d
Out[20]: [{'Ant': [2, 4]}, {'Dog': [10, 5]}, {'Giraffe': [15]}]
In [21]: from collections import defaultdict
In [22]: a = defaultdict(int)
In [23]: for each in d:
a[each.keys()[0]] =sum(each.values()[0])
....:
In [24]: a
Out[24]: defaultdict(<type 'int'>, {'Ant': 6, 'Giraffe': 15, 'Dog': 15})

Categories

Resources