Python: Sum values in a dictionary based on condition - python

I have a dictionary that has Key:Values.
The values are integers. I would like to get a sum of the values based on a condition...say all values > 0 (i.e).
I've tried few variations, but nothing seems to work unfortunately.

Try using the values method on the dictionary (which returns a generator in Python 3.x), iterating through each value and summing if it is greater than 0 (or whatever your condition is):
In [1]: d = {'one': 1, 'two': 2, 'twenty': 20, 'negative 4': -4}
In [2]: sum(v for v in d.values() if v > 0)
Out[2]: 23

>>> a = {'a' : 5, 'b': 8}
>>> sum(value for _, value in a.items() if value > 0)

Related

How to returns a list of all values corresponding to keys greater than x in the dictionary [duplicate]

This question already has answers here:
Iterating over dictionaries using 'for' loops
(15 answers)
How to filter a dictionary according to an arbitrary condition function?
(7 answers)
Closed 12 months ago.
I need to use for For loop to find the return the list of values in a dictionary greater than x.
d= {}
for key in d():
if key > x:
return(d(key))
d = dict(a=1, b=10, c=30, d=2)
>>> d
{'a': 1, 'c': 30, 'b': 10, 'd': 2}
d = dict((k, v) for k, v in d.items() if v >= 10)
>>> d
{'c': 30, 'b': 10}
values_list = list(d.values())
>>> values_list
[30, 10]
We hold greater_than_x list, and append the values in d dictionary if it's bigger than the given x.
x = 20
greater_than_x = []
d = {"a": 10, "b": 20, "c": 30}
for value in d.values():
if value > x:
greater_than_x.append(value)
print(greater_than_x)
>[30]
One-liner applying the same logic:
x = 20
d = {"a": 10, "b": 20, "c": 30}
greater_than_x = [value for value in d.values() if value > x]
print(greater_than_x)
>[30]

Separating values into dictionaries and counting them in Python

I’m trying to take a list of numbers, separate the values into dictionaries based on the int and float types, and then count the number of their occurrences. I’m having issues with the logic.
With the ideal output looking like so:
'int' : [1 : 3, 2 : 4, 5 : 1, 6 : 2],
'float' : [1.0 : 2, 2.3 : 4, 3.4 : 4]
This is what I have so far, and I keep pounding my head:
values = [1, 2.0, 3.0, 1, 1, 3, 4.0, 2, 1.0]
types = {
'int': [],
'float': []
}
for obj in values:
if isinstance(obj, int):
types['int'].append(obj)
elif isinstance(obj, float):
types['float'].append(obj)
for v in types:
if v not in types['int']:
counts = 0
counts[v] += 1
elif v not in types['float']:
counts = 0
counts[v] += 1
print(counts)
With the ideal output being:
'int' : [1 : 3, 2 : 4, 5 : 1, 6 : 2],
'float' : [1.0 : 2, 2.3 : 4, 3.4 : 4]
First half seems fine, but second half can improve.
try:
for k,v in types.items():
print(k, len(v))
Sample input:
values = [1.1,2,3.1]
Gives this output:
int 1
float 2
How about something like this:
This leverages the type function to give you the keys for the resulting array ('int' and 'float') without having to specify those strings.
numbers = [1,1,1,2.1,2.1,2.2, 3,3.1]
def tally_by_type(numbers):
result = {}
for v in numbers:
k = type(v).__name__
result[k] = result.get(k, {})
result[k][v] = result[k].get(v,0)
result[k][v] = result[k][v] + 1
return result
tally_by_type(numbers)
{'int': {1: 3, 3: 1}, 'float': {2.1: 2, 2.2: 1, 3.1: 1}}
Interestingly, this also works if you have strings in there
tally_by_type([1,2,3,3,3,'a','a','b'])
{'int': {1: 1, 2: 1, 3: 3}, 'str': {'a': 2, 'b': 1}}
you can try something like this:
ints={}
floats={}
list=[3.5,27,87.8,1.02,66]
for val in list:
if isinstance(val,int):
ints[str(val)]=val
elif isinstance(val,float):
floats[str(val)]=val
print("ints dictionary\n",ints,"number of instances",len(ints))
print("floats dictionary\n",floats,"number of instances",len(floats))
which prints:
ints dictionary
{'27': 27, '66': 66} number of instances 2
floats dictionary
{'3.5': 3.5, '87.8': 87.8, '1.02': 1.02} number of instances 3
I did not quite get what dictionary keys you want to use though, assumed you don't really need them.

Prefer a key by max-value in dictionary?

You can get the key with max value in dictionary this way max(d, key=d.get).
The question when two or more keys have the max how can you set a preferred key.
I found a way to do this by perpending the key with a number.
Is there a better way ?
In [56]: d = {'1a' : 5, '2b' : 1, '3c' : 5 }
In [57]: max(d, key=d.get)
Out[57]: '1a'
In [58]: d = {'4a' : 5, '2b' : 1, '3c' : 5 }
In [59]: max(d, key=d.get)
Out[59]: '3c'
The function given in the key argument can return a tuple. The second element of the tuple will be used if there are several maximums for the first element. With that, you can use the method you want, for example with two dictionnaries:
d = {'a' : 5, 'b' : 1, 'c' : 5 }
d_preference = {'a': 1, 'b': 2, 'c': 3}
max(d, key=lambda key: (d[key], d_preference[key]))
# >> 'c'
d_preference = {'a': 3, 'b': 2, 'c': 1}
max(d, key=lambda key: (d[key], d_preference[key]))
# >> 'a'
This is a similar idea to #AxelPuig's solution. But, instead of relying on an auxiliary dictionary each time you wish to retrieve an item with max or min value, you can perform a single sort and utilise collections.OrderedDict:
from collections import OrderedDict
d = {'a' : 5, 'b' : 1, 'c' : 5 }
d_preference1 = {'a': 1, 'b': 2, 'c': 3}
d_preference2 = {'a': 3, 'b': 2, 'c': 1}
d1 = OrderedDict(sorted(d.items(), key=lambda x: -d_preference1[x[0]]))
d2 = OrderedDict(sorted(d.items(), key=lambda x: -d_preference2[x[0]]))
max(d1, key=d.get) # c
max(d2, key=d.get) # a
Since OrderedDict is a subclass of dict, there's generally no need to convert to a regular dict. If you are using Python 3.7+, you can use the regular dict constructor, since dictionaries are insertion ordered.
As noted on the docs for max:
If multiple items are maximal, the function returns the first one
encountered.
A slight variation on #AxelPuig's answer. You fix an order of keys in a priorities list and take the max with key=d.get.
d = {"1a": 5, "2b": 1, "3c": 5}
priorities = list(d.keys())
print(max(priorities, key=d.get))

Sum values of similar keys inside two nested dictionary in python

I have nested dictionary like this:
data = {
"2010":{
'A':2,
'B':3,
'C':5,
'D':-18,
},
"2011":{
'A':1,
'B':2,
'C':3,
'D':1,
},
"2012":{
'A':1,
'B':2,
'C':4,
'D':2
}
}
In my case, i need to sum all values based on its similar keys in every year, from 2010 till 2012..
So the result i expected should be like this:
data = {'A':4,'B':7, 'C':12, 'D':-15}
You can use collections.Counter() (works only for positive values!):
In [17]: from collections import Counter
In [18]: sum((Counter(d) for d in data.values()), Counter())
Out[18]: Counter({'C': 12, 'B': 7, 'A': 4, 'D': 3})
Note that based on python documentation Counter is designed only for use cases with positive values:
The multiset methods are designed only for use cases with positive values. The inputs may be negative or zero, but only outputs with positive values are created. There are no type restrictions, but the value type needs to support addition, subtraction, and comparison.
The elements() method requires integer counts. It ignores zero and negative counts.
So if you want to get a comprehensive result you can do the summation manually. The collections.defaultdict() is a good way for getting around this problem:
In [28]: from collections import defaultdict
In [29]: d = defaultdict(int)
In [30]: for sub in data.values():
....: for i, j in sub.items():
....: d[i] += j
....:
In [31]: d
Out[31]: defaultdict(<class 'int'>, {'D': -15, 'A': 4, 'C': 12, 'B': 7})
Try this,
reduce(lambda x, y: dict((k, v + y[k]) for k, v in x.iteritems()), data.values())
Result
{'A': 4, 'B': 7, 'C': 12, 'D': -15}

Dictionary Containing list data, filter based on value in list

I have test data which is gathered based on multiple inputs, and results in a single output. I'm currently storing this data in a dictionary whose keys are my parameter/ results labels, and whose values are the test conditions and results. I would like to be able to filter the data so I can generate plots based on isolated conditions.
In my example below, my test conditions would be 'a' and 'b', and the result of the experiment would be 'c'. I want to filter my data so I get a dictionary with the same key, value structure and only my filtered results. However my current dictionary comprehension returns an empty dictionary. Any advice to get the desired result?
Current Code:
data = {'a': [0, 1, 2, 0, 1, 2], 'b': [10, 10, 10, 20, 20, 20], 'c': [1.3, 1.9, 2.3, 2.3, 2.9, 3.4]}
filtered_data = {k:v for k,v in data.iteritems() if v in data['b'] >= 20}
Desired Result:
{'a': [0, 1, 2], 'b': [20, 20, 20], 'c': [2.3, 2.9, 3.4]}
Current Result:
{}
Also, is this dictionary of lists a good schema to store data of this type, given that I'm going to want to filter the results, or is there a better way to accomplish this?
use this:
k:[v[i] for i,x in enumerate(v) if data['b'][i] >= 20] for k,v in data.items()}
Desired Result:
{'a': [0, 1, 2], 'c': [2.3, 2.9, 3.4], 'b': [20, 20, 20]}
Consider using the pandas module for this type of work.
import pandas as pd
df = pd.DataFrame(data)
df = df[df["b"] >= 20]
print(df)
It appears like this will give you what you want. You are using the dictionary key to represent the column name and the values are just rows in a given column, so it is amenable to using a dataframe.
Result:
a b c
3 0 20 2.3
4 1 20 2.9
5 2 20 3.4
Are all of the dictionary value lists in matching orders? If so, you could just look at whichever list you want to filter by, say 'b' in this case, find the values you want, and then either use those indices or the same slice on the other values in the dictionary.
For example:
matching_indices = []
for i in data['b']:
if data['b'][i] >= 20:
matching_indices.append(i)
new_dict = {}
for key in data:
for item in matching_indices:
new_dict[key] = data[key][item]
You could probably figure a dictionary comprehension for it if you wanted. Hopefully this is clear.
you can change this into a method which would give it more flexibility. Your current logic means that dataset a and c are neglected because there are no values greater than or equal to 20:
data = {'a': [0, 1, 2, 0, 1, 2], 'b': [10, 10, 10, 20, 20, 20], 'c': [1.3, 1.9, 2.3, 2.3, 2.9, 3.4]}
filter_vals = ['a', 'b']
new_d = {}
for k, v in data.iteritems():
if k in filter_vals:
new_d[k] = [i for i in v if i >= 20]
print new_d
Now i'm not a big fan if many if statements, but something like this is straight forward and can be called many times
def my_filter(operator, condition, filter_vals, my_dict):
new_d = {}
for k, v in my_dict.iteritems():
if k in filter_vals:
if operator == '>':
new_d[k] = [i for i in v if i > condition]
elif operator == '<':
new_d[k] = [i for i in v if i < condition]
elif operator == '<=':
new_d[k] = [i for i in v if i <= condition]
elif operator == '>=':
new_d[k] = [i for i in v if i >= condition]
return new_d
I agree with the pandas approach above.
If for some reason you hate pandas or are an old school computer scientist, tuples are a good way to tore relational data. In your example, the a, b, and c lists are columns rather than rows. For tuples, you would want to store the rows as:
data = {'a':(0,10,1.3),'b':(1,10,1.9),'c':(2,10,2.3),'d':(0,20,2.3),'e':(1,20,2.9),'f':(2,20,3.4)}
where the tuples are stored in the (condition1, condition2, outcome) format you described and you can call a single test or filter a set as you describe. From there you can get a filtered set of results as follows:
filtered_data = {k:v for k,v in data.iteritems() if v[1]>=20}
which returns:
{'d': (0, 20, 2.3), 'e': (1, 20, 2.9), 'f': (2, 20, 3.4)}

Categories

Resources