Python reduce sum tuple - python

I have an input that will vary in size.
data = [(("101","A"),5), (("105","C"),12), (("101", "B"),4)]
Looking for an output that groups by key[0], keeps all items of key[1]. And, sums up the values.
output = [(("101", "A", "B"),9), (("105", "C"),12)]
I've tried.
my_dict = dict(data)
final_values = {}
for k,v in my_dict.items():
key1 = k[0]
key2 = k[1]
if key1 not in final_values:
final_values[key1] = []
final_values[key1].append(key2)
final_values[key1].append(v)
Which returns.
{'101': ['A', 5, 'B', 4], '105': ['C', 12]}
I'd like to get the sum of the numbers in the list.

for k in final_values:
print '%s: sum is %d' % (k, sum([x for x in final_values[k] if type(x) is int]))

You can try using a collections.defaultdict() to group the items, then flattening the results at the end:
from collections import defaultdict
from operator import itemgetter
data = [(("101","A"),5), (("105","C"),12), (("101", "B"),4)]
d = defaultdict(list)
for (x, y), z in data:
d[x].append((y, z))
result = [
((k, *tuple(map(itemgetter(0), v))), sum(map(itemgetter(1), v)))
for k, v in d.items()
]
print(result)
# [(('101', 'A', 'B'), 9), (('105', 'C'), 12)]

Related

Summing values in dictionary who stores list of tuples

I have a list of tuples named lista and values in this list looks like:
[('A', 1234),('A', 9876),('B',6574),('B',9562), etc]
Next I create defaultdict(list) where I store my tuple and I get:
([('A', [1234, 9876]),('B',[6547.9562]), etc])
To create this I wrote:
//list
lista = []
for w in data:
if self.getAmountOfProceededInSpecificYear(w.status,w.year,w.district):
lista.append(tuple((w.district,w.amount)))
//dict
passed_dict = defaultdict(list)
for k,v in lista:
passed_dict[k].append(v)
Now I want to sum up values for each key and get:
'A', 11110
Does anobody know how to sum up this values?
You can use defaultdict(int) instead of defaultdict(list):
lista = [('A', 1234),('A', 9876),('B',6574),('B',9562)]
passed_dict = defaultdict(int)
for k,v in lista:
passed_dict[k] += v
passed_dict
defaultdict(int, {'A': 11110, 'B': 16136})
You can use itemgetter and groupby as suggested here
If you want a list as an output
from itertools import groupby
from operator import itemgetter
lista = [('A', 1234),('A', 9876),('B',6574),('B',9562)]
passed_dict = [(k, sum(list(zip(*v))[1])) for k, v in groupby(lista, itemgetter(0))]
# [('A', 11110), ('B', 16136)]
If you want a dictionary as an output
passed_dict = {k: sum(list(zip(*v))[1]) for k, v in groupby(lista, itemgetter(0))}
# {'A': 11110, 'B': 16136}

weighted counting in python

I want to count the instances of X in a list, similar to
How can I count the occurrences of a list item in Python?
but taking into account a weight for each instance.
For example,
L = [(a,4), (a,1), (b,1), (b,1)]
the function weighted_count() should return something like
[(a,5), (b,2)]
Edited to add: my a, b will be integers.
you can still use counter:
from collections import Counter
c = Counter()
for k,v in L:
c.update({k:v})
print c
The following will give you a dictionary of all the letters in the array and their corresponding counts
counts = {}
for value in L:
if value[0] in counts:
counts[value[0]] += value[1]
else:
counts[value[0]] = value[1]
Alternatively, if you're looking for a very specific value. You can filter the list for that value, then map the list to the weights and find the sum of them.
def countOf(x,L):
filteredL = list(filter(lambda value: value[0] == x,L))
return sum(list(map(lambda value: value[1], filteredL)))
>>> import itertools
>>> L = [ ('a',4), ('a',1), ('b',1), ('b',1) ]
>>> [(k, sum(amt for _,amt in v)) for k,v in itertools.groupby(sorted(L), key=lambda tup: tup[0])]
[('a', 5), ('b', 2)]
defaultdict will do:
from collections import defaultdict
L = [('a',4), ('a',1), ('b',1), ('b',1)]
res = defaultdict(int)
for k, v in L:
res[k] += v
print(list(res.items()))
prints:
[('b', 2), ('a', 5)]
Group items with the occurrence of first element of each tuple using groupby from itertools:
>>> from itertools import groupby
>>> from operator import itemgetter
>>> L = [('a',4), ('a',1), ('b',1), ('b',1)]
>>> L_new = []
>>> for k,v in groupby(L,key=itemgetter(0)):
L_new.append((k,sum(map(itemgetter(1), v))))
>>> L_new
[('a', 5), ('b', 2)]
>>> L_new = [(k,sum(map(itemgetter(1), v))) for k,v in groupby(L, key=itemgetter(0))] #for those fun of list comprehension and one liner expression
>>> L_new
[('a', 5), ('b', 2)]
Tested in both Python2 & Python3
Use the dictionaries get method.
>>> d = {}
>>> for item in L:
... d[item[0]] = d.get(item[0], 0) + item[1]
...
>>> d
{'a': 5, 'b': 2}

in Python, dictionary sort by value, but only return key

Im using Python 3.3.1 (newbie)
I have a dictionary with a integer key and integer value
I need to sort this dictionary and return a list of key where the value falls below a threshold (say 't')
so far I have
list_integer = sorted(dict_int_int.items(), key=lambda x: x[1] )
this sorts the dictionary by value -- everything is fine so far, but how do I limit the values to be below 't' and then ONLY return the keys
Thanks in Advance
try this:
[key for key,value in sorted(dic.items() ,key=lambda x : x[1]) if value < threshold]
or use operator.itemgetter:
>>> from operator import itemgetter
>>> [key for key,value in sorted(dic.items() ,key= itemgetter(1) ) if value < threshold]
Try this
list_integer = filter(lambda x: x[1] < t, dict_int_int.items()))
list_integer = sorted([x[0] for x in list_integer])
Let's start with what you have:
list_integer = sorted(dict_int_int.items(), key=lambda x: x[1])
This gives you a list of key-value pairs, sorted by values (which makes the name a bit misleading).
Now let's limit the values to be below 't':
list_small = ((k, v) for k, v in list_integer if v < t)
(You could also write that as filter if you prefer.)
And now, let's ONLY return the keys:
list_keys = [k for k, v in list_small]
Of course you can combine any two of these steps, or even combine all three (in which case you end up with Ashwini Chaudhary's answer).
Let's go through these step by step to make sure they work:
>>> dict_int_int = {'a': 1.0, 'b': 0.5, 'c': 0.25, 'd': 0.10 }
>>> t = 0.75
>>> list_integer = sorted(dict_int_int.items(), key=lambda x: x[1])
>>> list_integer
[('d', 0.1), ('c', 0.25), ('b', 0.5), ('a', 1.0)]
>>> list_small = [(k, v) for k, v in list_integer if v < t]
>>> list_small
[('d', 0.1), ('c', 0.25), ('b', 0.5)]
>>> list_keys = [k for k, v in list_small]
>>> list_keys
['d', 'c', 'b']
(Notice that I changed the generator expression list_small into a list comprehension. This is because we need to print out its values, and then use them again. A generator expression only lets you use its values once.)

Concatenate strings by groups python

I would like to concatenate a list of strings into new strings grouped over values in a list. Here is an example of what I mean:
Input
key = ['1','2','2','3']
data = ['a','b','c','d']
Result
newkey = ['1','2','3']
newdata = ['a','b c','d']
I understand how to join text. But I don't know how to iterate correctly over the values of the list to aggregate the strings that are common to the same key value.
Any help or suggestions appreciated. Thanks.
from collections import defaultdict
d = defaultdict(list)
for k, v in zip(key, data):
d[k].append(v)
print [(k, ' '.join(v)) for k, v in d.items()]
Output:
[('1', 'a'), ('3', 'd'), ('2', 'b c')]
And how to get new lists:
newkey, newvalue = d.keys(), [' '.join(v) for v in d.values()]
And with saved order:
newkey, newvalue = zip(*[(k, ' '.join(d.pop(k))) for k in key if k in d])
Use the itertools.groupby() function to combine elements; zip will let you group two input lists into two output lists:
import itertools
import operator
newkey, newdata = [], []
for key, items in itertools.groupby(zip(key, data), key=operator.itemgetter(0)):
# key is the grouped key, items an iterable of key, data pairs
newkey.append(key)
newdata.append(' '.join(d for k, d in items))
You can turn this into a list comprehension with a bit more zip() magic:
from itertools import groupby
from operator import itemgetter
newkey, newdata = zip(*[(k, ' '.join(d for _, d in it)) for k, it in groupby(zip(key, data), key=itemgetter(0))])
Note that this does require the input to be sorted; groupby only groups elements based on the consecutive keys being the same. On the other hand, it does preserve that initial sorted order.
you can use itertools.groupby() on zip(key,data):
In [128]: from itertools import *
In [129]: from operator import *
In [133]: lis=[(k," ".join(x[1] for x in g)) for k,g in groupby(zip(key,data),key=itemgetter(0))]
In [134]: newkey,newdata=zip(*lis)
In [135]: newkey
Out[135]: ('1', '2', '3')
In [136]: newdata
Out[136]: ('a', 'b c', 'd')
If you dont feel like importing collections you can always use a regular dictionary.
key = ['1','2','2','3']
data = ['a','b','c','d']
newkeydata = {}
for k,d in zip(key,data):
newkeydata[k] = newkeydata.get(k, []).append(d)
Just for the sake of variety, here is a solution that works without any external libraries and without dictionaries:
def group_vals(keys, vals):
new_keys= sorted(set(keys))
zipped_keys = zip(keys, keys[1:]+[''])
zipped_vals = zip(vals, vals[1:]+[''])
new_vals = []
for i, (key1, key2) in enumerate(zipped_keys):
if key1 == key2:
new_vals.append(' '.join(zipped_vals[i]))
else:
new_vals.append(zipped_vals[i][0])
return new_keys, new_vals
group_vals([1,2,2,3], ['a','b','c','d'])
# --> ([1, 2, 3], ['a', 'b c', 'd'])
But I know that it's quite ugly and probably not as performant as the other solutions. Just for demonstration purposes. :)

pythonic way to filter a dictionary of arrays

I have a dictionary that looks something like this:
d = { 'a':['a','b','c','d'],
'b':['a','b','c','d'],
'c':['a','b','c','d'],
'd':['a','b','c','d'], }
I would like to reduce this dictionary into a new one that contains 2 keys randomly selected from the full set of keys, and also only contains values that correspond to those random keys.
Here is the code I wrote that works, but I feel like there is probably a more pythonic way to do it, any suggestions?
import random
d = { 'a':['a','b','c','d'],
'b':['a','b','c','d'],
'c':['a','b','c','d'],
'd':['a','b','c','d'], }
new_d = {}
r = d.keys()
random.shuffle(r)
r = r[:2]
r_dict = dict( (k,True) for k in r)
for k in r_dict:
a = tuple(d[k])
new_a = []
for item in a:
if item in r_dict:
new_a.append(item)
new_d[k] = new_a
"new_d" has filtered dictionary, for example:
{'a': ['a', 'b'], 'b': ['a', 'b']}
If 'a' and 'b' are the two random keys.
Building on FM's, with the underused set type:
>>> ks = set(random.sample(d, 2))
>>> dict((k, list(ks & set(d[k]))) for k in ks)
{'a': ['a', 'c'], 'c': ['a', 'c']}
How about the following:
import random
rk = random.sample(d.keys(),2)
new_d = {}
for k in rk:
new_d[k] = list(set(d[k]).intersection(rk))
ks = set(random.sample(d.keys(), 2))
nd = dict( (k, list(v for v in d[k] if v in ks)) for k in ks )

Categories

Resources