Sum numbers by letter in list of tuples - python

I have a list of tuples:
[ ('A',100), ('B',50), ('A',50), ('B',20), ('C',10) ]
I am trying to sum up all numbers that have the same letter. I.e. I want to output
[('A', 150), ('B', 70), ('C',10)]
I have tried using set to get the unique values but then when I try and compare the first elements to the set I get
TypeError: unsupported operand type(s) for +: 'int' and 'str'
Any quick solutions to match the numbers by letter?

Here is a one(and a half?)-liner: group by letter (for which you need to sort before), then take the sum of the second entries of your tuples.
from itertools import groupby
from operator import itemgetter
data = [('A', 100), ('B', 50), ('A', 50), ('B', 20), ('C', 10)]
res = [(k, sum(map(itemgetter(1), g)))
for k, g in groupby(sorted(data, key=itemgetter(0)), key=itemgetter(0))]
print(res)
// => [('A', 150), ('B', 70), ('C', 10)]
The above is O(n log n) — sorting is the most expensive operation. If your input list is truly large, you might be better served by the following O(n) approach:
from collections import defaultdict
data = [('A', 100), ('B', 50), ('A', 50), ('B', 20), ('C', 10)]
d = defaultdict(int)
for letter, value in data:
d[letter] += value
res = list(d.items())
print(res)
// => [('B', 70), ('C', 10), ('A', 150)]

>>> from collections import Counter
>>> c = Counter()
>>> for k, num in items:
c[k] += num
>>> c.items()
[('A', 150), ('C', 10), ('B', 70)]
Less efficient (but nicer looking) one liner version:
>>> Counter(k for k, num in items for i in range(num)).items()
[('A', 150), ('C', 10), ('B', 70)]

How about this: (assuming a is the name of the tuple you have provided)
letters_to_numbers = {}
for i in a:
if i[0] in letters_to_numbers:
letters_to_numbers[i[0]] += i[1]
else:
letters_to_numbers[i[0]] = i[1]
b = letters_to_numbers.items()
The elements of the resulting tuple b will be in no particular order.

In order to achieve this, firstly create a dictionary to store your values. Then convert the dict object to tuple list using .items() Below is the sample code on how to achieve this:
my_list = [ ('A',100), ('B',50), ('A',50), ('B',20), ('C',10) ]
my_dict = {}
for key, val in my_list:
if key in my_dict:
my_dict[key] += val
else:
my_dict[key] = val
my_dict.items()
# Output: [('A', 150), ('C', 10), ('B', 70)]

What is generating the list of tuples? Is it you? If so, why not try a defaultdict(list) to append the values to the right letter at the time of making the list of tuples. Then you can simply sum them. See example below.
>>> from collections import defaultdict
>>> val_store = defaultdict(list)
>>> # next lines are me simulating the creation of the tuple
>>> val_store['A'].append(10)
>>> val_store['B'].append(20)
>>> val_store['C'].append(30)
>>> val_store
defaultdict(<class 'list'>, {'C': [30], 'A': [10], 'B': [20]})
>>> val_store['A'].append(10)
>>> val_store['C'].append(30)
>>> val_store['B'].append(20)
>>> val_store
defaultdict(<class 'list'>, {'C': [30, 30], 'A': [10, 10], 'B': [20, 20]})
>>> for val in val_store:
... print(val, sum(val_store[val]))
...
C 60
A 20
B 40

Try this:
a = [('A',100), ('B',50), ('A',50), ('B',20), ('C',10) ]
letters = set([s[0] for s in a])
new_a = []
for l in letters:
nums = [s[1] for s in a if s[0] == l]
new_a.append((l, sum(nums)))
print new_a
Results:
[('A', 150), ('C', 10), ('B', 70)]

A simpler approach
x = [('A',100),('B',50),('A',50),('B',20),('C',10)]
y = {}
for _tuple in x:
if _tuple[0] in y:
y[_tuple[0]] += _tuple[1]
else:
y[_tuple[0]] = _tuple[1]
print [(k,v) for k,v in y.iteritems()]

A one liner:
>>> x = [ ('A',100), ('B',50), ('A',50), ('B',20), ('C',10) ]
>>> {
... k: reduce(lambda u, v: u + v, [y[1] for y in x if y[0] == k])
... for k in [y[0] for y in x]
... }.items()
[('A', 150), ('C', 10), ('B', 70)]

Related

Creating Python defaultdict using nested list of tuples

The scenario is that I have a 2-D list. Each item of the inner list is tuple (key, value pair). The key might repeat in the list. I want to create a default-dict on the fly, in such a way that finally, the dictionary stores the key, and the cumulative sum of all the values of that key from the 2-D list.
To put the code :
listOfItems = [[('a', 1), ('b', 3)], [('a', 6)], [('c', 0), ('d', 5), ('b', 2)]]
finalDict = defaultdict(int)
for eachItem in listOfItems:
for key, val in eachItem:
finalDict[key] += val
print(finalDict)
This is giving me what I want : defaultdict(<class 'int'>, {'a': 7, 'b': 5, 'c': 0, 'd': 5}) but I am looking for a more 'Pythonic' way using comprehensions. So I tried the below :
finalDict = defaultdict(int)
finalDict = {key : finalDict[key]+val for eachItem in listOfItems for key, val in eachItem}
print(finalDict)
But the output is : {'a': 6, 'b': 2, 'c': 0, 'd': 5} What is it that I am doing wrong? Or is it that when using comprehension the Dictionary is not created and modified on the fly?
Yes a comprehension can't be updated on-the-fly. Anyway, this task might be better suited to collections.Counter() with .update() calls:
>>> from collections import Counter
>>> c = Counter()
>>> for eachItem in listOfItems:
... c.update(dict(eachItem))
...
>>> c
Counter({'a': 7, 'b': 5, 'd': 5, 'c': 0})
This is because you do not assign any value to your finalDict inside your dict in comprehension.
In your dict in comprehension you are literally changing the type of finalDict
As far as I know you cannot assign value to your dict inside a dict in comprehension.
Here is a way to get the dictionnary you want
from functools import reduce
listOfItems = [[('a', 1), ('b', 3)], [('a', 6)], [('c', 0), ('d', 5), ('b', 2)]]
list_dict = [{key: val} for eachItem in listOfItems for key, val in eachItem]
def sum_dict(x, y):
return {k: x.get(k, 0) + y.get(k, 0) for k in set(x) | set(y)}
print(reduce(sum_dict, list_dict))
Simple solution without using additional modules:
inp_list = [[('a', 1), ('b', 3)], [('a', 6)], [('c', 0), ('d', 5), ('b', 2)]]
l = [item for sublist in inp_list for item in sublist] # flatten the list
sums = [(key, sum([b for (a,b) in l if a == key])) for key in dict(l)]
print(sums)
trying to use python's built-in methods instead of coding the functionality myself:
The long and explained solution
from itertools import chain, groupby
from operator import itemgetter
listOfItems = [[('a', 1), ('b', 3)], [('a', 6)], [('c', 0), ('d', 5), ('b', 2)]]
# just flat the list of lists into 1 list..
flatten_list = chain(*listOfItems)
# get all elements grouped by the key, e.g 'a', 'b' etc..
first = itemgetter(0)
groupedByKey = groupby(sorted(flatten_list, key=first), key=first))
#sum
summed_by_key = ((k, sum(item[1] for item in tups_to_sum)) for k, tups_to_sum in groupedByKey)
# create a dict
d = dict(summed_by_key)
print(d) # {'a': 7, 'b': 5, 'c': 0, 'd': 5}
~one line solution
from itertools import chain, groupby
from operator import itemgetter
first = itemgetter(0)
d = dict((k, sum(item[1] for item in tups_to_sum)) for k, tups_to_sum in groupby(sorted(chain(*listOfItems), key=first), key=first))
print(d) # {'a': 7, 'b': 5, 'c': 0, 'd': 5}

how to sort a dictionary and get only first and last element

I have written a program in which I am counting he frequency of a letter occurred in a string.
Input: AAAABBBBBCCDEEEEEEEEEEFFF
I want my output to be only those letter which occurred most and least number of times, and with the number of times they occurred.
import sys
seq=sys.argv[1]
count = {}
for i in seq:
if count.has_key(i):
count[i] += 1
else:
count[i] = 1
for i in sorted(count, key=count.get, reverse=True):
print i, count[i]
output:
Actual Output:
E:10, B:5, A:4, F:3, C:2, D:1
Expected Output:
E: 10 , D: 1
You can use collections.Counter to count the letters:
>>> import operator, collections
>>> counter = collections.Counter('AAAABBBBBCCDEEEEEEEEEEFFF')
Counter({'E': 10, 'B': 5, 'A': 4, 'F': 3, 'C': 2, 'D': 1})
>>> sorted_counter = sorted(counter, key=operator.itemgetter(1), reverse=True)
[('E', 10), ('B', 5), ('A', 4), ('F', 3), ('C', 2), ('D', 1)]
>>> print sorted_counter[-1]
('D', 1)
>>> print sorted_counter[0]
('E', 10)
You're pretty much there. There's just no reason for you to be iterating over the whole sorted dictionary.
sorted_count = sorted(count, key=count.get, reverse=True)
print sorted_count[0]
print sorted_count[-1]
Alternatively:
print min(count, key=count.get)
print max(count, key=count.get)

Reducing a list of repeated tuples and counting repeats

I was wondering how to reduce a list of tuples like the following:
[('a','b'),('b','a'),('c','d')]
to the following:
[('a','b'),('c','d')]
And also count the number of times an element repeats and return a list that associates the count with the tuple. From this example, that list would be [2, 1]
Thanks!
I've tried:
l = [('a','b'),('c','d')]
counts_from_list = [len(list(group)) for group in itertools.groupby(my_list)]
zip(set(l), counts_from_list)
Use a Counter, sorting the items first to make sure ('a', 'b') and ('b', 'a') are the "same".
>>> data = [('a','b'),('b','a'),('c','d')]
>>> data = [tuple(sorted(x)) for x in data]
>>> from collections import Counter
>>> c = Counter(data)
>>> c
Counter({('a', 'b'): 2, ('c', 'd'): 1})
Accessing the data
>>> c.keys()
[('a', 'b'), ('c', 'd')]
>>> c.values()
[2, 1]
>>> c.items()
[(('a', 'b'), 2), (('c', 'd'), 1)]
>>>
Use set for to check tuples regardless to order of their items.
my_list = [('a','b'),('b','a'),('c','d')]
my_list = map(set, my_list)
and Instead of
counts_from_list = [len(list(group)) for group in itertools.groupby(my_list)]
You should use
counts_from_list = [len(list(value)) for group, value in itertools.groupby(my_list)]
because itertools.groupby(my_list) returns key, value pairs
but I would advise You to use collections.Counter
collections.Counter(map(frozenset, [('a','b'),('b','a'),('c','d')])).values()
set is not appropriate here because it is not hashable.

how to write a function to add the integer of corresponding letter in python?

how to write a function to add the integer of corresponding letter in python?
for example:
L=[('a',3),('b',4),('c',5),('a',2),('c',2),('b',1)]
How to solve it by just loop over the item in L?
I guess the clearest way is just to loop through and add them up.
>>> L=[('a',3),('b',4),('c',5),('a',2),('c',2),('b',1)]
>>> import collections
>>> d=collections.defaultdict(int)
>>> for key,n in L:
... d[key] += n
...
>>> sorted(d.items())
[('a', 5), ('b', 5), ('c', 7)]
You can use dictionary for it and add the repeated key values , Just like that.
dict = {}
for i in L:
if i[0] in dict:
dict[i[0]] += i[1]
else:
dict[i[0]] = i[1]
dict.items()
Output will be : [('a', 5), ('c', 7), ('b', 5)]
you can try to define a function like this :
def sorting(L):
dit = {}
result = []
for l in L :
dit[l[0]]= 0
for key , item in dit.items():
for ll in L :
if key == ll[0] :
dit[key] += ll[1]
for key , item in dit.items():
result.append((key , item))
return sorted(result)
you will see the result :
>>> sorting(L)
[('a', 5), ('b', 5), ('c', 7)]
Here's the obligatory one-line itertools solution:
>>> import itertools
>>> [
... (k, sum(g[1] for g in group))
... for k, group in itertools.groupby(sorted(L), key=lambda x: x[0])
... ]
[('a', 5), ('b', 5), ('c', 7)]

Combine two lists: aggregate values that have similar keys

I have two lists or more than . Some thing like this:
listX = [('A', 1, 10), ('B', 2, 20), ('C', 3, 30), ('D', 4, 30)]
listY = [('a', 5, 50), ('b', 4, 40), ('c', 3, 30), ('d', 1, 20),
('A', 6, 60), ('D', 7, 70])
i want to get the result that move the duplicate elements like this:
my result is to get all the list from listX + listY,but in the case there are duplicated
for example
the element ('A', 1, 10), ('D', 4, 30) of listX is presented or exitst in listY.so the result so be like this
result = [('A', 7, 70), ('B', 2, 20), ('C', 3, 30), ('D', 11, 100),
('a', 5, 50), ('b', 4, 40), ('c', 3, 30), ('d', 1, 20)]
(A, 7, 70) is obtained by adding ('A', 1, 10) and ('A', '6', '60') together
Anybody could me to solve this problem.?
Thanks.
This is pretty easy if you use a dictionary.
combined = {}
for item in listX + listY:
key = item[0]
if key in combined:
combined[key][0] += item[1]
combined[key][1] += item[2]
else:
combined[key] = [item[1], item[2]]
result = [(key, value[0], value[1]) for key, value in combined.items()]
You appear to be using lists like a dictionary. Any reason you're using lists instead of dictionaries?
My understanding of this garbled question, is that you want to add up values in tuples where the first element in the same.
I'd do something like this:
counter = dict(
(a[0], (a[1], a[2]))
for a in listX
)
for key, v1, v2 in listY:
if key not in counter:
counter[key] = (0, 0)
counter[key][0] += v1
counter[key][1] += v2
result = [(key, value[0], value[1]) for key, value in counter.items()]
I'd say use a dictionary:
result = {}
for eachlist in (ListX, ListY,):
for item in eachlist:
if item[0] not in result:
result[item[0]] = item
It's always tricky do do data manipulation if you have data in a structure that doesn't represent the data well. Consider using better data structures.
Use dictionary and its 'get' method.
d = {}
for x in (listX + listY):
y = d.get(x[0], (0, 0, 0))
d[x[0]] = (x[0], x[1] + y[1], x[2] + y[2])
d.values()

Categories

Resources