adding lists of tuples and sorting them - python

I have the following lists of tuples:
mylist=[('a', 3), ('b', 2), ('c', 8)]
mylist2=[('a', 3), ('b', 5), ('c', 20), ('d', 5)]
Is there a way I can sum all values that share the same name and sort them in Python? Something like:
[('c', 28), ('b', 7), ('a', 6), ('d', 5)]

If I were you, I would have done it like:
>>> mylist=[('a', 3), ('b', 2), ('c', 8)]
>>> mylist2=[('a', 3), ('b', 5), ('c', 20), ('d', 5)]
# Step 1: Convert the list of tuples to `dict`
>>> dict_1, dict_2 = dict(mylist), dict(mylist2)
# Step 2: get set of all keys
>>> all_keys = set(dict_1.keys() + dict_2.keys())
# Step 3: Get `sum` of value for each key
>>> sum_list = [(k, dict_1.get(k, 0) + dict_2.get(k, 0)) for k in all_keys]
And then sort the list as:
>>> from operator import itemgetter
# Step 4: Sort in descending order based on value at index 1
>>> sorted(sum_list, key=itemgetter(1), reverse=True)
[('c', 28), ('b', 7), ('a', 6), ('d', 5)]
Note: It assumes that the key at 0th index in tuples of both the lists are unique.

Related

Remove elements from tuple array that have same value in first index position of each element

Lets say I have a list:
t = [('a', 1), ('a', 6), ('b', 2), ('c', 3), ('c', 5), ('d', 4)]
There are two tuples with 'a' as the first element, and two tuples with 'c' as the first element. I want to only keep the first instance of each, so I end up with:
t = [('a', 1), ('b', 2), ('c', 3), ('d', 4)]
How can I achieve that?
You can use a dictionary to help you filter the duplicate keys:
>>> t = [('a', 1), ('a', 6), ('b', 2), ('c', 3), ('c', 5), ('d', 4)]
>>> d = {}
>>> for x, y in t:
... if x not in d:
... d[x] = y
...
>>> d
{'a': 1, 'b': 2, 'c': 3, 'd': 4}
>>> t = list(d.items())
>>> t
[('a', 1), ('b', 2), ('c', 3), ('d', 4)]
#MrGeek's answer is good, but if you do not want to use a dictionary, you could do something simply like this:
>>> t = [('a', 1), ('a', 6), ('b', 2), ('c', 3), ('c', 5), ('d', 4)]
>>> already_seen = []
>>> for e in t:
... if e[0] not in already_seen:
... already_seen.append(e[0])
... else:
... t.remove(e)
...
>>> t
[('a', 1), ('b', 2), ('c', 3), ('d', 4)]
#gold_cy's Comment is the easiest way:
You can use itertools.groupby in order to group your data. We use key param to group by the first element of each tuple.
import itertools as it
t = [list(my_iterator)[0] for g, my_iterator in it.groupby(t, key=lambda x: x[0])]
Output:
[('a', 1), ('b', 2), ('c', 3), ('d', 4)]

Sorted a tuple by value use a List when the value is equal

I have a list:
test1 = ["a","b","c","d","e","f","g","h","i"]
And a list of tuples:
test2 = [("c",1),("g",1),("b",1),("e",1),("g",1),("d",10),("a",10)]
I need sorted the:
[val for (key, val) in test2]
and when the val is equal the same val's sorted by the test1:
test3 = [("b",1),("c",1),("e",1),("f",1),("g",1),("a",10),("d",10)]
sorted accepts an optional parameter key. The return value of the function (each item is passed to the function) is used instead of the items themselves.
>>> test1 = ["a","b","c","d","e","f","g","h","i"]
>>> test2 = [("c",1),("g",1),("b",1),("e",1),("g",1),("d",10),("a",10)]
>>> sorted(test2, key=lambda x: (x[1], test1.index(x[0])))
[('b', 1), ('c', 1), ('e', 1), ('g', 1), ('g', 1), ('a', 10), ('d', 10)]
Given the above key function order will be by the number first, then position in the test1.
Use a dict mapping each string in test1 to its index so for ties you sort by the index and the lookup is 0(1):
test1 = ["a","b","c","d","e","f","g","h","i"]
inds = dict(zip(test1, range(len(test1))))
test2 = [("c",1),("g",1),("b",1),("e",1),("g",1),("d",10),("a",10)]
print(sorted(test2,key=lambda x: (x[1], inds[x[0]])))
Output:
[('b', 1), ('c', 1), ('e', 1), ('g', 1), ('g', 1), ('a', 10), ('d', 10)]
If you actually wanted the strings to be in sorted order you could just use the string itself, using itemgetter instead of a lambda:
test2 = [("c", 1), ("g", 1), ("b", 1), ("e", 1), ("g", 1), ("d", 10), ("a", 10)]
from operator import itemgetter
print(sorted(test2, key=itemgetter(1, 0)))
[('b', 1), ('c', 1), ('e', 1), ('g', 1), ('g', 1), ('a', 10), ('d', 10)]

How to sum values of tuples that have same name in Python

I have the following list containing tuples that have to values:
mylist=[(3, 'a'), (2, 'b'), (4, 'a'), (5, 'c'), (2, 'a'), (1, 'b')]
Is there a way to sum all values that share the same name?
Something like:
(9, 'a'), (3, 'b'), (5, 'c')
I tried iterating tuples with for loop but can't get what i want.
Thank you
Use a dict (or defaultdict) to aggregate over your tuples:
from collections import defaultdict
mylist=[(3, 'a'), (2, 'b'), (4, 'a'), (5, 'c'), (2, 'a'), (1, 'b')]
sums = defaultdict(int)
for i, k in mylist:
sums[k] += i
print sums.items()
# output: [('a', 9), ('c', 5), ('b', 3)]
# Or if you want the key/value order reversed and sorted by key
print [(v, k) for (k, v) in sorted(sums.items())]
# output: [(9, 'a'), (3, 'b'), (5, 'c')]
You can use itertools.groupby (after sorting by the second value of each tuple) to create groups. Then for each group, sum the first element in each tuple, then create a tuple per group in a list comprehension.
>>> import itertools
>>> [(sum(i[0] for i in group), key) for key, group in itertools.groupby(sorted(mylist, key = lambda i: i[1]), lambda i: i[1])]
[(9, 'a'), (3, 'b'), (5, 'c')]
A simple approach without any dependencies:
mylist = [(3, 'a'), (2, 'b'), (4, 'a'), (5, 'c'), (2, 'a'), (1, 'b')]
sums = {}
for a, b in mylist:
if b not in sums: sums[b] = a
else: sums[b] += a
print list(sums.items())
You can create a dict which has sum of individual keys.
>>> my_dict = {}
>>> for item in mylist:
... my_dict[item[1]] = my_dict.get(item[1], 0) + item[0]
>>> [(my_dict[k], k) for k in my_dict]
[(9, 'a'), (5, 'c'), (3, 'b')]

How to sort list of tuples by several keys

I am doing an exercise on Python and lists with one problem:
I have a list of tuples sorted by second key:
[('f', 3), ('a', 3), ('d', 3), ('b', 2), ('c', 2)]
And I need sort it: Second value by number and first value by alphabetical order. And it must look like:
[('a', 3), ('d', 3), ('f', 3), ('b', 2), ('c', 2)]
When I used the sorted function I got:
[('a', 3), ('b', 2), ('c', 2), ('d', 3), ('f', 3)]
It sorted by first element (and I lost arrangement of second). I also tried to use key:
def getKey(item):
return item[0]
a = (sorted(lis, key=getKey))
And it didn't help me either.
When you have a list with nested tuples you cannot sort it by looking at both elements. In your case you can either sort by alphabetical order or numerical order. The key parameter of the sort method let's you specify by which element in the tuple pair you want to sort your data.
If you want to sort by increasing numerical order:
alist = [('f', 3), ('a', 3), ('d', 3), ('b', 2), ('c', 2)]
alist.sort(key=lambda pair: pair[1])
If you want to sort by decreasing numerical order:
alist = [('f', 3), ('a', 3), ('d', 3), ('b', 2), ('c', 2)]
alist.sort(key = lambda pair: pair[1], reverse=True)
If you want to sort by alphabetical order:
alist = [('f', 3), ('a', 3), ('d', 3), ('b', 2), ('c', 2)]
alist.sort()
Reverse alphabetical order:
alist = [('f', 3), ('a', 3), ('d', 3), ('b', 2), ('c', 2)]
alist.sort(reverse = True)
The key parameter let's you specify by which element of the tuple pair you want to sort by.
You cannot sort by both alphabetical and numerical order.
l = [('f', 4), ('b', 4), ('a', 4), ('c', 3), ('k', 1)]
l.sort(key=lambda x:(-x[1],x[0]))
print(l)
[('a', 4), ('b', 4), ('f', 4), ('c', 3), ('k', 1)]
We pass two keys to sort -x[1] which reverses the sort by numbers with the negative sign from highest to lowest, we then break ties with x[0] which is sorted from lowest to highest i.e a-z naturally.
`
Correct answer:
l = [('f', 4), ('b', 4), ('a', 4), ('c', 3), ('k', 1)]
l.sort(key=lambda x:(-x[1],x[0]))
print(l)
Result:
[('a', 4), ('b', 4), ('f', 4), ('c', 3), ('k', 1)]
def getKey(item):
return item[0]
This returns the first element of the tuple, so the list will be sorted by the first tuple element. You want to sort by second element, then first, so you want to reverse your tuple. Your key function would then need to be:
def getKey(item):
return -item[1], item[0]
Making your final call:
>>> sorted(lis, key=getKey)
[('a', 3), ('d', 3), ('f', 3), ('b', 2), ('c', 2)]
The sort() method is stable. Call it twice, first for the secondary key (alphabetically), then for the primary key (the number):
>>> lst = [('f', 3), ('a', 3), ('d', 3), ('b', 2), ('c', 2)]
>>> lst.sort()
>>> lst.sort(key=lambda kv: kv[1], reverse=True)
>>> lst
[('a', 3), ('d', 3), ('f', 3), ('b', 2), ('c', 2)]

Python list sort by size of group

I have a group of items that are labeled like item_labels = [('a', 3), ('b', 2), ('c', 1), ('d', 3), ('e', 2), ('f', 3)]
I want to sort them by the size of group. e.g., label 3 has size 3 and label 2 has size 2 in the above example.
I tried using a combination of groupby and sorted but didn't work.
In [162]: sil = sorted(item_labels, key=op.itemgetter(1))
In [163]: sil
Out[163]: [('c', 1), ('b', 2), ('e', 2), ('a', 3), ('d', 3), ('f', 3)]
In [164]: g = itt.groupby(sil,)
Display all 465 possibilities? (y or n)
In [164]: g = itt.groupby(sil, key=op.itemgetter(1))
In [165]: for k, v in g:
.....: print k, list(v)
.....:
.....:
1 [('c', 1)]
2 [('b', 2), ('e', 2)]
3 [('a', 3), ('d', 3), ('f', 3)]
In [166]: sg = sorted(g, key=lambda x: len(list(x[1])))
In [167]: sg
Out[167]: [] # not exactly know why I got an empty list here
I can always write some tedious for-loop to do this, but I would rather find something more elegant. Any suggestion? If there are libraries that are useful I would happy to use that. e.g., pandas, scipy
In python2.7 and above, use Counter:
from collections import Counter
c = Counter(y for _, y in item_labels)
item_labels.sort(key=lambda t : c[t[1]])
In python2.6, for our purpose, this Counter constructor can be implemented using defaultdict (as suggested by #perreal) this way:
from collections import defaultdict
def Counter(x):
d = defaultdict(int)
for v in x: d[v]+=1
return d
Since we are working with numbers only, and assuming the numbers are as low as those in your example, we can actually use a list (which will be compatible with even older version of Python):
def Counter(x):
lst = list(x)
d = [0] * (max(lst)+1)
for v in lst: d[v]+=1
return d
Without counter, you can simply do this:
item_labels.sort(key=lambda t : len([x[1] for x in item_labels if x[1]==t[1] ]))
It is slower, but reasonable over short lists.
The reason you've got an empty list is that g is a generator. You can only iterate over it once.
from collections import defaultdict
import operator
l=[('c', 1), ('b', 2), ('e', 2), ('a', 3), ('d', 3), ('f', 3)]
d=defaultdict(int)
for p in l: d[p[1]] += 1
print [ p for i in sorted(d.iteritems(), key=operator.itemgetter(1))
for p in l if p[1] == i[1] ]
itertools.groupby returns an iterator, so this for loop: for k, v in g: actually consumed that iterator.
>>> it = iter([1,2,3])
>>> for x in it:pass
>>> list(it) #iterator already consumed by the for-loop
[]
code:
>>> lis = [('a', 3), ('b', 2), ('c', 1), ('d', 3), ('e', 2), ('f', 3)]
>>> from operator import itemgetter
>>> from itertools import groupby
>>> lis.sort(key = itemgetter(1) )
>>> new_lis = [list(v) for k,v in groupby(lis, key = itemgetter(1) )]
>>> new_lis.sort(key = len)
>>> new_lis
[[('c', 1)], [('b', 2), ('e', 2)], [('a', 3), ('d', 3), ('f', 3)]]
To get a flattened list use itertools.chain:
>>> from itertools import chain
>>> list( chain.from_iterable(new_lis))
[('c', 1), ('b', 2), ('e', 2), ('a', 3), ('d', 3), ('f', 3)]
Same as #perreal's and #Elazar's answers, but with better names:
from collections import defaultdict
size = defaultdict(int)
for _, group_id in item_labels:
size[group_id] += 1
item_labels.sort(key=lambda (_, group_id): size[group_id])
print item_labels
# -> [('c', 1), ('b', 2), ('e', 2), ('a', 3), ('d', 3), ('f', 3)]
Here is another way:
example=[('a', 3), ('b', 2), ('c', 1), ('d', 3), ('e', 2), ('f', 3)]
out={}
for t in example:
out.setdefault(t[1],[]).append(t)
print sorted(out.values(),key=len)
Prints:
[[('c', 1)], [('b', 2), ('e', 2)], [('a', 3), ('d', 3), ('f', 3)]]
If you want a flat list:
print [l for s in sorted(out.values(),key=len) for l in s]
[('c', 1), ('b', 2), ('e', 2), ('a', 3), ('d', 3), ('f', 3)]

Categories

Resources