Reducing a list of repeated tuples and counting repeats - python

I was wondering how to reduce a list of tuples like the following:
[('a','b'),('b','a'),('c','d')]
to the following:
[('a','b'),('c','d')]
And also count the number of times an element repeats and return a list that associates the count with the tuple. From this example, that list would be [2, 1]
Thanks!
I've tried:
l = [('a','b'),('c','d')]
counts_from_list = [len(list(group)) for group in itertools.groupby(my_list)]
zip(set(l), counts_from_list)

Use a Counter, sorting the items first to make sure ('a', 'b') and ('b', 'a') are the "same".
>>> data = [('a','b'),('b','a'),('c','d')]
>>> data = [tuple(sorted(x)) for x in data]
>>> from collections import Counter
>>> c = Counter(data)
>>> c
Counter({('a', 'b'): 2, ('c', 'd'): 1})
Accessing the data
>>> c.keys()
[('a', 'b'), ('c', 'd')]
>>> c.values()
[2, 1]
>>> c.items()
[(('a', 'b'), 2), (('c', 'd'), 1)]
>>>

Use set for to check tuples regardless to order of their items.
my_list = [('a','b'),('b','a'),('c','d')]
my_list = map(set, my_list)
and Instead of
counts_from_list = [len(list(group)) for group in itertools.groupby(my_list)]
You should use
counts_from_list = [len(list(value)) for group, value in itertools.groupby(my_list)]
because itertools.groupby(my_list) returns key, value pairs
but I would advise You to use collections.Counter
collections.Counter(map(frozenset, [('a','b'),('b','a'),('c','d')])).values()
set is not appropriate here because it is not hashable.

Related

Count number of pairs in list disregarding order

In example, if I have the following script:
import collections
lst = [['a','b'],['b','a'],['c','d'],['c','d'],['d','c']]
print([(a, b, v) for (a, b),v in collections.Counter(map(tuple,lst)).items()])
I get as output:
[('a', 'b', 1), ('b', 'a', 1), ('c', 'd', 2), ('d', 'c', 1)]
Can I adapt my code to yield the following output:
[('a', 'b', 2), ('c', 'd', 3)]
So a function that doesn't include the order of the pairs?
Use a data structure that doesn't care about order. In this case you'll need frozenset instead of a regular set because Counter requires it to be hashable. But basically it's a simple substitution of tuple in your original code for frozenset:
print([(a, b, v) for (a, b),v in collections.Counter(map(frozenset,lst)).items()])
Output:
[('a', 'b', 2), ('d', 'c', 3)]
You could just sort each element in the list before counting, like so:
import collections
lst = [['a','b'],['b','a'],['c','d'],['c','d'],['d','c']]
sorted_lst = [sorted(x) for x in lst]
print([(a, b, v) for (a, b),v in collections.Counter(map(tuple,sorted_lst)).items()])
Output:
[('a', 'b', 2), ('c', 'd', 3)]
Sorting the list before you get collections of it solves the problem.
import collections
lst = [['a','b'],['b','a'],['c','d'],['c','d'],['d','c']]
sort_list = sorted(x) for x in lst
print([(a, b, v) for (a, b),v in collections.Counter(map(tuple,sort_list)).items()])
You could sort the values of the key a,b and use groupby in itertools and then sum all the elements in the group.
import itertools as it
lst = [['a','b'],['b','a'],['c','d'],['c','d'],['d','c']]
output = [(*group,sum(1 for i in elements)) for group,elements in it.groupby(lst,key=lambda x:sorted(x))]
print(output)
OUTPUT
[('a', 'b', 2), ('c', 'd', 3)]

Find tuple in list with same first item and return another list

I have a list like this in Python:
[('a', 'b'), ('a', 'c'),('d','f')]
and I want join items that have same first item and result like this:
[('a', 'b', 'c'),('d','f')]
Here is one way to do it. For efficiency, we build a dict with the first value as key. We keep the values in the order in which they appear (and the tuples in their original order as well, if you use Python >= 3.7 - otherwise you will have to use a collections.OrderedDict)
def join_by_first(sequences):
out = {}
for seq in sequences:
try:
out[seq[0]].extend(seq[1:])
except KeyError:
out[seq[0]] = list(seq)
return [tuple(values) for values in out.values()]
join_by_first([('a', 'b'), ('a', 'c'),('d','f')])
# [('a', 'b', 'c'), ('d', 'f')]
You can not edit tuples - the are immuteable. You can use lists and convert all back to tuples afterward:
data = [('a', 'b'), ('a', 'c'),('d','f')]
new_data = []
for d in data # loop over your data
if new_data and new_data[-1][0] == d[0]: # if something in new_data and 1st
new_data[-1].extend(d[1:]) # ones are identical: extend
else:
new_data.append( [a for a in d] ) # not same/nothing in: add items
print(new_data) # all are lists
new_data = [tuple(x) for x in new_data]
print(new_data) # all are tuples again
Output:
[['a', 'b', 'c'], ['d', 'f']] # all are lists
[('a', 'b', 'c'), ('d', 'f')] # all are tuples again
See Immutable vs Mutable types
I feel like the simplest solution is to build a dictionary in which:
keys are the first items in the tuples
values are lists comporting all second items from the tuples
Once we have that we can then build the output list:
from collections import defaultdict
def merge(pairs):
mapping = defaultdict(list)
for k, v in pairs:
mapping[k].append(v)
return [(k, *v) for k, v in mapping.items()]
pairs = [('a', 'b'), ('a', 'c'),('d','f')]
print(merge(pairs))
This outputs:
[('a', 'b', 'c'), ('d', 'f')]
This solution is in O(n) as we only iterate two times over each item from pairs.

Converting a list of tuple of 2+ elements to a dictionary

I have a list of tuples which have more that 2 elements where the first element of each tuple is a number which is unique across all the tuples. How can I convert that list of tuples into a dictionary where a key is the 1st element of each tuple?
I know I can use dict(mylist) but it works only for 2-element tuples.
Use CoryKramer's answer if you are using Python 2, avoid the explicit indexing in favor of Extended Iterable Unpacking if you are using Python 3.
>>> lst = [(1, 'a', 'b'), (2, 'c', 'd')]
>>> {key:tuple(rest) for key, *rest in lst}
{1: ('a', 'b'), 2: ('c', 'd')}
You can use a dict comprehension
>>> l = [(1, 'a', 'b'),
(2, 'c', 'd')]
>>> {i[0]: i[1:] for i in l}
{1: ('a', 'b'),
2: ('c', 'd')}

Sum numbers by letter in list of tuples

I have a list of tuples:
[ ('A',100), ('B',50), ('A',50), ('B',20), ('C',10) ]
I am trying to sum up all numbers that have the same letter. I.e. I want to output
[('A', 150), ('B', 70), ('C',10)]
I have tried using set to get the unique values but then when I try and compare the first elements to the set I get
TypeError: unsupported operand type(s) for +: 'int' and 'str'
Any quick solutions to match the numbers by letter?
Here is a one(and a half?)-liner: group by letter (for which you need to sort before), then take the sum of the second entries of your tuples.
from itertools import groupby
from operator import itemgetter
data = [('A', 100), ('B', 50), ('A', 50), ('B', 20), ('C', 10)]
res = [(k, sum(map(itemgetter(1), g)))
for k, g in groupby(sorted(data, key=itemgetter(0)), key=itemgetter(0))]
print(res)
// => [('A', 150), ('B', 70), ('C', 10)]
The above is O(n log n) — sorting is the most expensive operation. If your input list is truly large, you might be better served by the following O(n) approach:
from collections import defaultdict
data = [('A', 100), ('B', 50), ('A', 50), ('B', 20), ('C', 10)]
d = defaultdict(int)
for letter, value in data:
d[letter] += value
res = list(d.items())
print(res)
// => [('B', 70), ('C', 10), ('A', 150)]
>>> from collections import Counter
>>> c = Counter()
>>> for k, num in items:
c[k] += num
>>> c.items()
[('A', 150), ('C', 10), ('B', 70)]
Less efficient (but nicer looking) one liner version:
>>> Counter(k for k, num in items for i in range(num)).items()
[('A', 150), ('C', 10), ('B', 70)]
How about this: (assuming a is the name of the tuple you have provided)
letters_to_numbers = {}
for i in a:
if i[0] in letters_to_numbers:
letters_to_numbers[i[0]] += i[1]
else:
letters_to_numbers[i[0]] = i[1]
b = letters_to_numbers.items()
The elements of the resulting tuple b will be in no particular order.
In order to achieve this, firstly create a dictionary to store your values. Then convert the dict object to tuple list using .items() Below is the sample code on how to achieve this:
my_list = [ ('A',100), ('B',50), ('A',50), ('B',20), ('C',10) ]
my_dict = {}
for key, val in my_list:
if key in my_dict:
my_dict[key] += val
else:
my_dict[key] = val
my_dict.items()
# Output: [('A', 150), ('C', 10), ('B', 70)]
What is generating the list of tuples? Is it you? If so, why not try a defaultdict(list) to append the values to the right letter at the time of making the list of tuples. Then you can simply sum them. See example below.
>>> from collections import defaultdict
>>> val_store = defaultdict(list)
>>> # next lines are me simulating the creation of the tuple
>>> val_store['A'].append(10)
>>> val_store['B'].append(20)
>>> val_store['C'].append(30)
>>> val_store
defaultdict(<class 'list'>, {'C': [30], 'A': [10], 'B': [20]})
>>> val_store['A'].append(10)
>>> val_store['C'].append(30)
>>> val_store['B'].append(20)
>>> val_store
defaultdict(<class 'list'>, {'C': [30, 30], 'A': [10, 10], 'B': [20, 20]})
>>> for val in val_store:
... print(val, sum(val_store[val]))
...
C 60
A 20
B 40
Try this:
a = [('A',100), ('B',50), ('A',50), ('B',20), ('C',10) ]
letters = set([s[0] for s in a])
new_a = []
for l in letters:
nums = [s[1] for s in a if s[0] == l]
new_a.append((l, sum(nums)))
print new_a
Results:
[('A', 150), ('C', 10), ('B', 70)]
A simpler approach
x = [('A',100),('B',50),('A',50),('B',20),('C',10)]
y = {}
for _tuple in x:
if _tuple[0] in y:
y[_tuple[0]] += _tuple[1]
else:
y[_tuple[0]] = _tuple[1]
print [(k,v) for k,v in y.iteritems()]
A one liner:
>>> x = [ ('A',100), ('B',50), ('A',50), ('B',20), ('C',10) ]
>>> {
... k: reduce(lambda u, v: u + v, [y[1] for y in x if y[0] == k])
... for k in [y[0] for y in x]
... }.items()
[('A', 150), ('C', 10), ('B', 70)]

Removing Python List Items that Contain 2 of the Same Elements

I have a list myList, which contains items of the form
myList = [('a','b',3), ('b','a',3), ('c','d',1), ('d','c',1), ('e','f',4)]
The first and second items are equal and so are the third and fourth, although their first and second elements are swapped. I would like to keep only one of each so that the final list looks like this:
a,b,3
c,d,1
e,f,4
Use sets and frozensets to remove equal elements:
>>> mySet = [frozenset(x) for x in myList]
>>> [tuple(x) for x in set(mySet)]
[('a', 3, 'b'), (4, 'e', 'f'), (1, 'c', 'd')]
the result can then be sorted however you'd like.
Take each tuple in myList, convert it to a list and apply sorted(). This results in a list filled with sorted inner lists which would look like.
myList = [('a','b',3), ('b','a',3), ('c','d',1), ('d','c',1), ('e','f',4)]
sorted_inner_list = [sorted(list(element)) for element in myList]
output = list(set(map(tuple,sorted_inner_list)))
If yo want to keep order of tuple, and always keep first tuple when there are duplicates, you can do :
>>> sets = [ frozenset(x) for x in myList ]
>>> filtered = [ myList[i] for i in range(len(myList)) if set(myList[i]) not in sets[:i] ]
>>> filtered
[('a', 'b', 3), ('c', 'd', 1), ('e', 'f', 4)]
If you prefer not to use another variable :
filtered = [ myList[i] for i in range(len(myList))
if set(myList[i]) not in [ frozenset(x) for x in myList ][:i] ]
You can use this to maintain the order of your tuples inside list and eliminate the duplicates by using set
>>> myList = [('a','b',3), ('b','a',3), ('c','d',1), ('d','c',1), ('e','f',4)]
>>> _ = lambda item: ([str,int].index(type(item)), item)
>>> sorted(set([tuple(sorted(i, key = _)) for i in myList]), key=lambda x: x[0])
Output:
[('a', 'b', 3), ('c', 'd', 1), ('e', 'f', 4)]

Categories

Resources