How to group objects (tuples) based on having consecutive attribute value? - python

I found a similar question here:
How to group a list of tuples/objects by similar index/attribute in python?
which talks about grouping a list of tuples by similar attributes. I have a list of objects; the objects have a 'day' attribute and I want to group these objects based on if they have consecutive 'day' values.
e.g
input = [('a',12),('b',13)('c',15),('d',16),('e',17)]
output:
[[('a',12),('b',13)],[('c',15),('d',16),('e',17)]]

You can do the following:
from itertools import groupby, count
from operator import itemgetter
data = [('a', 12), ('b', 13), ('c', 15), ('c', 16), ('c', 17)]
def key(i, cursor=count(0)):
"""Generate the same key for consecutive numbers"""
return i[1] - next(cursor)
ordered = sorted(data, key=itemgetter(1))
result = [list(group) for _, group in groupby(ordered, key=key)]
print(result)
Output
[[('a', 12), ('b', 13)], [('c', 15), ('c', 16), ('c', 17)]]
The above is based on an old example found in the documentation of Python 2.6, here.
To better illustrate, what is happening, for the following example:
lst = [12, 13, 15, 16, 17]
print([v - i for i, v in enumerate(lst)])
The generated keys are:
[12, 12, 13, 13, 13]
As it can be seen, consecutive runs have the same key.

Related

How do I print out the elements that is in both tuples?

a = (('we', 23), ('b', 2))
b = (('we', 3), ('e', 3), ('b', 4))
#wanted_result = (('we', 3), ('b', 4), ('we', 23), ('b', 2))
How can I receive the tuple that contains the same string in both a and b
like the result I have written below the code?
I would prefer using list comprehensions using filters btw... would that be available?
You can use set intersection:
keys = dict(a).keys() & dict(b)
tuple(t for t in a + b if t[0] in keys)
You can make a set of the intersection between the first part of the tuples in both lists. Then use a list comprehension to extract the tuples that match this common set:
a = (('we', 23), ('b', 2))
b = (('we', 3), ('e', 3), ('b', 4))
common = set(next(zip(*a))) & set(next(zip(*b)))
result = [t for t in a+b if t[0] in common]
[('we', 23), ('b', 2), ('we', 3), ('b', 4)]
You can also do something similar using the Counter class from collections (by filtering tuples on string counts greater than 1:
from collections import Counter
common = Counter(next(zip(*a,*b)))
result = [(s,n) for (s,n) in a+b if common[s]>1]
If you want a single list comprehension, given that your tuples have exactly two values, you can pair each one with a dictionary formed form the other and use the dictionary as a filter mechanism:
result = [t for d,tl in [(dict(b),a),(dict(a),b)] for t in tl if t[0] in d]
Adding two list comprehensions (i.e. concatenating lists):
print([bi for bi in b if any(bi[0]==i[0] for i in a)] +
[ai for ai in a if any(ai[0]==i[0] for i in b)])
# Output: [('we', 3), ('b', 4), ('we', 23), ('b', 2)]
Explanation
[bi for bi in b if any(bi[0]==i[0] for i in a)] # ->>
# Take tuples from b whose first element equals one of the
# first elements of a
[ai for ai in a if ai[0] in [i[0] for i in b]]
# Similarly take tuples from a whose first elements equals one of the
# first elements of b
another variation with sets
filtered_keys=set(k for k,v in a)&set(k for k,v in b)
res=tuple((k, v) for k, v in [*a, *b] if k in filtered_keys)
>>> (('we', 23), ('b', 2), ('we', 3), ('b', 4))

How to group list of tuples?

Note: I know how I can do this of course in an explicit for loop but I am looking for a solution that is a bit more readable.
If possible, I'd like to solve this by using some of the built-in functionalities. Best case scenario is something like
result = [ *groupby logic* ]
Assuming the following list:
import numpy as np
np.random.seed(42)
N = 10
my_tuples = list(zip(np.random.choice(list('ABC'), size=N),
np.random.choice(range(100), size=N)))
where my_tuples is
[('C', 74),
('A', 74),
('C', 87),
('C', 99),
('A', 23),
('A', 2),
('C', 21),
('B', 52),
('C', 1),
('C', 87)]
How can I group the indices (integer value at index 1 of each tuple) by the labels A, B and C using groupby from itertools?
If I do something like this:
from itertools import groupby
#..
[(k,*v) for k, v in dict(groupby(my_tuples, lambda x: x[0])).items()]
I see that this delivers the wrong result.
The desired outcome should be
{
'A': [74, 23, 2],
# ..
}
The simplest solution is probably not to use groupby at all.
from collections import defaultdict
d = defaultdict(list)
for k, v in my_tuples:
d[k].append(v)
The reason I wouldn't use groupby is because groupby(iterable) groups items in iterable that are adjacent. So to get all of the 'C' values together, you would first have to sort your list. Unless you have some reason to use groupby, it's unnecessary.
You should use collections.defaultdict for an O(n) solution, see #PatrickHaugh's answer.
Using itertools.groupby requires sorting before grouping, incurring O(n log n) complexity:
from itertools import groupby
from operator import itemgetter
sorter = sorted(my_tuples, key=itemgetter(0))
grouper = groupby(sorter, key=itemgetter(0))
res = {k: list(map(itemgetter(1), v)) for k, v in grouper}
print(res)
{'A': [74, 23, 2],
'B': [52],
'C': [74, 87, 99, 21, 1, 87]}

Concatanate tuples in list of tuples

I have a list of tuples that looks something like this:
tuples = [('a', 10, 11), ('b', 13, 14), ('a', 1, 2)]
Is there a way that i can join them together based on the first index of every tuple to make a each tuple contain 5 elements. I know for a fact there isn't more that 2 of each letter in the tuples, Ie more than 2 'a's or 'b's in the entire list. The other requirement is to use Python2.6. I cant figure out the logic to it. Any help is greatly appreciated.
Desired Output:
tuples = [('a', 10, 11, 1, 2), ('b', 13, 14, 0, 0)]
I have tried creating a new list of first elements and adding the other elements to it but then I only have a list and not list of tuples.
EDIT to provide previous tried code,
Created a new list: templist, resultList = [], []
Populate templist with the first element in every tuple:
for i in tuples:
templist.append(i[0])
elemlist = list(set(templist))
for i in elemlist:
for j in tuples:
if i == j[0]:
resultlist.append((i, j[1], j[2]))
This just returns the same list of tuples, How can i hold onto it and append every j[1] j[2] that corresponds to correct j[0]
Assuming there are only one or two of every letter in the list as stated:
import itertools
tuples = [('a', 10, 11), ('b', 13, 14), ('a', 1, 2)]
result = []
key = lambda t: t[0]
for letter,items in itertools.groupby(sorted(tuples,key=key),key):
items = list(items)
if len(items) == 1:
result.append(items[0]+(0,0))
else:
result.append(items[0]+items[1][1:])
print(result)
Output:
[('a', 10, 11, 1, 2), ('b', 13, 14, 0, 0)]

python dictionary sorted by list

I have a list
category = ['Toy','Cloth','Food','Auto']
I also have a dictionary (where first A, B, C... are item names, first element in each list is category and the second is the price.
inventory = {'A':['Food', 5], 'B':['Food', 6],
'C':['Auto', 5], 'D':['Cloth', 14],
'E':['Toy',19], 'F':['Cloth', 13], 'G':['Toy',20], 'H':['Toy',11]}
I would like this to be sorted first by the order of the category in the list, then secondarily, I would like them to be ordered by the price (while category order maintained) such that the result looks like this...
inventory_sorted = {'G':['Toy',20],'E':['Toy',19], 'H':['Toy',11], 'D':['Cloth', 14],
'F':['Cloth', 13], 'B':['Food', 6],'A':['Food', 5],'C':['Auto', 5],}
Could you please offer me two step process where first is about sorting by the list's category and the second is about sorting (inversely) by the price with the category sorting preserved. If you are using Lambda, please offer me a bit of narrative so that I could understand better. I am new to Lamda expressions. Thank you so much
You cannot sort a Python dict object as they are not ordered. At most, you can produce a sorted sequence of (key-value) pairs. You could then feed those pairs to a collections.OrderedDict() object if you want to have a mapping that includes the order.
Convert your category order to a mapping to get an order, then use that in a sort key together with the price. Since you want your prices sorted in descending order, you need to return the negative price:
cat_order = {cat: i for i, cat in enumerate(category)}
inventory_sorted = sorted(inventory.items(),
key=lambda i: (cat_order[i[1][0]], -i[1][1]))
The i argument is passed each key-value pair; i[1] is then the value, and i[1][0] the category, i[1][1] the price.
This produces key-value pairs in the specified order:
>>> category = ['Toy','Cloth','Food','Auto']
>>> inventory = {'A':['Food', 5], 'B':['Food', 6],
... 'C':['Auto', 5], 'D':['Cloth', 14],
... 'E':['Toy',19], 'F':['Cloth', 13], 'G':['Toy',20], 'H':['Toy',11]}
>>> cat_order = {cat: i for i, cat in enumerate(category)}
>>> sorted(inventory.items(), key=lambda i: (cat_order[i[1][0]], -i[1][1]))
[('G', ['Toy', 20]), ('E', ['Toy', 19]), ('H', ['Toy', 11]), ('D', ['Cloth', 14]), ('F', ['Cloth', 13]), ('B', ['Food', 6]), ('A', ['Food', 5]), ('C', ['Auto', 5])]
>>> from pprint import pprint
>>> pprint(_)
[('G', ['Toy', 20]),
('E', ['Toy', 19]),
('H', ['Toy', 11]),
('D', ['Cloth', 14]),
('F', ['Cloth', 13]),
('B', ['Food', 6]),
('A', ['Food', 5]),
('C', ['Auto', 5])]
An OrderedDict() object directly accepts this sequence:
>>> from collections import OrderedDict
>>> OrderedDict(sorted(inventory.items(), key=lambda i: (cat_order[i[1][0]], -i[1][1])))
OrderedDict([('G', ['Toy', 20]), ('E', ['Toy', 19]), ('H', ['Toy', 11]), ('D', ['Cloth', 14]), ('F', ['Cloth', 13]), ('B', ['Food', 6]), ('A', ['Food', 5]), ('C', ['Auto', 5])])
You can kind of get this with the following:
sorted(inventory.items(), key=lambda t: category.index(t[1][0]))
This works because:
inventory.items() turns your dict into a list of tuples, which can retain an order
The key function orders based on where t[1][0] appears in your category list, and
t will be something like ('G', ('Toy', 20)) so t[1] is ('Toy', 20) and t[1][0] is 'Toy'.
But you cannot go back to a standard dict from this (even though it would be very easy) because you would lose your ordering again, rendering the sort pointless. So you will either have to work with the data in this format, or use something like collections.OrderedDict as already mentioned.
Another completely different way of doing this, which is rather powerful, is to
use the python class to make the data structure,
store the data in a list
sort the list with key=attrgetter('variable')
Here's a snippet of example code:
class Item:
def __init__(self,label,category,number):
self.label = label
self.category = category
self.number = number
def __repr__(self):
return "Item(%s,%s,%d)"%(self.label,self.category,self.number)
def __str__(self):
return "%s: %s,%d"%(self.label,self.category,self.number)
inventory = []
inventory.append(Item("A","Food",5))
inventory.append(Item("B","Food",6))
inventory.append(Item("C","Auto",5))
inventory.append(Item("D","Cloth",14))
inventory.append(Item("E","Toy",19))
inventory.append(Item("F","Cloth",13))
inventory.append(Item("G","Toy",20))
inventory.append(Item("H","Toy",11))
inventory.sort(key=attrgetter('number'),reverse=True)
inventory.sort(key=attrgetter('category'))
The advantage of this, is that the sort is designed to maintain the order from the previous sort, so calling it twice (as I've done above) allows you to sort it by category primarily, but sort it by number secondarily. You could do this for as many sort keys as you want.
You can also add whatever other information you want to your Items.
categories = ['Toy','Cloth','Food','Auto']
inventory = {'A':['Food', 5], 'B':['Food', 6],
'C':['Auto', 5], 'D':['Cloth', 14],
'E':['Toy',19], 'F':['Cloth', 13], 'G':['Toy',20], 'H':['Toy',11]}
from collections import OrderedDict
inventory_sorted = OrderedDict()
for category in categories:
same_category = [(key, num) for key, (cat, num) in inventory.items() if cat == category]
for (key, num) in sorted(same_category, key=lambda (_, num): num, reverse=True):
inventory_sorted[key] = [category, num]
for key, value in inventory_sorted.items():
print key, value
Since dictionaries are unordered we will be using OrderedDict to achieve your goal.
How it works:
same_category is a simple list comprehension which filters items if they are the same category as the current loop category, it will form a list consisting of tuples of (key, num) pairs.
then we sort this new list using the number, the line which does this is key=lambda (_, num): num, this unpacks the tuple, discards the key using _, and sort by the num, we reverse it so it shows high numbers first.
then we add each of those [category, num] pairs to to the OrderedDict inventory_sorted in the key.
Result:
G ['Toy', 20]
E ['Toy', 19]
H ['Toy', 11]
D ['Cloth', 14]
F ['Cloth', 13]
B ['Food', 6]
A ['Food', 5]
C ['Auto', 5]

Python accessing slices of a list in order

In Python 2.7, I have a list named data, with some tuples, indexed by the first attribute, e.g.
data = [('A', 1, 2, 3), ('A', 10, 20, 30), ('A', 100, 200, 300),
('B', 1, 2, 3), ('B', 10, 20, 30),
('C', 15, 25, 30), ('C', 1, 20, 22), ('C', 100, 3, 8)]
There is a function f() that will work on any slice of data with the first index matching, e.g.
f( [x[1:] for x in data[:3] )
I want to call f (in proper sequence) on each slice of the array (group of tuples with the same first index) and compile the list of resulting values in a list.
I'm just starting with Python. Here is my solution, is there a better (faster or more elegant) way to do this?
slices = [x for x in xrange(len(data)) if data[x][0] != data[x-1][0]]
result = [f(data[start:end] for start, end in zip( [slices[:-1], slices[1:] )]
Thank you.
If you want to group on the first item of each tuple, you can do so with itertools.groupby():
from itertools import groupby
from operator import itemgetter
[f(list(g)) for k, g in groupby(data, key=itemgetter(0))]
The itemgetter(0) returns the first element of each tuple, which groupby() then gives you iterables for each group based on that value. Looping over each individual g result will then give you a sequence of tuples with just 'A', then 'B', etc.

Categories

Resources