Note: I know how I can do this of course in an explicit for loop but I am looking for a solution that is a bit more readable.
If possible, I'd like to solve this by using some of the built-in functionalities. Best case scenario is something like
result = [ *groupby logic* ]
Assuming the following list:
import numpy as np
np.random.seed(42)
N = 10
my_tuples = list(zip(np.random.choice(list('ABC'), size=N),
np.random.choice(range(100), size=N)))
where my_tuples is
[('C', 74),
('A', 74),
('C', 87),
('C', 99),
('A', 23),
('A', 2),
('C', 21),
('B', 52),
('C', 1),
('C', 87)]
How can I group the indices (integer value at index 1 of each tuple) by the labels A, B and C using groupby from itertools?
If I do something like this:
from itertools import groupby
#..
[(k,*v) for k, v in dict(groupby(my_tuples, lambda x: x[0])).items()]
I see that this delivers the wrong result.
The desired outcome should be
{
'A': [74, 23, 2],
# ..
}
The simplest solution is probably not to use groupby at all.
from collections import defaultdict
d = defaultdict(list)
for k, v in my_tuples:
d[k].append(v)
The reason I wouldn't use groupby is because groupby(iterable) groups items in iterable that are adjacent. So to get all of the 'C' values together, you would first have to sort your list. Unless you have some reason to use groupby, it's unnecessary.
You should use collections.defaultdict for an O(n) solution, see #PatrickHaugh's answer.
Using itertools.groupby requires sorting before grouping, incurring O(n log n) complexity:
from itertools import groupby
from operator import itemgetter
sorter = sorted(my_tuples, key=itemgetter(0))
grouper = groupby(sorter, key=itemgetter(0))
res = {k: list(map(itemgetter(1), v)) for k, v in grouper}
print(res)
{'A': [74, 23, 2],
'B': [52],
'C': [74, 87, 99, 21, 1, 87]}
Related
I found a similar question here:
How to group a list of tuples/objects by similar index/attribute in python?
which talks about grouping a list of tuples by similar attributes. I have a list of objects; the objects have a 'day' attribute and I want to group these objects based on if they have consecutive 'day' values.
e.g
input = [('a',12),('b',13)('c',15),('d',16),('e',17)]
output:
[[('a',12),('b',13)],[('c',15),('d',16),('e',17)]]
You can do the following:
from itertools import groupby, count
from operator import itemgetter
data = [('a', 12), ('b', 13), ('c', 15), ('c', 16), ('c', 17)]
def key(i, cursor=count(0)):
"""Generate the same key for consecutive numbers"""
return i[1] - next(cursor)
ordered = sorted(data, key=itemgetter(1))
result = [list(group) for _, group in groupby(ordered, key=key)]
print(result)
Output
[[('a', 12), ('b', 13)], [('c', 15), ('c', 16), ('c', 17)]]
The above is based on an old example found in the documentation of Python 2.6, here.
To better illustrate, what is happening, for the following example:
lst = [12, 13, 15, 16, 17]
print([v - i for i, v in enumerate(lst)])
The generated keys are:
[12, 12, 13, 13, 13]
As it can be seen, consecutive runs have the same key.
I have a list of tuples, each tuple of which contains one string and two integers. The list looks like this:
x = [('a',1,2), ('b',3,4), ('x',5,6), ('a',2,1)]
The list contains thousands of such tuples. Now if I want to get unique combinations, I can do the frozenset on my list as follows:
y = set(map(frozenset, x))
This gives me the following result:
{frozenset({'a', 2, 1}), frozenset({'x', 5, 6}), frozenset({3, 'b', 4})}
I know that set is an unordered data structure and this is normal case but I want to preserve the order of the elements here so that I can thereafter insert the elements in a pandas dataframe. The dataframe will look like this:
Name Marks1 Marks2
0 a 1 2
1 b 3 4
2 x 5 6
Instead of operating on the set of frozensets directly you could use that only as a helper data-structure - like in the unique_everseen recipe in the itertools section (copied verbatim):
from itertools import filterfalse
def unique_everseen(iterable, key=None):
"List unique elements, preserving order. Remember all elements ever seen."
# unique_everseen('AAAABBBCCDAABBB') --> A B C D
# unique_everseen('ABBCcAD', str.lower) --> A B C D
seen = set()
seen_add = seen.add
if key is None:
for element in filterfalse(seen.__contains__, iterable):
seen_add(element)
yield element
else:
for element in iterable:
k = key(element)
if k not in seen:
seen_add(k)
yield element
Basically this would solve the issue when you use key=frozenset:
>>> x = [('a',1,2), ('b',3,4), ('x',5,6), ('a',2,1)]
>>> list(unique_everseen(x, key=frozenset))
[('a', 1, 2), ('b', 3, 4), ('x', 5, 6)]
This returns the elements as-is and it also maintains the relative order between the elements.
No ordering with frozensets. You can instead create sorted tuples to check for the existence of an item, adding the original if the tuple does not exist in the set:
y = set()
lst = []
for i in x:
t = tuple(sorted(i, key=str)
if t not in y:
y.add(t)
lst.append(i)
print(lst)
# [('a', 1, 2), ('b', 3, 4), ('x', 5, 6)]
The first entry gets preserved.
There are some quite useful functions in NumPy which can help you to solve this problem.
import numpy as np
chrs, indices = np.unique(list(map(lambda x:x[0], x)), return_index=True)
chrs, indices
>> (array(['a', 'b', 'x'],
dtype='<U1'), array([0, 1, 2]))
[x[indices[i]] for i in range(indices.size)]
>> [('a', 1, 2), ('b', 3, 4), ('x', 5, 6)]
You can do it by simple using the zip to maintain the order in the frozenset.
Give this a try pls.
l = ['col1','col2','col3','col4']
>>> frozenset(l)
--> frozenset({'col2', 'col4', 'col3', 'col1'})
>>> frozenset(zip(*zip(l)))
--> frozenset({('col1', 'col2', 'col3', 'col4')})
Taking an example from the question asked:
>>> x = [('a',1,2), ('b',3,4), ('x',5,6), ('a',2,1)]
>>> frozenset(zip(*zip(x)))
--> frozenset({(('a', 1, 2), ('b', 3, 4), ('x', 5, 6), ('a', 2, 1))})
I have a list
category = ['Toy','Cloth','Food','Auto']
I also have a dictionary (where first A, B, C... are item names, first element in each list is category and the second is the price.
inventory = {'A':['Food', 5], 'B':['Food', 6],
'C':['Auto', 5], 'D':['Cloth', 14],
'E':['Toy',19], 'F':['Cloth', 13], 'G':['Toy',20], 'H':['Toy',11]}
I would like this to be sorted first by the order of the category in the list, then secondarily, I would like them to be ordered by the price (while category order maintained) such that the result looks like this...
inventory_sorted = {'G':['Toy',20],'E':['Toy',19], 'H':['Toy',11], 'D':['Cloth', 14],
'F':['Cloth', 13], 'B':['Food', 6],'A':['Food', 5],'C':['Auto', 5],}
Could you please offer me two step process where first is about sorting by the list's category and the second is about sorting (inversely) by the price with the category sorting preserved. If you are using Lambda, please offer me a bit of narrative so that I could understand better. I am new to Lamda expressions. Thank you so much
You cannot sort a Python dict object as they are not ordered. At most, you can produce a sorted sequence of (key-value) pairs. You could then feed those pairs to a collections.OrderedDict() object if you want to have a mapping that includes the order.
Convert your category order to a mapping to get an order, then use that in a sort key together with the price. Since you want your prices sorted in descending order, you need to return the negative price:
cat_order = {cat: i for i, cat in enumerate(category)}
inventory_sorted = sorted(inventory.items(),
key=lambda i: (cat_order[i[1][0]], -i[1][1]))
The i argument is passed each key-value pair; i[1] is then the value, and i[1][0] the category, i[1][1] the price.
This produces key-value pairs in the specified order:
>>> category = ['Toy','Cloth','Food','Auto']
>>> inventory = {'A':['Food', 5], 'B':['Food', 6],
... 'C':['Auto', 5], 'D':['Cloth', 14],
... 'E':['Toy',19], 'F':['Cloth', 13], 'G':['Toy',20], 'H':['Toy',11]}
>>> cat_order = {cat: i for i, cat in enumerate(category)}
>>> sorted(inventory.items(), key=lambda i: (cat_order[i[1][0]], -i[1][1]))
[('G', ['Toy', 20]), ('E', ['Toy', 19]), ('H', ['Toy', 11]), ('D', ['Cloth', 14]), ('F', ['Cloth', 13]), ('B', ['Food', 6]), ('A', ['Food', 5]), ('C', ['Auto', 5])]
>>> from pprint import pprint
>>> pprint(_)
[('G', ['Toy', 20]),
('E', ['Toy', 19]),
('H', ['Toy', 11]),
('D', ['Cloth', 14]),
('F', ['Cloth', 13]),
('B', ['Food', 6]),
('A', ['Food', 5]),
('C', ['Auto', 5])]
An OrderedDict() object directly accepts this sequence:
>>> from collections import OrderedDict
>>> OrderedDict(sorted(inventory.items(), key=lambda i: (cat_order[i[1][0]], -i[1][1])))
OrderedDict([('G', ['Toy', 20]), ('E', ['Toy', 19]), ('H', ['Toy', 11]), ('D', ['Cloth', 14]), ('F', ['Cloth', 13]), ('B', ['Food', 6]), ('A', ['Food', 5]), ('C', ['Auto', 5])])
You can kind of get this with the following:
sorted(inventory.items(), key=lambda t: category.index(t[1][0]))
This works because:
inventory.items() turns your dict into a list of tuples, which can retain an order
The key function orders based on where t[1][0] appears in your category list, and
t will be something like ('G', ('Toy', 20)) so t[1] is ('Toy', 20) and t[1][0] is 'Toy'.
But you cannot go back to a standard dict from this (even though it would be very easy) because you would lose your ordering again, rendering the sort pointless. So you will either have to work with the data in this format, or use something like collections.OrderedDict as already mentioned.
Another completely different way of doing this, which is rather powerful, is to
use the python class to make the data structure,
store the data in a list
sort the list with key=attrgetter('variable')
Here's a snippet of example code:
class Item:
def __init__(self,label,category,number):
self.label = label
self.category = category
self.number = number
def __repr__(self):
return "Item(%s,%s,%d)"%(self.label,self.category,self.number)
def __str__(self):
return "%s: %s,%d"%(self.label,self.category,self.number)
inventory = []
inventory.append(Item("A","Food",5))
inventory.append(Item("B","Food",6))
inventory.append(Item("C","Auto",5))
inventory.append(Item("D","Cloth",14))
inventory.append(Item("E","Toy",19))
inventory.append(Item("F","Cloth",13))
inventory.append(Item("G","Toy",20))
inventory.append(Item("H","Toy",11))
inventory.sort(key=attrgetter('number'),reverse=True)
inventory.sort(key=attrgetter('category'))
The advantage of this, is that the sort is designed to maintain the order from the previous sort, so calling it twice (as I've done above) allows you to sort it by category primarily, but sort it by number secondarily. You could do this for as many sort keys as you want.
You can also add whatever other information you want to your Items.
categories = ['Toy','Cloth','Food','Auto']
inventory = {'A':['Food', 5], 'B':['Food', 6],
'C':['Auto', 5], 'D':['Cloth', 14],
'E':['Toy',19], 'F':['Cloth', 13], 'G':['Toy',20], 'H':['Toy',11]}
from collections import OrderedDict
inventory_sorted = OrderedDict()
for category in categories:
same_category = [(key, num) for key, (cat, num) in inventory.items() if cat == category]
for (key, num) in sorted(same_category, key=lambda (_, num): num, reverse=True):
inventory_sorted[key] = [category, num]
for key, value in inventory_sorted.items():
print key, value
Since dictionaries are unordered we will be using OrderedDict to achieve your goal.
How it works:
same_category is a simple list comprehension which filters items if they are the same category as the current loop category, it will form a list consisting of tuples of (key, num) pairs.
then we sort this new list using the number, the line which does this is key=lambda (_, num): num, this unpacks the tuple, discards the key using _, and sort by the num, we reverse it so it shows high numbers first.
then we add each of those [category, num] pairs to to the OrderedDict inventory_sorted in the key.
Result:
G ['Toy', 20]
E ['Toy', 19]
H ['Toy', 11]
D ['Cloth', 14]
F ['Cloth', 13]
B ['Food', 6]
A ['Food', 5]
C ['Auto', 5]
Referring to the below code, the first for loop can be easily used for the sorting in the dictionary and it works very well.
import operator
myExample = {'Item1': 3867, 'Item2': 20, 'Item3': 400, 'Item4': 100, 'Item5': 2870,
'Item6': 22, 'Item7': 528, 'Item8': 114}
for w in sorted(myExample, key=myExample.get, reverse=False):
print w, myExample[w]
print ("\n")
for w in sorted(myExample, key=operator.itemgetter(0)):
print w, myExample[w]
But somehow I was told by the other thread, it is advice to use operator.itemgetter(index) method to perform the sorting due to the efficiency reason. But the second for loop is never works in my case.
Perhaps I should go through the documentation first and this is what I get:
>>> itemgetter(1)('ABCDEFG')
'B'
>>> itemgetter(1,3,5)('ABCDEFG')
('B', 'D', 'F')
>>> itemgetter(slice(2,None))('ABCDEFG')
'CDEFG'
The example is simple, But to be honest, I don't know how to link this back to the dictionary case. How should I use the index inside the itemgetter and different index will have what kind of impact? I tried all index from 0 to 4 and none of them give me an ascending sorting result and error will occur starting from index 5.
In a document, there is a example for tuple case, but it's not works for the dictionary.
>>> inventory = [('apple', 3), ('banana', 2), ('pear', 5), ('orange', 1)]
>>> getcount = itemgetter(1)
>>> map(getcount, inventory)
[3, 2, 5, 1]
>>> sorted(inventory, key=getcount)
[('orange', 1), ('banana', 2), ('apple', 3), ('pear', 5)]
Back to the Origin, I still hope to understand how to use the index inside the itemgetter and what it does in different cases like tuple vs. dictionary vs. list vs. only a string vs. tuple inside a list, and etc.
Please advise.
In Python 2.7, I have a list named data, with some tuples, indexed by the first attribute, e.g.
data = [('A', 1, 2, 3), ('A', 10, 20, 30), ('A', 100, 200, 300),
('B', 1, 2, 3), ('B', 10, 20, 30),
('C', 15, 25, 30), ('C', 1, 20, 22), ('C', 100, 3, 8)]
There is a function f() that will work on any slice of data with the first index matching, e.g.
f( [x[1:] for x in data[:3] )
I want to call f (in proper sequence) on each slice of the array (group of tuples with the same first index) and compile the list of resulting values in a list.
I'm just starting with Python. Here is my solution, is there a better (faster or more elegant) way to do this?
slices = [x for x in xrange(len(data)) if data[x][0] != data[x-1][0]]
result = [f(data[start:end] for start, end in zip( [slices[:-1], slices[1:] )]
Thank you.
If you want to group on the first item of each tuple, you can do so with itertools.groupby():
from itertools import groupby
from operator import itemgetter
[f(list(g)) for k, g in groupby(data, key=itemgetter(0))]
The itemgetter(0) returns the first element of each tuple, which groupby() then gives you iterables for each group based on that value. Looping over each individual g result will then give you a sequence of tuples with just 'A', then 'B', etc.