Group list by values [duplicate] - python

This question already has answers here:
Python group by
(9 answers)
Closed last month.
Let's say I have a list like this:
mylist = [["A",0], ["B",1], ["C",0], ["D",2], ["E",2]]
How can I most elegantly group this to get this list output in Python:
[["A", "C"], ["B"], ["D", "E"]]
So the values are grouped by the secound value but the order is preserved...

values = set(map(lambda x:x[1], mylist))
newlist = [[y[0] for y in mylist if y[1]==x] for x in values]

from operator import itemgetter
from itertools import groupby
lki = [["A",0], ["B",1], ["C",0], ["D",2], ["E",2]]
lki.sort(key=itemgetter(1))
glo = [[x for x,y in g]
for k,g in groupby(lki,key=itemgetter(1))]
print glo
.
EDIT
Another solution that needs no import , is more readable, keeps the orders, and is 22 % shorter than the preceding one:
oldlist = [["A",0], ["B",1], ["C",0], ["D",2], ["E",2]]
newlist, dicpos = [],{}
for val,k in oldlist:
if k in dicpos:
newlist[dicpos[k]].extend(val)
else:
newlist.append([val])
dicpos[k] = len(dicpos)
print newlist

Howard's answer is concise and elegant, but it's also O(n^2) in the worst case. For large lists with large numbers of grouping key values, you'll want to sort the list first and then use itertools.groupby:
>>> from itertools import groupby
>>> from operator import itemgetter
>>> seq = [["A",0], ["B",1], ["C",0], ["D",2], ["E",2]]
>>> seq.sort(key = itemgetter(1))
>>> groups = groupby(seq, itemgetter(1))
>>> [[item[0] for item in data] for (key, data) in groups]
[['A', 'C'], ['B'], ['D', 'E']]
Edit:
I changed this after seeing eyequem's answer: itemgetter(1) is nicer than lambda x: x[1].

>>> import collections
>>> D1 = collections.defaultdict(list)
>>> for element in L1:
... D1[element[1]].append(element[0])
...
>>> L2 = D1.values()
>>> print L2
[['A', 'C'], ['B'], ['D', 'E']]
>>>

I don't know about elegant, but it's certainly doable:
oldlist = [["A",0], ["B",1], ["C",0], ["D",2], ["E",2]]
# change into: list = [["A", "C"], ["B"], ["D", "E"]]
order=[]
dic=dict()
for value,key in oldlist:
try:
dic[key].append(value)
except KeyError:
order.append(key)
dic[key]=[value]
newlist=map(dic.get, order)
print newlist
This preserves the order of the first occurence of each key, as well as the order of items for each key. It requires the key to be hashable, but does not otherwise assign meaning to it.

len = max(key for (item, key) in list)
newlist = [[] for i in range(len+1)]
for item,key in list:
newlist[key].append(item)
You can do it in a single list comprehension, perhaps more elegant but O(n**2):
[[item for (item,key) in list if key==i] for i in range(max(key for (item,key) in list)+1)]

>>> xs = [["A",0], ["B",1], ["C",0], ["D",2], ["E",2]]
>>> xs.sort(key=lambda x: x[1])
>>> reduce(lambda l, x: (l.append([x]) if l[-1][0][1] != x[1] else l[-1].append(x)) or l, xs[1:], [[xs[0]]]) if xs else []
[[['A', 0], ['C', 0]], [['B', 1]], [['D', 2], ['E', 2]]]
Basically, if the list is sorted, it is possible to reduce by looking at the last group constructed by the previous steps - you can tell if you need to start a new group, or modify an existing group. The ... or l bit is a trick that enables us to use lambda in Python. (append returns None. It is always better to return something more useful than None, but, alas, such is Python.)

if using convtools library, which provides a lot of data processing primitives and generates ad hoc code under the hood, then:
from convtools import conversion as c
my_list = [["A", 0], ["B", 1], ["C", 0], ["D", 2], ["E", 2]]
# store the converter somewhere because this is where code generation
# takes place
converter = (
c.group_by(c.item(1))
.aggregate(c.ReduceFuncs.Array(c.item(0)))
.gen_converter()
)
assert converter(my_list) == [["A", "C"], ["B"], ["D", "E"]]

An answer inspired by #Howard's answer.
from operator import itemgetter
def group_by(nested_iterables: Iterable[Iterable], key_index: int) \
-> List[Tuple[Any, Iterable[Any]]]:
""" Groups elements nested in <nested_iterables> based on their <key_index>_th element.
Behaves similarly to itertools.groupby when the input to the itertools function is sorted.
E.g. If <nested_iterables> = [(1, 2), (2, 3), (5, 2), (9, 3)] and
<key_index> = 1, we will return [(2, [(1, 2), (5, 2)]), (3, [(2, 3), (9,3)])].
Returns:
A list of (group_key, values) tuples where <values> is an iterator of the iterables in
<nested_iterables> that all have their <key_index>_th element equal to <group_key>.
"""
group_keys = set(map(itemgetter(key_index), nested_iterables))
return [(key, list(filter(lambda x: x[key_index] == key, nested_iterables)))
for key in group_keys]

Related

Sorting list based on order of substrings in another list

I have two lists of strings.
list_one = ["c11", "a78", "67b"]
list_two = ["a", "b", "c"]
What is the shortest way of sorting list_one using strings from list_two to get the following output?
["a78", "67b", "c11"]
Edit 1:
There is a similar question Sorting list based on values from another list?, but in that question he already has the list of required indexes for resulting string, while here I have just the list of substrings.
Edit 2:
Since the example of list above might be not fully representative, I add another case.
list_one is ["1.cde.png", "1.abc.png", "1.bcd.png"]
list_two is ["abc", "bcd", "cde"].
The output is supposed to be [ "1.abc.png", "1.bcd.png", "1.cde.png"]
If, for example, list_one is shorter than list_two, it should still work:
list_one is ["1.cde.png", "1.abc.png"]
list_two is ["abc", "bcd", "cde"]
The output is supposed to be [ "1.abc.png", "1.cde.png"]
key = {next((s for s in list_one if v in s), None): i for i, v in enumerate(list_two)}
print(sorted(list_one, key=key.get))
This outputs:
['a78', '67b', 'c11']
Try this
list_one = ["c11", "a78", "67b"]
list_two = ["a", "b", "c"]
[x for y in list_two for x in list_one if y in x]
Output :
["a78", "67b", "c11"]
Assuming that each item in list_one contains exactly one of the characters from list_two, and that you know the class of those characters, e.g. letters, you can extract those using a regex and build a dictionary mapping the characters to the element. Then, just look up the correct element for each character.
>>> list_one = ["c11", "a78", "67b"]
>>> list_two = ["a", "b", "c"]
>>> d = {re.search("[a-z]", s).group(): s for s in list_one}
>>> list(map(d.get, list_two))
['a78', '67b', 'c11']
>>> [d[c] for c in list_two]
['a78', '67b', 'c11']
Other than the other approaches posted so far, which all seem to be O(n²), this is only O(n).
Of course, the approach can be generalized to e.g. more than one character, or characters in specific positions of the first string, but it will always require some pattern and knowledge about that pattern. E.g., for your more recent example:
>>> list_one = ["1.cde.png", "1.abc.png", "1.bcd.png"]
>>> list_two = ["abc", "cde"]
>>> d = {re.search("\.(\w+)\.", s).group(1): s for s in list_one}
>>> d = {s.split(".")[1]: s for s in list_one} # alternatively without re
>>> [d[c] for c in list_two if c in d]
['1.abc.png', '1.cde.png']
>>> sorted(list_one, key=lambda x: [i for i,e in enumerate(list_two) if e in x][0])
['a78', '67b', 'c11']

Use list of nested indices to access list element

How can a list of indices (called "indlst"), such as [[1,0], [3,1,2]] which corresponds to elements [1][0] and [3][1][2] of a given list (called "lst"), be used to access their respective elements? For example, given
indlst = [[1,0], [3,1,2]]
lst = ["a", ["b","c"], "d", ["e", ["f", "g", "h"]]]
(required output) = [lst[1][0],lst[3][1][2]]
The output should correspond to ["b","h"]. I have no idea where to start, let alone find an efficient way to do it (as I don't think parsing strings is the most pythonic way to go about it).
EDIT: I should mention that the nested level of the indices is variable, so while [1,0] has two elements in it, [3,1,2] has three, and so forth. (examples changed accordingly).
Recursion can grab arbitrary/deeply indexed items from nested lists:
indlst = [[1,0], [3,1,2]]
lst = ["a", ["b","c"], "d", ["e", ["f", "g", "h"]]]
#(required output) = [lst[1][0],lst[3][1][2]]
def nested_lookup(nlst, idexs):
if len(idexs) == 1:
return nlst[idexs[0]]
return nested_lookup(nlst[idexs[0]], idexs[1::])
reqout = [nested_lookup(lst, i) for i in indlst]
print(reqout)
dindx = [[2], [3, 0], [0], [2], [3, 1, 2], [3, 0], [0], [2]]
reqout = [nested_lookup(lst, i) for i in dindx]
print(reqout)
['b', 'h']
['d', 'e', 'a', 'd', 'h', 'e', 'a', 'd']
I also found that arbitrary extra zero indices are fine:
lst[1][0][0]
Out[36]: 'b'
lst[3][1][2]
Out[37]: 'h'
lst[3][1][2][0][0]
Out[38]: 'h'
So if you actually know the max nesting depth you can fill in the index list values by overwriting your (variable number, shorter) index list values into the max fixed length dictionary primed with zeros using the .update() dictonary method
Then directly hard code the indices of nested list, which ignores any "extra" hard coded zero valued indices
below hard coded 4 depth:
def fix_depth_nested_lookup(nlst, idexs):
reqout = []
for i in idexs:
ind = dict.fromkeys(range(4), 0)
ind.update(dict(enumerate(i)))
reqout.append(nlst[ind[0]][ind[1]][ind[2]][ind[3]])
return reqout
print(fix_depth_nested_lookup(lst, indlst))
['b', 'h']
you can try this code block:
required_output = []
for i,j in indlst:
required_output.append(lst[i][j])
You can just iterate through and collect the value.
>>> for i,j in indlst:
... print(lst[i][j])
...
b
f
Or, you can use a simple list comprehension to form a list from those values.
>>> [lst[i][j] for i,j in indlst]
['b', 'f']
Edit:
For variable length, you can do the following:
>>> for i in indlst:
... temp = lst
... for j in i:
... temp = temp[j]
... print(temp)
...
b
h
You can form a list with functions.reduce and list comprehension.
>>> from functools import reduce
>>> [reduce(lambda temp, x: temp[x], i,lst) for i in indlst]
['b', 'h']
N.B. this is a python3 solution. For python2, you can just ignore the import statement.

Permutations of several lists in python efficiently

I'm trying to write a python script that will generate random permutations of several lists without repeating
i.e. [a,b] [c,d]
a, c
b,c,
a,d
b,d
I can generate every permutation using the following, however the result is somewhat non random:
for r in itertools.product(list1, list2):
target.write("%s,%s" % (r[0], r[1])
Does anyone know a way i can implement this such that I can extract only 2 permutations, and they will be completely random but ensure that they will never be repeated?
You can use random.choice():
>>> from itertools import product
>>> import random
>>> l1 = ['a', 'b', 'c']
>>> l2 = ['d', 'e', 'f']
>>> prod = tuple(product(l1, l2))
>>>
>>> random.choice(prod)
('c', 'e')
>>> random.choice(prod)
('a', 'f')
>>> random.choice(prod)
('c', 'd')
Or simply use a nested list comprehension for creating the products:
>>> lst = [(i, j) for j in l2 for i in l1]
If you don't want to produce duplicate items you can use a set object which will create a set object from your product without an specified order then you can simply pot the items from it:
>>> prod = set(product(l1, l2))
>>>
>>> prod.pop()
('c', 'f')
>>> prod.pop()
('a', 'f')
>>> prod.pop()
('a', 'd')
Or use shuffle in order to shuffle the iterable, as #ayhan has suggested in his answer.
You can use random.shuffle then pop to make sure the results will not be repeated:
list1 = ["a", "b"]
list2 = ["c", "d"]
p = list(itertools.product(list1, list2))
random.shuffle(p)
e1 = p.pop()
e2 = p.pop()
list(itertools.product()) is not efficient as it generates and stores all of them. If you have big lists you can generate one at a time and check whether they are duplicated:
s = set()
list1 = ["a", "b"]
list2 = ["c", "d"]
while True:
r = (random.choice(list1), random.choice(list2))
if r not in s:
target.write("%s,%s" % (r[0], r[1]))
s.add(r)
break

How to write a function to rearrange a list according to the dictionary of index

How to write a function to rearrange a list according to the dictionary of index in python?
for example,
L=[('b',3),('a',2),('c',1)]
dict_index={'a':0,'b':1,'c':2}
I want a list of :
[2,3,1]
where 2 is from 'a',3 is from 'b' and 1 is from 'c', but rearrange only the number in L according to the dict_index
Try this (edited with simpler solution):
L=[('b',3),('a',2),('c',1)]
dict_index={'a':0,'b':1,'c':2}
# Creates a new empty list with a "slot" for each letter.
result_list = [0] * len(dict_index)
for letter, value in L:
# Assigns the value on the correct slot based on the letter.
result_list[dict_index[letter]] = value
print result_list # prints [2, 3, 1]
sorted and the .sort() method of lists take a key parameter:
>>> L=[('b',3),('a',2),('c',1)]
>>> dict_index={'a':0,'b':1,'c':2}
>>> sorted(L, key=lambda x: dict_index[x[0]])
[('a', 2), ('b', 3), ('c', 1)]
and so
>>> [x[1] for x in sorted(L, key=lambda x: dict_index[x[0]])]
[2, 3, 1]
should do it. For a more interesting example -- yours happens to match alphabetical order with the numerical order, so it's hard to see that it's really working -- we can shuffle dict_index a bit:
>>> dict_index={'a':0,'b':2,'c':1}
>>> sorted(L, key=lambda x: dict_index[x[0]])
[('a', 2), ('c', 1), ('b', 3)]
Using list comprehensions:
def index_sort(L, dict_index):
res = [(dict_index[i], j) for (i, j) in L] #Substitute in the index
res = sorted(res, key=lambda entry: entry[0]) #Sort by index
res = [j for (i, j) in res] #Just take the value
return res

How to split a list into subsets based on a pattern?

I'm doing this but it feels this can be achieved with much less code. It is Python after all. Starting with a list, I split that list into subsets based on a string prefix.
# Splitting a list into subsets
# expected outcome:
# [['sub_0_a', 'sub_0_b'], ['sub_1_a', 'sub_1_b']]
mylist = ['sub_0_a', 'sub_0_b', 'sub_1_a', 'sub_1_b']
def func(l, newlist=[], index=0):
newlist.append([i for i in l if i.startswith('sub_%s' % index)])
# create a new list without the items in newlist
l = [i for i in l if i not in newlist[index]]
if len(l):
index += 1
func(l, newlist, index)
func(mylist)
You could use itertools.groupby:
>>> import itertools
>>> mylist = ['sub_0_a', 'sub_0_b', 'sub_1_a', 'sub_1_b']
>>> for k,v in itertools.groupby(mylist,key=lambda x:x[:5]):
... print k, list(v)
...
sub_0 ['sub_0_a', 'sub_0_b']
sub_1 ['sub_1_a', 'sub_1_b']
or exactly as you specified it:
>>> [list(v) for k,v in itertools.groupby(mylist,key=lambda x:x[:5])]
[['sub_0_a', 'sub_0_b'], ['sub_1_a', 'sub_1_b']]
Of course, the common caveats apply (Make sure your list is sorted with the same key you're using to group), and you might need a slightly more complicated key function for real world data...
In [28]: mylist = ['sub_0_a', 'sub_0_b', 'sub_1_a', 'sub_1_b']
In [29]: lis=[]
In [30]: for x in mylist:
i=x.split("_")[1]
try:
lis[int(i)].append(x)
except:
lis.append([])
lis[-1].append(x)
....:
In [31]: lis
Out[31]: [['sub_0_a', 'sub_0_b'], ['sub_1_a', 'sub_1_b']]
Use itertools' groupby:
def get_field_sub(x): return x.split('_')[1]
mylist = sorted(mylist, key=get_field_sub)
[ (x, list(y)) for x, y in groupby(mylist, get_field_sub)]

Categories

Resources