Shuffle groups of sublists in Python

Shuffle groups of sublists in Python - python

I want to shuffle this list:
[[1, 'A'], [2, 'A'], [6, 'B'], [3, 'B'], [4, 'C'], [5, 'C'], [7, 'F']]
But I need groups identified by sublists second elements to stay together, so that the shuffled list could look like this:
[[6, 'B'], [3, 'B'], [7, 'F'], [1, 'A'], [2, 'A'], [4, 'C'], [5, 'C']]
Where all 'B', 'F', 'A', and 'C' sublists stay together.
I'm guessing using a combination of shuffle and groupby would do the trick, but I don't know where to start with this. Any idea would be appreciated!

items = [[1, 'A'], [2, 'A'], [6, 'B'], [3, 'B'], [4, 'C'], [5, 'C'], [7, 'F']]
import itertools, operator, random
groups = [list(g) for _, g in itertools.groupby(items, operator.itemgetter(1))]
random.shuffle(groups)
shuffled = [item for group in groups for item in group]
print(shuffled)
Prints for example:
[[4, 'C'], [5, 'C'], [1, 'A'], [2, 'A'], [7, 'F'], [6, 'B'], [3, 'B']]

Giving each group a random number and sorting by that. Sublists stay together because Pythons sorting is stable.
Update years later: Using a defaultdict looks nicer and only generates one random number for each group, not one for every element:
from random import random
from collections import defaultdict
r = defaultdict(random)
items.sort(key=lambda item: r[item[1]])
As squeezed oneliner:
items.sort(key=lambda i, r=defaultdict(random): r[i[1]])
Back to original answer:
items = [[1, 'A'], [2, 'A'], [6, 'B'], [3, 'B'], [4, 'C'], [5, 'C'], [7, 'F']]
import random
r = {b: random.random() for a, b in items}
items.sort(key=lambda item: r[item[1]])
print(items)
Prints for example:
[[6, 'B'], [3, 'B'], [4, 'C'], [5, 'C'], [7, 'F'], [1, 'A'], [2, 'A']]
The two lines could be combined, then you don't have that extra variable flying around afterwards.
items.sort(key=lambda item, r={b: random.random() for a, b in items}: r[item[1]])

You can use a dict to group without needing to sort then just shuffle the values the flatten into a flat list:
from collections import defaultdict
from random import shuffle
from itertools import chain
def shuffle_groups(l):
d = defaultdict(list)
for v, k in l:
d[k].append([k, v])
vals = list(d.values())
shuffle(vals)
return chain(*vals)
Output:
In [9]: list(shuffle_groups(l))
Out[9]: [['A', 1], ['A', 2], ['F', 7], ['B', 6], ['B', 3], ['C', 4], ['C', 5]]
In [10]: list(shuffle_groups(l))
Out[10]: [['C', 4], ['C', 5], ['B', 6], ['B', 3], ['A', 1], ['A', 2], ['F', 7]]
In [11]: list(shuffle_groups(l))
Out[11]: [['F', 7], ['B', 6], ['B', 3], ['A', 1], ['A', 2], ['C', 4], ['C', 5]]
Some timings:
In [5]: l =[choice(l) for _ in range(100000)]
In [6]: timeit _groupy(l)
10 loops, best of 3: 139 ms per loop
In [7]: timeit shuffle_groups(l)
10 loops, best of 3: 27.1 ms per loop

Related

How to take elements from two lists and combine them in a third one (python)?

Let's say I have:
list_a = [1, 2, 3, 4, 5]
list_b = ['a', 'b', 'c']
And I expect the outcome to be something like this, so I can easily access it later:
list_c = [['a', 1], ['a', 2], ['a', 3], ...]
What's the easiest way to do that?
The two lists have different lengths
I need every letter in list_b to have the five corresponding numbers, basically all possible combinations, because I need to easily access ie. [c, 4] later on.
I tried just to append list_a and list_b to list_c but it obviously didn't go as planned.
I can't use builtin functions such as zip, itertools, etc.

Use a list comprehension with 2 for statements:
list_c = [[b, a] for b in list_b for a in list_a]
Output: [['a', 1], ['a', 2], ['a', 3], ['a', 4], ['a', 5], ['b', 1], ['b', 2], ['b', 3], ['b', 4], ['b', 5], ['c', 1], ['c', 2], ['c', 3], ['c', 4], ['c', 5]]

Slicing 2D Python List

Let's say I have a list:
list = [[1, 2, 3, 4],
['a', 'b', 'c', 'd'],
[9, 8, 7, 6]]
and I would like to get something like:
newList = [[2, 3, 4],
['b', 'c', 'd'],
[8, 7, 6]]
hence I tried going with this solution
print(list[0:][1:])
But I get this output
[['a', 'b', 'c', 'd'],
[9, 8, 7, 6]]
Therefore I tried
print(list[1:][0:])
but I get precisely the same result.
I tried to make some research and experiments about this specific subject but without any result.

You want the 1 to end element of every row in your matrix.
mylist = [[1, 2, 3, 4],
['a', 'b', 'c', 'd'],
[9, 8, 7, 6]]
new_list = [row[1:] for row in mylist]

I want explain, what have you done by this
print(list[0:][1:])
print(list[1:][0:])
Firstly note that python use indices starting at 0, i.e. for [1,2,3] there is 0th element, 1th element and 2nd element.
[0:] means get list elements starting at 0th element, this will give you copy of list, [1:] means get list elements starting at 1th element, which will give you list with all but 0th element. Therefore both lines are equivalent to each other and to
print(list[1:])
You might desired output using comprehension or map as follows
list1 = [[1, 2, 3, 4], ['a', 'b', 'c', 'd'], [9, 8, 7, 6]]
list2 = list(map(lambda x:x[1:],list1))
print(list2)
output
[[2, 3, 4], ['b', 'c', 'd'], [8, 7, 6]]
lambda here is nameless function, note that comprehension here is more readable, but might be easier to digest if you earlier worked with language which have similar feature, e.g. JavaScript's map

First - don't name your list "list"!
a = [[1, 2, 3, 4],
['a', 'b', 'c', 'd'],
[9, 8, 7, 6]]
b = [x[1:] for x in a]
print(b)
[[2, 3, 4], ['b', 'c', 'd'], [8, 7, 6]]

Convert DataFrame into multi-dimensional array with the column names of DataFrame

Below is the DataFrame I want to action upon:
df = pd.DataFrame({'A': [1,1,1],
'B': [2,2,3],
'C': [4,5,4]})
Each row of df creates a unique key. Objective is to create the following list of multi-dimensional arrays:
parameter = [[['A', 1],['B', 2], ['C', 4]],
[['A', 1],['B', 2], ['C', 5]],
[['A', 1],['B', 3], ['C', 4]]]
Problem is related to this question where I have to iterate over the parameter but instead of manually providing them to my function, I have to put all parameter from df (rows) in a list.

You could use the following list comprehension, which zips the values on each row with the columns of the dataframe:
from itertools import repeat
[list(map(list,zip(cols, i))) for cols, i in zip(df.values.tolist(), repeat(df.columns))]
[[[1, 'A'], [2, 'B'], [4, 'C']],
[[1, 'A'], [2, 'B'], [5, 'C']],
[[1, 'A'], [3, 'B'], [4, 'C']]]

Sort two lists of lists by index of inner list [duplicate]

This question already has answers here:
Sorting list based on values from another list
(20 answers)
Closed 5 years ago.
Assume I want to sort a list of lists like explained here:
>>>L=[[0, 1, 'f'], [4, 2, 't'], [9, 4, 'afsd']]
>>>sorted(L, key=itemgetter(2))
[[9, 4, 'afsd'], [0, 1, 'f'], [4, 2, 't']]
(Or with lambda.) Now I have a second list which I want to sort in the same order, so I need the new order of the indices. sorted() or .sort() do not return indices. How can I do that?
Actually in my case both lists contain numpy arrays. But the numpy sort/argsort aren't intuitive for that case either.

If I understood you correctly, you want to order B in the example below, based on a sorting rule you apply on L. Take a look at this:
L = [[0, 1, 'f'], [4, 2, 't'], [9, 4, 'afsd']]
B = ['a', 'b', 'c']
result = [i for _, i in sorted(zip(L, B), key=lambda x: x[0][2])]
print(result) # ['c', 'a', 'b']
# that corresponds to [[9, 4, 'afsd'], [0, 1, 'f'], [4, 2, 't']]

If I understand correctly, you want to know how the list has been rearranged. i.e. where is the 0th element after sorting, etc.
If so, you are one step away:
L2 = [L.index(x) for x in sorted(L, key=itemgetter(2))]
which gives:
[2, 0, 1]
As tobias points out, this is needlessly complex compared to
map(itemgetter(0), sorted(enumerate(L), key=lambda x: x[1][2]))

NumPy
Setup:
import numpy as np
L = np.array([[0, 1, 'f'], [4, 2, 't'], [9, 4, 'afsd']])
S = np.array(['a', 'b', 'c'])
Solution:
print S[L[:,2].argsort()]
Output:
['c' 'a' 'b']
Just Python
You could combine both lists, sort them together, and separate them again.
>>> L = [[0, 1, 'f'], [4, 2, 't'], [9, 4, 'afsd']]
>>> S = ['a', 'b', 'c']
>>> L, S = zip(*sorted(zip(L, S), key=lambda x: x[0][2]))
>>> L
([9, 4, 'afsd'], [0, 1, 'f'], [4, 2, 't'])
>>> S
('c', 'a', 'b')
I guess you could do something similar in NumPy as well...

Python Sort Lambda

Sort Key Lambda Parameters
I do not understand how the lambda parameters are working, the [-e[0],e[1]] portion is especially confusing. I have removed all the excessive printing code and I have also removed all unnecessary code from my question. What does the parameter -e[0] achieve and what is that e[1] achieves?
data.sort(key = lambda e: [-e[0],e[1]]) # --> anonymous function
print ("This is the data sort after the lambda filter but NOT -e %s" %data)`
[in] 'aeeccccbbbbwwzzzwww'
[out] This is the data before the sort [[2, 'e'], [4, 'c'], [1, 'a'], [4, 'b'], [5, 'w'], [3, 'z']]
[out] This is the data sort before the lambda filter [[1, 'a'], [2, 'e'], [3, 'z'], [4, 'b'], [4, 'c'], [5, 'w']]
[out] This is the data sort after the lambda filter but NOT -e [[1, 'a'], [2, 'e'], [3, 'z'], [4, 'b'], [4, 'c'], [5, 'w']]
[out] This is the data sort after the lambda filter [[5, 'w'], [4, 'b'], [4, 'c'], [3, 'z'], [2, 'e'], [1, 'a']]
[out] w 5
[out] b 4
[out] c 4

l = [[2, 'e'], [4, 'c'], [1, 'a'], [4, 'b'], [5, 'w'], [3, 'z']]
>>> l.sort()
Normal sort: first the first element of the nested list is considered and then the second element.
>>>l.sort(key=lambda e: [e[0], e[1]])
Similar to l.sort()
>>>l.sort(key=lambda e: [-e[0], e[1]])
Now, what is does is- Reverse sort the the list on the basis of first element of the nested list AND sort normally on the internal elements of the nested sorted list i.e
first 2,3,4,5 etc are considered for sorting the list in reverse order( -e[0] == -2,-3,-4...) and then we sort the elements on the basis of second element for internal sorting (e[1] == 'w', 'a', 'b'...)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Shuffle groups of sublists in Python - python

Related

How to take elements from two lists and combine them in a third one (python)?

Slicing 2D Python List

Convert DataFrame into multi-dimensional array with the column names of DataFrame

Sort two lists of lists by index of inner list [duplicate]

Python Sort Lambda

Categories

Resources