Python - order list of lists by multiple column indexes - python

Let's say I've the following list x in Python
[['a',6,'aa']
['d',7,'bb']]
['c',1,'cc']
['a',4,'dd']
['d',2,'ee']]
and I want to sort its elements in order to obtain the following result
[['a',4,'dd']
['a',6,'aa']
['c',1,'cc']
['d',2,'ee']
['d',7,'bb']]
that is I want to sort it by two columns, the first one (the most important) and the second one (the less important). This is probably a duplicate question but I haven't been able to find the solution...

The following sorts list by the first element, then by the second element:
>>> sorted(a, key=lambda x:(x[0], x[1]))
[['a', 4, 'dd'], ['a', 6, 'aa'], ['c', 1, 'cc'], ['d', 2, 'ee'], ['d', 7, 'bb']]

Since it does not matter whether you sort by the third column as well, or not, you can use plain sort here, and get the same result:
>>> sorted(a)
[['a', 4, 'dd'], ['a', 6, 'aa'], ['c', 1, 'cc'], ['d', 2, 'ee'], ['d', 7, 'bb']]
This is because lists are compared left to right and sorted in lexicographical order.
If you did want to order by arbitrary column order, you should use operator.itemgetter, which is faster than using a lambda function for the key.
>>> import operator
>>> sorted(a, key=operator.itemgetter(1, 0)) # order by column 1 first, then 0.
[['c', 1, 'cc'], ['d', 2, 'ee'], ['a', 4, 'dd'], ['a', 6, 'aa'], ['d', 7, 'bb']]

You can simply use sorted as sorted automatically handles if the sorting is applied from first element to last.
a = [['a',6,'aa'],['d',7,'bb'],['c',1,'cc'],['a',4,'dd'],['d',2,'ee']]
sorted(a)
[['a', 4, 'dd'],
['a', 6, 'aa'],
['c', 1, 'cc'],
['d', 2, 'ee'],
['d', 7, 'bb']]

Related

Fetching elements from multiple columns of lists in a dataframe

I have a dataframe like this:
A B C
[1,2,3] ['a','b','c'] ['aa', 'bb', 'cc']
[4,5,6] ['d','e','f'] ['dd', 'ee', 'ff']
[7,8,9] ['g','h','i'] ['gg', 'hh', 'ii']
I would like to combine the values from these columns as follows:
[[[1,'a', 'aa'], [2,'b','bb'], [3, 'c', 'cc']], [[4,'d','dd'], [5,'e', 'ee'], [6,'f','ff']], [[7,'g','gg'], [8,'h','hh'], [9,'i','ii']]]
My idea was to change each column to list like this (which will give a list of list) :
first = df['A'].values.tolist() # similarly for other columns
And then zip all lists and iterate through them and fetch corresponding values from each list and create a new list as per the output format. But, I am sure there are better solutions than mine. Can anyone help me with this?
IIUC explode with groupby
pd.concat([df[[x]].explode(x) for x in df.columns],axis=1)\
.apply(lambda x : x.tolist(),axis=1).groupby(level=0).agg(list).tolist()
Out[366]:
[[[1, 'a', 'aa'], [2, 'b', 'bb'], [3, 'c', 'cc']],
[[4, 'd', 'dd'], [5, 'e', 'ee'], [6, 'f', 'ff']],
[[7, 'g', 'gg'], [8, 'h', 'hh'], [9, 'i', 'ii']]]
An extreme solution with apply:
df.apply(lambda x: list(zip(*x.to_list())), axis=1).to_list()
Output:
[[(1, 'a', 'aa'), (2, 'b', 'bb'), (3, 'c', 'cc')],
[(4, 'd', 'dd'), (5, 'e', 'ee'), (6, 'f', 'ff')],
[(7, 'g', 'gg'), (8, 'h', 'hh'), (9, 'i', 'ii')]]

Slicing flat list into multi-level nested list efficiently

For example, I have a flat list
[1, 2, 3, 4, 5, 6, 7, 8, 9, 'A', 'B', 'C', 'D', 'E', 'F', 'G']
I want to transform it into 4-deep list
[[[[1, 2], [3, 4]], [[5, 6], [7, 8]]], [[[9, 'A'], ['B', 'C']], [['D', 'E'] ['F', 'G']]]]
Is there a way to do it without creating a separate variable for every level? What is the most memory- and performance-efficient way?
UPDATE:
Also, is there a way to do it in a non-symmetrical fashion?
[[[[1, 2, 3], 4], [[5, 6, 7], 8]]], [[[9, 'A', 'B'], 'C']], [['D', 'E', 'F'], 'G']]]]
Note that your first list has 15 elements instead of 16. Also, what should A be? Is it a constant you've defined somewhere else? I'll just assume it's a string : 'A'.
If you work with np.arrays, you could simply reshape your array:
import numpy as np
r = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 'A', 'B', 'C', 'D', 'E', 'F', 'G'])
r.reshape(2,2,2,2)
It outputs:
array([[[['1', '2'],
['3', '4']],
[['5', '6'],
['7', '8']]]
[[['9', 'A'],
['B', 'C']],
[['D', 'E'],
['F', 'G']]]
dtype='<U11')
This should be really efficient because numpy doesn't change the underlying data format. It's still a flat array, displayed differently.
Numpy doesn't support irregular shapes. You'll have to work with standard python lists then:
i = iter([1, 2, 3, 4, 5, 6, 7, 8, 9, 'A', 'B', 'C', 'D', 'E', 'F', 'G'])
l1 = []
for _ in range(2):
l2 = []
for _ in range(2):
l3 = []
l4 = []
for _ in range(3):
l4.append(next(i))
l3.append(l4)
l3.append(next(i))
l2.append(l3)
l1.append(l2)
print(l1)
# [[[[1, 2, 3], 4], [[5, 6, 7], 8]], [[[9, 'A', 'B'], 'C'], [['D', 'E', 'F'], 'G']]]
As you said, you'll have to define a temporary variable for each level. I guess you could use list comprehensions, but they wouldn't be pretty.

Appending to a list of lists sequentially

I have two list of lists:
my_list = [[1,2,3,4], [5,6,7,8]]
my_list2 = [['a', 'b', 'c'], ['d', 'e', 'f']]
I want my output to look like this:
my_list = [[1,2,3,4,'a','b','c'], [5,6,7,8,'d','e','f']]
I wrote the following code to do this but I end up getting more lists in my result.
my_list = map(list, (zip(my_list, my_list2)))
this produces the result as:
[[[1, 2, 3, 4], ['a', 'b', 'c']], [[5, 6, 7, 8], ['d', 'e', 'f']]]
Is there a way that I can remove the redundant lists.
Thanks
Using zip is the right approach. You just need to add the elements from the tuples zip produces.
>>> my_list = [[1,2,3,4], [5,6,7,8]]
>>> my_list2 = [['a', 'b', 'c'], ['d', 'e', 'f']]
>>> [x+y for x,y in zip(my_list, my_list2)]
[[1, 2, 3, 4, 'a', 'b', 'c'], [5, 6, 7, 8, 'd', 'e', 'f']]
You can use zip in a list comprehension:
my_list = [[1,2,3,4], [5,6,7,8]]
my_list2 = [['a', 'b', 'c'], ['d', 'e', 'f']]
new_list = [i+b for i, b in zip(my_list, my_list2)]
As an alternative you may also use map with sum and lambda function to achieve this (but list comprehension approach as mentioned in other answer is better):
>>> map(lambda x: sum(x, []), zip(my_list, my_list2))
[[1, 2, 3, 4, 'a', 'b', 'c'], [5, 6, 7, 8, 'd', 'e', 'f']]

Sorting lists based on a particular element - Python

How do I sort a list of lists based on the first element of the lists in Python?
>>> list01 = (['a','b','c'],['b','a','d'],['d','e','c'],['a','f','d'])
>>> map(sorted, list01)
[['a', 'b', 'c'], ['a', 'b', 'd'], ['c', 'd', 'e'], ['a', 'd', 'f']]
>>> sorted(map(sorted, list01))
[['a', 'b', 'c'], ['a', 'b', 'd'], ['a', 'd', 'f'], ['c', 'd', 'e']]
Python's sorted() can receive a function to sort by.
If you want to sort by the first element in each sublist, you can use the following:
>>> lst = [[2, 3], [1, 2]]
>>> sorted(lst, key=lambda x: x[0])
[[1, 2], [2, 3]]
For more information on sorted(), please see the official docs.
from operator import itemgetter
sorted(list01, key=itemgetter(0))
>>> sorted(list01, key=lambda l: l[0])
[['a', 'b', 'c'], ['a', 'f', 'd'], ['b', 'a', 'd'], ['d', 'e', 'c']]
Is this what you mean?
Apart from the passing a key function to the sorted (as show in earlier answers) you can also pass it a cmp (comparison) function in Python2 as follows:
sorted(list01, cmp=lambda b, a: cmp(b[0], a[0]))
Output of above expression would be same as that of using the the key function.
Although they have removed the cmp argument in Python3 from sorted, https://docs.python.org/3.3/library/functions.html#sorted, and using a key function is the only choice.

How do I keep the index of the duplicate element unchanged

Here is a input list:
['a', 'b', 'b', 'c', 'c', 'd']
The output I expect should be:
[[0, 'a'], [1, 'b'], [1, 'b'], [2, 'c'], [2, 'c'], [3, 'd']]
I try to use map()
>>> map(lambda (index, word): [index, word], enumerate([['a', 'b', 'b', 'c', 'c', 'd']])
[[0, 'a'], [1, 'b'], [2, 'b'], [3, 'c'], [4, 'c'], [5, 'd']]
How can I get the expected result?
EDIT: This is not a sorted list, the index of each element increase only when meet a new element
>>> import itertools
>>> seq = ['a', 'b', 'b', 'c', 'c', 'd']
>>> [[i, c] for i, (k, g) in enumerate(itertools.groupby(seq)) for c in g]
[[0, 'a'], [1, 'b'], [1, 'b'], [2, 'c'], [2, 'c'], [3, 'd']]
[
[i, x]
for i, (value, group) in enumerate(itertools.groupby(['a', 'b', 'b', 'c', 'c', 'd']))
for x in group
]
It sounds like you want to rank the terms based on a lexicographical ordering.
input = ['a', 'b', 'b', 'c', 'c', 'd']
mapping = { v:i for (i, v) in enumerate(sorted(set(input))) }
[ [mapping[v], v] for v in input ]
Note that this works for unsorted inputs as well.
If, as your amendment suggests, you want to number items based on order of first appearance, a different approach is in order. The following is short and sweet, albeit offensively hacky:
[ [d.setdefault(v, len(d)), v] for d in [{}] for v in input ]
When list is sorted use groupby (see jamylak answer); when not, just iterate over the list and check if you've seen this letter already:
a = ['a', 'b', 'b', 'c', 'c', 'd']
result = []
d = {}
n = 0
for k in a:
if k not in d:
d[k] = n
n += 1
result.append([d[k],k])
It is the most effective solution; it takes only O(n) time.
Example of usage for unsorted lists:
[[0, 'a'], [1, 'b'], [1, 'b'], [2, 'c'], [2, 'c'], [3, 'd'], [0, 'a']]
As you can see, you have here the same order of items as in the input list.
When you sort the list first you need O(n*log(n)) additional time.

Categories

Resources