Sort list with multiple criteria in python - python

Currently I'm trying to sort a list of files which were made of version numbers. For example:
0.0.0.0.py
1.0.0.0.py
1.1.0.0.py
They are all stored in a list. My idea was to use the sort method of the list in combination with a lambda expression. The lambda-expression should first remove the .py extensions and than split the string by the dots. Than casting every number to an integer and sort by them.
I know how I would do this in c#, but I have no idea how to do this with python. One problem is, how can I sort over multiple criteria? And how to embed the lambda-expression doing this?
Can anyone help me?
Thank you very much!

You can use the key argument of sorted function:
filenames = [
'1.0.0.0.py',
'0.0.0.0.py',
'1.1.0.0.py'
]
print sorted(filenames, key=lambda f: map(int, f.split('.')[:-1]))
Result:
['0.0.0.0.py', '1.0.0.0.py', '1.1.0.0.py']
The lambda splits the filename into parts, removes the last part and converts the remaining ones into integers. Then sorted uses this value as the sorting criterion.

Have your key function return a list of items. The sort is lexicographic in that case.
l = [ '1.0.0.0.py', '0.0.0.0.py', '1.1.0.0.py',]
s = sorted(l, key = lambda x: [int(y) for y in x.replace('.py','').split('.')])
print s

# read list in from memory and store as variable file_list
sorted(file_list, key = lambda x: map(int, x.split('.')[:-1]))
In case you're wondering what is going on here:
Our lambda function first takes our filename, splits it into an array delimited by periods. Then we take all of the elements of the list, minus the last element, which is our file extension. Then we apply the 'int' function to every element of the list. The returned list is then sorted by the 'sorted' function according to the elements of the list, starting at the first with ties broken by later elements in the list.

Related

Ordering links in python

I have a list containing some links: ["http://link1.rar", "http://link1.rev","http://link2.rar","http://link2.rev"]
Is there a way to sort them, in order to look like:
["http://link1.rar", "http://link2.rar", "http://link1.rev", "http://link2.rev"]
I've tried with this:
def order(x):
if "rar" not in x:
return x
else:
return ""
new_links = sorted(links, key=order)
But in this way, rev links are sorted from the highest.
You want to solve according to multiple criteria: first, the file extension; then, the whole string.
The usual trick to sort according to multiple criteria is to use a tuple as the key. Tuples are sorted in lexicographical order, which means the first criterion is compared first; and in case of a tie, the second criterion is compared.
For instance, the following key returns a tuple such as (False, 'http://link1.rar') or (True, 'http://link1.rev'):
new_links = sorted(links, key=lambda x: ('rar' not in x, x))
Alternatively, you could use str.rsplit to split the string on the last '.' and get a tuple such as ('http://link1', 'rev'). Since you want the extension to be the first criterion, use slicing [::-1] to reverse the order of the tuple:
new_links = sorted(links, key=lambda x: x.rsplit('.', maxsplit=1)[::-1])
Note that using rsplit('.', maxsplit=1) to split on the last '.' is a bit of a hack. If the filename contains a double extension, such as '.tar.gz', only the last extension will be isolated.
One last point to consider: numbers in strings. As long as your strings contain only single digits, such as '1' or '2', sorting the strings will work as you expect. However, if you have a 'link10' somewhere in there, the order might not be the one you expect: lexicographically, 'link10' comes before 'link2', because the first character of '10' is '1'. In that case, I refer you to this related question: Is there a built in function for string natural sort?

Splitting and sorting a list based on substring

I'm trying to take a list from an array, and split the strings to sort the list sequentially by the last series of 6 numbers (for instance '042126). To do this I would split by '.', use the second to the last split of the string [-2], and then sort matchfiles[1] with this substring.
The files should end up sorted like:
erl1.041905, erl1.041907, erl2.041908, erl1.041909, erl2.041910, etc.
Two questions: how do I specify unlimited number of splits per string (in case of longer names using additional '.'? I am using 4 splits, but this case may not hold. Else, how would I just split two times working backwards?
More importantly, I am returned an error: 'list' object is not callable. What am I doing wrong?
Thanks
matchfiles = [ [1723], ['blue.2017-09-05t15-15-07.erl1.041905.png',
'blue.2017-09-05t15-15-11.erl1.041907.png',
'blue.2017-09-05t15-15-14.erl1.041909.png',
'blue.2017-09-05t14-21-35.erl2.041908.png',
'blue.2017-09-05t14-21-38.erl2.041910.png',
'blue.2017-09-05t14-21-41.erl2.041912.png',
'blue.2017-09-05t14-21-45.erl2.041914.png'],
[09302] ]
matchtry = sorted(matchfiles[1], key = [i.split('.', 4)[-2] for i in
matchfiles[1]])
The keyargument expects a function, but you give it a list, hence the error list is not callable.
You should use split('.')[-2] which always takes the second to last element.
matchfiles = [ [1723], ['blue.2017-09-05t15-15-07.erl1.041905.png',
'blue.2017-09-05t15-15-11.erl1.041907.png',
'blue.2017-09-05t15-15-14.erl1.041909.png',
'blue.2017-09-05t14-21-35.erl2.041908.png',
'blue.2017-09-05t14-21-38.erl2.041910.png',
'blue.2017-09-05t14-21-41.erl2.041912.png',
'blue.2017-09-05t14-21-45.erl2.041914.png'],
[9302] ]
matchtry = sorted(matchfiles[1], key=lambda x: x.rsplit('.')[-2])
print(matchtry)
# ['blue.2017-09-05t15-15-07.erl1.041905.png', 'blue.2017-09-05t15-15-11.erl1.041907.png',
'blue.2017-09-05t14-21-35.erl2.041908.png', 'blue.2017-09-05t15-15-14.erl1.041909.png',
'blue.2017-09-05t14-21-38.erl2.041910.png', 'blue.2017-09-05t14-21-41.erl2.041912.png',
'blue.2017-09-05t14-21-45.erl2.041914.png']
The key parameter to sorted requires a function. [i.split('.', 4)[-2] for i in matchfiles[1]] is a list, not a function. The expected function acts on a single element from the list, so you need a function that takes a string, splits it on the '.' character, and returns the second last column, possibly converted to an integer.
Also, Python does not allow integers to begin with a zero, so you must change that [09302] to [9302]. (Beginning with 0 signifies that the number will be non-decimal. In Python 2, 0427 would be 427 octal, but in Python 3, octal number must be preceded by 0o instead. 09302 is invalid in both versions, as an octal number cannot contain 9.)
matchfiles = [ [1723], ['blue.2017-09-05t15-15-07.erl1.041905.png',
'blue.2017-09-05t15-15-11.erl1.041907.png',
'blue.2017-09-05t15-15-14.erl1.041909.png',
'blue.2017-09-05t14-21-35.erl2.041908.png',
'blue.2017-09-05t14-21-38.erl2.041910.png',
'blue.2017-09-05t14-21-41.erl2.041912.png',
'blue.2017-09-05t14-21-45.erl2.041914.png'],
[9302] ]
matchtry = sorted(matchfiles[1], key = lambda str: int(str.split('.')[-2]))
Remember that the key argument to sorted takes each element of your iterable (list in your case) and converts it to some value. The values of each element after being transformed by key determine the sort order. So a simple way to get this to work every time is to define a function that
takes one element and converts it to something that's easy to sort:
def fname_to_value(fname):
name, ext = os.path.splitext(fname) # remove extension
number = name.split('.')[-1] # Get the last set of stuff after the last '.'
return number # no need to convert to int, string compare does what you want
So now you have a function converting the filename to a sortable value. Simple supply this to sorted as the key argument and you're done.
matchtry = sorted(matchfiles[1], key = fname_to_value)
for match in matchtry:
print(match)
result:
blue.2017-09-05t15-15-07.erl1.041905.png
blue.2017-09-05t15-15-11.erl1.041907.png
blue.2017-09-05t14-21-35.erl2.041908.png
blue.2017-09-05t15-15-14.erl1.041909.png
blue.2017-09-05t14-21-38.erl2.041910.png
blue.2017-09-05t14-21-41.erl2.041912.png
blue.2017-09-05t14-21-45.erl2.041914.png
You can then process the resulting list as needed.
Yes, the issue is your key. You can use a lambda expression: https://en.wikipedia.org/wiki/Anonymous_function#Python
Imagine this as a mathematical map. The key being used to sort needs a function, so you define a lambda like:
lambda curr: curr.split('.')[-2]
This gives each current object in the list the name "curr" and applies the expression following the :.
So in your case this should do the thing:
matchtry = sorted(matchfiles[1], key=lambda curr: curr.split('.')[-2])

Sorting a list key parameter

I have a problem on sorting a list, my goal is I'm trying to write a function that will sort a list of files based on their extension. For example given;
["a.c","a.py","b.py","bar.txt","foo.txt","x.c"]
desired output is;
["a.c","x.c","a.py","b.py","bar.txt","foo.txt"]
I fail when I tried to make a key parameter, I can't creating the algorithm. I tried to split() every file first, like;
def sort_file(lst):
second_list = []
for x in lst:
t = x.split(".")
second_list.append(t[1])
second_list.sort()
But I just don't know what to do now, how can I make this sorted second_list as a key parameter then I can sort files based on their extension?
I fail when I tried to make a key parameter
key argument takes a function (callable, rather), that returns the object to compare against when given the list item as input. In your case, the x.split(".")[1] is the object to compare against. Take a look at Python's wiki entry for sorting in this fashion
Something like the below should work for you.
>>> a = ["a.c","a.py","b.py","bar.txt","foo.txt","x.c"]
>>> sorted(a, key=lambda x: x.rsplit(".", 1)[1])
['a.c', 'x.c', 'a.py', 'b.py', 'bar.txt', 'foo.txt']
As #TanveerAlam says, using rsplit(..) is better because you'd want the split to be done from right.

How to sort a list of lists (non integers)?

I have a list of lists that looks like this:
[['10.2100', '0.93956088E+01'],
['11.1100', '0.96414905E+01'],
['12.1100', '0.98638361E+01'],
['14.1100', '0.12764182E+02'],
['16.1100', '0.16235739E+02'],
['18.1100', '0.11399972E+02'],
['20.1100', '0.76444933E+01'],
['25.1100', '0.37823686E+01'],
['30.1100', '0.23552237E+01'],...]
(here it looks as if it is already ordered, but some of the rest of the elements not included here to avoid a huge list, are not in order)
and I want to sort it by the first element of each pair, I have seen several very similar questions, but in all the cases the examples are with integers, I don't know if that is why when I use the list.sort(key=lambda x: x[0]) or the sorter, or the version with the operator.itemgetter(0) I get the following:
[['10.2100', '0.93956088E+01'],
['100.1100', '0.33752517E+00'],
['11.1100', '0.96414905E+01'],
['110.1100', '0.25774972E+00'],
['12.1100', '0.98638361E+01'],
['14.1100', '0.12764182E+02'],
['14.6100', '0.14123326E+02'],
['15.1100', '0.15451733E+02'],
['16.1100', '0.16235739E+02'],
['16.6100', '0.15351242E+02'],
['17.1100', '0.14040859E+02'],
['18.1100', '0.11399972E+02'], ...]
apparently what is doing is sorting by the first character appearing in the first element of each pair.
Is there a way of using list.sort or sorted() for ordering this pairs with respect to the first element?
dont use list as a variable name!
some_list.sort(key=lambda x: float(x[0]) )
will convert the first element to a float and comparit numerically instead of alphabetically
(note the cast to float is only for comparing... the item is still a string in the list)

Python, Sorting

name|num|num|num|num
name|num|num|num|num
name|num|num|num|num
How i can sort this list on need me field (2,3,4,5) ?
Sorry for my enlish.
Update
Input:
str|10|20
str|1|30
Sort by first field (1,10):
str|1|30
str|10|20
Sort by second field(20,30):
str|10|20
str|1|30
I would use the operator module function "itemgetter" instead of the lambda functions. That is faster and allows multiple levels of sorting.
from operator import itemgetter
data = (line.split('|') for line in input.split('\n'))
sort_index = 1
sorted(data, key=itemgetter(sort_index))
You can sort on a specific key, which tells the sort function how to evaluate the entries to be sorted -- that is, how we decide which of two entries is bigger. In this case, we'll first split up each string by the pipe, using split (for example, "a|b|c".split("|") returns ["a", "b", "c"]) and then grab whichever entry you want.
To sort on the first "num" field:
sorted(lines, key=(lambda line : line.split("|")[1])
where lines is a list of the lines as you mention in the question. To sort on a different field, just change the number in brackets.
Assuming you start with a list of strings, start by splitting each row into a list:
data = [line.split('|') for line in input]
Then sort by whatever index you want:
sort_index = 1
sorted_data = sorted(data, key=lambda line: int(line[sort_index]))
The Python sorting guide has a lot more information.

Categories

Resources