Looking to sort a set of .csv numeric values column-wise. Optionally, the number of columns varies. For example using Python:
print(sorted(['9,11', '70,10', '10,8,1','10,70']))
produces
['10,70', '10,8,1', '70,10', '9,11']
while the desired result is
['9,11', '10,8,1', '10,70', '70,10']
First, sort by the first column, then by the second, etc.
Obviously this can be done, but can this be done elegantly?
It can be done more elegantly by using the key argument of sorted:
data = [
'9,11',
'70,10',
'10,8,1',
'10,70'
]
print sorted(data, key=lambda s: map(int, s.split(',')))
Result:
['9,11', '10,8,1', '10,70', '70,10']
With the above code we convert each string of our list to a list of integer values and we use this list of integer values as our sorting key
If you don't mind third-party modules, you can use natsort, which is provides the function natsorted which is designed to be a drop-in replacement of sorted.
>>>> import natsort
>>> natsort.natsorted(['9,11', '70,10', '10,8,1','10,70'])
['9,11', '10,8,1', '10,70', '70,10']
Full disclosure, I am the package's author.
Related
Is there a default python function to be able to separate group of numbers without using a conventional loop?
inputArray=["slide_0000_00.jpg", "slide_0000_01.jpg","slide_0000_02.jpg","slide_0001_01.jpg","slide_0001_02.jpg","slide_0002_01.jpg"]
resultArray=[["slide_0000_01.jpg", "slide_0000_02.jpg", "slide_0000_03.jpg"],["slide_0001_01.jpg", "slide_0001_02.jpg"], ["slide_0002_01.jpg"]]
use itertools.groupby to group consecutive items by middle part:
inputArray=["slide_0000_00.jpg",
"slide_0000_01.jpg",
"slide_0000_02.jpg",
"slide_0001_01.jpg",
"slide_0001_02.jpg",
"slide_0002_01.jpg"]
import itertools
result = [list(g) for _,g in itertools.groupby(inputArray,key = lambda x:x.split("_")[1])]
which gives:
>>> result
[['slide_0000_00.jpg', 'slide_0000_01.jpg', 'slide_0000_02.jpg'],
['slide_0001_01.jpg', 'slide_0001_02.jpg'],
['slide_0002_01.jpg']]
note that if the groups don't follow, the grouping won't work (unless you sort the list first, here simple sort would work but the complexity isn't satisfactory). A classic alternative in that case is to use collections.defaultdict(list):
import collections
d = collections.defaultdict(list)
for x in inputArray:
d[x.split("_")[1]].append(x)
result = list(d.values())
the result is identical (order can vary, depending on the version of python and if dictionaries preserve order. You can expect that property from version 3.5)
Currently I'm trying to sort a list of files which were made of version numbers. For example:
0.0.0.0.py
1.0.0.0.py
1.1.0.0.py
They are all stored in a list. My idea was to use the sort method of the list in combination with a lambda expression. The lambda-expression should first remove the .py extensions and than split the string by the dots. Than casting every number to an integer and sort by them.
I know how I would do this in c#, but I have no idea how to do this with python. One problem is, how can I sort over multiple criteria? And how to embed the lambda-expression doing this?
Can anyone help me?
Thank you very much!
You can use the key argument of sorted function:
filenames = [
'1.0.0.0.py',
'0.0.0.0.py',
'1.1.0.0.py'
]
print sorted(filenames, key=lambda f: map(int, f.split('.')[:-1]))
Result:
['0.0.0.0.py', '1.0.0.0.py', '1.1.0.0.py']
The lambda splits the filename into parts, removes the last part and converts the remaining ones into integers. Then sorted uses this value as the sorting criterion.
Have your key function return a list of items. The sort is lexicographic in that case.
l = [ '1.0.0.0.py', '0.0.0.0.py', '1.1.0.0.py',]
s = sorted(l, key = lambda x: [int(y) for y in x.replace('.py','').split('.')])
print s
# read list in from memory and store as variable file_list
sorted(file_list, key = lambda x: map(int, x.split('.')[:-1]))
In case you're wondering what is going on here:
Our lambda function first takes our filename, splits it into an array delimited by periods. Then we take all of the elements of the list, minus the last element, which is our file extension. Then we apply the 'int' function to every element of the list. The returned list is then sorted by the 'sorted' function according to the elements of the list, starting at the first with ties broken by later elements in the list.
I have a dictionary such as below.
d = {
'0:0:7': '19734',
'0:0:0': '4278',
'0:0:21': '19959',
'0:0:14': '9445',
'0:0:28': '14205',
'0:0:35': '3254'
}
Now I want to sort it by keys with time priority.
Dictionaries are not sorted, if you want to print it out or iterate through it in sorted order, you should convert it to a list first:
e.g.:
sorted_dict = sorted(d.items(), key=parseTime)
#or
for t in sorted(d, key=parseTime):
pass
def parseTime(s):
return tuple(int(x) for x in s.split(':'))
Note that this will mean you can not use the d['0:0:7'] syntax for sorted_dict though.
Passing a 'key' argument to sorted tells python how to compare the items in your list, standard string comparison will not work to sort by time.
Dictionaries in python have no guarantees on order. There is collections.OrderedDict, which retains insertion order, but if you want to work through the keys of a standard dictionary in order you can just do:
for k in sorted(d):
In your case, the problem is that your time strings won't sort correctly. You need to include the additional zeroes needed to make them do so, e.g. "00:00:07", or interpret them as actual time objects, which will sort correctly. This function may be useful:
def padded(s, c=":"):
return c.join("{0:02d}".format(int(i)) for i in s.split(c))
You can use this as a key for sorted if you really want to retain the current format in your output:
for k in sorted(d, key=padded):
Have a look at the collections.OrderedDict module
Suppose I have a list of dates in the string format, 'YYYYMMDD.' How do I sort the list in regular and reverse order?
For that particular format, you can just sort them as strings
>>> sorted(['20100405','20121209','19990606'])
['19990606', '20100405', '20121209']
>>> sorted(['20100405','20121209','19990606'], reverse=True)
['20121209', '20100405', '19990606']
This works because in that format the digits are in the order of most significant to least significant
These are the two ways:
print sorted(my_list)
print sorted(my_list, reverse=True)
The whole reason people use dates in YYYYMMDD format is so that lexicographic (string) sorting will accomplish a date sort.
Strings sort naturally. Use list.sort (in-place) or built-in sorted (copying).
Both accept a boolean parameter named reverse which defaults to False; set to True fr reverse order.
name|num|num|num|num
name|num|num|num|num
name|num|num|num|num
How i can sort this list on need me field (2,3,4,5) ?
Sorry for my enlish.
Update
Input:
str|10|20
str|1|30
Sort by first field (1,10):
str|1|30
str|10|20
Sort by second field(20,30):
str|10|20
str|1|30
I would use the operator module function "itemgetter" instead of the lambda functions. That is faster and allows multiple levels of sorting.
from operator import itemgetter
data = (line.split('|') for line in input.split('\n'))
sort_index = 1
sorted(data, key=itemgetter(sort_index))
You can sort on a specific key, which tells the sort function how to evaluate the entries to be sorted -- that is, how we decide which of two entries is bigger. In this case, we'll first split up each string by the pipe, using split (for example, "a|b|c".split("|") returns ["a", "b", "c"]) and then grab whichever entry you want.
To sort on the first "num" field:
sorted(lines, key=(lambda line : line.split("|")[1])
where lines is a list of the lines as you mention in the question. To sort on a different field, just change the number in brackets.
Assuming you start with a list of strings, start by splitting each row into a list:
data = [line.split('|') for line in input]
Then sort by whatever index you want:
sort_index = 1
sorted_data = sorted(data, key=lambda line: int(line[sort_index]))
The Python sorting guide has a lot more information.