I have a list containing some links: ["http://link1.rar", "http://link1.rev","http://link2.rar","http://link2.rev"]
Is there a way to sort them, in order to look like:
["http://link1.rar", "http://link2.rar", "http://link1.rev", "http://link2.rev"]
I've tried with this:
def order(x):
if "rar" not in x:
return x
else:
return ""
new_links = sorted(links, key=order)
But in this way, rev links are sorted from the highest.
You want to solve according to multiple criteria: first, the file extension; then, the whole string.
The usual trick to sort according to multiple criteria is to use a tuple as the key. Tuples are sorted in lexicographical order, which means the first criterion is compared first; and in case of a tie, the second criterion is compared.
For instance, the following key returns a tuple such as (False, 'http://link1.rar') or (True, 'http://link1.rev'):
new_links = sorted(links, key=lambda x: ('rar' not in x, x))
Alternatively, you could use str.rsplit to split the string on the last '.' and get a tuple such as ('http://link1', 'rev'). Since you want the extension to be the first criterion, use slicing [::-1] to reverse the order of the tuple:
new_links = sorted(links, key=lambda x: x.rsplit('.', maxsplit=1)[::-1])
Note that using rsplit('.', maxsplit=1) to split on the last '.' is a bit of a hack. If the filename contains a double extension, such as '.tar.gz', only the last extension will be isolated.
One last point to consider: numbers in strings. As long as your strings contain only single digits, such as '1' or '2', sorting the strings will work as you expect. However, if you have a 'link10' somewhere in there, the order might not be the one you expect: lexicographically, 'link10' comes before 'link2', because the first character of '10' is '1'. In that case, I refer you to this related question: Is there a built in function for string natural sort?
Related
i have a list containing sentences with numbers in it, i want to sort then numerically and not alphabetically. but when i use the sort function, it sorts the elements alphabetically and not numerically, how do i fix this?
num = ['ryan has 8 apples','charles has 16 bananas','mylah has 3 watermelons']
num.sort()
print(num)
output:
['charles.....','mylah.....','ryan......']
as you can see, it is sorted alphabetically but that is not my expected output
the dots represent the rest of the sentence
expected result:
['mylah has 3 watermelons','ryan has 8 apples','charles has 16 bananas']
here's the expected output where the elements are sorted numerically and not alphabetically
You need to pass in a key to sort them on that splits the string and parses the number as a number to sort, you can also then sort alphabetically if you wish for those with the same number.
sorted(num, key=lambda x: (int(x.split()[2]), x))
or
num.sort(key=lambda x: (int(x.split()[2]), x))
You need to use key= to let sort know which value you want to assign each element for sorting.
If the strings are always that structured, you can use:
num = ['ryan has 8 apples','charles has 16 bananas','mylah has 3 watermelons']
num.sort(key=lambda x: int(x.split(' ')[2]))
print(num)
If they are more complex, take a look at regular expressions.
The solutions so far focus on a solution specific to the structure of your strings but would fail for slightly different string structures. If we extract only the digit parts of the string and then convert to an int, we'll get a value that we can pass to the sort function's key parameter:
num.sort(
key=lambda s: int(''.join(c for c in s if c.isdigit()))
)
The key parameter lets you specify some alternative aspect of your datum to sort by, in this case, a value provided by the lambda function.
I need to output data in a file in the following format: year-month,val. it should be sorted on year-month
for example:
2016-1,5
2016-7,1
2016-9,3
2016-11,4
2016-12,2
But, I am getting:
2016-1,5
2016-11,4
2016-12,2
2016-7,1
2016-9,3
the code is as follows:
for k,v in sorted(dictD.items()):
drow = [k,v]
writer.writerow(drow)
How to get the desired output?
Split the date at the hyphen and convert it to a tuple of numbers rather than strings.
for row in sorted(dictD.items(), key = lambda(x): map(int, x[0].split('-'))):
writer.writerow(row)
x is the (key, value) tuple returned by items(), so x[0] is the key, which is a date like '2016-1'. split splits this into the tuple ('2016', '1'), and map(int) converts that to a sequence of integers (2016, 1). Using this as the sort key will order them numerically instead of lexicographically.
Well, it's not a direct code, but I couldn't make it more simple so you may try to change the format of the month like this:
dictD = {'2016-1':5, '2017-7':1,'2016-9':3, '2016-11':4, '2016-12':2}
formatedKey = [list(dictD)[i].split('-')[0]+'-'+'{:02d}'.format(int(list(dictD)[i].split('-')[1])) \
for i in range(len(list(dictD)))]
dictD2 = dict(zip(formatedKey, list(dictD.values())))
for k,v in sorted(dictD2.items()):
drow = [k,v]
print(drow)
I didn't use the writer, but I hope this helps.
Assuming your dictionary is keyed by the YYYY-MM string and the value is the number after the comma, you can add a key argument to your sorted() call.
The key func could be:
lambda item: item[0][:5] + ('0' if len(item[0]) < 7 else '') + item[0][5:]
So your sorted call goes from:
sorted(dictD.items())
to:
sorted(dictD.items(), key=lambda item: <the rest from above>)
This leaves sorting by strings, but by adding the leading zero to the one-digit month, things come out as you want.
As a side note, you can pass a named function in as the key. You're not limited to using a lambda call.
When you pass things into sorted() without specifying a sorting algorithm, a default sort order is used. Dicts are sorted by keys (as strings), and tuples are sorted by tuple elements, starting with the first. For you, your .items() call produces a list of tuples (or at least close enough), with the key as the first element of the tuple, so the tuples get sorted by the dict keys as strings, ignoring any potential numeric value. By padding the leading zero to the one-digit months, the dates can be properly sorted as strings. The lambda call does just that -- it pads that extra '0' when necessary to allow the sorting to occur with the desired results.
I'm trying to take a list from an array, and split the strings to sort the list sequentially by the last series of 6 numbers (for instance '042126). To do this I would split by '.', use the second to the last split of the string [-2], and then sort matchfiles[1] with this substring.
The files should end up sorted like:
erl1.041905, erl1.041907, erl2.041908, erl1.041909, erl2.041910, etc.
Two questions: how do I specify unlimited number of splits per string (in case of longer names using additional '.'? I am using 4 splits, but this case may not hold. Else, how would I just split two times working backwards?
More importantly, I am returned an error: 'list' object is not callable. What am I doing wrong?
Thanks
matchfiles = [ [1723], ['blue.2017-09-05t15-15-07.erl1.041905.png',
'blue.2017-09-05t15-15-11.erl1.041907.png',
'blue.2017-09-05t15-15-14.erl1.041909.png',
'blue.2017-09-05t14-21-35.erl2.041908.png',
'blue.2017-09-05t14-21-38.erl2.041910.png',
'blue.2017-09-05t14-21-41.erl2.041912.png',
'blue.2017-09-05t14-21-45.erl2.041914.png'],
[09302] ]
matchtry = sorted(matchfiles[1], key = [i.split('.', 4)[-2] for i in
matchfiles[1]])
The keyargument expects a function, but you give it a list, hence the error list is not callable.
You should use split('.')[-2] which always takes the second to last element.
matchfiles = [ [1723], ['blue.2017-09-05t15-15-07.erl1.041905.png',
'blue.2017-09-05t15-15-11.erl1.041907.png',
'blue.2017-09-05t15-15-14.erl1.041909.png',
'blue.2017-09-05t14-21-35.erl2.041908.png',
'blue.2017-09-05t14-21-38.erl2.041910.png',
'blue.2017-09-05t14-21-41.erl2.041912.png',
'blue.2017-09-05t14-21-45.erl2.041914.png'],
[9302] ]
matchtry = sorted(matchfiles[1], key=lambda x: x.rsplit('.')[-2])
print(matchtry)
# ['blue.2017-09-05t15-15-07.erl1.041905.png', 'blue.2017-09-05t15-15-11.erl1.041907.png',
'blue.2017-09-05t14-21-35.erl2.041908.png', 'blue.2017-09-05t15-15-14.erl1.041909.png',
'blue.2017-09-05t14-21-38.erl2.041910.png', 'blue.2017-09-05t14-21-41.erl2.041912.png',
'blue.2017-09-05t14-21-45.erl2.041914.png']
The key parameter to sorted requires a function. [i.split('.', 4)[-2] for i in matchfiles[1]] is a list, not a function. The expected function acts on a single element from the list, so you need a function that takes a string, splits it on the '.' character, and returns the second last column, possibly converted to an integer.
Also, Python does not allow integers to begin with a zero, so you must change that [09302] to [9302]. (Beginning with 0 signifies that the number will be non-decimal. In Python 2, 0427 would be 427 octal, but in Python 3, octal number must be preceded by 0o instead. 09302 is invalid in both versions, as an octal number cannot contain 9.)
matchfiles = [ [1723], ['blue.2017-09-05t15-15-07.erl1.041905.png',
'blue.2017-09-05t15-15-11.erl1.041907.png',
'blue.2017-09-05t15-15-14.erl1.041909.png',
'blue.2017-09-05t14-21-35.erl2.041908.png',
'blue.2017-09-05t14-21-38.erl2.041910.png',
'blue.2017-09-05t14-21-41.erl2.041912.png',
'blue.2017-09-05t14-21-45.erl2.041914.png'],
[9302] ]
matchtry = sorted(matchfiles[1], key = lambda str: int(str.split('.')[-2]))
Remember that the key argument to sorted takes each element of your iterable (list in your case) and converts it to some value. The values of each element after being transformed by key determine the sort order. So a simple way to get this to work every time is to define a function that
takes one element and converts it to something that's easy to sort:
def fname_to_value(fname):
name, ext = os.path.splitext(fname) # remove extension
number = name.split('.')[-1] # Get the last set of stuff after the last '.'
return number # no need to convert to int, string compare does what you want
So now you have a function converting the filename to a sortable value. Simple supply this to sorted as the key argument and you're done.
matchtry = sorted(matchfiles[1], key = fname_to_value)
for match in matchtry:
print(match)
result:
blue.2017-09-05t15-15-07.erl1.041905.png
blue.2017-09-05t15-15-11.erl1.041907.png
blue.2017-09-05t14-21-35.erl2.041908.png
blue.2017-09-05t15-15-14.erl1.041909.png
blue.2017-09-05t14-21-38.erl2.041910.png
blue.2017-09-05t14-21-41.erl2.041912.png
blue.2017-09-05t14-21-45.erl2.041914.png
You can then process the resulting list as needed.
Yes, the issue is your key. You can use a lambda expression: https://en.wikipedia.org/wiki/Anonymous_function#Python
Imagine this as a mathematical map. The key being used to sort needs a function, so you define a lambda like:
lambda curr: curr.split('.')[-2]
This gives each current object in the list the name "curr" and applies the expression following the :.
So in your case this should do the thing:
matchtry = sorted(matchfiles[1], key=lambda curr: curr.split('.')[-2])
Currently I'm trying to sort a list of files which were made of version numbers. For example:
0.0.0.0.py
1.0.0.0.py
1.1.0.0.py
They are all stored in a list. My idea was to use the sort method of the list in combination with a lambda expression. The lambda-expression should first remove the .py extensions and than split the string by the dots. Than casting every number to an integer and sort by them.
I know how I would do this in c#, but I have no idea how to do this with python. One problem is, how can I sort over multiple criteria? And how to embed the lambda-expression doing this?
Can anyone help me?
Thank you very much!
You can use the key argument of sorted function:
filenames = [
'1.0.0.0.py',
'0.0.0.0.py',
'1.1.0.0.py'
]
print sorted(filenames, key=lambda f: map(int, f.split('.')[:-1]))
Result:
['0.0.0.0.py', '1.0.0.0.py', '1.1.0.0.py']
The lambda splits the filename into parts, removes the last part and converts the remaining ones into integers. Then sorted uses this value as the sorting criterion.
Have your key function return a list of items. The sort is lexicographic in that case.
l = [ '1.0.0.0.py', '0.0.0.0.py', '1.1.0.0.py',]
s = sorted(l, key = lambda x: [int(y) for y in x.replace('.py','').split('.')])
print s
# read list in from memory and store as variable file_list
sorted(file_list, key = lambda x: map(int, x.split('.')[:-1]))
In case you're wondering what is going on here:
Our lambda function first takes our filename, splits it into an array delimited by periods. Then we take all of the elements of the list, minus the last element, which is our file extension. Then we apply the 'int' function to every element of the list. The returned list is then sorted by the 'sorted' function according to the elements of the list, starting at the first with ties broken by later elements in the list.
I have a list of lists that looks like this:
[['10.2100', '0.93956088E+01'],
['11.1100', '0.96414905E+01'],
['12.1100', '0.98638361E+01'],
['14.1100', '0.12764182E+02'],
['16.1100', '0.16235739E+02'],
['18.1100', '0.11399972E+02'],
['20.1100', '0.76444933E+01'],
['25.1100', '0.37823686E+01'],
['30.1100', '0.23552237E+01'],...]
(here it looks as if it is already ordered, but some of the rest of the elements not included here to avoid a huge list, are not in order)
and I want to sort it by the first element of each pair, I have seen several very similar questions, but in all the cases the examples are with integers, I don't know if that is why when I use the list.sort(key=lambda x: x[0]) or the sorter, or the version with the operator.itemgetter(0) I get the following:
[['10.2100', '0.93956088E+01'],
['100.1100', '0.33752517E+00'],
['11.1100', '0.96414905E+01'],
['110.1100', '0.25774972E+00'],
['12.1100', '0.98638361E+01'],
['14.1100', '0.12764182E+02'],
['14.6100', '0.14123326E+02'],
['15.1100', '0.15451733E+02'],
['16.1100', '0.16235739E+02'],
['16.6100', '0.15351242E+02'],
['17.1100', '0.14040859E+02'],
['18.1100', '0.11399972E+02'], ...]
apparently what is doing is sorting by the first character appearing in the first element of each pair.
Is there a way of using list.sort or sorted() for ordering this pairs with respect to the first element?
dont use list as a variable name!
some_list.sort(key=lambda x: float(x[0]) )
will convert the first element to a float and comparit numerically instead of alphabetically
(note the cast to float is only for comparing... the item is still a string in the list)