sorting list of a sentence and number - python

I have checked several of the answers on how to sort lists in python, but I can't figure this one out.
Let's say I have a list like this:
['Today is a good day,1', 'yesterday was a strange day,2', 'feeling hopeful,3']
Is there a way to sort by the number after each sentence?
I am trying to learn this stuff on my own, so I tried stuff like:
def sortMyList(string):
return len(string)-1
sortedList = sorted(MyList, key=sortMyList())
But of course it doesn't work becaue sortMyList expects one parameter.

Since no one has commented on your coding attempts so far:
def sortMyList(string):
return len(string)-1
sortedList = sorted(MyList, key=sortMyList())
You are on your way, but there are a few issues. First, the key argument expects a function. That function should be sortMyList. sortMyList() would be the result of calling a function - and besides, your function has a parameter (as it should), so calling it with no arguments wouldn't work. Just refer to the function itself.
sortedList = sorted(MyList, key=sortMyList)
Next, you need to tell sorted what is actually being compared when you compare two strings. len(string)-1 gets the length of the string and subtracts one. This would have the effect of sorting the strings by their length, which isn't what you're looking for. You want the character in the string at that index, so sorted will look at all those characters to form a basis for comparison.
def sortMyList(string):
return string[len(string)-1]
Next, you can use a negative index instead of calculating the length of the string, to directly get the last character:
def sortMyList(string):
return string[-1]
Next, we'd like to handle multi-digit numbers. It looks like there's a comma right before the number, so we'll split on that, starting from the right (in case the sentence itself has a comma). We only need the first split, so we'll specify a maxsplit of 1:
def sortMyList(string):
return string.rsplit(',', maxsplit=1)[1]
This will run into a problem: these "numbers" are actually still strings, so when you compare them, it will do so alphabetically, putting "10" before "2" and so on. To fix this, turn the number into an integer before returning it:
def sortMyList(string):
return int(string.rsplit(',', maxsplit=1)[1])
Putting it all together:
def sortMyList(string):
return int(string.rsplit(',', maxsplit=1)[1])
sortedList = sorted(MyList, key=sortMyList)

You can do this
>>> sorted(l, key=lambda x : int(x.split(',')[-1]))
['Today is a good day,1', 'yesterday was a strange day,2', 'feeling hopeful,3']
>>>
This would also work if you happen to have numbers in your string that have more than one digit
>>> l = ['Today is a good day,12', 'yesterday was a strange day,21', 'feeling hopeful,23']
>>> sorted(l, key=lambda x : int(x.split(',')[1]))
['Today is a good day,12', 'yesterday was a strange day,21', 'feeling hopeful,23'] # still works
>>> sorted(l, key=lambda x : x[-1])
['yesterday was a strange day,21', 'Today is a good day,12', 'feeling hopeful,23'] # doesn't work in this scenario

This worked for me:
sorted(myList, key=lambda x: x[-1])
If you need to go into double digits:
sorted(myList, key=lambda x: int(x.split(',')[1]))

Related

Sort python list based on substring in separate list

I have 2 lists..
list_a = ['Grapes/testfile.csv','Apples/testfile.csv','Pears/testfile.csv','Pears/testfile2.csv']
ref_list = ['Pears','Grapes','Apples']
I need to use ref_list list to order list_a.
More context, list_a will always have the string from ref_list before the / but the length of ref_list will never match that of list_a.. Also I dont want to order reverse alphabetically.
Expected Output:
ordered_list = ['Pears/testfile.csv','Pears/testfile2.csv','Grapes/testfile.csv','Apples/testfile.csv']
I've tried many variations, referencing SO but I cant get this to work.. I just cant work out a way to reference the first list here is my attempt which obviously doesn't work as its not referencing ref_list but my logic is to use string method startswith()
Something like:?
ordered_list = sorted(list_a, key = lambda x: x.startswith())
Use split() to extract the word before /.
Then use index() to get the position of the starting word in ref_list, and use that for the sorting key.
ordered_list = sorted(list_a, key = lambda x: ref_list.index(x.split('/')[0]))
This answer may not be the most elegant, but it works:
sorted_list = list()
for key in ref_list:
sorted_list += [sorted_value for sorted_value in list_a if \
sorted_value.startswith(key)]

python sort list by substr

I would like to sort a list by a substr of the contents.
Imaginbe the following list and I would like it to be sorted by the number after the '-':
>>> lst = ['ABC-789','DEF-123','GHI-456']
>>> sorted(lst,key=lambda x=lst.split('-') x[1])
This gives me:
sorted(lst, key=lambda x=lst.split('-');x[1])
^
SyntaxError: expression cannot contain assignment, perhaps you meant "=="?
How can I achieve this?
This should work:
sorted(lst, key=lambda x: int(x.split('-')[1]))
Output:
['DEF-123', 'GHI-456', 'ABC-789']
Consider this corrected version:
lst = ['ABC-789','DEF-123','GHI-456']
lst = sorted(lst,key=lambda x: int(x.split('-')[1]))
print(lst) # ['DEF-123', 'GHI-456', 'ABC-789']
You had two issues here. First, your lambda syntax was off, and you want lambda x: <expr in x>. Second, since you want to sort numerically, after extracting the string to the right of the hyphen, you also should be casting to integer. Coincidentally, you can get away with this now, because all numbers are the same text width (3 digits). But, should the numbers not all be the same width, sorting by text might not give a numerical sort.
lst = ['ABC-789','DEF-123','GHI-456']
lst.sort(key=lambda x:x.split('-')[1])
print(lst)
Corrected the lambda and split syntax. Also, I have used the list.sort method instead of sorted in case the original list needs to be sorted and changed.

add letter to string at specific position in loop (problem with sort) [duplicate]

This question already has answers here:
Is there a built in function for string natural sort?
(23 answers)
Closed 3 years ago.
I have a problem with sort. Want to sort the list like
['asd_1qwer', 'asd_14qwer', 'asd_26qwer', 'asd_5qwer']
I found out that i need to add zeros to 1 and 5.
['asd_01qwer', 'asd_05qwer', 'asd_14qwer', 'asd_26qwer']
Dont know how to add it to right position because asd is not static.
list = ['asd_14qwer','asd_5qwer','asd_26qwer','asd_1qwer']
list.sort()
for i in list:
tempo = i.split('_')[-1].split('qwer')[0]
if len(tempo) == 1:
i[:4] + '0' + i[4:]
Edit
Need to add 0 to 1-9 and qwer list constant over all labels.
Actually, if your goal is to sort the list according to the numerical part of the strings, you don't need to zero-pad these numerical part, you just need to provide key function to sort() that extracts the numeric part as an integer:
l = ['asd_14qwer','asd_5qwer','asd_26qwer','asd_1qwer']
l.sort(key=lambda x: int(x.split('_')[-1].rstrip('qwer')))
Please note that this code does not depend on the characters preceding _, only on the fact that the numerical part is between _ and qwer.
You can sort also without adding zeros:
list = ['asd_14qwer','asd_5qwer','asd_26qwer','asd_1qwer']
list.sort(key=lambda i: int(i[(i.index('_') + 1):-4]))
print(list)
Output:
['asd_1qwer', 'asd_5qwer', 'asd_14qwer', 'asd_26qwer']
you can use:
my_list.sort(key=lambda x: int(x[4:][:-4]))
or you can use a regular expression:
import re
my_list.sort(key=lambda x: int(re.search(r'\d+', x).group()))
for i in range(len(list)):
if len(list[i])==9:
list[i] = list[i][:4]+'0'+list[i][4:]
This will add the zeroes at the required places in the list
a 'natural sort' perhaps
import re
def natsort(lst):
"""natural sort"""
lst = [str(i) for i in lst]
import re
convert = lambda text: int(text) if text.isdigit() else text
a_key = lambda key: [convert(c) for c in re.split('([0-9]+)', key)]
return sorted(lst, key=a_key)
lst = ['asd_1qwer', 'asd_14qwer', 'asd_26qwer', 'asd_5qwer']
natsort(lst)
Out[3]: ['asd_1qwer', 'asd_5qwer', 'asd_14qwer', 'asd_26qwer']
Please don't shadow build-in names. I renamed list to my_list below. :)
Also, based on your answers in the comments, your approach was mostly correct, but you don't need to add padding 0 if you're sorting with only numbers - you just need to parse that part as a number! No matter the length - 3, 21, or 111, it will sort correctly then.
sort function has a parameter key where you can set what should be used to sort the elements in the list - that's where we need to put our snippet that extracts and parses the number:
my_list = ['asd_14qwer','asd_5qwer','asd_26qwer','asd_1qwer']
my_list.sort(key=lambda word: int(word.split('_')[-1].split('qwer')[0]))
As you can see, the snippet is similar to what you tried - I just wrapped it in the int call. :)
Result:
['asd_1qwer', 'asd_5qwer', 'asd_14qwer', 'asd_26qwer']

How to get few specific strings from a list of string?

I have a list of strings and a function getVowel.
This function return the number of vowels present in the string.
Here is the sample code.
s = "hello ,this is a string"
no = getVowel(s)
lis = []
lis.append(s)
suppose I have got n no of string in the list lis.
How can I get the top 3 string with maximum no of vowels in them.
sorted(lis, key=lambda x:getVowel(x), reverse=True)[:3]
Something like that. BTW, per Python code conventions the correct name of the function should be get_vowel.
based on this answer:
https://stackoverflow.com/a/9887456/4671300
from heapq import nlargest
results = nlargest(3, lis, key=getVowel)
Assuming your function getVowel is indeed working, try the following:
sorted(lis, key=lambda x: getVowel(x), reverse=True)[:3]
sorted documentation : https://docs.python.org/3.6/library/functions.html?highlight=sorted#sorted
It would be more efficient to take the last three elements from the sorted list rather than reversing it, then taking the first three.
To do this, just index with [-3:] as so:
sorted(lis, key=lambda I: getVowel(i))[-3:]

python sort strings with digits at the end

what is the easiest way to sort a list of strings with digits at the end where some have 3 digits and some have 4:
>>> list = ['asdf123', 'asdf1234', 'asdf111', 'asdf124']
>>> list.sort()
>>> print list
['asdf111', 'asdf123', 'asdf1234', 'asdf124']
should put the 1234 one on the end. is there an easy way to do this?
is there an easy way to do this?
Yes
You can use the natsort module.
>>> from natsort import natsorted
>>> natsorted(['asdf123', 'asdf1234', 'asdf111', 'asdf124'])
['asdf111', 'asdf123', 'asdf124', 'asdf1234']
Full disclosure, I am the package's author.
is there an easy way to do this?
No
It's perfectly unclear what the real rules are. The "some have 3 digits and some have 4" isn't really a very precise or complete specification. All your examples show 4 letters in front of the digits. Is this always true?
import re
key_pat = re.compile(r"^(\D+)(\d+)$")
def key(item):
m = key_pat.match(item)
return m.group(1), int(m.group(2))
That key function might do what you want. Or it might be too complex. Or maybe the pattern is really r"^(.*)(\d{3,4})$" or maybe the rules are even more obscure.
>>> data= ['asdf123', 'asdf1234', 'asdf111', 'asdf124']
>>> data.sort( key=key )
>>> data
['asdf111', 'asdf123', 'asdf124', 'asdf1234']
What you're probably describing is called a Natural Sort, or a Human Sort. If you're using Python, you can borrow from Ned's implementation.
The algorithm for a natural sort is approximately as follows:
Split each value into alphabetical "chunks" and numerical "chunks"
Sort by the first chunk of each value
If the chunk is alphabetical, sort it as usual
If the chunk is numerical, sort by the numerical value represented
Take the values that have the same first chunk and sort them by the second chunk
And so on
l = ['asdf123', 'asdf1234', 'asdf111', 'asdf124']
l.sort(cmp=lambda x,y:cmp(int(x[4:]), int(y[4:]))
You need a key function. You're willing to specify 3 or 4 digits at the end and I have a feeling that you want them to compare numerically.
sorted(list_, key=lambda s: (s[:-4], int(s[-4:])) if s[-4] in '0123456789' else (s[:-3], int(s[-3:])))
Without the lambda and conditional expression that's
def key(s):
if key[-4] in '0123456789':
return (s[:-4], int(s[-4:]))
else:
return (s[:-3], int(s[-3:]))
sorted(list_, key=key)
This just takes advantage of the fact that tuples sort by the first element, then the second. So because the key function is called to get a value to compare, the elements will now be compared like the tuples returned by the key function. For example, 'asdfbad123' will compare to 'asd7890' as ('asdfbad', 123) compares to ('asd', 7890). If the last 3 characters of a string aren't in fact digits, you'll get a ValueError which is perfectly appropriate given the fact that you passed it data that doesn't fit the specs it was designed for.
The issue is that the sorting is alphabetical here since they are strings. Each sequence of character is compared before moving to next character.
>>> 'a1234' < 'a124' <----- positionally '3' is less than '4'
True
>>>
You will need to due numeric sorting to get the desired output.
>>> x = ['asdf123', 'asdf1234', 'asdf111', 'asdf124']
>>> y = [ int(t[4:]) for t in x]
>>> z = sorted(y)
>>> z
[111, 123, 124, 1234]
>>> l = ['asdf'+str(t) for t in z]
>>> l
['asdf111', 'asdf123', 'asdf124', 'asdf1234']
>>>
L.sort(key=lambda s:int(''.join(filter(str.isdigit,s[-4:]))))
rather than splitting each line myself, I ask python to do it for me with re.findall():
import re
import sys
def SortKey(line):
result = []
for part in re.findall(r'\D+|\d+', line):
try:
result.append(int(part, 10))
except (TypeError, ValueError) as _:
result.append(part)
return result
print ''.join(sorted(sys.stdin.readlines(), key=SortKey)),

Categories

Resources