I have a problem on sorting a list, my goal is I'm trying to write a function that will sort a list of files based on their extension. For example given;
["a.c","a.py","b.py","bar.txt","foo.txt","x.c"]
desired output is;
["a.c","x.c","a.py","b.py","bar.txt","foo.txt"]
I fail when I tried to make a key parameter, I can't creating the algorithm. I tried to split() every file first, like;
def sort_file(lst):
second_list = []
for x in lst:
t = x.split(".")
second_list.append(t[1])
second_list.sort()
But I just don't know what to do now, how can I make this sorted second_list as a key parameter then I can sort files based on their extension?
I fail when I tried to make a key parameter
key argument takes a function (callable, rather), that returns the object to compare against when given the list item as input. In your case, the x.split(".")[1] is the object to compare against. Take a look at Python's wiki entry for sorting in this fashion
Something like the below should work for you.
>>> a = ["a.c","a.py","b.py","bar.txt","foo.txt","x.c"]
>>> sorted(a, key=lambda x: x.rsplit(".", 1)[1])
['a.c', 'x.c', 'a.py', 'b.py', 'bar.txt', 'foo.txt']
As #TanveerAlam says, using rsplit(..) is better because you'd want the split to be done from right.
Related
Say I have a dictionary and then I have a list that contains the dictionary's keys. Is there a way to sort the list based off of the dictionaries values?
I have been trying this:
trial_dict = {'*':4, '-':2, '+':3, '/':5}
trial_list = ['-','-','+','/','+','-','*']
I went to use:
sorted(trial_list, key=trial_dict.values())
And got:
TypeError: 'list' object is not callable
Then I went to go create a function that could be called with trial_dict.get():
def sort_help(x):
if isinstance(x, dict):
for i in x:
return x[i]
sorted(trial_list, key=trial_dict.get(sort_help(trial_dict)))
I don't think the sort_help function is having any affect on the sort though. I'm not sure if using trial_dict.get() is the correct way to go about this either.
Yes dict.get is the correct (or at least, the simplest) way:
sorted(trial_list, key=trial_dict.get)
As Mark Amery commented, the equivalent explicit lambda:
sorted(trial_list, key=lambda x: trial_dict[x])
might be better, for at least two reasons:
the sort expression is visible and immediately editable
it doesn't suppress errors (when the list contains something that is not in the dict).
The key argument in the sorted builtin function (or the sort method of lists) has to be a function that maps members of the list you're sorting to the values you want to sort by. So you want this:
sorted(trial_list, key=lambda x: trial_dict[x])
Currently I'm trying to sort a list of files which were made of version numbers. For example:
0.0.0.0.py
1.0.0.0.py
1.1.0.0.py
They are all stored in a list. My idea was to use the sort method of the list in combination with a lambda expression. The lambda-expression should first remove the .py extensions and than split the string by the dots. Than casting every number to an integer and sort by them.
I know how I would do this in c#, but I have no idea how to do this with python. One problem is, how can I sort over multiple criteria? And how to embed the lambda-expression doing this?
Can anyone help me?
Thank you very much!
You can use the key argument of sorted function:
filenames = [
'1.0.0.0.py',
'0.0.0.0.py',
'1.1.0.0.py'
]
print sorted(filenames, key=lambda f: map(int, f.split('.')[:-1]))
Result:
['0.0.0.0.py', '1.0.0.0.py', '1.1.0.0.py']
The lambda splits the filename into parts, removes the last part and converts the remaining ones into integers. Then sorted uses this value as the sorting criterion.
Have your key function return a list of items. The sort is lexicographic in that case.
l = [ '1.0.0.0.py', '0.0.0.0.py', '1.1.0.0.py',]
s = sorted(l, key = lambda x: [int(y) for y in x.replace('.py','').split('.')])
print s
# read list in from memory and store as variable file_list
sorted(file_list, key = lambda x: map(int, x.split('.')[:-1]))
In case you're wondering what is going on here:
Our lambda function first takes our filename, splits it into an array delimited by periods. Then we take all of the elements of the list, minus the last element, which is our file extension. Then we apply the 'int' function to every element of the list. The returned list is then sorted by the 'sorted' function according to the elements of the list, starting at the first with ties broken by later elements in the list.
Working on a project for CS1, and I am close to cracking it, but this part of the code has stumped me! The object of the project is to create a list of the top 20 names in any given year by referencing a file with thousands of names on it. Each line in each file contains the name, gender, and how many times it occurs. This file is seperated by gender (so female names in order of their occurences followed by male names in order of their occurences). I have gotten the code to a point where each entry is contained within a class in a list (so this list is a long list of memory entries). Here is the code I have up to this point.
class entry():
__slots__ = ('name' , 'sex' , 'occ')
def mkEntry( name, sex, occ ):
dat = entry()
dat.name = name
dat.sex = sex
dat.occ = occ
return dat
##test = mkEntry('Mary', 'F', '7065')
##print(test.name, test.sex, test.occ)
def readFile(fileName):
fullset = []
for line in open(fileName):
val = line.split(",")
sett = mkEntry(val[0] , val[1] , int(val[2]))
fullset.append(sett)
return fullset
fullset = readFile("names/yob1880.txt")
print(fullset)
What I am wondering if I can do at this point is can I sort this list via usage of sort() or other functions, but sort the list by their occurrences (dat.occ in each entry) so in the end result I will have a list sorted independently of gender and then at that point I can print the first entries in the list, as they should be what I am seeking. Is it possible to sort the list like this?
Yes, you can sort lists of objects using sort(). sort() takes a function as an optional argument key. The key function is applied to each element in the list before making the comparisons. For example, if you wanted to sort a list of integers by their absolute value, you could do the following
>>> a = [-5, 4, 6, -2, 3, 1]
>>> a.sort(key=abs)
>>> a
[1, -2, 3, 4, -5, 6]
In your case, you need a custom key that will extract the number of occurrences for each object, e.g.
def get_occ(d): return d.occ
fullset.sort(key=get_occ)
(you could also do this using an anonymous function: fullset.sort(key=lambda d: d.occ)). Then you just need to extract the top 20 elements from this list.
Note that by default sort returns elements in ascending order, which you can manipulate e.g. fullset.sort(key=get_occ, reverse=True)
This sorts the list by using the occ property in descending order:
fullset.sort(key=lambda x: x.occ, reverse=True)
You mean you want to sort the list only by the occ? sort() has a parameter named key, you can do like this:
fullset.sort(key=lambda x: x.occ)
I think you just want to sort on the value of the 'occ' attribute of each object, right? You just need to use the key keyword argument to any of the various ordering functions that Python has available. For example
getocc = lambda entry: entry.occ
sorted(fullset, key=getocc)
# or, for in-place sorting
fullset.sort(key=getocc)
or perhaps some may think it's more pythonic to use operator.attrgetter instead of a custom lambda:
import operator
getocc = operator.attrgetter('occ')
sorted(fullset, key=getocc)
But it sounds like the list is pretty big. If you only want the first few entries in the list, sorting may be an unnecessarily expensive operation. For example, if you only want the first value you can get that in O(N) time:
min(fullset, key=getocc) # Same getocc as above
If you want the first three, say, you can use a heap instead of sorting.
import heapq
heapq.nsmallest(3, fullset, key=getocc)
A heap is a useful data structure for getting a slice of ordered elements from a list without sorting the whole list. The above is equivalent to sorted(fullset, key=getocc)[:3], but faster if the list is large.
Hopefully it's obvious you can get the three largest with heapq.nlargest and the same arguments. Likewise you can reverse any of the sorts or replace min with max.
Say I have a dictionary and then I have a list that contains the dictionary's keys. Is there a way to sort the list based off of the dictionaries values?
I have been trying this:
trial_dict = {'*':4, '-':2, '+':3, '/':5}
trial_list = ['-','-','+','/','+','-','*']
I went to use:
sorted(trial_list, key=trial_dict.values())
And got:
TypeError: 'list' object is not callable
Then I went to go create a function that could be called with trial_dict.get():
def sort_help(x):
if isinstance(x, dict):
for i in x:
return x[i]
sorted(trial_list, key=trial_dict.get(sort_help(trial_dict)))
I don't think the sort_help function is having any affect on the sort though. I'm not sure if using trial_dict.get() is the correct way to go about this either.
Yes dict.get is the correct (or at least, the simplest) way:
sorted(trial_list, key=trial_dict.get)
As Mark Amery commented, the equivalent explicit lambda:
sorted(trial_list, key=lambda x: trial_dict[x])
might be better, for at least two reasons:
the sort expression is visible and immediately editable
it doesn't suppress errors (when the list contains something that is not in the dict).
The key argument in the sorted builtin function (or the sort method of lists) has to be a function that maps members of the list you're sorting to the values you want to sort by. So you want this:
sorted(trial_list, key=lambda x: trial_dict[x])
name|num|num|num|num
name|num|num|num|num
name|num|num|num|num
How i can sort this list on need me field (2,3,4,5) ?
Sorry for my enlish.
Update
Input:
str|10|20
str|1|30
Sort by first field (1,10):
str|1|30
str|10|20
Sort by second field(20,30):
str|10|20
str|1|30
I would use the operator module function "itemgetter" instead of the lambda functions. That is faster and allows multiple levels of sorting.
from operator import itemgetter
data = (line.split('|') for line in input.split('\n'))
sort_index = 1
sorted(data, key=itemgetter(sort_index))
You can sort on a specific key, which tells the sort function how to evaluate the entries to be sorted -- that is, how we decide which of two entries is bigger. In this case, we'll first split up each string by the pipe, using split (for example, "a|b|c".split("|") returns ["a", "b", "c"]) and then grab whichever entry you want.
To sort on the first "num" field:
sorted(lines, key=(lambda line : line.split("|")[1])
where lines is a list of the lines as you mention in the question. To sort on a different field, just change the number in brackets.
Assuming you start with a list of strings, start by splitting each row into a list:
data = [line.split('|') for line in input]
Then sort by whatever index you want:
sort_index = 1
sorted_data = sorted(data, key=lambda line: int(line[sort_index]))
The Python sorting guide has a lot more information.