name|num|num|num|num
name|num|num|num|num
name|num|num|num|num
How i can sort this list on need me field (2,3,4,5) ?
Sorry for my enlish.
Update
Input:
str|10|20
str|1|30
Sort by first field (1,10):
str|1|30
str|10|20
Sort by second field(20,30):
str|10|20
str|1|30
I would use the operator module function "itemgetter" instead of the lambda functions. That is faster and allows multiple levels of sorting.
from operator import itemgetter
data = (line.split('|') for line in input.split('\n'))
sort_index = 1
sorted(data, key=itemgetter(sort_index))
You can sort on a specific key, which tells the sort function how to evaluate the entries to be sorted -- that is, how we decide which of two entries is bigger. In this case, we'll first split up each string by the pipe, using split (for example, "a|b|c".split("|") returns ["a", "b", "c"]) and then grab whichever entry you want.
To sort on the first "num" field:
sorted(lines, key=(lambda line : line.split("|")[1])
where lines is a list of the lines as you mention in the question. To sort on a different field, just change the number in brackets.
Assuming you start with a list of strings, start by splitting each row into a list:
data = [line.split('|') for line in input]
Then sort by whatever index you want:
sort_index = 1
sorted_data = sorted(data, key=lambda line: int(line[sort_index]))
The Python sorting guide has a lot more information.
Related
I need to output data in a file in the following format: year-month,val. it should be sorted on year-month
for example:
2016-1,5
2016-7,1
2016-9,3
2016-11,4
2016-12,2
But, I am getting:
2016-1,5
2016-11,4
2016-12,2
2016-7,1
2016-9,3
the code is as follows:
for k,v in sorted(dictD.items()):
drow = [k,v]
writer.writerow(drow)
How to get the desired output?
Split the date at the hyphen and convert it to a tuple of numbers rather than strings.
for row in sorted(dictD.items(), key = lambda(x): map(int, x[0].split('-'))):
writer.writerow(row)
x is the (key, value) tuple returned by items(), so x[0] is the key, which is a date like '2016-1'. split splits this into the tuple ('2016', '1'), and map(int) converts that to a sequence of integers (2016, 1). Using this as the sort key will order them numerically instead of lexicographically.
Well, it's not a direct code, but I couldn't make it more simple so you may try to change the format of the month like this:
dictD = {'2016-1':5, '2017-7':1,'2016-9':3, '2016-11':4, '2016-12':2}
formatedKey = [list(dictD)[i].split('-')[0]+'-'+'{:02d}'.format(int(list(dictD)[i].split('-')[1])) \
for i in range(len(list(dictD)))]
dictD2 = dict(zip(formatedKey, list(dictD.values())))
for k,v in sorted(dictD2.items()):
drow = [k,v]
print(drow)
I didn't use the writer, but I hope this helps.
Assuming your dictionary is keyed by the YYYY-MM string and the value is the number after the comma, you can add a key argument to your sorted() call.
The key func could be:
lambda item: item[0][:5] + ('0' if len(item[0]) < 7 else '') + item[0][5:]
So your sorted call goes from:
sorted(dictD.items())
to:
sorted(dictD.items(), key=lambda item: <the rest from above>)
This leaves sorting by strings, but by adding the leading zero to the one-digit month, things come out as you want.
As a side note, you can pass a named function in as the key. You're not limited to using a lambda call.
When you pass things into sorted() without specifying a sorting algorithm, a default sort order is used. Dicts are sorted by keys (as strings), and tuples are sorted by tuple elements, starting with the first. For you, your .items() call produces a list of tuples (or at least close enough), with the key as the first element of the tuple, so the tuples get sorted by the dict keys as strings, ignoring any potential numeric value. By padding the leading zero to the one-digit months, the dates can be properly sorted as strings. The lambda call does just that -- it pads that extra '0' when necessary to allow the sorting to occur with the desired results.
Currently I'm trying to sort a list of files which were made of version numbers. For example:
0.0.0.0.py
1.0.0.0.py
1.1.0.0.py
They are all stored in a list. My idea was to use the sort method of the list in combination with a lambda expression. The lambda-expression should first remove the .py extensions and than split the string by the dots. Than casting every number to an integer and sort by them.
I know how I would do this in c#, but I have no idea how to do this with python. One problem is, how can I sort over multiple criteria? And how to embed the lambda-expression doing this?
Can anyone help me?
Thank you very much!
You can use the key argument of sorted function:
filenames = [
'1.0.0.0.py',
'0.0.0.0.py',
'1.1.0.0.py'
]
print sorted(filenames, key=lambda f: map(int, f.split('.')[:-1]))
Result:
['0.0.0.0.py', '1.0.0.0.py', '1.1.0.0.py']
The lambda splits the filename into parts, removes the last part and converts the remaining ones into integers. Then sorted uses this value as the sorting criterion.
Have your key function return a list of items. The sort is lexicographic in that case.
l = [ '1.0.0.0.py', '0.0.0.0.py', '1.1.0.0.py',]
s = sorted(l, key = lambda x: [int(y) for y in x.replace('.py','').split('.')])
print s
# read list in from memory and store as variable file_list
sorted(file_list, key = lambda x: map(int, x.split('.')[:-1]))
In case you're wondering what is going on here:
Our lambda function first takes our filename, splits it into an array delimited by periods. Then we take all of the elements of the list, minus the last element, which is our file extension. Then we apply the 'int' function to every element of the list. The returned list is then sorted by the 'sorted' function according to the elements of the list, starting at the first with ties broken by later elements in the list.
I have a problem on sorting a list, my goal is I'm trying to write a function that will sort a list of files based on their extension. For example given;
["a.c","a.py","b.py","bar.txt","foo.txt","x.c"]
desired output is;
["a.c","x.c","a.py","b.py","bar.txt","foo.txt"]
I fail when I tried to make a key parameter, I can't creating the algorithm. I tried to split() every file first, like;
def sort_file(lst):
second_list = []
for x in lst:
t = x.split(".")
second_list.append(t[1])
second_list.sort()
But I just don't know what to do now, how can I make this sorted second_list as a key parameter then I can sort files based on their extension?
I fail when I tried to make a key parameter
key argument takes a function (callable, rather), that returns the object to compare against when given the list item as input. In your case, the x.split(".")[1] is the object to compare against. Take a look at Python's wiki entry for sorting in this fashion
Something like the below should work for you.
>>> a = ["a.c","a.py","b.py","bar.txt","foo.txt","x.c"]
>>> sorted(a, key=lambda x: x.rsplit(".", 1)[1])
['a.c', 'x.c', 'a.py', 'b.py', 'bar.txt', 'foo.txt']
As #TanveerAlam says, using rsplit(..) is better because you'd want the split to be done from right.
Working on a project for CS1, and I am close to cracking it, but this part of the code has stumped me! The object of the project is to create a list of the top 20 names in any given year by referencing a file with thousands of names on it. Each line in each file contains the name, gender, and how many times it occurs. This file is seperated by gender (so female names in order of their occurences followed by male names in order of their occurences). I have gotten the code to a point where each entry is contained within a class in a list (so this list is a long list of memory entries). Here is the code I have up to this point.
class entry():
__slots__ = ('name' , 'sex' , 'occ')
def mkEntry( name, sex, occ ):
dat = entry()
dat.name = name
dat.sex = sex
dat.occ = occ
return dat
##test = mkEntry('Mary', 'F', '7065')
##print(test.name, test.sex, test.occ)
def readFile(fileName):
fullset = []
for line in open(fileName):
val = line.split(",")
sett = mkEntry(val[0] , val[1] , int(val[2]))
fullset.append(sett)
return fullset
fullset = readFile("names/yob1880.txt")
print(fullset)
What I am wondering if I can do at this point is can I sort this list via usage of sort() or other functions, but sort the list by their occurrences (dat.occ in each entry) so in the end result I will have a list sorted independently of gender and then at that point I can print the first entries in the list, as they should be what I am seeking. Is it possible to sort the list like this?
Yes, you can sort lists of objects using sort(). sort() takes a function as an optional argument key. The key function is applied to each element in the list before making the comparisons. For example, if you wanted to sort a list of integers by their absolute value, you could do the following
>>> a = [-5, 4, 6, -2, 3, 1]
>>> a.sort(key=abs)
>>> a
[1, -2, 3, 4, -5, 6]
In your case, you need a custom key that will extract the number of occurrences for each object, e.g.
def get_occ(d): return d.occ
fullset.sort(key=get_occ)
(you could also do this using an anonymous function: fullset.sort(key=lambda d: d.occ)). Then you just need to extract the top 20 elements from this list.
Note that by default sort returns elements in ascending order, which you can manipulate e.g. fullset.sort(key=get_occ, reverse=True)
This sorts the list by using the occ property in descending order:
fullset.sort(key=lambda x: x.occ, reverse=True)
You mean you want to sort the list only by the occ? sort() has a parameter named key, you can do like this:
fullset.sort(key=lambda x: x.occ)
I think you just want to sort on the value of the 'occ' attribute of each object, right? You just need to use the key keyword argument to any of the various ordering functions that Python has available. For example
getocc = lambda entry: entry.occ
sorted(fullset, key=getocc)
# or, for in-place sorting
fullset.sort(key=getocc)
or perhaps some may think it's more pythonic to use operator.attrgetter instead of a custom lambda:
import operator
getocc = operator.attrgetter('occ')
sorted(fullset, key=getocc)
But it sounds like the list is pretty big. If you only want the first few entries in the list, sorting may be an unnecessarily expensive operation. For example, if you only want the first value you can get that in O(N) time:
min(fullset, key=getocc) # Same getocc as above
If you want the first three, say, you can use a heap instead of sorting.
import heapq
heapq.nsmallest(3, fullset, key=getocc)
A heap is a useful data structure for getting a slice of ordered elements from a list without sorting the whole list. The above is equivalent to sorted(fullset, key=getocc)[:3], but faster if the list is large.
Hopefully it's obvious you can get the three largest with heapq.nlargest and the same arguments. Likewise you can reverse any of the sorts or replace min with max.
I am trying to sort a dictionary of tuples where the second item contains the dates to be sorted.
The dictionary looks something like this:
time_founded: {Soonr: 2005-5-1, SpePharm: 2006-9-1, and so on...}
Right now I am trying to sort the dates like this:
dict = sortedLists[category]
sortedtime = sorted(dict.iteritems(), key=lambda d: map(int, d.split('-')))
But I am getting an error because it is trying to sort the tuples (Soonr: 2005-5-1) instead of just the date.
How can I update the sorting parameters to tell it to only look at the date on not the whole tuple?
Try this:
sortedtime = sorted(dict.iteritems(), key=lambda d: map(int, d[1].split('-')))
The only difference is the [1] which selects out the value portion of the item.
Not sure if you have control over the data structure, but if you do, please change the first element of the data structure to be the date you need to sort by. Python sorts iterables by the first element it finds, or, it can sort by a key you define.
I'd recommend you make it a tuple, with a dictionary in it, to simplify things:
>>> dates = [ ('2006-9-1', {Soonr:'2005-5-1'}),
('2006-8-9', {Soonr:'2005-8-28'})
]
>>> dates.sort() #will sort by the first element of the list items, if iterable...
>>> dates
[('2006-8-9', {Soonr:'2005-8-28'}), ('2006-9-1', {Soonr:'2005-5-1'})]