Working on a project for CS1, and I am close to cracking it, but this part of the code has stumped me! The object of the project is to create a list of the top 20 names in any given year by referencing a file with thousands of names on it. Each line in each file contains the name, gender, and how many times it occurs. This file is seperated by gender (so female names in order of their occurences followed by male names in order of their occurences). I have gotten the code to a point where each entry is contained within a class in a list (so this list is a long list of memory entries). Here is the code I have up to this point.
class entry():
__slots__ = ('name' , 'sex' , 'occ')
def mkEntry( name, sex, occ ):
dat = entry()
dat.name = name
dat.sex = sex
dat.occ = occ
return dat
##test = mkEntry('Mary', 'F', '7065')
##print(test.name, test.sex, test.occ)
def readFile(fileName):
fullset = []
for line in open(fileName):
val = line.split(",")
sett = mkEntry(val[0] , val[1] , int(val[2]))
fullset.append(sett)
return fullset
fullset = readFile("names/yob1880.txt")
print(fullset)
What I am wondering if I can do at this point is can I sort this list via usage of sort() or other functions, but sort the list by their occurrences (dat.occ in each entry) so in the end result I will have a list sorted independently of gender and then at that point I can print the first entries in the list, as they should be what I am seeking. Is it possible to sort the list like this?
Yes, you can sort lists of objects using sort(). sort() takes a function as an optional argument key. The key function is applied to each element in the list before making the comparisons. For example, if you wanted to sort a list of integers by their absolute value, you could do the following
>>> a = [-5, 4, 6, -2, 3, 1]
>>> a.sort(key=abs)
>>> a
[1, -2, 3, 4, -5, 6]
In your case, you need a custom key that will extract the number of occurrences for each object, e.g.
def get_occ(d): return d.occ
fullset.sort(key=get_occ)
(you could also do this using an anonymous function: fullset.sort(key=lambda d: d.occ)). Then you just need to extract the top 20 elements from this list.
Note that by default sort returns elements in ascending order, which you can manipulate e.g. fullset.sort(key=get_occ, reverse=True)
This sorts the list by using the occ property in descending order:
fullset.sort(key=lambda x: x.occ, reverse=True)
You mean you want to sort the list only by the occ? sort() has a parameter named key, you can do like this:
fullset.sort(key=lambda x: x.occ)
I think you just want to sort on the value of the 'occ' attribute of each object, right? You just need to use the key keyword argument to any of the various ordering functions that Python has available. For example
getocc = lambda entry: entry.occ
sorted(fullset, key=getocc)
# or, for in-place sorting
fullset.sort(key=getocc)
or perhaps some may think it's more pythonic to use operator.attrgetter instead of a custom lambda:
import operator
getocc = operator.attrgetter('occ')
sorted(fullset, key=getocc)
But it sounds like the list is pretty big. If you only want the first few entries in the list, sorting may be an unnecessarily expensive operation. For example, if you only want the first value you can get that in O(N) time:
min(fullset, key=getocc) # Same getocc as above
If you want the first three, say, you can use a heap instead of sorting.
import heapq
heapq.nsmallest(3, fullset, key=getocc)
A heap is a useful data structure for getting a slice of ordered elements from a list without sorting the whole list. The above is equivalent to sorted(fullset, key=getocc)[:3], but faster if the list is large.
Hopefully it's obvious you can get the three largest with heapq.nlargest and the same arguments. Likewise you can reverse any of the sorts or replace min with max.
Related
I want to loop through my set in an order. I know sets are unordered and will have to sort it into a list and have done so by sorting the set numerically, but I now want to try to sort it in the order the numbers were first appended to the set. Is this possible to do after the set has been created or will this require me to the numbers to populate something other than a set? The order will need to keep the order the numbers were first appended to the set
The easiest way, short of augmenting the elements inserted with some sort of ordering, is probably to use the keys of an ordereddict to store & retreive the elements of your set, with dummy values mapped to these keys.
from collections import OrderedDict
seq = [6, 7, 4, 3, 2, 1, 5, 0]
my_set = OrderedDict()
for elt in seq:
my_set[elt] = True
You can now iterate or retrieve the keys in the order you inserted them. You get the same properties as a set, i/e uniqueness, fast insertion, retrieval, and contains; what you don't get are specific set operations like symmetric difference, etc...
No, a set is not ordered. There is no way of finding out in which order the elements were appended to the set.
However, you could use a list for that. Every time you add an element to the set append it to the list as well and then you know the order.
But beware: a set only contains each same element once whereas a list can contain the same element multiple times. I am not sure how this effects your use of this feature.
You could work around it by only appending to the list if the element is not yet in the list.
s = set()
l = []
elem = 1
if elem not in l:
l.append(elem)
s.add(elem)
print(s)
print(l)
Currently I'm trying to sort a list of files which were made of version numbers. For example:
0.0.0.0.py
1.0.0.0.py
1.1.0.0.py
They are all stored in a list. My idea was to use the sort method of the list in combination with a lambda expression. The lambda-expression should first remove the .py extensions and than split the string by the dots. Than casting every number to an integer and sort by them.
I know how I would do this in c#, but I have no idea how to do this with python. One problem is, how can I sort over multiple criteria? And how to embed the lambda-expression doing this?
Can anyone help me?
Thank you very much!
You can use the key argument of sorted function:
filenames = [
'1.0.0.0.py',
'0.0.0.0.py',
'1.1.0.0.py'
]
print sorted(filenames, key=lambda f: map(int, f.split('.')[:-1]))
Result:
['0.0.0.0.py', '1.0.0.0.py', '1.1.0.0.py']
The lambda splits the filename into parts, removes the last part and converts the remaining ones into integers. Then sorted uses this value as the sorting criterion.
Have your key function return a list of items. The sort is lexicographic in that case.
l = [ '1.0.0.0.py', '0.0.0.0.py', '1.1.0.0.py',]
s = sorted(l, key = lambda x: [int(y) for y in x.replace('.py','').split('.')])
print s
# read list in from memory and store as variable file_list
sorted(file_list, key = lambda x: map(int, x.split('.')[:-1]))
In case you're wondering what is going on here:
Our lambda function first takes our filename, splits it into an array delimited by periods. Then we take all of the elements of the list, minus the last element, which is our file extension. Then we apply the 'int' function to every element of the list. The returned list is then sorted by the 'sorted' function according to the elements of the list, starting at the first with ties broken by later elements in the list.
I have a problem on sorting a list, my goal is I'm trying to write a function that will sort a list of files based on their extension. For example given;
["a.c","a.py","b.py","bar.txt","foo.txt","x.c"]
desired output is;
["a.c","x.c","a.py","b.py","bar.txt","foo.txt"]
I fail when I tried to make a key parameter, I can't creating the algorithm. I tried to split() every file first, like;
def sort_file(lst):
second_list = []
for x in lst:
t = x.split(".")
second_list.append(t[1])
second_list.sort()
But I just don't know what to do now, how can I make this sorted second_list as a key parameter then I can sort files based on their extension?
I fail when I tried to make a key parameter
key argument takes a function (callable, rather), that returns the object to compare against when given the list item as input. In your case, the x.split(".")[1] is the object to compare against. Take a look at Python's wiki entry for sorting in this fashion
Something like the below should work for you.
>>> a = ["a.c","a.py","b.py","bar.txt","foo.txt","x.c"]
>>> sorted(a, key=lambda x: x.rsplit(".", 1)[1])
['a.c', 'x.c', 'a.py', 'b.py', 'bar.txt', 'foo.txt']
As #TanveerAlam says, using rsplit(..) is better because you'd want the split to be done from right.
def mkEntry(file1):
for line in file1:
lst = (line.rstrip().split(","))
print("Old", lst)
print(type(lst))
tuple(lst)
print(type(lst)) #still showing type='list'
sorted(lst, key=operator.itemgetter(1, 2))
def main():
openFile = 'yob' + input("Enter the year <Do NOT include 'yob' or .'txt' : ") + '.txt'
file1 = open(openFile)
mkEntry(file1)
main()
TextFile:
Emma,F,20791
Tom,M,1658
Anthony,M,985
Lisa,F,88976
Ben,M,6989
Shelly,F,8975
and I get this output:
IndexError: string index out of range
I am trying to convert the lst to Tuple from List. So I will able to order the F to M and Smallest Number to Largest Numbers. In around line 7, it's still printing type list instead of type tuple. I don't know why it's doing that.
print(type(lst))
tuple(lst)
print(type(lst)) #still showing type='list'
You're not changing what lst refers to. You create a new tuple with tuple(lst) and immediately throw it away because you don't assign it to anything. You can do:
lst = tuple(lst)
Note that this will not fix your program. Notice that your sort operation is happening once per line of your file, which is not what you want. Try collecting each line into one sequence of tuples and then doing the sort.
Firstly, you are not saving the tuple you created anywhere:
tup = tuple(lst)
Secondly, there is no point in making it a tuple before sorting it - in fact, a list could be sorted in place as it's mutable, while a tuple would need another copy (although that's fairly cheap, the items it contains aren't copied).
Thirdly, the IndexError has nothing to do with whether it's a list or tuple, nor whether it is sorted. It most likely comes from the itemgetter, because there's a list item that doesn't have three entries in turn - for instance, the strings "F" or "M".
Fourthly, the sort you're doing, but not saving anywhere, is done on each individual line, not the table of data. Considering this means you're comparing a name, a number, and a gender, I rather doubt it's what you intended.
It's completely unclear why you're trying to convert data types, and the code doesn't match the structure of the data. How about moving back to the overview plan and sorting out what you want done? It could well be something like Python's csv module could help considerably.
Apologies for my title not being the best. Here is what I am trying to accomplish:
I have a list:
list1 = [a0_something, a2_something, a1_something, a4_something, a3_something]
i have another list who entries are tuples including a name such as :
list2 = [(x1,y1,z1,'bob'),(x2,y2,z2,'alex')...]
the 0th name in the second list corresponds to a0_something and the name in the 1st entry of the second list corresponds to a1_something. basically the second list is in the write order but the 1st list isnt.
The program I am working with has a setName function I would like to do this
a0_something.setName(list2[0][4])
and so on with a loop.
So that I can really just say
for i in range(len(list1)):
a(i)_something.setName(list2[i][4])
Is there anyway I can refer to that number in the a#_something so that I can iterate with a loop?
No.
Variable names have no meaning in run-time. (Unless you're doing introspection, which I guarantee you is something you should not be doing.)
Use a proper list such that:
lst = [a0_val, a1_val, a2_val, a3_val, a4_val]
and then address it by lst[0].
Alternatively, if those names have meanings, use a dict where:
dct = {
'a0' : a0_val,
'a1' : a1_val,
# ...
}
and use it with dct['a0'].
The enumerate function lets you get the value and the index of the current item. So, for your example, you could do:
for i, asomething in enumerate(list1):
asomething.setName(list2[i][3])
Since your list2 is length 4, the final element is index 3 (you could also use -1)