I am trying to make my program sort the recorded scores that are in a csv file, my process of doing this is going to be reading the csv file into a list, bubblesort the list, then overwrite the csv file with the new list. I have encountered a logic error in my code however. When I sort the list the result is [[], ['190'], ['200'], ['250'], ['350'], ['90']].
If anyone could help it would be much appreciated. Here is my code for my read and my bubble sort.
import csv
def bubbleSort(scores):
for length in range(len(scores)-1,0,-1):
for i in range(length):
if scores[i]>scores[i+1]:
temp = scores[i]
scores[i] = scores[i+1]
scores[i+1] = temp
with open ("rec_Scores.csv", "rb") as csvfile:
r = csv.reader(csvfile)
scores = list(r)
bubbleSort(scores)
print(scores)
This is my first time implementing a sort in python so any help would be great, thanks.
You are comparing strings instead of integers. Use int(scores[i]) to convert the string to an integer.
Upon further inspection looks like you are storing your numbers in a list of lists. In that case, to access the first number we must do scores[0][0] the second number would be scores[1][0] and so on... the first index is increasing by increments of one so we can use int(scores[i][0].
The second numbers stays at 0 because it looks like you are only storing a single int in your inner list.
It appears you are using strings in your scores list. If you want this sort to work correctly you need to convert your values to integers:
int(str_num)
Where str_num is your string value.
This sort should work just fine after you do this conversion.
Also, you can use the built-in timsort to sort your numbers by calling
scores.sort()
Then you don't have to worry about implementing your own algorithm.
you could try this:
import csv
with open ("rec_Scores.csv", "rb") as csvfile:
r = csv.reader(csvfile)
scores = [int(item) for item in list(r)]
print(sorted(scores))
scores= list(r);
intscores = [0]*scores.len()
for str in scores:
intscores.append(int(str))
intscores.sort()
This should do it.
Related
How to display the top 2 rows of highest difference from a text file in python
For example here is a text file:
Mazda 64333 53333
Merce 74321 54322
BMW 52211 31432
The expected output would be
Merce 74321 54322
BMW 52211 31432
I tried multiple codes but only managed to display the actual difference and not the whole row.
would this work for you?
from operator import itemgetter
with open("x.txt", "r+") as data:
data = [i.split() for i in data.readlines()]
top = sorted([[row[0], int(row[1])-int(row[2])]for row in data],key=itemgetter(1), reverse=True)
print(top)
print(top[:2])
[['BMW', 20779], ['Merce', 19999], ['Mazda', 11000]]
[['BMW', 20779], ['Merce', 19999]]
So, at a glance, this might seem slightly complicated but it's really not!
let's break down each step of the following program
from operator import itemgetter
with open("x.txt", "r+") as data:
data = [i.split() for i in data.readlines()]
top = sorted([[row[0], int(row[1])-int(row[2])]for row in data],key=itemgetter(1), reverse=True)
now let's first note that operator is a built-in package, it's not an external import such as libraries like requests, and itemgetter is a pretty straightforward function.
with open("x.txt", "r+") as data should be pretty straight forward as well... all this does is open a text file with reading permissions and store that object in data.
we then use our first list comprehension which might look new to you...
data = [i.split() for i in data.readlines()]
all this is doing is going through each line for example car 123 122 and splitting it by spaces into a list like so ["car", "123", "122"].
Now if you look closely at the product of that, there's something wrong. The last 2 elements (which need to be integers to find the difference) are strings! hence, why we are going to have to use the next list comprehension to change that.
top = sorted([[row[0], int(row[1])-int(row[2])]for row in data],key=itemgetter(1), reverse=True)
This is a bit more complicated... but all it's really doing is sorting a simple list comprehension.
It goes through each value in data and gets the differences! Let's see how it does that.
As you know, our data looks something like [["car", "123", "122"], ["car1", "1234", "1223"]] right now. So, we would be able to access the integer values of ["car", "123", "122"] with [1] and [2], with this knowledge we can loop through the data, and get the difference of those when they are casted to integers. E.g int(row[1])-int(row[2]) of ["car", "123", "122"] would return 1 (the difference).
With this knowledge, we can create a new list with the comprehension that contains: the car's name row[0] and the difference int(row[1])-int(row[2]) represented by [row[0], int(row[1])-int(row[2])] in the list comp. while using row as each iterable in data we can easily form this! Heres that list comprehension by itself:
[[row[0], int(row[1])-int(row[2])] for row in data]
Finally, we have arrived at the last piece of this little program... the sorted function! sorted() will return a sorted list based on the key you give it (and you can use reverse=True to have the greatest values first). it's really not that hard to understand when it's abbreviated as follows:
sorted([the list comprehension],key=itemgetter(1), reverse=True)
So while you might know that yes, it's sorting that list comprehension we made and listing the biggest values first, you might not know how its sorting this! To know how it's being sorted we need to look at the key.
itemgetter() is a function of the operator class. All you really need to know about it is that it's getting the 1st index of the lists given and therefore sorting by that. If you can recall each element of our data looks like ["car", difference] (difference is just a placeholder for what actually is the integer difference). Since we want the greatest differences then it makes sense to sort by them right?
using itemgetter(1) it will sort by the 1st index; the difference! and that pretty much sums it up :)
we store all of that to the variable top and then print the first two elements with print(top[:2])
I hope this helped!
Create a dict that contains the distances of each row with the car brand as key.
Then you can sort the dict.items() using the values and return the top 2
I am very new to Python and this problem hopefully has a simple solution that I have not understood yet. In this problem I am not allowed to use numpy or pandas.
The situation is that I have imported a list of lists from a csv-file with the code below:
import csv
my_list=[]
with open("my_file.csv", "r") as file:
csv_reader = csv.reader(file, delimiter = ";")
for rad in csv_reader:
my_list.append(rad)
This results in a list of lists, where each element is a string. What I would like to do is to convert a certain set of the elements in each list to a double to do calculations with. My guess is that I need to loop over each list and float() each single element one by one, since float() does not work on lists. However, I cannot come up with a solution that works.
I have seen the solution
[float(i) for i in my_list]
for single lists, but do not know how to apply this to a list of lists. Especially since I do not want to convert every item in each list.
Very grateful for any help.
You can do it with a nested list comprehension
[[float(i) if i.isnumeric() else i for i in inner_list] for inner_list in my_list]
I would like to filter records from a big file (a list of lists, 10M+ lines) based on given ids.
selected_id = list() # 70k+ elements
for line in in_fp: # input file: 10M+ lines
id = line.split()[0] # id (str type), such as '10000872820081804'
if id in selected_id:
out_fp.write(line)
The above code is time consuming. A idea comes to my mind. Store selected_id as dict instead of list.
Any better solutions?
You've got a few issues, though only the first is really nasty:
(By far the biggest cost in all likelihood) Checking for membership in a list is O(n); for a 70K element list, that's a lot of work. Make it a set/frozenset and lookup is typically O(1), saving thousands of comparisons. If the types are unhashable, you can pre-sort the selected_list and use the bisect module to do lookups in O(log n) time, which would still get a multiple order of magnitude speedup for such a large list.
If your lines are large, with several runs of whitespace, splitting at all points wastes time; you can specify maxsplit to only split enough to get the ID
If the IDs are always integer values it may be worth the time to make selected_id store int instead of str and convert on read so the lookup comparisons run a little faster (this would require testing). This probably won't make a major difference, so I'll omit it from the example.
Combining all suggestions:
selected_id = frozenset(... Your original list of 70k+ str elements ...)
for line in in_fp: # input file: 10M+ lines
id, _ = line.split(None, 1) # id (str type), such as '10000872820081804'
if id in selected_id:
out_fp.write(line)
You could even convert the for loop to a single call with a generator expression (though it gets a little overly compact) which pushes more work to the C layer in CPython, reducing Python byte code execution overhead:
out_fp.writelines(x for x in in_fp if x.split(None, 1)[0] in selected_id)
First off, in order to get the first column from your lines you can read your file using csv module with a proper delimiter them use zip() function (in python 3 and in pyhton 2 itertools.izip()) and next() function in order to get the first column then pass the result to a set() function in order to preserve the unique values.
import csv
with open('file_name') as f:
spam_reader = csv.reader(f, delimiter=' ')
unique_ids = set(next(zip(*spam_reader)))
If you want to preserve the order you can use collections.OrderedDict():
import csv
from collections import OrderedDict
with open('file_name') as f:
spam_reader = csv.reader(f, delimiter=' ')
unique_ids = OrderedDict.fromkeys(next(zip(*spam_reader)))
I'm trying to learn Python and am working on making an external merge sort using an input file with ints. I'm using heapq.merge, and my code almost works, but it seems to be sorting my lines as strings instead of ints. If I try to convert to ints, writelines won't accept the data. Can anyone help me find an alternative? Additionally, am I correct in thinking this will allow me to sort a file bigger than memory (given adequate disk space)
import itertools
from itertools import islice
import tempfile
import heapq
#converts heapq.merge to ints
#def merge(*temp_files):
# return heapq.merge(*[itertools.imap(int, s) for s in temp_files])
with open("path\to\input", "r") as f:
temp_file = tempfile.TemporaryFile()
temp_files = []
elements = []
while True:
elements = list(islice(f, 1000))
if not elements:
break
elements.sort(key=int)
temp_files.append(elements)
temp_file.writelines(elements)
temp_file.flush()
temp_file.seek(0)
with open("path\to\output", "w") as output_file:
output_file.writelines(heapq.merge(*temp_files))
Your elements are read by default as strings, you have to do something like:
elements = list(islice(f, 1000))
elements = [int(elem) for elem in elements]
so that they would be interpreted as integers instead.
That would also mean that you need to convert them back to strings when writing, e.g.:
temp_file.writelines([str(elem) for elem in elements])
Apart from that, you would need to convert your elements again to int for the final merging. In your case, you probably want to uncomment your merge method (and then convert the result back to strings again, same way as above).
Your code doesn't make much sense to me (temp_files.append(elements)? Merging inside the loop?), but here's a way to merge files sorting numerically:
import heapq
files = open('a.txt'), open('b.txt')
with open('merged.txt', 'w') as out:
out.writelines(map('{}\n'.format,
heapq.merge(*(map(int, f)
for f in files))))
First the map(int, ...) turns each file's lines into ints. Then those get merged with heapq.merge. Then map('{}\n'.format turns each of the integers back into a string, with newline. Then writelines writes those lines. In other words, you were already close, just had to convert the ints back to strings before writing them.
A different way to write it (might be clearer for some):
import heapq
files = open('a.txt'), open('b.txt')
with open('merged.txt', 'w') as out:
int_streams = (map(int, f) for f in files)
int_stream = heapq.merge(*int_streams)
line_stream = map('{}\n'.format, int_stream)
out.writelines(line_stream)
And in any case, do use itertools.imap if you're using Python 2 as otherwise it'll read the whole files into memory at once. In Python 3, you can just use the normal map.
And yes, if you do it right, this will allow you to sort gigantic files with very little memory.
You are doing Kway merge within the loop which will add a lots of runtimeComplexity . Better Store the file handles into a spearate list and perform a Kway merge
You also don't have to remove and add new line back ,just sort it based on number.
sorted(temp_files,key=lambda no:int(no.strip()))
Rest of things are fine.
https://github.com/melvilgit/external-Merge-Sort/blob/master/README.md
def mkEntry(file1):
for line in file1:
lst = (line.rstrip().split(","))
print("Old", lst)
print(type(lst))
tuple(lst)
print(type(lst)) #still showing type='list'
sorted(lst, key=operator.itemgetter(1, 2))
def main():
openFile = 'yob' + input("Enter the year <Do NOT include 'yob' or .'txt' : ") + '.txt'
file1 = open(openFile)
mkEntry(file1)
main()
TextFile:
Emma,F,20791
Tom,M,1658
Anthony,M,985
Lisa,F,88976
Ben,M,6989
Shelly,F,8975
and I get this output:
IndexError: string index out of range
I am trying to convert the lst to Tuple from List. So I will able to order the F to M and Smallest Number to Largest Numbers. In around line 7, it's still printing type list instead of type tuple. I don't know why it's doing that.
print(type(lst))
tuple(lst)
print(type(lst)) #still showing type='list'
You're not changing what lst refers to. You create a new tuple with tuple(lst) and immediately throw it away because you don't assign it to anything. You can do:
lst = tuple(lst)
Note that this will not fix your program. Notice that your sort operation is happening once per line of your file, which is not what you want. Try collecting each line into one sequence of tuples and then doing the sort.
Firstly, you are not saving the tuple you created anywhere:
tup = tuple(lst)
Secondly, there is no point in making it a tuple before sorting it - in fact, a list could be sorted in place as it's mutable, while a tuple would need another copy (although that's fairly cheap, the items it contains aren't copied).
Thirdly, the IndexError has nothing to do with whether it's a list or tuple, nor whether it is sorted. It most likely comes from the itemgetter, because there's a list item that doesn't have three entries in turn - for instance, the strings "F" or "M".
Fourthly, the sort you're doing, but not saving anywhere, is done on each individual line, not the table of data. Considering this means you're comparing a name, a number, and a gender, I rather doubt it's what you intended.
It's completely unclear why you're trying to convert data types, and the code doesn't match the structure of the data. How about moving back to the overview plan and sorting out what you want done? It could well be something like Python's csv module could help considerably.