Simple parsing and sorting data from file

Simple parsing and sorting data from file - python

Sorry if this has already been answered before; the searches I have done have not been helpful.
I have a file that stores data as such:
name,number
(Although perhaps not relevant to the question, I will have to add entries to this file. I know how to do this.)
My question is for the pythonic(?) way of analyzing the data and sorting it in ascending order. So if the file was:
alex,30
bob,20
and I have to add the entry
carol, 25
The file should be rewritten as
bob,20
carol,25
alex,30
My first attempt was to store the entire file as a string (by read()) and then split by lines to get a list of strings, procedurally split those strings by a comma, and then create a new list of scores then sort that, but this doesn't seem right and fails because I don't have a way to go "back" once I have the order of scores.
I am unable to use libraries for this program.
Edit:
My first attempt I did not test because all it manages to do is sort a list of the scores; I don't know of a way to get the "entries" back.
file = open("scores.txt" , "r")
data = file.read()
list_data = data.split()
data.append([name,score])
for i in range(len(list_data)):
list_scores = list_scores.append(list_data[i][1])
list_scores = sorted(list_scores)
As you can see, this gives me an ascending list of scores, but I do not know where to go from here in order to sort the list of name, score entries.

You will just have to write the sorted entries back to some file, using some basic string formatting:
with open('scores.txt') as f_in, open('file_out.txt', 'w') as f_out:
entries = [(x, int(y)) for x, y in (line.strip().split(',') for line in f_in)]
entries.append(('carol', 25))
entries.sort(key=lambda e: e[1])
for x, y in entries:
f_out.write('{},{}\n'.format(x, y))

I'm going to assume you're capable of putting your data into a .csv file in the following format:
Name,Number
John,20
Jane,25
Then you can use csv.DictReader to read this into a dictionary with something like as shown in the listed example:
with(open('name_age.csv', 'w') as csvfile:
reader = csv.DictReader(csvfile)
and write to it using
with(open('name_age.csv') as csvfile:
writer = csv.DictWriter(csvfile)
writer.writerow({'Name':'Carol','Number':25})
You can then sort it using python's built-in operator as shown here

this a function that will take a filename and sort it for you
def sort_file(filename):
f = open(filename, 'r')
text = f.read()
f.close()
lines = [i.split(',') for i in text.splitlines()]
lines.sort(key=lambda x: x[1])
lines = [', '.join(i) for i in lines]
text = '\n'.join(lines)
f = open(filename, 'w')
f.write(text)
f.close()

Related

How to sort the values (from smallest to larger) of a column in an ascii file using python?

I have an ASCII file with the following columns :
ID, val1, val2, val3
where ID is a row_number but not sorted. I want to write a new ascii file with the same columns with sorted ID (from smaller to larger).
How I could do that in python?
In fact, this file has been produced by the concatenation of 2 ascii files using the following code:
import os.path
maindir1="/home/d01/"
maindir2="/home/d02/"
outdir="/home/final/"
pols=[ "F1","F2","F3" ]
months=["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"]
for ipol in pols:
for imonth in months:
for kk in range(1, 7):
template_args = {"ipol": ipol, "imonth": imonth, "kk": kk}
filename = "{ipol}_{imonth}_0{kk}_L1.txt".format(ipol=ipol, imonth=imonth, kk=kk)
out_name = os.path.join(outdir, filename)
in_names = [os.path.join(maindir1, filename), os.path.join(maindir2, filename)]
with open(out_name, "w") as out_file:
for in_name in in_names:
with open(in_name, "r") as in_file:
out_file.write(in_file.read())
How could I define to the above code to write the final file in a sorted way (based on the first column) ?

Assuming Comma Separated Values
I think you're talking about a Comma Separated Values (CSV) file. The character encoding is probably ASCII. If this is true, you'll have an input like this:
id,val1,val2,val3
3,a,b,c
1,a,b,c
2,a,b,c
Python has a good standard library for this: csv.
import csv
with open("in.csv") as f:
reader = csv.reader(f)
We import the csv library first, then open the file using a context processor. Basically, it's a nice way to open the file, do stuff (in the with block) and then close it.
The csv.reader method takes the file pointer f as an argument. This reader can be iterated and represents the contents of your file. If you cast it to a list, you get a list of lists. The first item in the list of lists is the header, which you want to save, and the rest is the contents:
contents = list(reader)
header = contents[0]
rows = contents[1:]
You then want to sort the rows. But sorting a list of lists might not do what you expect. You need to write a function that helps you find the key to use to perform the sorting:
lambda line: line[0]
This means for every line (which we expect to be a list), the key is equal to the first member of the list. If you prefer not to use lambdas, you can also define a function:
def get_key(line):
return line[0]
get_key is identical to the lambda.
Combine this all together to get:
new_file = sorted(rows, key=lambda line: line[0])
If you didn't use the lambda, that's:
new_file = sorted(rows, key=get_key)
To write it to a file, you can use the csv library again. Remember to first write the header then the rest of the contents:
with open("out.csv", "w") as f:
writer = csv.writer(f)
writer.writerow(header)
writer.writerows(new_file)
All together, the code looks like this:
import csv
with open("in.txt") as f:
reader = csv.reader(f)
contents = list(reader)
header = contents[0]
rows = contents[1:]
new_file = sorted(rows, key=lambda line: line[0])
with open("out.csv", "w") as f:
writer = csv.writer(f)
writer.writerow(header)
writer.writerows(new_file)
Assuming Custom
If the file is custom and definitely has the spaces in the header like you described (almost like a CSV) or you don't want to use the csv library, you can extract rows like this:
contents = [row.replace(" ", "").split(",") for row in f.readlines()]
If, for instance, it's space-delimited instead of comma-delimited, you would use this:
contents = [row.split() for row in f.readlines()]
You can write rows like this:
with open("out.csv", "w") as f:
f.write(", ".join(header))
for row in new_file:
f.write(", ".join(row))
In ensemble:
with open("in.txt") as f:
contents = [row.replace(" ", "").split(",") for row in f.readlines()]
header = contents[0]
rows = contents[1:]
new_file = sorted(rows, key=lambda line: line[0])
with open("out.csv", "w") as f:
f.write(", ".join(header))
for row in new_file:
f.write(", ".join(row))
Hope that helps!
EDIT: This would perform a lexicographical sort on the first column, which is probably not what you want. If you can guarantee that all of the first column (aside from the header) are integers, you can just cast them from a str:
lambda line: line[0]
...becomes:
lambda line: int(line[0])
...with full code:
import csv
with open("in.txt") as f:
reader = csv.reader(f)
contents = list(reader)
header = contents[0]
rows = contents[1:]
new_file = sorted(rows, key=lambda line: int(line[0]))
with open("out.csv", "w") as f:
writer = csv.writer(f)
writer.writerow(header)
writer.writerows(new_file)

So, you need to sort the data in csv format you have in ascending order on the basis of Id.
You can use this function to do it
def Sort(sub_li):
sub_li.sort(key = lambda x: x[0])
return sub_li
x[0] to sort according to Id means first column or you can change according to your use case.
I took the input as `
a = [["1a", 122323,1000,0],
["6a", 12323213,24,2],
["3a", 1233,1,3]]
So, using the above function I got the output as
[['1a', 122323, 1000, 0],
['3a', 1233, 1, 3],
['6a', 12323213, 24, 2]]
I hope this will help.

How to dump pickle file into new lines?

Here's my code:
f = open("cities.txt", 'wb')
pickle.dump(city_list, f)
f.close()
I know normally to print a list vertically, into new lines, you do this inside a print statement: print(*city_list, sep='\n'). I want to know if there's a way to do this when creating the pickle file, so that when you open it, you see a vertical list, without having to do anything else. For example, when I open the file:
fh = open("cities.txt", 'rb')
x = pickle.load(fh)
print(x)
I want the output to be a vertical list without me having to add a sep='\n' to the print statement.

Once you have loaded your pickled data, it has been converted to a regular Python list already. What you are asking is then: How can I print the items of a list, one item per line?
The answer is simply to do this:
for item in x:
print(item)
If instead you want the output file to be more easily readable by a human, you should encode your data in a format other than what Python's pickling module offers.
Using CSV:
import csv
city_list = [
('Montreal', 'Canada'),
('Belmopan', 'Belize'),
('Monaco', 'Monaco'),
]
with open('cities.txt', 'w') as file:
writer = csv.writer(file)
for city, country in city_list:
writer.writerow(city, country)
This will result in cities.txt containing the following:
Montreal,Canada
Belmopan,Belize
Monaco,Monaco

appending a single string to each element of a list in python

I'm trying to read a text file containing a list of user IDs and convert those IDs into email addresses by appending the #wherever.com ending to the IDs. Then I want to write those email addresses to a new file separated by commas.
textFile = open(“userID.txt”, “r”)
identList = textFile.read().split(“, “)
print identList
textFile.close()
emailString = “#wherever.com, ”
newList = [x + emailString for x in identList]
writeFile = open(“userEmail.txt”, “w”)
writeFile.writelines(newList)
writeFile.close()
I'm using python 3.x for Mac. This isn't working at all. I'm not sure if it is reading the initial file at all. It is certainly not writing to the new file. Can someone suggest where the program is failing to work?

Something like the following should work:
with open('userID.txt', 'r') as f_input, open('emails.txt', 'w') as f_output:
emails = ["{}#wherever.com".format(line.strip()) for line in f_input]
f_output.write(", ".join(emails))
So if you had a userID.txt file containing the following names, with one name per line:
fred
wilma
You would get a one line output file as follows:
fred#wherever.com, wilma#wherever.com

You could do it like this, also using context managers for reading and writing, because then you don't need to worry about closing the file:
identList = []
with open('userID.txt', 'r') as f:
identList = f.readlines().rstrip().split(',')
outString = ','.join([ '{0}#wherever.com'.format(x) for x in identList ])
with open('userEmail.txt', 'w') as f:
f.write(outString)
The conversion to string was done with join, which joins in this case the list elements formed in the comprehension with commas between them.

Sorting a file in ascending order

I was given a file with one name per line in random order myInput01.txt and I need to order it in ascending order and output the ordered names one name per line to a file named myOutput01.txt.
myhandle = open('myInput01.txt', 'r')
aLine = myhandle.readlines()
sorted(aLine)
aLine = myOutput01.txt
print myOutput01.txt

For future visitors, the easiest and most concise way of doing this in Python (assuming a sort isn't going to blow your system memory) is:
with open('myInput01.txt') as fin, open('myOutput01.txt', 'w') as fout:
fout.writelines(sorted(fin))

So, this part is ok:
myhandle = open('myInput01.txt', 'r')
aLine = myhandle.readlines()
You open a file (get a file handler in myhandle) and read its lines into aLine.
Now, there's a problem with:
sorted(aLine)
The sorted function doesn't do anything to the aLine argument. It returns a sorted new list. So it's either you use aLine.sort() to sort in-place or assign the output of the sorted function to another variable:
sorted_lines = sorted(aLine)
Take a look to this sorting tutorial.
Also, these two lines are very problematic:
aLine = myOutput01.txt
print myOutput01.txt
You're overwriting your aLine variable with something called myOutput01.txt, which is unknown to the script (what is it? where is it defined?). You need to proceed in a similar way as to read a file. You need to open a handler and write stuff to the file using that handler as a reference.
You need:
mywritehandle = open('myOutputO1.txt', 'w')
mywritehandle.writelines(sorted_lines)
mywritehandle.close()
Or, to avoid having to call close() explicitly:
with open('myOutputO1.txt', 'w') as mywritehandle:
mywritehandle.writelines(sorted_lines)
You should familiarize yourself with file objects and be aware that myOutput01.txt is very different to "myOutput01.txt".

outputFile = open('myOutput01.txt','w')
inputFile = open('myInput01.txt','r')
content = inputFile.readlines()
for name in sorted(content):
outputFile.write(name + '\n')
inputFile.close()
outputFile.close()

Accessing items based off list of indices in iterables

I have about 40 million lines of text to parse through and I want to treat each line as a split string and then ask for multiple slices (or subscripts, whatever they are called) using a list of numbers I generate in a method.
# ...
other_file = open('output.txt','w')
list = [1, 4, 5, 7, ...]
for line in open(input_file):
other_file.write(line.split(',')[i for i in list])
the subscript can't take this generator I have shown, but I want to ask the split line for multiple entries in it without having to iterate through the list in every line.
I apologize, I know this is a simple answer but I just can't think of it. It's so late!

CSV module can help you
import csv
reader = csv.reader(open(input_file, 'r'))
writer = csv.writer(open(output_file, 'w'))
fields = (1,4,5,7,...)
for row in reader:
writer.writerow([row[i] for i in fields])
For further improvements, open files with context managers

Don't use list as a variable name - remember there is a builtin called list
other_file = open('output.txt','w')
lst = [1,4,5,7,...]
for line in open(input_file):
fields = line.split(',')
other_file.write(",".join(fields[i] for i in lst) + "\n")
For further improvement use context managers to open/close the files for you

from operator import itemgetter
from csv import reader, writer
fields = 1,4,5,7
row_filter = itemgetter(*fields)
with open('inp.txt', 'r') as inp:
with open('out.txt', 'w') as out:
writer(out).writerows(map(row_filter, reader(inp)))

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Simple parsing and sorting data from file - python

Related

How to sort the values (from smallest to larger) of a column in an ascii file using python?

How to dump pickle file into new lines?

appending a single string to each element of a list in python

Sorting a file in ascending order

Accessing items based off list of indices in iterables

Categories

Resources