python split and reverse text file

python split and reverse text file - python

I have a text file which stores data like name : score e.g.:
bob : 10
fred : 3
george : 5
However, I want to make it so it says
10 : bob
3 : fred
5 : george
What would the code be to flip it like that?
Would I need to separate them first by removing the colon as I have managed this through this code?
file = open("Class 3.txt", "r")
t4 = (file.read())
test =''.join(t4.split(':')[0:10])
print (test)
How would I finish it and make it say the reverse?

This code handles fractional scores (e.g. 9.5), and doesn't care whether there are extra spaces around the : delimiter. It should be much easier to maintain than your current code.
Class 3.txt:
bob : 10
fred : 3
george : 5
Code:
class_num = input('Which class (1, 2, or 3)? ')
score_sort = input('Sort by name or score? ').lower().startswith('s')
with open("Class " + class_num + ".txt", "r") as f:
scores = {name.strip():float(score) for
name,score in (line.strip().split(':') for line in f)}
if score_sort:
for name in sorted(scores, key=scores.get, reverse=True):
print(scores.get(name), ':', name)
else:
for name in sorted(scores):
print(name, ':', scores.get(name))
Input:
3
scores
Output:
10.0 : bob
5.0 : george
3.0 : fred
Input:
3
name
Output:
bob : 10.0
fred : 3.0
george : 5.0

First, this is going to be a lot harder to do whole-file-at-once than line-at-a-time.
But, either way, you obviously can't just split(':') and then ''.join(…). All that's going to do is replace colons with nothing. You obviously need ':'.join(…) to put the colons back in.
And meanwhile, you have to swap the values around on each side of each colon.
So, here's a function that takes just one line, and swaps the sides:
def swap_sides(line):
left, right = line.split(':')
return ':'.join((right, left))
But you'll notice there's a few problems here. The left has a space before the colon; the right has a space after the colon, and a newline at the end. How are you going to deal with that?
The simplest way is to just strip out all the whitespace on both sides, then add back in the whitespace you want:
def swap_sides(line):
left, right = line.split(':')
return ':'.join((right.strip() + ' ', ' ' + left.strip())) + '\n'
But a smarter idea is to treat the space around the colon as part of the delimiter. (The newline, you'll still need to handle manually.)
def swap_sides(line):
left, right = line.strip().split(' : ')
return ' : '.join((right.strip(), left.strip())) + '\n'
But if you think about it, do you really need to add the newline back on? If you're just going to pass it to print, the answer is obviously no. So:
def swap_sides(line):
left, right = line.strip().split(' : ')
return ' : '.join((right.strip(), left.strip()))
Anyway, once you're happy with this function, you just write a loop that calls it once for each line. For example:
with open("Class 3.txt", "r") as file:
for line in file:
swapped_line = swap_sides(line)
print(swapped_line)

Let's learn how to reverse a single line:
line = `bob : 10`
line.partition(' : ') # ('10', ' : ', 'bob')
''.join(reversed(line.partition(' : ')) # 'bob : 10'
Now, combine with reading lines from a file:
for line in open('Class 3.txt').read().splitlines():
print ''.join(reversed(line.partition(' : '))
Update
I am re-writing the code to read the file, line by line:
with open('Class 3.txt') as input_file:
for line in input_file:
line = line.strip()
print ''.join(reversed(line.partition(' : ')))

Related

how to replace (or delete) a part of string from txt file in python

i am very new in python (and programming in general) and here is my issue. i would like to replace (or delete) a part of a string from a txt file which contains hundreds or thousands of lines. each line starts with the very same string which i want to delete.
i have not found a method to delete it so i tried a replace it with empty string but for some reason it doesn't work.
here is what i have written:
file = "C:/Users/experimental/Desktop/testfile siera.txt"
siera_log = open(file)
text_to_replace = "Chart: Bar Backtest: NQU8-CME [CB] 1 Min #1 | Study: free dll = 0 |"
for each_line in siera_log:
new_line = each_line.replace("text_to_replace", " ")
print(new_line)
when i print it to check if it was done, i can see that the lines are as they were before. no change was made.
can anyone help me to find out why?

each line starts with the very same string which i want to delete.
The problem is you're passing a string "text_to_replace" rather than the variable text_to_replace.
But, for this specific problem, you could just remove the first n characters from each line:
text_to_replace = "Chart: Bar Backtest: NQU8-CME [CB] 1 Min #1 | Study: free dll = 0 |"
n = len(text_to_replace)
for each_line in siera_log:
new_line = each_line[n:]
print(new_line)

If you quote a variable it becomes a string literal and won't be evaluated as a variable.
Change your line for replacement to:
new_line = each_line.replace(text_to_replace, " ")

Python File Reading & Writing

So I need to write a program that reads a text file, and copies its contents to another file. I then have to add a column at the end of the text file, and populate that column with an int that is calculated using the function calc_bill. I can get it to copy the contents of the original file to the new one, but I cannot seem to get my program to read in the ints necessary for calc_bill to run.
Any help would be greatly appreciated.
Here are the first 3 lines of the text file I am reading from:
CustomerID Title FirstName MiddleName LastName Customer Type
1 Mr. Orlando N. Gee Residential 297780 302555
2 Mr. Keith NULL Harris Residential 274964 278126
It is copying the file exactly as it is supposed to to the new file. What is not working is writing the bill_amount (calc_bill)/ billVal(main) to the new file in a new column. Here is the expected output to the new file:
CustomerID Title FirstName MiddleName LastName Customer Type Company Name Start Reading End Reading BillVal
1 Mr. Orlando N. Gee Residential 297780 302555 some number
2 Mr. Keith NULL Harris Residential 274964 278126 some number
And here is my code:
def main():
file_in = open("water_supplies.txt", "r")
file_in.readline()
file_out = input("Please enter a file name for the output:")
output_file = open(file_out, 'w')
lines = file_in.readlines()
for line in lines:
lines = [line.split('\t')]
#output_file.write(str(lines)+ "\n")
billVal = 0
c_type = line[5]
start = int(line[7])
end = int(line[8])
billVal = calc_bill(c_type, start, end)
output_file.write(str(lines)+ "\t" + str(billVal) + "\n")
def calc_bill(customer_type, start_reading, end_reading):
price_per_gallon = 0
if customer_type == "Residential":
price_per_gallon = .012
elif customer_type == "Commercial":
price_per_gallon = .011
elif customer_type == "Industrial":
price_per_gallon = .01
if start_reading >= end_reading:
print("Error: please try again")
else:
reading = end_reading - start_reading
bill_amount = reading * price_per_gallon
return bill_amount
main()

There are the issues mentioned above, but here is a small change to your main() method that works correctly.
def main():
file_in = open("water_supplies.txt", "r")
# skip the headers in the input file, and save for output
headers = file_in.readline()
# changed to raw_input to not require quotes
file_out = raw_input("Please enter a file name for the output: ")
output_file = open(file_out, 'w')
# write the headers back into output file
output_file.write(headers)
lines = file_in.readlines()
for line in lines:
# renamed variable here to split
split = line.split('\t')
bill_val = 0
c_type = split[5]
start = int(split[6])
end = int(split[7])
bill_val = calc_bill(c_type, start, end)
# line is already a string, don't need to cast it
# added rstrip() to remove trailing newline
output_file.write(line.rstrip() + "\t" + str(bill_val) + "\n")
Note that the line variable in your loop includes the trailing newline, so you will need to strip that off as well if you're going to write it to the output file as-is. Your start and end indices were off by 1 as well, so I changed to split[6] and split[7].
It is a good idea to not require the user to include the quotes for the filename, so keep that in mind as well. An easy way is to just use raw_input instead of input.
Sample input file (from OP):
CustomerID Title FirstName MiddleName LastName Customer Type
1 Mr. Orlando N. Gee Residential 297780 302555
2 Mr. Keith NULL Harris Residential 274964 278126
$ python test.py
Please enter a file name for the output:test.out
Output (test.out):
1 Mr. Orlando N. Gee Residential 297780 302555 57.3
2 Mr. Keith NULL Harris Residential 274964 278126 37.944

There are a couple things. The inconsistent spacing in your column names makes counting the actual columns a bit confusing, but I believe there are 9 column names there. However, each of your rows of data have only 8 elements, so it looks like you've got an extra column name (maybe "CompanyName"). So get rid of that, or fix the data.
Then your "start" and "end" variables are pointing to indexes 7 and 8, respectively. However, since there are only 8 elements in the row, I think the indexes should be 6 and 7.
Another problem could be that inside your for-loop through "lines", you set "lines" to the elements in that line. I would suggest renaming the second "lines" variable inside the for-loop to something else, like "elements".
Aside from that, I'd just caution you about naming consistency. Some of your column names are camel-case and others have spaces. Some of your variables are separated by underscores and others are camel-case.
Hopefully that helps. Let me know if you have any other questions.

You have two errors in handling your variables, both in the same line:
lines = [line.split()]
You put this into your lines variable, which is the entire file contents. You just lost the rest of your input data.
You made a new list-of-list from the return of split.
Try this line:
line = line.split()
I got reasonable output with that change, once I make a couple of assumptions about your placement of tabs.
Also, consider not overwriting a variable with a different data semantic; it confuses the usage. For instance:
for record in lines:
line = record.split()

Read only one line every time, with or without an "\n" at the end

I have a file which is filled like this:
Samsung CLP 680/ CLX6260 + CLT-C506S/ELS + CLT-M506S/ELS + CLT-Y506S/ELS + 39.50
Xerox Phaser 6000/6010/6015 + 106R01627 + 106R01628 + 106R01629 + 8.43
Xerox DocuPrint 6110/6110mfp + 106R01206 + 106R01204 + 106R01205 + 7.60
Xerox Phaser 6121/6121D + 106R01466 + 106R01467 + 106R01468 + 18.20
When I read it with:
for line in excelRead:
title=line.split("+")
title=[lines.strip()for lines in title]
sometimes there is an "\n" at the end of the line, and sometimes there is not, if line ends with \n splitting gives me 5 elements, if not 9 and etc., until it founds and "\n" as I guess
So, the question is: How do I read only one line in file each time, and obtain 5 elements every time, with or without an "\n" at the end? I can't check all all file whether there is, or not an "\n" at the end
Thanks

You might consider using the csv module to parse this, and placing into a dict by model:
import csv
data={}
with open('/tmp/excel.csv') as f:
for line in csv.reader(f, delimiter='+', skipinitialspace=True):
data[line[0].strip()]=[e.strip() for e in line[1:]]
print data
# {'Samsung CLP 680/ CLX6260': ['CLT-C506S/ELS', 'CLT-M506S/ELS', 'CLT-Y506S/ELS', '39.50'],
'Xerox Phaser 6121/6121D': ['106R01466', '106R01467', '106R01468', '18.20'],
'Xerox DocuPrint 6110/6110mfp': ['106R01206', '106R01204', '106R01205', '7.60'],
'Xerox Phaser 6000/6010/6015': ['106R01627', '106R01628', '106R01629', '8.43']}

for line in excelRead:
title = [x.strip() for x in line.rstrip('\n').split('+')]
It's better to avoid making one variable (title) mean two different things. Rather than give it a different name in your second line, I just removed the line entirely and put the split inside the list comprehension.
Instead of feeding line into split, first I rstrip the \n (removes that character from the end)

When \n is missing, this will split title[4] to give two titles:
import re
data = []
with open('aa.txt') as excelRead:
for line in excelRead:
title=line.split("+")
title=[lines.strip()for lines in title]
while len(title) > 5:
one = re.sub('(\d+\.\d+)', '', title[4])
five = title[4].replace(one, '')
title1 = title[:4] + [five]
title = [one] + title[5:]
data.append(title1)
data.append(title)
for item in data:
print(item)
You could easily make data a dictionary instead of a list.

Return the average mark for all student in that Section

I know it was asked already but the answers the super unclear
The first requirement is to open a file (sadly I have no idea how to do that)
The second requirement is a section of code that does the following:
Each line represents a single student and consists of a student number, a name, a section code and a midterm grade, all separated by whitespace
So I don't think i can target that element due to it being separate by whitespace?
Here is an excerpt of the file, showing line structure
987654322 Xu Carolyn L0101 19.5
233432555 Jones Billy Andrew L5101 16.0
555432345 Patel Amrit L0101 13.5
888332441 Fletcher Bobby L0201 18
777998713 Van Ryan Sarah Jane L5101 20
877633234 Zhang Peter L0102 9.5
543444555 Martin Joseph L0101 15
876543222 Abdolhosseini Mohammad Mazen L0102 18.5
I was provided the following hints:
Notice that the number of names per student varies.
Use rstrip() to get rid of extraneous whitespace at the end of the lines.
I don't understand the second hint.
This is what I have so far:
counter = 0
elements = -1
for sets in the_file
elements = elements + 1
if elements = 3
I know it has something to do with readlines() and the targeting the section code.

marks = [float(line.strip().split()[-1]) for line in open('path/to/input/file')]
average = sum(marks)/len(marks)
Hope this helps

Open and writing to files
strip method
Something like this?
data = {}
with open(filename) as f:#open a file
for line in f.readlines():#proceed through file lines
#next row is to split data using spaces and them skip empty using strip
stData = [x.strip() for x in line.split() if x.strip()]
#assign to variables
studentN, studentName, sectionCode, midtermGrade = stData
if sectionCode not in data:
data[sectionCode] = []
#building dict, key is a section code, value is a tuple with student info
data[sectionCode].append([studentN, studentName, float(midtermGrade)]
#make calculations
for k,v in data.iteritems():#iteritems returns you (key, value) pair on each iteration
print 'Section:' + k + ' Grade:' + str(sum(x[2] for x in v['grade']))

more or less:
infile = open('grade_file.txt', 'r')
score = 0
n = 0
for line in infile.readlines():
score += float(line.rstrip().split()[-1])
n += 1
avg = score / n

Why are there extra blank lines in my python program output?

I'm not particularly experienced with python, so may be doing something silly below. I have the following program:
import os
import re
import linecache
LINENUMBER = 2
angles_file = open("d:/UserData/Robin Wilson/AlteredData/ncaveo/16-June/scan1_high/000/angles.txt")
lines = angles_file.readlines()
for line in lines:
splitted_line = line.split(";")
DN = float(linecache.getline(splitted_line[0], LINENUMBER))
Zenith = splitted_line[2]
output_file = open("d:/UserData/Robin Wilson/AlteredData/ncaveo/16-June/scan1_high/000/DNandZenith.txt", "a")
output_file.write("0\t" + str(DN) + "\t" + Zenith + "\n")
#print >> output_file, str(DN) + "\t" + Zenith
#print DN, Zenith
output_file.close()
When I look at the output to the file I get the following:
0 105.5 0.0
0 104.125 18.0
0 104.0 36.0
0 104.625 54.0
0 104.25 72.0
0 104.0 90.0
0 104.75 108.0
0 104.125 126.0
0 104.875 144.0
0 104.375 162.0
0 104.125 180.0
Which is the right numbers, it just has blank lines between each line. I've tried and tried to remove them, but I can't seem to. What am I doing wrong?
Robin

For a GENERAL solution, remove the trailing newline from your INPUT:
splitted_line = line.rstrip("\n").split(";")
Removing the extraneous newline from your output "works" in this case but it's a kludge.
ALSO: (1) it's not a good idea to open your output file in the middle of a loop; do it once, otherwise you are just wasting resources. With a long enough loop, you will run out of file handles and crash (2) It's not a good idea to hard-wire file names like that, especially hidden in the middle of your script; try to make your scripts reusable.

Change this:
output_file.write("0\t" + str(DN) + "\t" + Zenith + "\n")
to this:
output_file.write("0\t" + str(DN) + "\t" + Zenith)
The Zenith string already contains the trailing \n from the original file when you read it in.

Alternative solution (handy if you are processing lines from file) is to strip the whitespace:
Zenith = Zenith.strip();

EDIT: See comments for details, but there's definitely a better way. [:-1] isn't the best choice, no matter how cool it looks. Use line.rstrip('\n') instead.
The problem is that, unlike file_text.split('\n'), file.readlines() does not remove the \n from the end of each line of input. My default pattern for parsing lines of text goes like this:
with open(filename) as f:
for line in f.readlines():
parse_line(line[:-1]) # funny face trims the '\n'

If you want to make sure there's no whitespace on any of your tokens (not just the first and last), try this:
splitted_line = map (str.strip, line.split (';'))

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

python split and reverse text file - python

Related

how to replace (or delete) a part of string from txt file in python

Python File Reading & Writing

Read only one line every time, with or without an "\n" at the end

Return the average mark for all student in that Section

Why are there extra blank lines in my python program output?

Categories

Resources