I have the following python code whose purpose is to remove blank lines from an input text file. It should return an output file with all blank lines removed but it doesn't. What's the bug? Thank you!
import sys
def main():
inputFileName = sys.argv[1]
outputFileName = sys.argv[2]
inputFile = open(inputFileName, "r")
outputFile = open(inputFileName, "w")
for line in inputFile:
if "\n" in line:
removeBlank = line.replace("\n", "")
outputFile.write(removeBlank)
else:
outputFile.write(line)
inputFile.close()
outputFile.close()
main()
You have a lot of problem with your code. Specially the condition you check with empty line. People has rightly pointed out some problems.
Here is the solutions that should work and generate the output file with no empty lines.
import sys
def main():
inputFileName = sys.argv[1]
outputFileName = sys.argv[2]
with open(inputFileName) as inputFile, open(inputFileName, "w") as outputFile:
for line in inputFile.readlines():
if line.strip() != '':
outputFile.write(line)
if __name__ == '__main__':
main()
At present your code appears to truncate its input file immediately after opening it. At best this might give differing results on different platforms. On some platforms the file might be empty. I presume that opening the input file for writing was a typo.
A better way to approach this problem is to use a generator. Also, the correct test for an empty line is line == '\n', not '\n' in line, which will be true for all returned lines except perhaps the last.
def noblanks(file):
for line in file:
if line != '\n':
yield line
You can use this like so:
with open(inputFileName, "r") as inf, open(outputFilename, 'w') as outf:
for line in noblanks(inf):
outf.write(line)
The context managers in the with statement will ensure that your files are properly closed without further action on your part.
Related
I am working on NLP project and have extracted the text from pdf using PyPDF2. Further, I removed the blank lines. Now, my output is being shown on the console but I want to populate the text file with the same data which is stored in my variable (file).
Below is the code which is removing the blank lines from a text file.
for line in open('resume1.txt'):
line = line.rstrip()
if line != '':
file=line
print(file)
Output on Console:
Eclipse,
Visual Studio 2012,
Arduino IDE,
Java
,
HTML,
CSS
2013
Excel
.
Now, I want the same data in my (resume1.txt) text file. I have used three methods but all these methods print a single dot in my resume1.txt file. If I see at the end of the text file then there is a dot which is being printed.
Method 1:
with open("resume1.txt", "w") as out_file:
out_file.write(file)
Method 2:
print(file, file=open("resume1.txt", 'w'))
Method 3:
pathlib.Path('resume1.txt').write_text(file)
Could you please be kind to assist me in populating the text file. Thank you for your cooperation.
First of all, note that you are writing to the same file losing the old data, I don't know if you want to do that. Other than that, every time you write using those methods, you are overwriting the data you previously wrote to the output file. So, if you want to use these methods, you must write just 1 time (write all the data).
SOLUTIONS
Using method 1:
to_file = []
for line in open('resume1.txt'):
line = line.rstrip()
if line != '':
file = line
print(file)
to_file.append(file)
to_save = '\n'.join(to_file)
with open("resume1.txt", "w") as out_file:
out_file.write(to_save)
Using method 2:
to_file = []
for line in open('resume1.txt'):
line = line.rstrip()
if line != '':
file = line
print(file)
to_file.append(file)
to_save = '\n'.join(to_file)
print(to_save, file=open("resume1.txt", 'w'))
Using method 3:
import pathlib
to_file = []
for line in open('resume1.txt'):
line = line.rstrip()
if line != '':
file = line
print(file)
to_file.append(file)
to_save = '\n'.join(to_file)
pathlib.Path('resume1.txt').write_text(to_save)
In these 3 methods, I have used to_save = '\n'.join(to_file) because I'm assuming you want to separate each line of other with an EOL, but if I'm wrong, you can just use ''.join(to_file) if you want not space, or ' '.join(to_file) if you want all the lines in a single one.
Other method
You can do this by using other file, let's say 'output.txt'.
out_file = open('output.txt', 'w')
for line in open('resume1.txt'):
line = line.rstrip()
if line != '':
file = line
print(file)
out_file.write(file)
out_file.write('\n') # EOL
out_file.close()
Also, you can do this (I prefer this):
with open('output.txt', 'w') as out_file:
for line in open('resume1.txt'):
line = line.rstrip()
if line != '':
file = line
print(file)
out_file.write(file)
out_file.write('\n') # EOL
First post on stack, so excuse the format
new_line = ""
for line in open('resume1.txt', "r"):
for char in line:
if char != " ":
new_line += char
print(new_line)
with open('resume1.txt', "w") as f:
f.write(new_line)
I'm trying to make a code to rewrite a specific line from a .txt file.
I can get to write in the line i want, but i can't erase the previous text on the line.
Here is my code:
(i'm trying a couple of things)
def writeline(file,n_line, text):
f=open(file,'r+')
count=0
for line in f:
count=count+1
if count==n_line :
f.write(line.replace(str(line),text))
#f.write('\r'+text)
You can use this code to make a test file for testing:
with open('writetest.txt','w') as f:
f.write('1 \n2 \n3 \n4 \n5')
writeline('writetest.txt',4,'This is the fourth line')
Edit: For Some reason, if i use 'if count==5:' the code compiles ok (even if it doen't erase the previous text), but if i do 'if count==n_line: ', the file ends up with a lot of garbage.
The Answers work, but i would like to know what are the problems with my code, and why i can't read and write. Thanks!
You are reading from the file and also writing to it. Don't do that. Instead, you should write to a NamedTemporaryFile and then rename it over the original file after you finish writing and close it.
Or if the size of the file is guaranteed to be small, you can use readlines() to read all of it, then close the file, modify the line you want, and write it back out:
def editline(file,n_line,text):
with open(file) as infile:
lines = infile.readlines()
lines[n_line] = text+' \n'
with open(file, 'w') as outfile:
outfile.writelines(lines)
Use temporary file:
import os
import shutil
def writeline(filename, n_line, text):
tmp_filename = filename + ".tmp"
count = 0
with open(tmp_filename, 'wt') as tmp:
with open(filename, 'rt') as src:
for line in src:
count += 1
if count == n_line:
line = line.replace(str(line), text + '\n')
tmp.write(line)
shutil.copy(tmp_filename, filename)
os.remove(tmp_filename)
def create_test(fname):
with open(fname,'w') as f:
f.write('1 \n2 \n3 \n4 \n5')
if __name__ == "__main__":
create_test('writetest.txt')
writeline('writetest.txt', 4, 'This is the fourth line')
I'm writing python script to read line from a input file and write a unique lines(if the same line is not already in output file) to output file. somehow, my scripts always append the first line of input file to output file even if the same line is already in output file. I can't figure out why this happens.
can anyone know why and how do I fix this?
thanks,
import os
input_file= 'input.txt'
output_file = 'output.txt'
fo = open(output_file, 'a+')
flag = False
with open(input_file, 'r') as fi:
for line1 in fi:
print line1
for line2 in fo:
print line2
if line2 == line1:
flag = True
print('Found Match!!')
break
if flag == False:
fo.write(line1)
elif flag == True:
flag == False
fo.seek(0)
fo.close()
fi.close()
When you open a file in append mode, the file object position is at the end of the file. So the first time through, when it reaches for line2 in fo:, there aren't any more lines in fo, so that block is skipped, and flag is still true, so that first line is written to the output file. After that, you do fo.seek(0), so you are checking against the entire file for subsequent lines.
The answer by kmacinnis is right on as to why your code isn't working; you need to use mode 'r+' instead of 'a+', or else put fo.seek(0) at the beginning of the for loop instead of the end.
That said, there's a much better way to do this than reading the entire output file for every line of the input file.
def ensure_file_ends_with_newline(handle):
position = handle.tell()
handle.seek(-1, 2)
handle_end = handle.read(1)
if handle_end != '\n':
handle.write('\n')
handle.seek(position)
input_filepath = 'input.txt'
output_filepath = 'output.txt'
with open(input_file, 'r') as infile, open(output_file, 'r+') as outfile:
ensure_file_ends_with_newline(outfile)
written = set(outfile)
for line in infile:
if line not in written:
outfile.write(line)
written.add(line)
Your flag was never set to False.
flag == True is an equality
flag = True is an assignment.
Try the latter.
import os
input_file= 'input.txt'
output_file = 'output.txt'
fo = open(output_file, 'a+')
flag = False
with open(input_file, 'r') as fi:
for line1 in fi:
#print line1
for line2 in fo:
#print line2
if line2 == line1:
flag = True
print('Found Match!!')
print (line1,line2)
break
if flag == False:
fo.write(line1)
elif flag == True:
flag = False
fo.seek(0)
For opening and reading 1 file even after adding the close argument it is giving the error. The code written is as below:
infilename = "Rate.txt"
infile = open(infilename, "r").readlines()
firstLine = infile.pop(0) #removes the header(first line)
infile = infile[:-1]#removes the last line
for line in infile:
a = line.split()
CheckNumeric = a[4]
CheckNumeric1 = a[5]
strfield = a[3]
infile.close()
By doing infile = open(infilename, "r").readlines() you have actually assigned infile to be a list, rather than an open file object. The garbage collecter should sweep up your open file and close it for you, but a better way to handle this would be to use a with block:
infilename = "Rate.txt"
with open(infilename, "r") as infile:
line_list = infile.readlines()
firstLine = line_list.pop(0) #removes the header(first line)
line_list = line_list[:-1]#removes the last line
for line in line_list:
a = line.split()
CheckNumeric = a[4]
CheckNumeric1 = a[5]
strfield = a[3]
In the code above, everything that is indented within the with block will execute while the file is open. Once the block ends the file is automatically closed.
Value stored in the infile variable is not a file object, it is a list. Because your called readlines method.
Doing
infile = open(infilename, "r").readlines()
you have read the lines of the file and assign the list to infile. But you haven't assigne the file to a variable.
If you want to explicitly close the file:
someFile = open(infilename, "r")
infile = someFile.readlines()
...
someFile.close()
or use with which close the file automatically:
with open(infilename, "r") as someFile:
infile = someFile.readlines()
....
print "the file here is closed"
infile = open(infilename, "r")
# this resp. infile is a file object (where you can call the function close())
infile = open(infilename, "r").readlines()
# this resp. infile is a list object, because readlines() returns a list
That's all.
As #Ffisegydd mentioned above, make use of with statement introduced in in Python 2.5. It will automatically close the file for you after the nested code block. And yet, in case an exception also happened the file will be closed before the exception is caught, pretty handy.
For more info, checkout this out on the context manager:
https://docs.python.org/2/library/contextlib.html
I actually make use of the context manager to achieve somewhat some level of maintainability.
I would use this more memory efficient code:
infilename = "Rate.txt"
with open (infilename) as f:
next(f) # Skip header
dat = None
for line in f:
if dat: # Skip last line
_, _, _, strfield, CheckNumeric, CheckNumeric1 = dat.split()
dat = line
I am trying to replace text in a text file by reading each line, testing it, then writing if it needs to be updated. I DO NOT want to save as a new file, as my script already backs up the files first and operates on the backups.
Here is what I have so far... I get fpath from os.walk() and I guarantee that the pathmatch var returns correctly:
fpath = os.path.join(thisdir, filename)
with open(fpath, 'r+') as f:
for line in f.readlines():
if '<a href="' in line:
for test in filelist:
pathmatch = file_match(line, test)
if pathmatch is not None:
repstring = filelist[test] + pathmatch
print 'old line:', line
line = line.replace(test, repstring)
print 'new line:', line
f.write(line)
But what ends up happening is that I only get a few lines (updated correctly, mind you, but repeated from earlier in the file) corrected. I think this is a scoping issue, afaict.
*Also: I would like to know how to only replace the text upon the first instance of the match, for ex., I don't want to match the display text, only the underlying href.
First, you want to write the line whether it matches the pattern or not. Otherwise, you're writing out only the matched lines.
Second, between reading the lines and writing the results, you'll need to either truncate the file (can f.seek(0) then f.truncate()), or close the original and reopen. Picking the former, I'd end up with something like:
fpath = os.path.join(thisdir, filename)
with open(fpath, 'r+') as f:
lines = f.readlines()
f.seek(0)
f.truncate()
for line in lines:
if '<a href="' in line:
for test in filelist:
pathmatch = file_match(line, test)
if pathmatch is not None:
repstring = filelist[test] + pathmatch
line = line.replace(test, repstring)
f.write(line)
Open the file for read and copy all of the lines into memory. Close the file.
Apply your transformations on the lines in memory.
Open the file for write and write out all the lines of text in memory.
with open(filename, "r") as f:
lines = (line.rstrip() for line in f)
altered_lines = [some_func(line) if regex.match(line) else line for line in lines]
with open(filename, "w") as f:
f.write('\n'.join(altered_lines) + '\n')
A (relatively) safe way to replace a line in a file.
#!/usr/bin/python
# defensive programming style
# function to replace a line in a file
# and not destroy data in case of error
def replace_line(filepath, oldline, newline ):
"""
replace a line in a temporary file,
then copy it over into the
original file if everything goes well
"""
# quick parameter checks
assert os.exists(filepath) # !
assert ( oldline and str(oldline) ) # is not empty and is a string
assert ( newline and str(newline) )
replaced = False
written = False
try:
with open(filepath, 'r+') as f: # open for read/write -- alias to f
lines = f.readlines() # get all lines in file
if oldline not in lines:
pass # line not found in file, do nothing
else:
tmpfile = NamedTemporaryFile(delete=True) # temp file opened for writing
for line in lines: # process each line
if line == oldline: # find the line we want
tmpfile.write(newline) # replace it
replaced = True
else:
tmpfile.write(oldline) # write old line unchanged
if replaced: # overwrite the original file
f.seek(0) # beginning of file
f.truncate() # empties out original file
for tmplines in tmpfile:
f.write(tmplines) # writes each line to original file
written = True
tmpfile.close() # tmpfile auto deleted
f.close() # we opened it , we close it
except IOError, ioe: # if something bad happened.
printf ("ERROR" , ioe)
f.close()
return False
return replaced and written # replacement happened with no errors = True
(note: this replaces entire lines only , and all of the lines that match in the file)