Write to file without redundant lines in python - python

I'm writing python script to read line from a input file and write a unique lines(if the same line is not already in output file) to output file. somehow, my scripts always append the first line of input file to output file even if the same line is already in output file. I can't figure out why this happens.
can anyone know why and how do I fix this?
thanks,
import os
input_file= 'input.txt'
output_file = 'output.txt'
fo = open(output_file, 'a+')
flag = False
with open(input_file, 'r') as fi:
for line1 in fi:
print line1
for line2 in fo:
print line2
if line2 == line1:
flag = True
print('Found Match!!')
break
if flag == False:
fo.write(line1)
elif flag == True:
flag == False
fo.seek(0)
fo.close()
fi.close()

When you open a file in append mode, the file object position is at the end of the file. So the first time through, when it reaches for line2 in fo:, there aren't any more lines in fo, so that block is skipped, and flag is still true, so that first line is written to the output file. After that, you do fo.seek(0), so you are checking against the entire file for subsequent lines.

The answer by kmacinnis is right on as to why your code isn't working; you need to use mode 'r+' instead of 'a+', or else put fo.seek(0) at the beginning of the for loop instead of the end.
That said, there's a much better way to do this than reading the entire output file for every line of the input file.
def ensure_file_ends_with_newline(handle):
position = handle.tell()
handle.seek(-1, 2)
handle_end = handle.read(1)
if handle_end != '\n':
handle.write('\n')
handle.seek(position)
input_filepath = 'input.txt'
output_filepath = 'output.txt'
with open(input_file, 'r') as infile, open(output_file, 'r+') as outfile:
ensure_file_ends_with_newline(outfile)
written = set(outfile)
for line in infile:
if line not in written:
outfile.write(line)
written.add(line)

Your flag was never set to False.
flag == True is an equality
flag = True is an assignment.
Try the latter.
import os
input_file= 'input.txt'
output_file = 'output.txt'
fo = open(output_file, 'a+')
flag = False
with open(input_file, 'r') as fi:
for line1 in fi:
#print line1
for line2 in fo:
#print line2
if line2 == line1:
flag = True
print('Found Match!!')
print (line1,line2)
break
if flag == False:
fo.write(line1)
elif flag == True:
flag = False
fo.seek(0)

Related

My file keeps on being closed after .write()

Here is this block of code I'm trying to finish:
elif parameter == 'statistics':
outfile.write(stats(infile))
for line in infile:
outfile.write(line)
So essentially, I am trying to write the statistics of the file into the new file that is being copied. The statistics works and everything as when I open the file, the statistics are written in. However, I noticed because of the two outfile.write it seems to close after the first one, so only the statistics go in and not the rest of the content in the original file.
The error that I am getting is this:
ValueError: I/O operation on closed file.
I am unsure why the file is closing.
EDIT: Here is the whole code, as requested
def copy_file():
infile_name = input("Please enter the name of the file to copy: ")
infile = open(infile_name, 'r', encoding='utf8')
parameter = input("Please enter a parameter(line numbers, Gutenberg trim, statistics, none): ")
outfile_name = input("Please enter the name of the new copy: ")
outfile = open(outfile_name, 'w', encoding='utf8')
counter = 1
if parameter == 'line numbers':
for line in infile:
outfile.write(f' {counter:6}: {line}')
counter += 1
elif parameter == 'Gutenberg trim':
copyStart = False
for line in infile:
#print(line.strip())
if '*** START' in line.strip():
copyStart = True
continue
elif '*** END' in line.strip():
copyStart = False
break
if copyStart == True:
outfile.write(line)
elif parameter == 'statistics':
outfile.write(stats(infile))
for line in infile:
outfile.write(line)
else:
for line in infile:
outfile.write(line)
infile.close()
outfile.close()
copy_file()
EDIT2: So sorry for not including it. Here is the stats function:
def stats(text) -> str:
with text as infile:
totallines = 0
emplines = 0
characters = 0
for line in infile:
totallines += 1
characters += len(line)
if len(line.strip()) == 0:
emplines += 1
lines = totallines - emplines
totalaveChars = characters/totallines
nonempaveChars = characters/lines
result = (f'{totallines:5} lines in list \n'
f'{emplines:5} empty lines in list \n'
f'{totalaveChars:5.1f} average characters per line \n'
f'{nonempaveChars:5.1f} average chars per non-empty line')
return result
print(stats(open('ASH.txt', 'r', encoding='utf8')))
Here is the result from stats:
13052 lines in list
2666 empty lines in list
44.6 average characters per line
56.0 average chars per non-empty line
The issue is in the stats function. The with statement will close the file with the local name text, which is infile in your case!
def stats(text) -> str:
totallines = 0
emplines = 0
characters = 0
for line in text:
totallines += 1
characters += len(line)
if len(line.strip()) == 0:
emplines += 1
lines = totallines - emplines
totalaveChars = characters/totallines
nonempaveChars = characters/lines
result = (f'{totallines:5} lines in list \n'
f'{emplines:5} empty lines in list \n'
f'{totalaveChars:5.1f} average characters per line \n'
f'{nonempaveChars:5.1f} average chars per non-empty line')
return result
In your main program, you passed to the function stats the variable infile, which is a file. You do not need to reopen it with with inside the stats functions. Moreover, with will ensure the closing at the end. Thus in your main loop, the infile is closed after the call on stats.
Try the following;
def copy_file():
infile_name = input("Please enter the name of the file to copy: ")
parameter = input("Please enter a parameter(line numbers, Gutenberg trim, statistics, none): ")
outfile_name = input("Please enter the name of the new copy: ")
counter = 1
with open(infile_name, 'r', encoding='utf8') as infile:
with open(outfile_name, 'w', encoding='utf8') as outfile:
if parameter == 'line numbers':
for line in infile:
outfile.write(f' {counter:6}: {line}')
counter += 1
elif parameter == 'Gutenberg trim':
copyStart = False
for line in infile:
#print(line.strip())
if '*** START' in line.strip():
copyStart = True
continue
elif '*** END' in line.strip():
copyStart = False
break
if copyStart == True:
outfile.write(line)
elif parameter == 'statistics':
outfile.write(stats(infile))
for line in infile:
outfile.write(line)
else:
for line in infile:
outfile.write(line)
copy_file()
Using with open(filename, 'r') as file: it will automatically close the file once the operation has finished, and not before.
elif parameter == 'statistics':
outfile.write(stats(infile))
for line in infile:
outfile.write(line)
... only the statistics go in and not the rest of the content in the
original file ...
My educated guess is that the stats function consumes and possibly
closes the input stream (IS).
If stats is somehow well behaved and limits itself to consuming the
IS, one can rewind it
...
infile.seek(0) # rewind the input stream
for line in infile:
outfile.write(line)
If, on the other hand, stats is a bit disruptive and closes
altogether the IS one can use the .name attribute of the file object
to reopen it, like this
...
for line in open(infile.name):
outfile.write(line)
This second solution works even in the first, milder hypotesis and
works even if the code was passed the infile file object from a
outer call.
Another possibility, if you can access and modify the stats source
code, is to undo the reading performed by the function, memorizing
the current position in the input stream before any read operation
and later rewind the IS to that position
def stats(infile):
...
current_pos = infile.tell()
# do your stuff
...
infile.seek(current_pos)
return workload
For this to work, of course, the file object has not to be closed
before the .seek(), either explicitly (by a .close()) or
implicitly (by falling outside the scope of a with block).
If this is your situation (closed file), please remove either the
explicit infile.close() or the (unnecessary) with statement and
the rewind will be correct.

Function not returning an output file

I have the following python code whose purpose is to remove blank lines from an input text file. It should return an output file with all blank lines removed but it doesn't. What's the bug? Thank you!
import sys
def main():
inputFileName = sys.argv[1]
outputFileName = sys.argv[2]
inputFile = open(inputFileName, "r")
outputFile = open(inputFileName, "w")
for line in inputFile:
if "\n" in line:
removeBlank = line.replace("\n", "")
outputFile.write(removeBlank)
else:
outputFile.write(line)
inputFile.close()
outputFile.close()
main()
You have a lot of problem with your code. Specially the condition you check with empty line. People has rightly pointed out some problems.
Here is the solutions that should work and generate the output file with no empty lines.
import sys
def main():
inputFileName = sys.argv[1]
outputFileName = sys.argv[2]
with open(inputFileName) as inputFile, open(inputFileName, "w") as outputFile:
for line in inputFile.readlines():
if line.strip() != '':
outputFile.write(line)
if __name__ == '__main__':
main()
At present your code appears to truncate its input file immediately after opening it. At best this might give differing results on different platforms. On some platforms the file might be empty. I presume that opening the input file for writing was a typo.
A better way to approach this problem is to use a generator. Also, the correct test for an empty line is line == '\n', not '\n' in line, which will be true for all returned lines except perhaps the last.
def noblanks(file):
for line in file:
if line != '\n':
yield line
You can use this like so:
with open(inputFileName, "r") as inf, open(outputFilename, 'w') as outf:
for line in noblanks(inf):
outf.write(line)
The context managers in the with statement will ensure that your files are properly closed without further action on your part.

Get all the lines below certain words, until there's a space [duplicate]

Lets say I have a Text file with the below content
fdsjhgjhg
fdshkjhk
Start
Good Morning
Hello World
End
dashjkhjk
dsfjkhk
Now I need to write a Python code which will read the text file and copy the contents between Start and end to another file.
I wrote the following code.
inFile = open("data.txt")
outFile = open("result.txt", "w")
buffer = []
keepCurrentSet = True
for line in inFile:
buffer.append(line)
if line.startswith("Start"):
#---- starts a new data set
if keepCurrentSet:
outFile.write("".join(buffer))
#now reset our state
keepCurrentSet = False
buffer = []
elif line.startswith("End"):
keepCurrentSet = True
inFile.close()
outFile.close()
I'm not getting the desired output as expected
I'm just getting Start
What I want to get is all the lines between Start and End.
Excluding Start & End.
Just in case you have multiple "Start"s and "End"s in your text file, this will import all the data together, excluding all the "Start"s and "End"s.
with open('path/to/input') as infile, open('path/to/output', 'w') as outfile:
copy = False
for line in infile:
if line.strip() == "Start":
copy = True
continue
elif line.strip() == "End":
copy = False
continue
elif copy:
outfile.write(line)
If the text files aren't necessarily large, you can get the whole content of the file then use regular expressions:
import re
with open('data.txt') as myfile:
content = myfile.read()
text = re.search(r'Start\n.*?End', content, re.DOTALL).group()
with open("result.txt", "w") as myfile2:
myfile2.write(text)
I'm not a Python expert, but this code should do the job.
inFile = open("data.txt")
outFile = open("result.txt", "w")
keepCurrentSet = False
for line in inFile:
if line.startswith("End"):
keepCurrentSet = False
if keepCurrentSet:
outFile.write(line)
if line.startswith("Start"):
keepCurrentSet = True
inFile.close()
outFile.close()
Using itertools.dropwhile, itertools.takewhile, itertools.islice:
import itertools
with open('data.txt') as f, open('result.txt', 'w') as fout:
it = itertools.dropwhile(lambda line: line.strip() != 'Start', f)
it = itertools.islice(it, 1, None)
it = itertools.takewhile(lambda line: line.strip() != 'End', it)
fout.writelines(it)
UPDATE: As inspectorG4dget commented, above code copies over the first block. To copy multiple blocks, use following:
import itertools
with open('data.txt', 'r') as f, open('result.txt', 'w') as fout:
while True:
it = itertools.dropwhile(lambda line: line.strip() != 'Start', f)
if next(it, None) is None: break
fout.writelines(itertools.takewhile(lambda line: line.strip() != 'End', it))
Move the outFile.write call into the 2nd if:
inFile = open("data.txt")
outFile = open("result.txt", "w")
buffer = []
for line in inFile:
if line.startswith("Start"):
buffer = ['']
elif line.startswith("End"):
outFile.write("".join(buffer))
buffer = []
elif buffer:
buffer.append(line)
inFile.close()
outFile.close()
import re
inFile = open("data.txt")
outFile = open("result.txt", "w")
buffer1 = ""
keepCurrentSet = True
for line in inFile:
buffer1=buffer1+(line)
buffer1=re.findall(r"(?<=Start) (.*?) (?=End)", buffer1)
outFile.write("".join(buffer1))
inFile.close()
outFile.close()
I would handle it like this :
inFile = open("data.txt")
outFile = open("result.txt", "w")
data = inFile.readlines()
outFile.write("".join(data[data.index('Start\n')+1:data.index('End\n')]))
inFile.close()
outFile.close()
if one wants to keep the start and end lines/keywords while extracting the lines between 2 strings.
Please find below the code snippet that I used to extract sql statements from a shell script
def process_lines(in_filename, out_filename, start_kw, end_kw):
try:
inp = open(in_filename, 'r', encoding='utf-8', errors='ignore')
out = open(out_filename, 'w+', encoding='utf-8', errors='ignore')
except FileNotFoundError as err:
print(f"File {in_filename} not found", err)
raise
except OSError as err:
print(f"OS error occurred trying to open {in_filename}", err)
raise
except Exception as err:
print(f"Unexpected error opening {in_filename} is", repr(err))
raise
else:
with inp, out:
copy = False
for line in inp:
# first IF block to handle if the start and end on same line
if line.lstrip().lower().startswith(start_kw) and line.rstrip().endswith(end_kw):
copy = True
if copy: # keep the starts with keyword
out.write(line)
copy = False
continue
elif line.lstrip().lower().startswith(start_kw):
copy = True
if copy: # keep the starts with keyword
out.write(line)
continue
elif line.rstrip().endswith(end_kw):
if copy: # keep the ends with keyword
out.write(line)
copy = False
continue
elif copy:
# write
out.write(line)
if __name__ == '__main__':
infile = "/Users/testuser/Downloads/testdir/BTEQ_TEST.sh"
outfile = f"{infile}.sql"
statement_start_list = ['database', 'create', 'insert', 'delete', 'update', 'merge', 'delete']
statement_end = ";"
process_lines(infile, outfile, tuple(statement_start_list), statement_end)
Files are iterators in Python, so this means you don't need to hold a "flag" variable to tell you what lines to write. You can simply use another loop when you reach the start line, and break it when you reach the end line:
with open("data.txt") as in_file, open("result.text", 'w') as out_file:
for line in in_file:
if line.strip() == "Start":
for line in in_file:
if line.strip() == "End":
break
out_file.write(line)

Extract Values between two strings in a text file using python

Lets say I have a Text file with the below content
fdsjhgjhg
fdshkjhk
Start
Good Morning
Hello World
End
dashjkhjk
dsfjkhk
Now I need to write a Python code which will read the text file and copy the contents between Start and end to another file.
I wrote the following code.
inFile = open("data.txt")
outFile = open("result.txt", "w")
buffer = []
keepCurrentSet = True
for line in inFile:
buffer.append(line)
if line.startswith("Start"):
#---- starts a new data set
if keepCurrentSet:
outFile.write("".join(buffer))
#now reset our state
keepCurrentSet = False
buffer = []
elif line.startswith("End"):
keepCurrentSet = True
inFile.close()
outFile.close()
I'm not getting the desired output as expected
I'm just getting Start
What I want to get is all the lines between Start and End.
Excluding Start & End.
Just in case you have multiple "Start"s and "End"s in your text file, this will import all the data together, excluding all the "Start"s and "End"s.
with open('path/to/input') as infile, open('path/to/output', 'w') as outfile:
copy = False
for line in infile:
if line.strip() == "Start":
copy = True
continue
elif line.strip() == "End":
copy = False
continue
elif copy:
outfile.write(line)
If the text files aren't necessarily large, you can get the whole content of the file then use regular expressions:
import re
with open('data.txt') as myfile:
content = myfile.read()
text = re.search(r'Start\n.*?End', content, re.DOTALL).group()
with open("result.txt", "w") as myfile2:
myfile2.write(text)
I'm not a Python expert, but this code should do the job.
inFile = open("data.txt")
outFile = open("result.txt", "w")
keepCurrentSet = False
for line in inFile:
if line.startswith("End"):
keepCurrentSet = False
if keepCurrentSet:
outFile.write(line)
if line.startswith("Start"):
keepCurrentSet = True
inFile.close()
outFile.close()
Using itertools.dropwhile, itertools.takewhile, itertools.islice:
import itertools
with open('data.txt') as f, open('result.txt', 'w') as fout:
it = itertools.dropwhile(lambda line: line.strip() != 'Start', f)
it = itertools.islice(it, 1, None)
it = itertools.takewhile(lambda line: line.strip() != 'End', it)
fout.writelines(it)
UPDATE: As inspectorG4dget commented, above code copies over the first block. To copy multiple blocks, use following:
import itertools
with open('data.txt', 'r') as f, open('result.txt', 'w') as fout:
while True:
it = itertools.dropwhile(lambda line: line.strip() != 'Start', f)
if next(it, None) is None: break
fout.writelines(itertools.takewhile(lambda line: line.strip() != 'End', it))
Move the outFile.write call into the 2nd if:
inFile = open("data.txt")
outFile = open("result.txt", "w")
buffer = []
for line in inFile:
if line.startswith("Start"):
buffer = ['']
elif line.startswith("End"):
outFile.write("".join(buffer))
buffer = []
elif buffer:
buffer.append(line)
inFile.close()
outFile.close()
import re
inFile = open("data.txt")
outFile = open("result.txt", "w")
buffer1 = ""
keepCurrentSet = True
for line in inFile:
buffer1=buffer1+(line)
buffer1=re.findall(r"(?<=Start) (.*?) (?=End)", buffer1)
outFile.write("".join(buffer1))
inFile.close()
outFile.close()
I would handle it like this :
inFile = open("data.txt")
outFile = open("result.txt", "w")
data = inFile.readlines()
outFile.write("".join(data[data.index('Start\n')+1:data.index('End\n')]))
inFile.close()
outFile.close()
if one wants to keep the start and end lines/keywords while extracting the lines between 2 strings.
Please find below the code snippet that I used to extract sql statements from a shell script
def process_lines(in_filename, out_filename, start_kw, end_kw):
try:
inp = open(in_filename, 'r', encoding='utf-8', errors='ignore')
out = open(out_filename, 'w+', encoding='utf-8', errors='ignore')
except FileNotFoundError as err:
print(f"File {in_filename} not found", err)
raise
except OSError as err:
print(f"OS error occurred trying to open {in_filename}", err)
raise
except Exception as err:
print(f"Unexpected error opening {in_filename} is", repr(err))
raise
else:
with inp, out:
copy = False
for line in inp:
# first IF block to handle if the start and end on same line
if line.lstrip().lower().startswith(start_kw) and line.rstrip().endswith(end_kw):
copy = True
if copy: # keep the starts with keyword
out.write(line)
copy = False
continue
elif line.lstrip().lower().startswith(start_kw):
copy = True
if copy: # keep the starts with keyword
out.write(line)
continue
elif line.rstrip().endswith(end_kw):
if copy: # keep the ends with keyword
out.write(line)
copy = False
continue
elif copy:
# write
out.write(line)
if __name__ == '__main__':
infile = "/Users/testuser/Downloads/testdir/BTEQ_TEST.sh"
outfile = f"{infile}.sql"
statement_start_list = ['database', 'create', 'insert', 'delete', 'update', 'merge', 'delete']
statement_end = ";"
process_lines(infile, outfile, tuple(statement_start_list), statement_end)
Files are iterators in Python, so this means you don't need to hold a "flag" variable to tell you what lines to write. You can simply use another loop when you reach the start line, and break it when you reach the end line:
with open("data.txt") as in_file, open("result.text", 'w') as out_file:
for line in in_file:
if line.strip() == "Start":
for line in in_file:
if line.strip() == "End":
break
out_file.write(line)

Match the last word and delete the entire line

Input.txt File
12626232 : Bookmarks
1321121:
126262
Here 126262: can be anything text or digit, so basically will search for last word is : (colon) and delete the entire line
Output.txt File
12626232 : Bookmarks
My Code:
def function_example():
fn = 'input.txt'
f = open(fn)
output = []
for line in f:
if not ":" in line:
output.append(line)
f.close()
f = open(fn, 'w')
f.writelines(output)
f.close()
Problem: When I match with : it remove the entire line, but I just want to check if it is exist in the end of line and if it is end of the line then only remove the entire line.
Any suggestion will be appreciated. Thanks.
I saw as following but not sure how to use it in here
a = "abc here we go:"
print a[:-1]
I believe with this you should be able to achieve what you want.
with open(fname) as f:
lines = f.readlines()
for line in lines:
if not line.strip().endswith(':'):
print line
Here fname is the variable pointing to the file location.
You were almost there with your function. You were checking if : appears anywhere in the line, when you need to check if the line ends with it:
def function_example():
fn = 'input.txt'
f = open(fn)
output = []
for line in f:
if not line.strip().endswith(":"): # This is what you were missing
output.append(line)
f.close()
f = open(fn, 'w')
f.writelines(output)
f.close()
You could have also done if not line.strip()[:-1] == ':':, but endswith() is better suited for your use case.
Here is a compact way to do what you are doing above:
def function_example(infile, outfile, limiter=':'):
''' Filters all lines in :infile: that end in :limiter:
and writes the remaining lines to :outfile: '''
with open(infile) as in, open(outfile,'w') as out:
for line in in:
if not line.strip().endswith(limiter):
out.write(line)
The with statement creates a context and automatically closes files when the block ends.
To search if the last letter is : Do following
if line.strip().endswith(':'):
...Do Something...
You can use a regular expression
import re
#Something end with ':'
regex = re.compile('.(:+)')
new_lines = []
file_name = "path_to_file"
with open(file_name) as _file:
lines = _file.readlines()
new_lines = [line for line in lines if regex.search(line.strip())]
with open(file_name, "w") as _file:
_file.writelines(new_lines)

Categories

Resources