Python Splitting Text file based on a keyword

Python Splitting Text file based on a keyword - python

I am trying to write a python program that will constantly read a text file line by line and each time it comes across a line with the word 'SPLIT' it will write the contents to a new text file.
Please could someone point me in the right direction of writing a new text file each time the script comes across the word 'split'. I have no problem reading a text file with Python, I'm unsure how to split on the keyword and create an individual text file each time.
THE SCRIPT BELOW WORKS IN 2.7.13
file_counter = 0
done = False
with open('test.txt') as input_file:
# with open("test"+str(file_counter)+".txt", "w") as out_file:
while not done:
for line in input_file:
if "SPLIT" in line:
done = True
file_counter += 1
else:
print(line)
out_file = open("test"+str(file_counter)+".txt", "a")
out_file.write(line)
#out_file.write(line.strip()+"\n")
print file_counter

You need to have two loops. One which iterates the filenames of the output files then another inside to write the input contents to the current active output until "split" is found:
out_n = 0
done = False
with open("test.txt") as in_file:
while not done: #loop over output file names
with open(f"out{out_n}.txt", "w") as out_file: #generate an output file name
while not done: #loop over lines in inuput file and write to output file
try:
line = next(in_file).strip() #strip whitespace for consistency
except StopIteration:
done = True
break
if "SPLIT" in line: #more robust than 'if line == "SPLIT\n":'
break
else:
out_file.write(line + '\n') #must add back in newline because we stripped it out earlier
out_n += 1 #increment output file name integer

for line in text.splitlines():
if " SPLIT " in line:
# write in new file.
pass
To write in new file check here:
https://www.tutorialspoint.com/python/python_files_io.htm
or
https://docs.python.org/3.6/library/functions.html#open

Related

How to remove a new line at the end of a file in python?

I'm creating a program to allow users to remove users which works, however, when it removes a user at the end of the file a new line character is not removed which breaks the program. The following is the a part of the function to remove the user.
with open("users.txt", "r") as input:
with open("temp.txt", "w") as output: # Iterate all lines from file
for line in input:
if not line.strip("\n").startswith(enteredUsername):
# If line doesn't start with the username entered, then write it in temp file.
output.write(line)
os.replace('temp.txt', 'users.txt') # Replace file with original name
This creates a temporary file where anything which doesn't start with a given string is written to the file. the name is then swapped back to "users.txt" I've looked on other threads on stackoverflow as well as other websites and nothing has worked, is there anything I should change about this solution?
EDIT --------------------
I managed to fix this with the following code (and thanks to everyone for your suggestions!):
count = 1 # Keeps count of the number of lines
removed = False # Initially nothing has been removed
with open(r"users.txt", 'r') as fp:
x = len(fp.readlines()) # Finds the number of lines in the file
if login(enteredUsername, enteredPassword) == True: # Checks if the username and password combinination is correct
with open("users.txt", "r") as my_input:
with open("temp.txt", "w") as output: # Iterate all lines from file
for line in my_input:
if not line.strip("\n").startswith(enteredUsername): # If line doesn't start with the username entered, then write it in temp file.
if count == x - 1 and removed == False: # If something has not been removed, get rid of newline character
output.write(line[:-1])
else:
output.write(line)
else:
removed = True # This only becomes true if the previous statement is false, if so, something has been 'removed'
count +=1 # Increments the count for every line
os.replace('temp.txt', 'users.txt') # Replace file with original name

with open("users.txt", "r") as input:
with open("temp.txt", "w") as output: # Iterate all lines from file
for line in input:
if not line.strip("\n").startswith(enteredUsername):
# If line doesn't start with the username entered, then write it in temp file.
# output.write(line) # <-- you are writing the line that still have the new line character
output.write(line.strip("\n")) # try this?
os.replace('temp.txt', 'users.txt') # Replace file with original name
Also as general tip I would recommend not using the term "input" as a variable name since it is reserved in python. Just letting you know as it can potentially cause some whacky errors that can be a pain to debug (speaking from personal experience here!)
================================================================
EDIT:
I realize that doing this will likely not have any new line characters after you write the line, which will have all usernames on the same line. You will need to write a new line character after every name you write down except for the last one, which give you the trailing new line character that is causing you the problem.
with open("users.txt", "r") as my_input:
with open("temp.txt", "w") as output: # Iterate all lines from file
for line in my_input:
if not line.strip("\n").startswith(enteredUsername):
# If line doesn't start with the username entered, then write it in temp file.
output.write(line)
os.replace('temp.txt', 'users.txt') # Replace file with original name
# https://stackoverflow.com/questions/18857352/remove-very-last-character-in-file
# remove last new line character from the file
with open("users.txt", 'rb+') as filehandle:
filehandle.seek(-1, os.SEEK_END)
filehandle.truncate()
This is admittedly a hackey way to go about it-- but it should work! This last section removes the last character of the file, which is a new line character.

You don't need to use a temporary file for this.
def remove_user(filename, enteredUsername):
last = None
with open(filename, 'r+') as users:
lines = users.readlines()
users.seek(0)
for line in lines:
if not line.startswith(enteredUsername):
users.write(line)
last = line
# ensure that the last line is newline terminated
if last and last[-1] != '\n':
users.write('\n')
users.truncate()

How to read a file from one line to the end and write another from that same line to the end?

print("Type: \n 1 - To read and translate from the beginning of the file\n 2 - To read from a specific line of the file")
how_to_read = str(input())
if (how_to_read == "1"):
read_mode = "w"
total_previous_lines = 0 #Read the file from the beginning
elif(how_to_read == "2"):
read_mode = "a"
with open('translated_file.xml') as last_translated_file:
total_previous_lines = sum(1 for line in last_translated_file) - 1 #Number of lines, to which one is subtracted for the last line that is empty and must be replaced from it if it is the case
print(total_previous_lines)
else:
print("You have not chosen any of the valid options")
read_mode = None
if(read_mode):
with open("en-sentiment.xml", "r") as read_file:
#I NEED TO READ "en-sentiment.xml" FROM total_previous_lines + 1 (that is, the next to the last one that already existed, to continue...)
with open("translated_file.xml", read_mode) as write_file:
# I NEED TO WRITE "translated_file.xml" FROM total_previous_lines + 1 (ie the next to the last one, to continue...)
#For each line of the file that it reads, we will write the file with the write function.
for line in read_file:
print(repr(line))
That's my code, and I was having trouble reading the .xml files from that total_previous_lines, since the statement with open() as ..._file: naturally reads from the beginning iterating line by line, but in this case if the file already existed, if with the opening mode a you wanted to write from total_previous_lines you would have the problem that it starts to iterate from the beginning.
And with the opening mode "r" the same thing would happen if you want to read from total_previous_lines with a value other than 0 (that is, the first line)

Ignoring the code in your question...
Let's say that by some means or other you have figured out a line in your input file that you want to start reading from and that you want to copy the remainder of that input file to some other file.
start_line = 20 # for example
with open('input_file.txt') as infile:
with open('output_file.txt', 'w') as outfile:
for line in infile.readlines()[start_line:]:
outfile.write(line)

Read about seek() in python. But better use an xml parser like ElementTree.

Printing to a file via Python

Hopefully this is an easy fix. I'm trying to edit one field of a file we use for import, however when I run the following code it leaves the file blank and 0kb. Could anyone advise what I'm doing wrong?
import re #import regex so we can use the commands
name = raw_input("Enter filename:") #prompt for file name, press enter to just open test.nhi
if len(name) < 1 : name = "test.nhi"
count = 0
fhand = open(name, 'w+')
for line in fhand:
words = line.split(',') #obtain individual words by using split
words[34] = re.sub(r'\D', "", words[34]) #remove non-numeric chars from string using regex
if len(words[34]) < 1 : continue # If the 34th field is blank go to the next line
elif len(words[34]) == 2 : "{0:0>3}".format([words[34]]) #Add leading zeroes depending on the length of the field
elif len(words[34]) == 3 : "{0:0>2}".format([words[34]])
elif len(words[34]) == 4 : "{0:0>1}".format([words[34]])
fhand.write(words) #write the line
fhand.close() # Close the file after the loop ends

I have taken below text in 'a.txt' as input and modified your code. Please check if it's work for you.
#Intial Content of a.txt
This,program,is,Java,program
This,program,is,12Python,programs
Modified code as follow:
import re
#Reading from file and updating values
fhand = open('a.txt', 'r')
tmp_list=[]
for line in fhand:
#Split line using ','
words = line.split(',')
#Remove non-numeric chars from 34th string using regex
words[3] = re.sub(r'\D', "", words[3])
#Update the 3rd string
# If the 3rd field is blank go to the next line
if len(words[3]) < 1 :
#Removed continue it from here we need to reconstruct the original line and write it to file
print "Field empty.Continue..."
elif len(words[3]) >= 1 and len(words[3]) < 5 :
#format won't add leading zeros. zfill(5) will add required number of leading zeros depending on the length of word[3].
words[3]=words[3].zfill(5)
#After updating 3rd value in words list, again creating a line out of it.
tmp_str = ",".join(words)
tmp_list.append(tmp_str)
fhand.close()
#Writing to same file
whand = open("a.txt",'w')
for val in tmp_list:
whand.write(val)
whand.close()
File content after running code
This,program,is,,program
This,program,is,00012,programs

The file mode 'w+' Truncates your file to 0 bytes, so you'll only be able to read lines that you've written.
Look at Confused by python file mode "w+" for more information.
An idea would be to read the whole file first, close it, and re-open it to write files in it.

Not sure which OS you're on but I think reading and writing to the same file has undefined behaviour.
I guess internally the file object holds the position (try fhand.tell() to see where it is). You could probably adjust it back and forth as you went using fhand.seek(last_read_position) but really that's asking for trouble.
Also, I'm not sure how the script would ever end as it would end up reading the stuff it had just written (in a sort of infinite loop).
Best bet is to read the entire file first:
with open(name, 'r') as f:
lines = f.read().splitlines()
with open(name, 'w') as f:
for l in lines:
# ....
f.write(something)

For 'Printing to a file via Python' you can use:
ifile = open("test.txt","r")
print("Some text...", file = ifile)

parse blocks of text from text file using Python

I am trying to parse some text files and need to extract blocks of text. Specifically, the lines that start with "1:" and 19 lines after the text. The "1:" does not start on the same row in each file and there is only one instance of "1:". I would prefer to save the block of text and export it to a separate file. In addition, I need to preserve the formatting of the text in the original file.
Needless to say I am new to Python. I generally work with R but these files are not really compatible with R and I have about 100 to process. Any information would be appreciated.
The code that I have so far is:
tmp = open(files[0],"r")
lines = tmp.readlines()
tmp.close()
num = 0
a=0
for line in lines:
num += 1
if "1:" in line:
a = num
break
a = num is the line number for the block of text I want. I then want to save to another file the next 19 lines of code, but can't figure how how to do this. Any help would be appreciated.

Here is one option. Read all lines from your file. Iterate till you find your line and return next 19 lines. You would need to handle situations where your file doesn't contain additional 19 lines.
fh = open('yourfile.txt', 'r')
all_lines = fh.readlines()
fh.close()
for count, line in enumerate(all_lines):
if "1:" in line:
return all_lines[count+1:count+20]

Could be done in a one-liner...
open(files[0]).read().split('1:', 1)[1].split('\n')[:19]
or more readable
txt = open(files[0]).read() # read the file into a big string
before, after = txt.split('1:', 1) # split the file on the first "1:"
after_lines = after.split('\n') # create lines from the after text
lines_to_save = after_lines[:19] # grab the first 19 lines after "1:"
then join the lines with a newline (and add a newline to the end) before writing it to a new file:
out_text = "1:" # add back "1:"
out_text += "\n".join(lines_to_save) # add all 19 lines with newlines between them
out_text += "\n" # add a newline at the end
open("outputfile.txt", "w").write(out_text)
to comply with best practice for reading and writing files you should also be using the with statement to ensure that the file handles are closed as soon as possible. You can create convenience functions for it:
def read_file(fname):
"Returns contents of file with name `fname`."
with open(fname) as fp:
return fp.read()
def write_file(fname, txt):
"Writes `txt` to a file named `fname`."
with open(fname, 'w') as fp:
fp.write(txt)
then you can replace the first line above with:
txt = read_file(files[0])
and the last line with:
write_file("outputfile.txt", out_text)

I always prefer to read the file into memory first, but sometimes that's not possible. If you want to use iteration then this will work:
def process_file(fname):
with open(fname) as fp:
for line in fp:
if line.startswith('1:'):
break
else:
return # no '1:' in file
yield line # yield line containing '1:'
for i, line in enumerate(fp):
if i >= 19:
break
yield line
if __name__ == "__main__":
with open('ouput.txt', 'w') as fp:
for line in process_file('intxt.txt'):
fp.write(line)
It's using the else: clause on a for-loop which you don't see very often anymore, but was created for just this purpose (the else clause if executed if the for-loop doesn't break).

Avoid writing SAME string to a text file when updating python

I have a text file("Memory.txt") that contains the following string:
111111111
11111111
111111
1111111111
11111111111
111111111111111
1111111111111
I'm pretty new to python and also new here but I wonder if there is a way I can add another string(e.g '111111111111') to this same file (only if the string does not exist in the file).
My code is composed of two sections:
reads text file (e.g 'Memory.txt') and selects one of the string in the file
writes a new string to the same file (if the string does not exist in the file) but I've not been able to achieve this, below is my code for this section:
with open("Memory.txt", "a+") as myfile:
for lines in myfile.read().split():
if 'target_string' == lines:
continue
else:
lines.write('target_string')
This does not return/do anything, please could someone point in the right direction or explain to me what to do.
Thanks

You can just do:
# Open for read+write
with open("Memory.txt", "r+") as myfile:
# A file is an iterable of lines, so this will
# check if any of the lines in myfile equals line+"\n"
if line+"\n" not in myfile:
# Write it; assumes file ends in "\n" already
myfile.write(line+"\n")
myfile.write(line+"\n") can also be written as
# Python 3
print(line, file=myfile)
# Python 2
print >>myfile, line

You need to call "write" on the file object:
with open("Memory.txt", "a+") as myfile:
for lines in myfile.read().split():
if 'target_string' == lines:
continue
else:
myfile.write('target_string')

If I correctly understood what you want:
with open("Memory.txt", "r+") as myfile:
if 'target_string' not in myfile.readlines():
myfile.write('target_string')
Open file
Read all lines
Check if target string in lines
If no - append

I would simply set a bool to True when found it, write in the end if not
with open("Memory.txt", "a+") as myfile:
for lines in myfile.read().split():
if 'target_string' == lines:
fnd = True # you found it
break
if !fnd:
myfile.write('target_string')

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python Splitting Text file based on a keyword - python

for line in text.splitlines(): if " SPLIT " in line: # write in new file. pass To write in new file check here: https://www.tutorialspoint.com/python/python_files_io.htm or https://docs.python.org/3.6/library/functions.html#open

Related

How to remove a new line at the end of a file in python?

How to read a file from one line to the end and write another from that same line to the end?

Printing to a file via Python

parse blocks of text from text file using Python

Avoid writing SAME string to a text file when updating python

Categories

Resources