Python search for text over multiple lines

Python search for text over multiple lines - python

import os
searchquery = 'word'
with open('Y:/Documents/result.txt', 'w') as f:
for filename in os.listdir('Y:/Documents/scripts/script files'):
with open('Y:/Documents/scripts/script files/' + filename) as currentFile:
for line in currentFile:
if searchquery in line:
start = line.find(searchquery)
end = line.find("R")
result = line[start:end]
print result
f.write(result + ' ' +filename[:-4] + '\n')
Now this works well to search for "word" and prints everything after word up until an "R" providing that it is on the same line. However if the "R" is on the line it won't print the stuff before it.
eg:
this should not be printed!
this should also not be printed! "word" = 12345
6789 "R" After this R should not be printed either!
In the case above the 6789 on line 3 will not be printed with my current. However i want it to be. How do i make python keep going over multiple lines until it reaches the "R".
Thanks for any help!

It is normal that it does not print the content on the next line because you are searching for the word on one line. A better solution would be as follows.
import os
searchquery = 'word'
with open('Y:/Documents/result.txt', 'w') as f:
for filename in os.listdir('Y:/Documents/scripts/script files'):
with open('Y:/Documents/scripts/script files/' + filename) as currentFile:
content = ''.join([line for line in currentFile])
start = content.find(searchquery)
end = content.find("R")
result = content[start:end].replace("\n", "")
print result
f.write(result + ' ' +filename[:-4] + '\n')
Please be advised, this will work only for a single occurence. You will need to break it up further to print multiple occurences.

Related

How to delete the last line of my output file?

Been trying to write my PYTHON code but it will always output the file with a blank line at the end. Is there a way to mod my code so it doesn't print out the last blank line.
def write_concordance(self, filename):
""" Write the concordance entries to the output file(filename)
See sample output files for format."""
try:
file_out = open(filename, "w")
except FileNotFoundError:
raise FileNotFoundError("File Not Found")
word_lst = self.concordance_table.get_all_keys() #gets a list of all the words
word_lst.sort() #orders it
for i in word_lst:
ln_num = self.concordance_table.get_value(i) #line number list
ln_str = "" #string that will be written to file
for c in ln_num:
ln_str += " " + str(c) #loads line numbers as a string
file_out.write(i + ":" + ln_str + "\n")
file_out.close()
Output_file
Line 13 in this picture is what I need gone

Put in a check so that the new line is not added for the last element of the list:
def write_concordance(self, filename):
""" Write the concordance entries to the output file(filename)
See sample output files for format."""
try:
file_out = open(filename, "w")
except FileNotFoundError:
raise FileNotFoundError("File Not Found")
word_lst = self.concordance_table.get_all_keys() #gets a list of all the words
word_lst.sort() #orders it
for i in word_lst:
ln_num = self.concordance_table.get_value(i) #line number list
ln_str = "" #string that will be written to file
for c in ln_num:
ln_str += " " + str(c) #loads line numbers as a string
file_out.write(i + ":" + ln_str)
if i != word_lst[-1]:
file_out.write("\n")
file_out.close()

The issue is here:
file_out.write(i + ":" + ln_str + "\n")
The \n adds a new line.
The way to fix this is to rewrite it slightly:
ln_strs = []
for i in word_lst:
ln_num = self.concordance_table.get_value(i) #line number list
ln_str = " ".join(ln_num) #string that will be written to file
ln_strs.append(f"{i} : {ln_str}")
file_out.write('\n'.join(ln_strs))
Just btw, you should actually not use file_out = open() and file_out.close() but with open() as file_out:, this way you always close the file and an exception won't leave the file hanging

Python to read txt files and delete lines that contains same part

I have a tons (1000+) of txt files that looks like this
TextTextText('aaa/bbb`ccc' , "ddd.eee");
TextTextText('yyy/iii`ooo' , "rrr.ttt");
TextTextText('aaa/fff`ggg' , "hhh.jjj");
What I want to achieve is to delete all lines that contains same "aaa" part, and leave only one line with it (remove all duplicates).
my code so far:
import os
from collections import Counter
sourcepath = os.listdir('Process_Directory3/')
for file in sourcepath:
inputfile = 'Process_Directory3/' + file
outputfile = "Output_Directory/" + file
lines_seen = set()
outfile = open(outputfile, "w")
for line in open(inputfile, "r"):
print(line)
cut_line = line.split("'")
new_line = cut_line[1]
cut_line1 = new_line.split("/")
new_line1 = cut_line1[0]
if new_line1 not in lines_seen:
outfile.write(new_line1)
lines_seen.add(new_line1)
outfile.close()
My code is not working at all, I dont get any results
Console Report:
Line13 in <module>
new_line = cut_line[1]
IndexError: list index out of range
Sorry for my bad writing, it's my first post so far :D
Best Regards
Update:
I added
startPattern = "TextTextText"
if(startPattern in line):
to make sure i target only lines that begins with "TextTextText", but for some reason I am getting .txt in destination folder that contains only 1 line of content "aaa".
In the end of the day, here is a fully working code:
import os
sourcepath = os.listdir('Process_Directory3/')
for file in sourcepath:
inputfile = 'Process_Directory3/' + file
outputfile = "Output_Directory/" + file
lines_seen = set()
outfile = open(outputfile, "w")
for line in open(inputfile, "r"):
if line.startswith("TextTextText"):
try:
cut_line = line.split("'")
new_line = cut_line[1]
cut_line1 = new_line.split("/")
new_line1 = cut_line1[0]
if new_line1 not in lines_seen:
outfile.write(line)
lines_seen.add(new_line1)
except:
pass
else:
outfile.write(line)
outfile.close()
Thanks for a great help guys!

Use a try-except block in inner for loop. This will prevent your program from being interrupted if any error is encountered due to any line which doesn't contain ' or /.
Update:
I've tried the code given below and it worked fine for me.
sourcepath = os.listdir('Process_Directory3/')
for file in sourcepath:
inputfile = 'Process_Directory3/' + file
outputfile = "Output_Directory/" + file
lines_seen = set()
outfile = open(outputfile, "w")
for line in open(inputfile, "r"):
try:
cut_line = line.split("'")
new_line = cut_line[1]
cut_line1 = new_line.split("/")
new_line1 = cut_line1[0]
if new_line1 not in lines_seen:
outfile.write(line) # Replaced new_line1 with line
lines_seen.add(new_line1)
except:
pass
outfile.close() # This line was having bad indentation
Input file:
TextTextText('aaa/bbb`ccc' , "ddd.eee");
TextTextText('yyy/iii`ooo' , "rrr.ttt");
TextTextText('aaa/fff`ggg' , "hhh.jjj");
TextTextText('WWW/fff`ggg' , "hhh.jjj");
TextTextText('yyy/iii`ooo' , "rrr.ttt");
Output File:
TextTextText('aaa/bbb`ccc' , "ddd.eee");
TextTextText('yyy/iii`ooo' , "rrr.ttt");
TextTextText('WWW/fff`ggg' , "hhh.jjj");

It looks like you encountered line inside your file which has not ', in this case line.split("'") produce list with single element, for example
line = "blah blah blah"
cut_line = line.split("'")
print(cut_line) # ['blah blah blah']
so trying to get cut_line[1] result in error as there is only cut_line[0]. As this piece of your code is inside loop you might avoid that by skipping to next iteration using continue word, if cut_line has not enough elements, just replace:
cut_line = line.split("'")
new_line = cut_line[1]
by:
cut_line = line.split("'")
if len(cut_line) < 2:
continue
new_line = cut_line[1]
This will result in ignoring all lines without '.

I think using a regular expression would make it easier. I have made a simplified working code using re.
import re
lines = [
"",
"dfdsa sadfsadf sa",
"TextTextText('aaa/bbb`ccc' ,dsafdsafsA ",
"TextTextText('yyy/iii`ooo' ,SDFSDFSDFSA ",
"TextTextText('aaa/fff`ggg' ,SDFSADFSDF ",
]
lines_seen = set()
out_lines = []
for line in lines:
# SEARCH FOR 'xxx/ TEXT in the line -----------------------------------
re_result = re.findall(r"'[a-z]+\/", line)
if re_result:
print(f're_result {re_result[0]}')
if re_result[0] not in lines_seen:
print(f'>>> newly found {re_result[0]}')
lines_seen.add(re_result[0])
out_lines.append(line)
print('------------')
for line in out_lines:
print(line)
Result
re_result 'aaa/
>>> newly found 'aaa/
re_result 'yyy/
>>> newly found 'yyy/
re_result 'aaa/
------------
TextTextText('aaa/bbb`ccc' ,dsafdsafsA
TextTextText('yyy/iii`ooo' ,SDFSDFSDFSA
You can experiment with regular expressions here regex101.com.
Try r"'.+/" any character between ' and /, or r"'[a-zA-Z]+/" lower and uppercase letters between ' and /.

Find a dot in a text file and add a newline to the file in Python?

I read from a file, if it finds a ".", it should add a newline "\n" to the text and write it back to the file. I tried this code but still have the problem.
inp = open('rawCorpus.txt', 'r')
out = open("testFile.text", "w")
for line in iter(inp):
l = line.split()
if l.endswith(".")
out.write("\n")
s = '\n'.join(l)
print(s)
out.write(str(s))
inp.close()
out.close()

Try This ( Normal way ):
with open("rawCorpus.txt", 'r') as read_file:
raw_data = read_file.readlines()
my_save_data = open("testFile.text", "a")
for lines in raw_data:
if "." in lines:
re_lines = lines.replace(".", ".\r\n")
my_save_data.write(re_lines)
else:
my_save_data.write(lines + "\n")
my_save_data.close()
if your text file is not big you can try this too :
with open("rawCorpus.txt", 'r') as read_file:
raw_data = read_file.read()
re_data = raw_data.replace(".", ".\n")
with open("testFile.text", "w") as save_data:
save_data.write(re_data)
UPDATE ( output new lines depends on your text viewer too! because in some text editors "\n" is a new line but in some others "\r\n" is a new line. ) :
input sample :
This is a book. i love it.
This is a apple. i love it.
This is a laptop. i love it.
This is a pen. i love it.
This is a mobile. i love it.
Code:
last_buffer = []
read_lines = [line.rstrip('\n') for line in open('input.txt')]
my_save_data = open("output.txt", "a")
for lines in read_lines:
re_make_lines = lines.split(".")
for items in re_make_lines:
if items.replace(" ", "") == "":
pass
else:
result = items.strip() + ".\r\n"
my_save_data.write(result)
my_save_data.close()
Ouput Will Be :
This is a book.
i love it.
This is a apple.
i love it.
This is a laptop.
i love it.
This is a pen.
i love it.
This is a mobile.
i love it.

You are overwriting the string s in every loop with s = '\n'.join(l).
Allocate s = '' as empty string before the for-loop and add the new lines during every loop, e.g. with s += '\n'.join(l) (short version of s = s + '\n'.join(l)
This should work:
inp = open('rawCorpus.txt', 'r')
out = open('testFile.text', 'w')
s = '' # empty string
for line in iter(inp):
l = line.split('.')
s += '\n'.join(l) # add new lines to s
print(s)
out.write(str(s))
inp.close()
out.close()

Here is my own solution, but still I want one more newline after ".", that this solution not did this
read_lines = [line.rstrip('\n') for line in open('rawCorpus.txt')]
words = []
my_save_data = open("my_saved_data.txt", "w")
for lines in read_lines:
words.append(lines)
for word in words:
w = word.rstrip().replace('.', '\n.')
w = w.split()
my_save_data.write(str("\n".join(w)))
print("\n".join(w))
my_save_data.close()

Why doesn't this writing to file in python work?

The idea behind the following code is that the if the variable crop is already contained within the .txt file the variable quantity will be added on to the end of the same line as crop. This is my attempt at doing this, however it doesn't work: you really need to run it to understand, but, essentially, the wrong section of the list is added to, an ever expanding series of '/' appear and the line breaks disappear. Does anyone know how to modify this code so it functions properly?
What should be outputted:
Lettuce 77 88 100
Tomato 99
What actually is outputted:
["['\\n', 'Lettuce 77 \\n88 ', 'Tomato 88 ']100 "]
Code:
def appendA ():
with open('alpha.txt', 'r') as file_1:
lines = file_1.readlines()
for line in lines:
if crop in line:
index = lines.index(line)
line = str(line + quantity + ' ')
lines [index] = line
newlines = str(lines)
#The idea here is that the variable quantity is added onto the end
# of the same row as the entered crop in the .txt file.
with open('alpha.txt', 'w') as file_3:
file_3.write (newlines)
def appendB ():
with open('alpha.txt', 'a') as file_2:
file_2.write ('\n')
file_2.write (crop + ' ')
file_2.write (quantity + ' ')
crop = input("Which crop? ")
quantity = input("How many? ")
with open('alpha.txt', 'a') as file_0:
if crop in open('alpha.txt').read():
appendA ()
else:
appendB ()

Let's start! Your code should look something like this:
def appendA():
with open('alpha.txt', 'r') as file_1:
lines = []
for line in file_1:
if crop in line:
line = str(line.rstrip("\n") + quantity + "\n")
lines.append(line)
#The idea here is that the variable quantity is added onto the end
# of the same row as the entered crop in the .txt file.
with open('alpha.txt', 'w') as file_3:
file_3.writelines(lines)
def appendB():
with open('alpha.txt', 'a') as file_2:
file_2.write('\n')
file_2.write(crop + ' ')
file_2.write(quantity + ' ')
crop = "Whichcrop"
quantity = "+!!!+"
with open('alpha.txt') as file_0:
if crop in file_0.read():
print("appendA")
appendA()
else:
print("appendB")
appendB()
with open('alpha.txt', 'a') as file_0:
if crop in open('alpha.txt').read():
appendA ()
else:
appendB ()
Also you make several mistakes.
This line "with open('alpha.txt', 'a') as file_0:" open file with context for append in the end of file, but you dont use variable file_0. I think it's extra.
On next step you opened file for check "crop in open('alpha.txt').read()", but never close it.
["['\n', 'Lettuce 77 \n88 ', 'Tomato 88 ']100 "]
You get such a output because, you use write instead of writelines:
with open('alpha.txt', 'w') as file_3:
file_3.write (newlines)
Also you write in the file after each iteration, better to form a list of strings and then write to file.

newlines = str(lines) # you convert all lines list to str - so you get default conversion
and also you should replace whole file if you want to write in the middle
And you can also get read of appendB, because you still check every line and your code anyway is not optimal in terms of performance :)
from os import remove, close
def appendA(filename, crop, quantity):
result = []
exists = False
with open(filename, 'r') as file_1:
lines = file_1.readlines()
for line in lines:
if not crop in line:
result.append(line)
else:
exists = True
result.append(line.strip('\n') + quantity + '\n')
if not exists:
with open(filename, 'a') as file_2:
file_2.write ('\n' + crop + ' ' + quantity + ' ')
else:
tmp_file = filename + '.tmp'
with open(tmp_file, 'w') as file_3:
file_3.write(result)
remove(filename)
move(tmp_file, filename)

"str(lines)": lines is list type, you can use ''.join(lines) to
convert it to a string.
"line in lines": "line" end with a "\n"
Code indent error: "line newlines = ''.join(lines)" and the follow
"if crop in lines" is mistake, if crop named "AA" and "AABB", the
new input "AA" with return true, the quantity will be appended to
all lines including "AA" ,not only the "AA" line.
def appendA():
with open('alpha.txt', 'r') as file_1:
lines = file_1.readlines()
for line in lines:
if crop in line:
index = lines.index(line)
line = str(line.replace("\n", "") + ' ' + quantity + '\n')
lines[index] = line
newlines = ''.join(lines)
# The idea here is that the variable quantity is added onto the end
# of the same row as the entered crop in the .txt file.
with open('alpha.txt', 'w') as file_3:
file_3.write(newlines)
def appendB():
with open('alpha.txt', 'a') as file_2:
file_2.write("\n")
file_2.write(crop + ' ')
file_2.write(quantity + ' ')
crop = input("Which crop? ")
quantity = input("How many? ")
with open('alpha.txt', 'a') as file_0:
if crop in open('alpha.txt').read():
appendA()
else:
appendB()

I want to add a string in between of 2 strings in file, but in output whole text is getting appended at the end of file

Code :
fo = open("backup.txt", "r")
filedata = fo.read()
with open("backup.txt", "ab") as file :
file.write(filedata[filedata.index('happy'):] + " appending text " + filedata[:filedata.rindex('ending')])
with open("backup.txt", "r") as file :
print "In meddival : \n",file.read()
Expected Output :
I noticed that every now and then I need to Google fopen all over again. happy appending text ending
Actual output :
I noticed that every now and then I need to Google fopen all over again. happy endinghappy ending appending text I noticed that every now and then I need to Google fopen all over again. happy

Okay, this will definitely fix your problem.
fo = open("backup.txt", "r")
filedata = fo.read()
ix = filedata.index('ending')
new_str = ' '.join([filedata[:ix], 'appending text', filedata[ix:]])
with open("backup.txt", "ab") as file:
file.write(new_str)
with open("backup.txt", "r") as file :
print "In meddival : \n",file.read()
As you can see, I am getting the index of the beginning of the ending word.
Then, I use join to make push in the appending text between happy and ending.
Note You're adding to your file another line with the changes you've made. To override the old line, replace the a with w in the with open("backup.txt", "ab")...
There are more ways for doing that
You can split the string to words, find the index of the 'ending' word and insert the 'appending text' before it.
text_list = filedata.split()
ix = text_list.index('ending')
text_list.insert(ix, 'appending text')
new_str = ' '.join(text_list)
You can also do this one:
word = 'happy'
text_parts = filedata.split('happy')
appending_text = ' '.join(word, 'appending text')
new_str = appending_text.join(text_parts)

You need to split your file content
fo = open("backup.txt", "r")
filedata = fo.read().split()
with open("backup.txt", "ab") as file:
file.write(' '.join(filedata[filedata.index('happy'):]) + " appending text " + ' '.join(filedata[:filedata.index('ending')]))
with open("backup.txt", "r") as file :
print "In meddival : \n",file.read()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python search for text over multiple lines - python

Related

How to delete the last line of my output file?

Python to read txt files and delete lines that contains same part

Find a dot in a text file and add a newline to the file in Python?

Why doesn't this writing to file in python work?

I want to add a string in between of 2 strings in file, but in output whole text is getting appended at the end of file

Categories

Resources