I am trying to find specific words from a text file, however my script doesn't seem to be able to match the word to what's written on a line in the text file, even though I know it matches. I've noticed there are spaces but since I am saying entry in line, shouldn't it work?
I have also tried:
if str(entry) in line:,
if str(entry) in str(line): and
if entry in str(line):
but none of them seem to work either
I can't see where I'm going wrong. Any help would be appreciated.
Here is my code
with open(address+'file_containing_data_I_want.txt') as f:
for entry in System_data:
print "Entry:"
print entry
for line in f:
print "Start of line"
print line
print"End of line"
if entry in line:
print "Found entry in line" #This never gets printed
Using the print statements (for just the first entry) I see:
Entry:
Manufacturer
Start of line
??
End of line
Start of line
End of line
Start of line
Manufacturer=manufacturer_data
End of line
Start of line
Model=model_data
End of line
Start of line
End of line
Start of line
End of line
The text file looks like this (Note:I can't change the text file as this is the way I will be receiving it, ' indicates a blank line):
'
'
Manufacturer=manufacturer_data
Model=model_data
'
'
'
UPDATE:
Changing my script to:
with open(address+'file_containing_data_I_want.txt') as f:
for line in f:
print "Start of line %s" % line
print"End of line"
for entry in System_data:
print "Entry: %s" % entry
if entry in line.strip():
print "Found entry in line"
Results in this being printed (Still no "Found entry in line"):
Entry: Manufacturer
Entry: Model
Start of line:
End of line
Entry: Manufacturer
Entry: Model
Start of line: Manufacturer=manufacturer_data
End of line
Entry: Manufacturer
Entry: Model
Start of line: Model=model_data
Entry: Manufacturer
Entry: Model
Start of line:
End of line
Entry: Manufacturer
Entry: Model
Start of line:
End of line
Changing my code to this:
for line in f:
print "Start of line: %s" % line.strip("\r\n")
print "End of line"
for entry in System_data:
print "Entry: %s" % entry.strip()
if entry.strip() in line.strip("\r\n"):
print "FOUND!!!!!!!!!!!!!"
Gives me this:
Start of line: ??
End of line
Entry: Manufacturer
Entry: Model
Start of line:
End of line
Entry: Manufacturer
Entry: Model
Start of line: Manufacturer=manufacturer_data
End of line
Entry: Manufacturer
Entry: Model
Start of line: Model=model_data
End of line
You read to the end of the file the after the first loop. Swap the loops instead, so each entry in System_data gets checked at each line of the file:
for line in f:
print "Start of line %s" % line
print "End of line"
for entry in System_data:
print "Entry: %s" % entry
if entry.strip() in line.strip("\r\n"):
print "Found entry in line" #This now gets printed
or you can correct this behavior in your current code by calling f.seek(0) before for line in f
You should strip all blanks/newlines from both the entry and lines in file. So, prefix everything with
entry = entry.strip()
and change the
if entry in line:
to
if entry in line.strip():
EDIT:
also, what Moses Koledoye says
Ok so it seems the issue was that the string was actually in hexadecimal form.
But it only appeared in hexadecimal form to me when I used print repr(line) it appeared like:
'\x00m\x00a\x00n\x00u\x00f\x00a\x00c\x00t\x00u\x00r\x00e\x00r\x00_\x00d\x00a\x00t\x00a\x00'
So I changed my code to the following:
with open(address+'file_containing_data_I_want.txt') as f:
for line in f:
for entry in System_data:
line=line.strip()
line = re.sub(r'[^\w=]', '', line)
if entry in line:
print "Found entry in line"
This script now enters the loop if entry in line: and prints "Found entry in line"
Related
I've been trying to add a save function to a game by having the user enter their initials which gets saved to a file with their score so then it can load were they were before.
At the start it asks for their initials and saves it to a file, then I want to the file to be copied onto a list so the score can be edited, however the list doesn't include the most recent initials and score added the file.
I don't know where the problem is so I've added the file stuff I did.
Names.write('~~~')
Names.write('\n')
Names.write(username_score)
Names.write('\n')
line = Names.readlines()
print(line)
with open('Names.txt','r+') as Names:
for line_number, data in enumerate(Names, start=1):
if username in data:
print(f"Word '{username}' found on line {line_number}")
break
print('data',data)
print('line',line)
line.pop(1)
line.insert(1, username_score)
print('line 2',line)
for i in range(len(line)):
Names.write(line[i])
I think this is because in your first code block, you are trying to write and then read the file without closing and reopening the file. The line = Names.readlines() line should be in the second code block like so:
username_score = '200'
username = 'foo'
with open('Names.txt','w+') as Names:
Names.write('~~~')
Names.write('\n')
Names.write(username_score)
Names.write('\n')
line = Names.readlines()
print(line)
with open('Names.txt','r+') as Names:
line = Names.readlines()
print(line)
for line_number, data in enumerate(line, start=0):
if username in data:
print(f"Word '{username}' found on line {line_number}")
break
print('data',data)
print('line',line)
line.pop(1)
line.insert(1, username_score)
print('line 2',line)
for i in range(len(line)):
Names.write(line[i])
This full example should be easy enough to integrate into your code, and if you have any questions leave a comment.
I want to recover this four lines tagged ERROR, contained in a file:
ERROR Blablabalbalablabalbalablabalbalablabalbalablabalbalablabalbala
ERROR Tototototototototototototototototototototototototototototototot
ERROR Hihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihi
hihihihihihihihihihihihihihihihihi
ERROR Lalalalalalalalala
def getErrorWarningInfo(self, file, line, tag):
msg = line.strip(tag)
while True:
nextline = file.readline()
if (' ' in nextline):
msg += "\n"+nextline.strip()
break
return [self.id, tag, msg] ,line
I recover only 3 ERROR including the one containing two lines, but I can't get the line before this one:
ERROR Tototototototototototototototototototototototototototototototot
And when I remove, in the function, the line file.readline(), I recover the 4 ERROR but only the first line of the one with two lines.
Why don't You use file.read().split('ERROR') and then remove whitespaces? (:
Here is a prototype to get around your issue. You can adjust it for your need if you like it.
I've edited a log file to contain INFO and WARNING too
ERROR Blablabalbalablabalbalablabalbalablabalbalablabalbalablabalbala
ERROR Tototototototototototototototototototototototototototototototot
INFO tik tok
ERROR Hihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihi
hihihihihihihihihihihihihihihihihi
WARNING bip bup
ERROR Lalalalalalalalala
# types of tags you can encounter in log
TAGS = ["ERROR", "INFO", "WARNING"]
LOG = "./LOG"
with open(LOG) as log:
lines = []
# you want to keep previous line in case next line is continuation of it
prev_line = ""
for line in log.readlines():
# Check if line contains any tags
if any(map(line.__contains__, TAGS)):
if prev_line:
# If we already have a previous line, we can add it to the list.
# The current line also contains a tag
lines.append(prev_line)
# Current line becomes previous
prev_line = line
continue
# If there is no tag in the line, it most be continuation of previous.
prev_line += line
# Check that all lines concatenated correctly
for line in lines:
print line
print("============ Filtering logs =============\n")
# Now you can filter your lines by log tags
def get_errors(lines):
return [line for line in lines if "ERROR" in line]
# Only ERROR logs
for line in get_errors(lines):
print line
Outputs
ERROR Blablabalbalablabalbalablabalbalablabalbalablabalbalablabalbala
ERROR Tototototototototototototototototototototototototototototototot
INFO tik tok
ERROR Hihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihi
hihihihihihihihihihihihihihihihihi
WARNING bip bup
============ Filtering logs =============
ERROR Blablabalbalablabalbalablabalbalablabalbalablabalbalablabalbala
ERROR Tototototototototototototototototototototototototototototototot
ERROR Hihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihi
hihihihihihihihihihihihihihihihihi
def parse_file(self):
for file in self.prt_files:
with open(os.path.join(self.path, file), "r") as f:
try:
for line in f:
if (line.startswith("ERROR")):
error, line = self.getErrorWarningInfo(f, line, "ERROR")
[...]
I've got a pretty simple python script that reads in a file, and parses it line by line.
It doesn't seem to recognize the '//' at the start of my lines. If I change it to look for '#' at the start of my lines, it doesn't find those lines either. Am I just misunderstanding this?
line = fIn.readline()
while line:
print "line is", line
line = line.strip()
if line.startswith('//'):
print "winner"
line = fIn.readline()
The file I'm reading in looks like this:
// Feedback
"Feedback" = "Feedback";
// New strings
"File URL not reachable." = "File URL not reachable.";
And the debug line looks appropriate when it prints out:
line is // Feedback
line is "Feedback" = "Feedback";
line is
line is // New strings
line is "File URL not reachable." = "File URL not reachable.";
line is
Better version:
with open("abc") as f:
for line in f:
line=line.strip()
if line and line.startswith("//"):
print "line is",line
print "winner"
print next(f)
....:
output:
line is // Feedback
winner
"Feedback" = "Feedback";
line is // New strings
winner
"File URL not reachable." = "File URL not reachable.";
You are only reading one line of your text file. Other than you have the wrong indent on the last line, it seems to work. Try running your program after making sure line = fIn.readline() gets executed on each iteration (move it one block to the left).
Here is what I get after fixing that one line, is this the desired output?
line is // Feedback
winner
line is "Feedback" = "Feedback";
line is
line is // New strings
winner
line is "File URL not reachable." = "File URL not reachable.";
Edit: does this work for you?
for line in open("yourfile.txt").readlines():
print "line is", line
line = line.strip()
if line.startswith('//'):
print "winner"
try this
for line in fIn:
print "line is", line
line = line.strip()
if line[0:2]=='//':
print "winner"
line = fIn.readline()
I have file contains
"Starting program and
Starting program
Loading molecule...
Initialising variables...
Starting the calculation - this could take a while!
Molecule energy = 2432.6 kcal mol-1
Calculation finished. Bye!"
import sys
import re
search_string = "Starting program"
txtlength=len(search_string)
print "txtlength",txtlength
lines = open( "C:\search.txt", "r" ).readlines()
for line in lines:
if re.search( search_string, line ):
print line,
else :
print "Not found"
I am looking for only 2nd line in the file but ouput coming from this code is 1 line is also displaying
You don't need regex for the example you show:
with open("C:/search.txt") as inp:
for line in inp:
if line.strip() == search_string:
print line
I have a long text file with paragraph with 6 and 7 lines each. I need to take all seven line paragraphs and write them to a file and take six line paragraphs and write them to a file.
Or delete 6-line (7-line) paragraphs.
Each paragraph is separated with blank line (or two blank lines).
Text file example:
Firs Name Last Name
address1
Address2
Note 1
Note 2
Note3
Note 4
First Name LastName
add 1
add 2
Note2
Note3
Note4
etc...
I want to use python 3 for windows. Any help is welcome. Thanks!
As a welcome on stackoverflow, and because I think you have now searched more for a code , I propose you the following code.
It verifies that the paragraphs have not more than 7 lines and not less than 6 lines. It warns when such paragraphs exist in the source.
You'll remove all the prints to have a clean code, but with them you can follow the algorithm.
I think there is no bug in it, but don't take that as 100 % sure.
It isn't the only manner to do , but I choosed the way that can be used for all types of files, big or not: iterating one line at a time. Reading the entire file in one pass could be done, and then split into a list of lines, or treated with help of regexes; however , when a file is enormous, reading it all in one time is memory consuming.
with open('source.txt') as fsource,\
open('SIX.txt','w') as six, open('SEVEN.txt','w') as seven:
buf = []
cnt = 0
exceeding7paragraphs = 0
tinyparagraphs = 0
line = 'go'
while line:
line = fsource.readline()
cnt += 1
buf.append(line)
if len(buf)<6 and line.rstrip('\n\r')=='':
tinyparagraphs += 1
print cnt,repr(line),"this line of paragraph < 6 is void,"+\
"\nthe treatment of all this paragraph is skipped\n"+\
'\n# '+str(cnt)+' '+ repr(line)+" skipped line "
buf = []
while line and line.rstrip('\n\r')=='':
line = fsource.readline()
cnt += 1
if line=='':
print "line",cnt,"is '' , EOF -> the program will be stopped"
elif line.rstrip('\n\r')=='':
print '#',cnt,repr(line)
else:
buf.append(line)
print '!',cnt,repr(line),' put in void buf'
else:
print cnt,repr(line),' put in buf'
if len(buf)==6:
line = fsource.readline() # reading a potential seventh line of a paragraph
cnt += 1
if line.rstrip('\n\r'): # means the content of the seventh line isn't void
buf.append(line)
print cnt,repr(line),'seventh line put in buf'
line = fsource.readline()
cnt += 1
if line.rstrip('\n\r'): # means the content of the eighth line isn't void
exceeding7paragraphs += 1
print cnt,repr(line),"the eight line isn't void,"+\
"\nthe treatment of all this paragraph is skipped"+\
"\neighth line skipped"
buf = []
while line and line.rstrip('\n\r'):
line = fsource.readline()
cnt += 1
if line=='':
print "line",cnt,"is '' , EOF -> the program will be stopped"
elif line.rstrip('\n\r')=='':
print '\n#',cnt,repr(line)
else:
print str(cnt) + ' ' + repr(line)+' skipped line'
else:
if line=='':
print cnt,"line is '' , EOF -> the program will be stopped\n"
else: # line.rstrip('\n\r') is ''
print cnt,'eighth line is void',repr(line)
seven.write(''.join(buf) + '\n')
print buf,'\n',len(buf),'lines recorded in file SEVEN\n'
buf = []
else:
print cnt,repr(line),'seventh line: void'
six.write(''.join(buf) + '\n')
print buf,'\n',len(buf),'lines recorded in file SIX'
buf = []
if line=='':
print "line",cnt,"is '' , EOF -> the program will be stopped"
else:
print '\nthe line is',cnt, repr(line)
while line and line.rstrip('\n\r')=='':
line = fsource.readline()
cnt += 1
if line=='':
print "line",cnt,"is '' , EOF -> the program will be stopped"
elif line.rstrip('\n\r')=='':
print '#',cnt,repr(line)
else: # line.rstrip('\n\r') != ''
buf.append(line)
print '!',cnt,repr(line),' put in void buf'
if exceeding7paragraphs>0:
print '\nWARNING :'+\
'\nThere are '+str(exceeding7paragraphs)+' paragraphs whose number of lines exceeds 7.'
if tinyparagraphs>0:
print '\nWARNING :'+\
'\nThere are '+str(tinyparagraphs)+' paragraphs whose number of lines is less than 6.'
print '\n===================================================================='
print 'File SIX\n'
with open('SIX.txt') as six:
print six.read()
print '===================================================================='
print 'File SEVEN\n'
with open('SEVEN.txt') as seven:
print seven.read()
I also upvote your question because it is a problem not so easy that it's seems to solve, and to not let you with one post and one downvote, it is demoralizing as a beginning. Try to make your presentation better next time, as other said.
.
EDIT:
here's a simplified code for a text containing only paragraphs of 6 or 7 lines precisely, separated by 1 or 2 lines exactly, as stated in the problem's wording
with open('source2.txt') as fsource,\
open('SIX.txt','w') as six, open('SEVEN.txt','w') as seven:
buf = []
line = fsource.readline()
while not line: # to go to the first non empty line
line = fsource.readline()
while True:
buf.append(line) # this line is the first of a paragraph
print '\n- first line of a paragraph',repr(line)
for i in xrange(5):
buf.append(fsource.readline())
# at this point , 6 lines of a paragraph have been read
print '-- buf 6 : ',buf
line = fsource.readline()
print '--- line seventh',repr(line),id(line)
if line.rstrip('\r\n'):
buf.append(line)
seven.write(''.join(buf) + '\n')
buf = []
line = fsource.readline()
else:
six.write(''.join(buf) + '\n')
buf = []
# at this point, line is the empty line after a paragraph or EOF
print '---- line after',repr(line),id(line)
line = fsource.readline()
print '----- second line after',repr(line)
# at this point, line is an empty line after a paragraph or EOF
# or the first line of a new paragraph
if not line: # it is EOF
break
if not line.rstrip('\r\n'): # it is a second empty line
line = fsource.readline()
# now line is the first of a new paragraph
print '\n===================================================================='
print 'File SIX\n'
with open('SIX.txt') as six:
print six.read()
print '===================================================================='
print 'File SEVEN\n'
with open('SEVEN.txt') as seven:
print seven.read()