I am attempting to loop over a series of text files I have and I want to do so by checking the value of the next line. The input from the text file looks like:
Person1
(COUNT)|key
1|************
Person2
(COUNT)|key
// and so on
Some people may have a key and others may not. I am trying to write a loop that checks for at least 3 consecutive lines (people with keys) before a space like the Person1 example where each line begins with a character and I want to print only those cases.
My current loop looks like this:
for line in input:
if re.match(r'\S', line):
line1 = line
print(line1)
if re.match(r'\S', input.next()):
line2 = line
print(line2)
if re.match(r'\S', input.next()):
line3 = line
print(line3)
However, I cannot seem to get this loop correct. It seems to be printing the Person three times and only sometimes printing the key. Looking for any guidance here available.
You can use enumerate to get the current index and be able to check the next lines too. You'll need to beware the case when you reach the end of the file though.
for i, line in enumerate(input):
if i == len(input) - 2:
break
next_line = line[i+1]
Related
What im trying to do is match a phrase in a text file, then print that line(This works fine). I then need to move the cursor up 4 lines so I can do another match in that line, but I cant get the seek() method to move up 4 lines from the line that has been matched so that I can do another regex search. All I can seem to do with seek() is search from the very end of the file, or the beginning. It doesn't seem to let me just do seek(105,1) from the line that is matched.
### This is the example test.txt
This is 1st line
This is 2nd line # Needs to seek() to this line from the 6th line. This needs to be dynamic as it wont always be 4 lines.
This is 3rd line
This is 4th line
This is 5th line
This is 6st line # Matches this line, now need to move it up 4 lines to the "2nd line"
This is 7 line
This is 8 line
This is 9 line
This is 10 line
#
def Findmatch():
file = open("test.txt", "r")
print file.tell() # shows 0 which is the beginning of the file
string = file.readlines()
for line in string:
if "This is 6th line" in line:
print line
print file.tell() # shows 171 which is the end of the file. I need for it to be on the line that matches my search which should be around 108. seek() only lets me search from end or beginning of file, but not from the line that was matched.
Findmatch()
Since you've read all of it into memory at once with file.readlines(). tell() method does indeed correctly point to the end and your already have all your lines in an array. If you still wanted to, you'd have to read the file in line by line and record position within file for each line start so that you could go back four lines.
For your described problem. You can first find index of the line first match and then do the second operation starting from the list slice four items before that.
Here a very rough example of that (return None isn't really needed, it's just for sake of verbosity, clearly stating intent/expected behavior; raising an exception might be just as well a desired depending on what the overall plan is):
def relevant(value, lines):
found = False
for (idx, line) in enumerate(lines):
if value in line:
found = True
break # Stop iterating, last idx is a match.
if found is True:
idx = idx - 4
if idx < 0:
idx = 0 # Just return all lines up to now? Or was that broken input and fail?
return lines[idx:]
else:
return None
with open("test.txt") as in_file:
lines = in_file.readlines()
print(''.join(relevant("This is 6th line", lines)))
Please also note: It's a bit confusing to name list of lines string (one would probably expect a str there), go with lines or something else) and it's also not advisable (esp. since you indicate to be using 2.7) to assign your variable names already used for built-ins, like file. Use in_file for instance.
EDIT: As requested in a comment, just a printing example, adding it in parallel as the former seem potentially more useful for further extension. :) ...
def print_relevant(value, lines):
found = False
for (idx, line) in enumerate(lines):
if value in line:
found = True
print(line.rstrip('\n'))
break # Stop iterating, last idx is a match.
if found is True:
idx = idx - 4
if idx < 0:
idx = 0 # Just return all lines up to now? Or was that broken input and fail?
print(lines[idx].rstrip('\n'))
with open("test.txt") as in_file:
lines = in_file.readlines()
print_relevant("This is 6th line", lines)
Note, since lines are read in with trailing newlines and print would add one of its own I've rstrip'ed the line before printing. Just be aware of it.
I have a file where some sentence are spread over multiple lines.
For example:
1:1 This is a simple sentence
[NEWLINE]
1:2 This line is spread over
multiple lines and it goes on
and on.
[NEWLINE]
1:3 This is a line spread over
two lines
[NEWLINE]
So I want it to look like this
1:1 This is a simple sentence
[NEWLINE]
1:2 This line is spread over multiple lines and it goes on and on.
[NEWLINE]
1:3 This is a line spread over two lines
Some lines a spread over 2 or 3 or 4 lines. If there follows al line which is not an new line it should be merged into one single line.
I would like to overwrite the given file of to make an new file.
I've tried it with a while loop but without succes.
input = open(file, "r")
zin = ""
lines = input.readlines()
#Makes array with the lines
for i in lines:
while i != "\n"
zin += i
.....
But this creates a infinite loop.
You should not be nesting for and while loops in your use case. What happens in your code is that a line is assigned to the variable i by the for loop, but it isn't being modified by the nested while loop, so if the while clause is True, then it will remain that way and without a breaking condition, you end up with an infinite loop.
A solution might look like this:
single_lines = []
current = []
for i in lines:
i = i.strip()
if i:
current.append(i)
else:
if not current:
continue # treat multiple blank lines as one
single_lines.append(' '.join(current))
current = []
else:
if current:
# collect the last line if the file doesn't end with a blank line
single_lines.append(' '.join(current))
Good ways of overwriting the input file would be to either collect all output in memory, close the file after reading it out and reopen it for writing, or to write to another file while reading the input and renaming the second one to overwrite the first after closing both.
I have written a code that reads a file, finds if a line has the word table_begin and then counts the number of lines until the line with the word table_end.
Here is my code -
for line in read_file:
if "table_begin" in line:
k=read_file.index(line)
if 'table_end' in line:
k1=read_file.index(line)
break
count=k1-k
if count<10:
q.write(file)
I have to run it on ~15K files so, since its a bit slow (~1 file/sec), I was wondering if I am doing something inefficient. I was not able to find myself, so any help would be great!
When you do read_file.index(line), you are scanning through the entire list of lines, just to get the index of the line you're already on. This is likely what's slowing you down. Instead, use enumerate() to keep track of the line number as you go:
for i, line in enumerate(read_file):
if "table_begin" in line:
k = i
if "table_end" in line:
k1 = i
break
You are always checking for both strings in the line. In addition, index is heavy as you're seeking the file, not the line. Using "in" or "find" will be quicker, as will only checking for table_begin until you've found it, and table_end after you've seen table_begin. If you aren't positive each file has table_begin and table_end in that order (and only one of each) you may need some tweaking/checks here (maybe pairing your begin/end into tuples?)
EDIT: Incorporated enumerate and switched from a while to a for loop, allowing some complexity to be removed.
def find_lines(filename):
bookends = ["table_begin", "table_end"]
lines = open(filename).readlines()
for bookend in bookends:
for ind, line in enumerate(lines):
if bookend in line:
yield ind
break
for line in find_lines(r"myfile.txt"):
print line
print "done"
Clearly, you obtain read_file by f.readlines(), which is a bad idea, because you read the all file.
You can win a lot of time by :
reading file line by line :
searching one keyword at each time.
stopping after 10 lines.
with open('test.txt') as read_file:
counter=0
for line in read_file:
if "table_begin" in line : break
for line in read_file:
counter+=1
if "table_end" in line or counter>=10 : break # if "begin" => "end" ...
if counter < 10 : q.write(file)
I have a for loop iterating through my file, and based on a condition, I want to be able to read the next line in the file.I want to detect a keyword of [FOR_EACH_NAME] once I find it, I know that names will follow and I print each name. Basically Once I find the [FOR_EACH_NAME] keyword how can I keep going through the lines.
Python code:
file=open("file.txt","r")
for line in file:
if "[FOR_EACH_NAME]" in line
for x in range(0,5)
if "Name" in line:
print(line)
Hi everyone, thank you for the answers. I have posted the questions with much more detial of what I'm actually doing here How to keep track of lines in a file python.
Once you found the tag, just break from the loop and start reading names, it will continue reading from the position where you interrupted:
for line in file:
if '[FOR_EACH_NAME]' in line:
break
else:
raise Exception('Did not find names') # could not resist using for-else
for _ in range(5):
line = file.readline()
if 'Name' in line:
print(line)
Are the names in the lines following FOR_EACH_NAME? if so, you can check what to look for in an extra variable:
file=open("file.txt","r")
names = 0
for line in file:
if "[FOR_EACH_NAME]" in line
names = 5
elif names > 0:
if "Name" in line:
print(line)
names -= 1
I think this will do what you want. This will read the next 5 lines and not the next 5 names. If you want the next five names then indent once the line ct+=1
#Open the file
ffile=open('testfile','r')
#Initialize
flg=False
ct=0
#Start iterating
for line in ffile:
if "[FOR_EACH_NAME]" in line:
flg=True
ct=0
if "Name" in line and flg and ct<5:
print(line)
ct+=1
Pretty new to Python and have been writing up a script to pick out certain lines of a basic log file
Basically the function searches lines of the file and when it finds one I want to output to a separate file, adds it into a list, then also adds the next five lines following that. This then gets output to a separate file at the end in a different funcition.
What I've been trying to do following that is jump the loop to continue on from the last of those five lines, rather than going over them again. I thought the last line in the code would solved the problem, but unfortunately not.
Are there any recommended variations of a for loop I could use for this purpose?
def readSingleDayLogs(aDir):
print 'Processing files in ' + str(aDir) + '\n'
lineNumber = 0
try:
open_aDirFile = open(aDir) #open the log file
for aLine in open_aDirFile: #total the num. lines in file
lineNumber = lineNumber + 1
lowerBound = 0
for lineIDX in range(lowerBound, lineNumber):
currentLine = linecache.getline(aDir, lineIDX)
if (bunch of logic conditions):
issueList.append(currentLine)
for extraLineIDX in range(1, 6): #loop over the next five lines of the error and append to issue list
extraLine = linecache.getline(aDir, lineIDX+ extraLineIDX) #get the x extra line after problem line
issueList.append(extraLine)
issueList.append('\n\n')
lowerBound = lineIDX
You should use a while loop :
line = lowerBound
while line < lineNumber:
...
if conditions:
...
for lineIDX in range(line, line+6):
...
line = line + 6
else:
line = line + 1
A for-loop uses an iterator over the range, so you can have the ability to change the loop variable.
Consider using a while-loop instead. That way, you can update the line index directly.
I would look at something like:
from itertools import islice
with open('somefile') as fin:
line_count = 0
my_lines = []
for line in fin:
line_count += 1
if some_logic(line):
my_lines.append(line)
next_5 = list(islice(fin, 5))
line_count += len(next_5)
my_lines.extend(next_5)
This way, by using islice on the input, you're able to move the iterator ahead and resume after the 5 lines (perhaps fewer if near the end of the file) are exhausted.
This is based on if I'm understanding correctly that you can read forward through the file, identify a line, and only want a fixed number of lines after that point, then resume looping as per normal. (You may not even require the line counting if that's all you're after as it only appears to be for the getline and not any other purpose).
If you indeed you want to take the next 5, and still consider the following line, you can use itertools.tee to branch at the point of the faulty line, and islice that and let the fin iterator resume on the next line.