Reading a text file from a certain point (python) - python

I'm trying to make code that can find a specific word in a file and start reading from there until it reads the same word again. In this case the word is "story". The code counts up the lines until the word, and then it starts counting again from 0 in the second loop. I have tried to use functions and global variables, but I keep getting the same number twice and I don't know why.
file = open("testing_area.txt", "r")
line_count = 0
counting = line_count
for line in file.readlines()[counting:]:
if line != "\n":
line_count = line_count + 1
if line.startswith('story'):
#line_count += 1
break
print(line_count)
for line in file.readlines()[counting:]:
if line != "\n":
line_count = line_count + 1
if line.startswith('story'):
#line_count += 1
break
print(line_count)
file.close()
Output:
6
6
Expected output:
6
3
This is the text file:
text
text
text
text
text
story
text
text
story

Code can be simplified to:
with open("testing_area.txt", "r") as file: # Context manager preferred for file open
first, second = None, None # index of first and second occurance of 'story'
for line_count, line in enumerate(file, start = 1): # provides line index and content
if line.startswith('story'): # no need to check separately for blank lines
if first is None:
first = line_count # first is None, so this must be the first
else:
second = line_count # previously found first, so this is the second
break # have now found first & second
print(first, second - first) # index of first occurrence and number of lines between first and second
# Output: 6, 3

There are several issues here. The first is that, for a given file object, readlines() basically only works once. Imagine a text file open in an editor, with a cursor that starts at the beginning. readline() (singular) reads the next line, moving the cursor down one: readlines() (plural) reads all lines from the cursor's current position to the end. Once you've called it once, there are no more lines left to read. You could solve this by putting something like lines = file.readlines() up at the top, and then looping through the resulting list. (See this section in the docs for more info.)
However, you neither reset line_count to 0, nor ever set counting to anything but 0, so the loops still won't do what you intend. You want something more like this:
with open("testing_area.txt") as f:
lines = f.readlines()
first_count = 0
for line in lines:
if line != "\n":
first_count += 1
if line.startswith('story'):
break
print(first_count)
second_count = 0
for line in lines[first_count:]:
if line != "\n":
second_count += 1
if line.startswith('story'):
break
print(second_count)
(This also uses the with keyword, which automatically closes the file even if the program encounters an exception.)
That said, you don't really need two loops in the first place. You're looping through one set of lines, so as long as you reset the line number, you can do it all at once:
line_no = 0
words_found = 0
with open('testing_area.txt') as f:
for line in f:
if line == '\n':
continue
line_no += 1
if line.startswith('story'):
print(line_no)
line_no = 0
words_found += 1
if words_found == 2:
break
(Using if line == '\n': continue is functionally the same as putting the rest of the loop's code inside if line != '\n':, but personally I like avoiding the extra indentation. It's mostly a matter of personal preference.)

As the question doesn't said that it only needs to count the word twice, I provide a solution that will read through the whole file and print every time when "story" found.
# Using with to open file is preferred as file will be properly closed
with open("testing_area.txt") as f:
line_count = 0
for line in f:
line_count += 1
if line.startwith("story"):
print(line_count)
# reset the line_count if "story" found
line_count = 0
Output:
6
3

Related

Script continues reading from file although file is finished

I am creating script which reads from rockyou.txt file and the problem is that when it finishes going through all lines - 1.5M then it continues reading empty lines from the file and i need it to stop.
I can't do a simple if statement to check if the line is empty because in the file there are multiple places where there is a single empty line.
Do you have any ideas how to implement?
Code:
while line != static:
line = f.readline()
line = line.strip()
counter = counter + 1
print("Trying " + line + " Number " + str(counter))
if line == static:
print("Success")
flag = 1
break
if flag == 0:
print("Unsuccessful")
Your code attempts to read lines until a hit is found, but it doesn’t test whether the end of the file is reached.
Rewrite your code as follows to stop at the end of the file:
found = False
for line in f:
if line.strip() == static:
found = True
break
This code is omitting the counter, but it could be added back in trivially:
for counter, line in enumerate(f, 1):
line = line.strip()
print(f'Trying {line} Number {counter}')
if line == static:
found = True
break
If you have a single blank line, readline() will actually return "\n" rather than an empty string "". Thus it is safe to do this:
line = f.readline()
if not line:
break
Since bool('\n') is True. No blank lines will be skipped.
Instead of checking for 1 single empty line, check for multiple single lines. You can do this by setting another counter like this
emptyLineCounter = 0
while True:
if line == '': #Because it has been stripped,there will be no extra empty spaces
emptyLineCounter+=1
if emptyLineCounter==2: #Or any number of lines you want it to be
break
else:
emptyLineCounter = 0 #Resetting it to zero if there is text in the line

Need to count how many times "AGAT" "AATG" and "TATC" repeats in .txt file that has a DNA sequence

This is my first coding class and I'm having trouble getting the counter to increase every time one of the given appears in the DNA sequence.
My code so far:
agat_Counter = 0
aatg_Counter= 0
tatc_Counter= 0
DNAsample = open('DNA SEQUENCE FILE.txt', 'r');
for lines in DNAsample:
if lines in DNAsample=='AGAT':
agat_Counter+=1
else:
agat_Counter+=0
print(agat_Counter)
for lines in DNAsample:
if lines in DNAsample=='AATG':
aatg_Counter+=1
else:
aatg_Counter+=0
print(aatg_Counter)
for lines in DNAsample:
if lines in DNAsample=='TATC':
tatc_Counter+=0
else:
tatc_Counter+=0
print(tatc_Counter)
You can do this with many ways. One of the more simple is the following:
DNAsample = open('DNA SEQUENCE FILE.txt', 'r').read()
agat_Counter = DNAsample.count('AGAT')
aatg_Counter= DNAsample.count('AATG')
tatc_Counter= DNAsample.count('TATC')
This should work. The issue is with your if statements. as well as once you iterate through the file once, the file pointer is at the end (I think) so it won't go back through. The code below iterates through each line one at a time and compares the string to the 4 character sequence, note that the .strip() removes the trailing \n and or \r characters that are in the line variable as the file is iterated through.
In general, when opening files it is best to use with open(filename, mode) as var: as shown below this handles closing the file once it is done and elminates the risk of un-closed file handles.
Assumption based on original code is that the DNA SEQUENCE FILE.txt file is organized as such:
AGAT
AATG
...
agat_Counter = 0
aatg_Counter= 0
tatc_Counter= 0
with open('DNA SEQUENCE FILE.txt', 'r') as DNAample:
for line in DNAsample:
strippedLine = line.strip()
if strippedLine == 'AGAT':
agat_Counter += 1
elif strippedLine == 'AATG':
aatg_Counter += 1
elif stripepdLine == 'TATC':
tatc_Counter += 1
print(agat_Counter)
print(aatg_Counter)
print(tatc_Counter)

How to delete a line from a file by index?

Pretty self explanatory. This is my text file:
C:\Windows\Users\Public\Documents\
C:\Program Files\Text Editor\
So, I have code which prompts the user to input the number of the line he wants to delete. But how do I delete the line which corresponds to the number?
EDIT :
To the person asking code:
# Viewing presets
if pathInput.lower() == 'view':
# Prints the lines in a numbered, vertical list.
numLines = 1
numbered = ''
for i in lines:
i = str(numLines) + '. ' + i
numbered += i
print (i)
numLines += 1
viewSelection = input('\n^ Avaiable Paths ^\nInput the preset\'s number you want to perform an action on.\n')
for i in numbered:
if viewSelection in i:
viewAction = input('\nInput action for this preset.\nOptions: "Delete"')
if viewAction.lower() == 'delete':
I simply want a way to delete a line by it's number in a file.
A simple approach would be to read the lines into a list, update the list, and write it back to the same file. Something like this:
with open("file.txt", "r+") as f:
lines = f.readlines()
del lines[linenum] # use linenum - 1 if linenum starts from 1
f.seek(0)
f.truncate()
f.writelines(lines)

Deleting n number of lines after specific line of file in python

I am trying to remove a specific number of lines from a file. These lines always occur after a specific comment line. Anyways, talk is cheap, here is an example of what I have.
FILE: --
randomstuff
randomstuff2
randomstuff3
# my comment
extrastuff
randomstuff2
extrastuff2
#some other comment
randomstuff4
So, I am trying to remove the section after # my comment. Perhaps there is someway to delete a line in r+ mode?
Here is what I have so far
with open(file_name, 'a+') as f:
for line in f:
if line == my_comment_text:
f.seek(len(my_comment_text)*-1, 1) # move cursor back to beginning of line
counter = 4
if counter > 0:
del(line) # is there a way to do this?
Not exactly sure how to do this. How do I remove a specific line? I have looked at this possible dup and can't quite figure out how to do it that way either. The answer recommends you read the file, then you re-write it. The problem with this is they are checking for a specific line when they write. I cant do that exactly, plus I dont like the idea of storing the entire files contents in memory. That would eat up a lot of memory with a large file (since every line has to be stored, rather than one at a time).
Any ideas?
You can use the fileinput module for this and open the file in inplace=True mode to allow in-place modification:
import fileinput
counter = 0
for line in fileinput.input('inp.txt', inplace=True):
if not counter:
if line.startswith('# my comment'):
counter = 4
else:
print line,
else:
counter -= 1
Edit per your comment "Or until a blank line is found":
import fileinput
ignore = False
for line in fileinput.input('inp.txt', inplace=True):
if not ignore:
if line.startswith('# my comment'):
ignore = True
else:
print line,
if ignore and line.isspace():
ignore = False
You can make a small modification to your code and stream the content from one file to the other very easily.
with open(file_name, 'r') as f:
with open(second_file_name,'w') a t:
counter = 0
for line in f:
if line == my_comment_text:
counter = 3
elif: counter > 0
counter -= 1
else:
w.write(line)
I like the answer form #Ashwini. I was working on the solution also and something like this should work if you are OK to write a new file with filtered lines:
def rewriteByRemovingSomeLines(inputFile, outputFile):
unDesiredLines = []
count = 0
skipping = False
fhIn = open(inputFile, 'r')
line = fhIn.readline()
while(line):
if line.startswith('#I'):
unDesiredLines.append(count)
skipping = True
while (skipping):
line = fhIn.readline()
count = count + 1
if (line == '\n' or line.startswith('#')):
skipping=False
else:
unDesiredLines.append(count)
count = count + 1
line = fhIn.readline()
fhIn.close()
fhIn = open(inputFile, 'r')
count = 0
#Write the desired lines to a new file
fhOut = open(outputFile, 'w')
for line in fhIn:
if not (count in unDesiredLines):
fhOut.write(line)
count = count + 1
fhIn.close()
fhOut.close

Update iteration value in Python for loop

Pretty new to Python and have been writing up a script to pick out certain lines of a basic log file
Basically the function searches lines of the file and when it finds one I want to output to a separate file, adds it into a list, then also adds the next five lines following that. This then gets output to a separate file at the end in a different funcition.
What I've been trying to do following that is jump the loop to continue on from the last of those five lines, rather than going over them again. I thought the last line in the code would solved the problem, but unfortunately not.
Are there any recommended variations of a for loop I could use for this purpose?
def readSingleDayLogs(aDir):
print 'Processing files in ' + str(aDir) + '\n'
lineNumber = 0
try:
open_aDirFile = open(aDir) #open the log file
for aLine in open_aDirFile: #total the num. lines in file
lineNumber = lineNumber + 1
lowerBound = 0
for lineIDX in range(lowerBound, lineNumber):
currentLine = linecache.getline(aDir, lineIDX)
if (bunch of logic conditions):
issueList.append(currentLine)
for extraLineIDX in range(1, 6): #loop over the next five lines of the error and append to issue list
extraLine = linecache.getline(aDir, lineIDX+ extraLineIDX) #get the x extra line after problem line
issueList.append(extraLine)
issueList.append('\n\n')
lowerBound = lineIDX
You should use a while loop :
line = lowerBound
while line < lineNumber:
...
if conditions:
...
for lineIDX in range(line, line+6):
...
line = line + 6
else:
line = line + 1
A for-loop uses an iterator over the range, so you can have the ability to change the loop variable.
Consider using a while-loop instead. That way, you can update the line index directly.
I would look at something like:
from itertools import islice
with open('somefile') as fin:
line_count = 0
my_lines = []
for line in fin:
line_count += 1
if some_logic(line):
my_lines.append(line)
next_5 = list(islice(fin, 5))
line_count += len(next_5)
my_lines.extend(next_5)
This way, by using islice on the input, you're able to move the iterator ahead and resume after the 5 lines (perhaps fewer if near the end of the file) are exhausted.
This is based on if I'm understanding correctly that you can read forward through the file, identify a line, and only want a fixed number of lines after that point, then resume looping as per normal. (You may not even require the line counting if that's all you're after as it only appears to be for the getline and not any other purpose).
If you indeed you want to take the next 5, and still consider the following line, you can use itertools.tee to branch at the point of the faulty line, and islice that and let the fin iterator resume on the next line.

Categories

Resources