Script continues reading from file although file is finished - python

I am creating script which reads from rockyou.txt file and the problem is that when it finishes going through all lines - 1.5M then it continues reading empty lines from the file and i need it to stop.
I can't do a simple if statement to check if the line is empty because in the file there are multiple places where there is a single empty line.
Do you have any ideas how to implement?
Code:
while line != static:
line = f.readline()
line = line.strip()
counter = counter + 1
print("Trying " + line + " Number " + str(counter))
if line == static:
print("Success")
flag = 1
break
if flag == 0:
print("Unsuccessful")

Your code attempts to read lines until a hit is found, but it doesn’t test whether the end of the file is reached.
Rewrite your code as follows to stop at the end of the file:
found = False
for line in f:
if line.strip() == static:
found = True
break
This code is omitting the counter, but it could be added back in trivially:
for counter, line in enumerate(f, 1):
line = line.strip()
print(f'Trying {line} Number {counter}')
if line == static:
found = True
break

If you have a single blank line, readline() will actually return "\n" rather than an empty string "". Thus it is safe to do this:
line = f.readline()
if not line:
break
Since bool('\n') is True. No blank lines will be skipped.

Instead of checking for 1 single empty line, check for multiple single lines. You can do this by setting another counter like this
emptyLineCounter = 0
while True:
if line == '': #Because it has been stripped,there will be no extra empty spaces
emptyLineCounter+=1
if emptyLineCounter==2: #Or any number of lines you want it to be
break
else:
emptyLineCounter = 0 #Resetting it to zero if there is text in the line

Related

Reading a text file from a certain point (python)

I'm trying to make code that can find a specific word in a file and start reading from there until it reads the same word again. In this case the word is "story". The code counts up the lines until the word, and then it starts counting again from 0 in the second loop. I have tried to use functions and global variables, but I keep getting the same number twice and I don't know why.
file = open("testing_area.txt", "r")
line_count = 0
counting = line_count
for line in file.readlines()[counting:]:
if line != "\n":
line_count = line_count + 1
if line.startswith('story'):
#line_count += 1
break
print(line_count)
for line in file.readlines()[counting:]:
if line != "\n":
line_count = line_count + 1
if line.startswith('story'):
#line_count += 1
break
print(line_count)
file.close()
Output:
6
6
Expected output:
6
3
This is the text file:
text
text
text
text
text
story
text
text
story
Code can be simplified to:
with open("testing_area.txt", "r") as file: # Context manager preferred for file open
first, second = None, None # index of first and second occurance of 'story'
for line_count, line in enumerate(file, start = 1): # provides line index and content
if line.startswith('story'): # no need to check separately for blank lines
if first is None:
first = line_count # first is None, so this must be the first
else:
second = line_count # previously found first, so this is the second
break # have now found first & second
print(first, second - first) # index of first occurrence and number of lines between first and second
# Output: 6, 3
There are several issues here. The first is that, for a given file object, readlines() basically only works once. Imagine a text file open in an editor, with a cursor that starts at the beginning. readline() (singular) reads the next line, moving the cursor down one: readlines() (plural) reads all lines from the cursor's current position to the end. Once you've called it once, there are no more lines left to read. You could solve this by putting something like lines = file.readlines() up at the top, and then looping through the resulting list. (See this section in the docs for more info.)
However, you neither reset line_count to 0, nor ever set counting to anything but 0, so the loops still won't do what you intend. You want something more like this:
with open("testing_area.txt") as f:
lines = f.readlines()
first_count = 0
for line in lines:
if line != "\n":
first_count += 1
if line.startswith('story'):
break
print(first_count)
second_count = 0
for line in lines[first_count:]:
if line != "\n":
second_count += 1
if line.startswith('story'):
break
print(second_count)
(This also uses the with keyword, which automatically closes the file even if the program encounters an exception.)
That said, you don't really need two loops in the first place. You're looping through one set of lines, so as long as you reset the line number, you can do it all at once:
line_no = 0
words_found = 0
with open('testing_area.txt') as f:
for line in f:
if line == '\n':
continue
line_no += 1
if line.startswith('story'):
print(line_no)
line_no = 0
words_found += 1
if words_found == 2:
break
(Using if line == '\n': continue is functionally the same as putting the rest of the loop's code inside if line != '\n':, but personally I like avoiding the extra indentation. It's mostly a matter of personal preference.)
As the question doesn't said that it only needs to count the word twice, I provide a solution that will read through the whole file and print every time when "story" found.
# Using with to open file is preferred as file will be properly closed
with open("testing_area.txt") as f:
line_count = 0
for line in f:
line_count += 1
if line.startwith("story"):
print(line_count)
# reset the line_count if "story" found
line_count = 0
Output:
6
3

How to keep track of lines in a file python

I have the following file in python that I'm reading in and I want to keep track if the line is = [FOR_RECORD]. At that point I have a for loop populating an output with the value of [REG_NAME], until I reach the [/FOR_RECORD]. Then I want to go back to the start of the [FOR_RECORD] portion of the file to start populating with the next [REG_NAME]. How can I jump around in a python file like this?
Input file
--
-- generated with parser version 1.09
use ieee.std_logic_arith.all;
package [PKG_FILE]_pkg is
[FOR_RECORD]
constant [REG_NAME]_offset : std_logic_vector := x"[OFFSET]";
[/FOR_RECORD]
type [REG_NAME]_type is record
[FILED_NAME] : std_logic; -- [OFFSET] :
end record [REG_NAME]_type;
Package is [PKG_FILE]
Python code
for line in input_1:
if '[FOR_RECORD]' in line:
# This is where I want to jump to the next line
#So I can evaluate the contents
# I have 4 names in reg_name[i]
#Very important that this is nested in the if statement
for x in range(0,4):
if '[/FOR_RECORD]' in line:
break
if '[REG_NAME]' in line:
line=line.replace('[REG_NAME]',reg_name[i]['name'])
output.write(line)
output.write(line)
You can use tell to find your position in the file and seek to go to a specific position but you also have to use readline function because that for loop reads all of the lines first.
input1 = open('file')
eof = False
while (True):
while (True):
line = input1.readline()
if line == '':
eof = True
break
output.write(line)
if '[FOR_RECORD]' in line:
offset = input1.tell()
break
if eof: break
for i in range(4):
input1.seek(offset)
while (True):
line = input1.readline()
if line == '':
eof = True
break
if '[/FOR_RECORD]' in line:
break
if '[REG_NAME]' in line:
line=line.replace('[REG_NAME]',reg_name[i]['name'])
output.write(line)
if eof: break
The first loop fins the position of [FOR RECORD] line and the second iterates over elements of reg_name.
When you're iterating over a file you can use the next command to advance the iterator (retrieve the next line). So ... something like this probably gets you where you need:
for line in input_1:
if '[FOR_RECORD]' in line:
while '[/FOR_RECORD]' not in line:
line = next(input_1)
# your replacement code here.
This will iterate until it finds your begin tag, then continue to consume lines one by one until it finds your close tag, at which point you'll drop back to the outer for loop.
I would use a mini state machine. If we are between a [FOR_RECORD] and [/FOR_RECORD] lines, we should do replacement, and not if outside. Code could be:
in_record = False
for line in input_1:
if '[FOR_RECORD]' in line:
in_record = True
elif '[/FOR_RECORD]' in line:
in_record = False
elif in_record:
if '[REG_NAME]' in line:
for i in range(4):
output.write(line.replace('[REG_NAME]',
reg_name[i]['name']))
else: output.write(line)
else: output.write(line)

Print lines between two patterns in python

I have a file with the following structure:
#scaffold456
ATGTCGTGTCAGTG
GTACGTGTGTGG
+
!!!!!#!!!!!!!!
!!!!!!!!!!!!
#scaffold342
ATGGTGTCGTGGTG
ACGTGGC
+
!>!>!!!!+!!!!!
!!!!!!!
I would want an output like this:
>scaffold456
ATGTCGTGTCAGTG
GTACGTGTGTGG
>scaffold342
ATGGTGTCGTGGTG
ACGTGGC
I want to achieve this in Python, I started with the following:
fastq_filename = "test_file"
fastq = open(fastq_filename) # fastq is the file object
for line in fastq:
if line.startswith("#"):
print line.replace("#", ">")
but I can't go on anymore as I don't know:
1. How to print lines after a certain pattern match?
2. How I should specify that I want to skip lines between + till the next # sign?
This is a more complex topic in Python which I don't know, any help and explanation would be great, thanks!
fastq_filename = "test_file"
fastq = open(fastq_filename) # fastq is the file object
canPrintLines = False # Boolean state variable to keep track of whether we want to be printing lines or not
for line in fastq:
if line.startswith("#"):
canPrintLines = True # We have found an # so we can start printing lines
line = line.replace("#", ">")
elif line.startswith("+"):
canPrintLines = False # We have found a + so we don't want to print anymore
if canPrintLines:
print(line)
I don't know how complex your lines with the ! can get. I understand your question such that you wish to ignore all + and # signs inside these lines.
In that case I would introduce a state variable that stores whether we are currently working on an interesting line:
interesting_line=True
for line in fastq:
if line.strip()=='+': # Here we check for the + sign. You might need to adapt the test.
interesting_line=False # We don't care from now on
if line.startswith('#'):
interesting_line=True
if interesting_line:
# Do what you want with your line.
As I said, you might need to check if there can be situations where my simple tests don't match but this should give you a starting point
This is an easy way to do it:
for line in fastq:
if line and line[0].isalpha() or line[0]== '#':
line = line.rstrip()
print line.replace("#", ">")
Output:
>scaffold456
ATGTCGTGTCAGTG
GTACGTGTGTGG
>scaffold342
ATGGTGTCGTGGTG
ACGTGGC
for line in fastq:
if line.startswith("#") or line.isalpha():
print(line.replace("#", ">"))
Find the line that starts with # replace that with > and print it.
Then find a line that contains only letters then print that line either.
Below code will
ignore lines start with + or !
replace # with > if line start with #
write all other lines
code
def format_file(path):
new_lines = ""
for line in open(path):
if line.startswith("#"):
new_lines += line.replace("#", ">")
elif line.startswith("+"):
pass
elif line.startswith("!"):
pass
else:
new_lines += line
print new_lines
format_file("test_file")
If I'm interpreting your question correctly then I think this is what you are looking for
for line in fastq:
line = line.replace('\n','')
n = len(line)
mat = re.match(r'([ATGC]){%d}' % n,line)
if mat:
print line
if line[0] == '#':
print line.replace('#','>')
This uses Regular Expressions which are incredibly useful. This says if it is either A,T,G, or C only in a line then print that line and then the other if statement is the same as what you have. {%d} matches n number of occurrences of the previous statement, [ATGC]. If there are more than A,T,G, or C then just add them between the square brackets.

Counting Lines and numbering them

Another question.
This program counts and numbers every line in the code unless it has a hash tag or if the line is empty. I got it to number every line besides the hash tags. How can I stop it from counting empty lines?
def main():
file_Name = input('Enter file you would like to open: ')
infile = open(file_Name, 'r')
contents = infile.readlines()
line_Number = 0
for line in contents:
if '#' in line:
print(line)
if line == '' or line == '\n':
print(line)
else:
line_Number += 1
print(line_Number, line)
infile.close()
main()
You check if line == '' or line == '\n' inside the if clause for '#' in line, where it has no chance to be True.
Basically, you need the if line == '' or line == '\n': line shifted to the left :)
Also, you can combine the two cases, since you perform the same actions:
if '#' in line or not line or line == '\n':
print line
But actually, why would you need printing empty stings or '\n'?
Edit:
If other cases such as line == '\t' should be treated the same way, it's the best to use Tim's advice and do: if '#' in line or not line.strip().
You can skip empty lines by adding the following to the beginning of your for loop:
if not line:
continue
In Python, the empty string evaluates to the boolean value True. In case, that means empty lines are skipped because this if statement is only True when the string is empty.
The statement continue means that the code will continue at the next pass through the loop. It won't execute the code after that statement and this means your code that's counting the lines is skipped.

str.startswith() not working as I intended

I can't see why this won't work. I am performing lstrip() on the string being passed to the function, and trying to see if it starts with """. For some reason, it gets caught in an infinite loop
def find_comment(infile, line):
line_t = line.lstrip()
if not line_t.startswith('"""') and not line_t.startswith('#'):
print (line, end = '')
return line
elif line.lstrip().startswith('"""'):
while True:
if line.rstrip().endswith('"""'):
line = infile.readline()
find_comment(infile, line)
else:
line = infile.readline()
else:
line = infile.readline()
find_comment(infile, line)
And my output:
Enter the file name: test.txt
import re
def count_loc(infile):
Here is the top of the file i am reading in for reference:
import re
def count_loc(infile):
""" Receives a file and then returns the amount
of actual lines of code by not counting commented
or blank lines """
loc = 0
func_records = {}
for line in infile:
(...)
You haven't provided and exit path from the recursive loop. A return statement should do the trick.
(...)
while True:
if line.rstrip().endswith('"""'):
line = infile.readline()
return find_comment(infile, line)
else:
line = infile.readline()
while True is an infinite loop. You need to break once you're done.
not line_t.startswith('"""') or not line_t.startswith('#')
This expression evaluates to True no matter what string line_t denotes. Do you want 'and' instead of 'or'? Your question isn't clear to me.
if not line_t.startswith('"""') or not line_t.startswith('#'):
This if will always be satisfied -- either the line doesn't start with """, or it doesn't start with # (or both). You probably meant to use and where you used or.
As long as lines start or end with a comment, the code below should work.
However, keep in mind that the docstrings can start or end in the middle of a line of code.
Also, you'll need to code for triple single-quotes as well as docstrings assigned to variables which aren't really comments.
Does this get you closer to an answer?
def count_loc(infile):
skipping_comments = False
loc = 0
for line in infile:
# Skip one-liners
if line.strip().startswith("#"): continue
# Toggle multi-line comment finder: on and off
if line.strip().startswith('"""'):
skipping_comments = not skipping_comments
if line.strip().endswith('"""'):
skipping_comments = not skipping_comments
continue
if skipping_comments: continue
print line,

Categories

Resources