Read N lines until EOF in Python 3 - python

Hi tried several solutions found on SO but I am missing some info.
I want to read 4 lines at once until I hit EOF. I know how to do it in other languages, but what is the best approach in Python 3?
This is what I have, lines is always the first 4 lines and the code stops afterwards (I know, because the comprehension only gives me the first 4 elements of all_lines. I could use some kind of counter and break and so on, but that seems rather cheap to me.
if os.path.isfile(myfile):
with open(myfile, 'r') as fo:
all_lines = fo.readlines()
for lines in all_lines[:4]:
print(lines)
I want to handle 4 lines at once until I hit EOF. The file I am working with is rather short, maybe about 100 lines MAX

If you want to iterate the lines in chunks of 4, you can do something like this:
if os.path.isfile(myfile):
with open(myfile, 'r') as fo:
all_lines = fo.readlines()
for i in range(0, len(all_lines), 4):
print(all_lines[i:i+4])

Instead of reading in the whole file and then looping over the lines four at a time, you can simply read them in four at a time. Consider
def fun(myfile):
if not os.path.isfile(myfile):
return
with open(myfile, 'r') as fo:
while True:
for line in (fo.readline() for _ in range(4)):
if not line:
return
print(line)
Here, a generator expression is used to read four lines, which is embedded in an "infinite" loop, which stop when line is falsy (the empty str ''), which only happens when we have reached EOF.

Related

For loop only returns first line of textfile

I am trying to make a program where i have to check if certain numbers are in use in a text file. The problem is that my for loop only loops trough the first line, instead of every line. How can i solve this? I've already used readlines() but that has not worked for me. This is the code and i've got a text file with: 1;, 2; and 3;, each on a seporated line. Hope someone can help!
if int(keuze) == 2:
def new_safe():
with open('fa_kluizen.txt', 'r') as f:
for number in f:
return number
print(new_safe())
My text File:
# TextFile
1;
2;
3;
You are returning too early (at first iteration).
You can read all lines in a list while cleaning the data and then return that list.
with open('fa_kluizen.txt', 'r') as f:
data = [line.strip() for line in f]
return data
Also most of the time its bad to create a function inside an if-statement.
Maybe you can add a little bit more information about what you want to achieve.
you are returning the first line you encounter, and by doing so, the program exits the current function and of course, the loop.
One way to do it is:
def new_safe():
with open('fa_kluizen.txt', 'r') as f:
return f.read().splitlines()
Which returns a each line as a list of the strings.
Output:
['1;', '2;', '3;']
That's beacause "return numbrer", Try
if int(keuze) == 2:
def new_safe():
my_list = []
with open('fa_kluizen.txt', 'r') as f:
for number in f:
my_list.append(number)
return my_list

Nested loop fail in python

Sorry for daft newbie question but my nested loops wont work. It returns only the first iteration. What have I missed??
I'm trying to grep for a multiple strings in my main file. I think I messed up the indentation but all the variations I try return errors.
f = open('GRCh37_genes_all_mod.txt', 'rU') # main search file
f1 = open('genes_regions_out.txt', 'a') #out file
f2 = open('gene_list.txt', 'r') # search list
for gene in f2:
for line in f:
if gene in line:
print line
f1.write(line)
You can only iterate through a file once. After the first time through f, the next time you try and run for line in f, you won't get any content.
If you want to iterate through a file's content multiple times, you can put that content into a list.
with open('GRCh37_genes_all_mod.txt', 'rU') as f:
contents = list(f)
with open('gene_list.txt', 'r') as f:
genes = list(f)
for gene in genes:
for line in contents:
...
After the first iteration, the file pointer is at the end of the file and the iterator is exhausted (calls to next(f) will raise StopIteration).
The simplest solution for this case is to reset the file pointer using f.seek(0):
for gene in f2:
f.seek(0)
for line in f:
# ...
For other iterables (that might not be 'resetable'), if you know how many 'copies' you need, you can use itertools.tee(), or, if you know the iterable is finite (some iterable are infinite) and all it's content will fit in memory, you can just make a list of it as explained by Khelwood.

Python, Extracting 3 lines before and after a match

I am trying to figure out how to extract 3 lines before and after a matched word.
At the moment, my word is found. I wrote up some text to test my code. And, I figured out how to print three lines after my match.
But, I am having difficulty trying to figure out how to print three lines before the word, "secure".
Here is what I have so far:
from itertools import islice
with open("testdoc.txt", "r") as f:
for line in f:
if "secure" in line:
print("".join(line))
print ("".join(islice(f,3)))
Here is the text I created for testing:
----------------------------
This is a test to see
if i can extract information
using this code
I hope, I try,
maybe secure shell will save thee
Im adding extra lines to see my output
hoping that it comes out correctly
boy im tired, sleep is nice
until then, time will suffice
You need to buffer your lines so you can recall them. The simplest way is to just load all the lines into a list:
with open("testdoc.txt", "r") as f:
lines = f.readlines() # read all lines into a list
for index, line in enumerate(lines): # enumerate the list and loop through it
if "secure" in line: # check if the current line has your substring
print(line.rstrip()) # print the current line (stripped off whitespace)
print("".join(lines[max(0,index-3):index])) # print three lines preceeding it
But if you need maximum storage efficiency you can use a buffer to store the last 3 lines as you loop over the file line by line. A collections.deque is ideal for that.
i came up with this solution, just adding the previous lines in a list, and deleting the first one after 4 elements
from itertools import islice
with open("testdoc.txt", "r") as f:
linesBefore = list()
for line in f:
linesBefore.append(line.rstrip())
if len(linesBefore) > 4: #Adding up to 4 lines
linesBefore.pop(0)
if "secure" in line:
if len(linesBefore) == 4: # if there are at least 3 lines before the match
for i in range(3):
print(linesBefore[i])
else: #if there are less than 3 lines before the match
print(''.join(linesBefore))
print("".join(line.rstrip()))
print ("".join(islice(f,3)))

Improving the speed of a python script

I have an input file with containing a list of strings.
I am iterating through every fourth line starting on line two.
From each of these lines I make a new string from the first and last 6 characters and put this in an output file only if that new string is unique.
The code I wrote to do this works, but I am working with very large deep sequencing files, and has been running for a day and has not made much progress. So I'm looking for any suggestions to make this much faster if possible. Thanks.
def method():
target = open(output_file, 'w')
with open(input_file, 'r') as f:
lineCharsList = []
for line in f:
#Make string from first and last 6 characters of a line
lineChars = line[0:6]+line[145:151]
if not (lineChars in lineCharsList):
lineCharsList.append(lineChars)
target.write(lineChars + '\n') #If string is unique, write to output file
for skip in range(3): #Used to step through four lines at a time
try:
check = line #Check for additional lines in file
next(f)
except StopIteration:
break
target.close()
Try defining lineCharsList as a set instead of a list:
lineCharsList = set()
...
lineCharsList.add(lineChars)
That'll improve the performance of the in operator. Also, if memory isn't a problem at all, you might want to accumulate all the output in a list and write it all at the end, instead of performing multiple write() operations.
You can use https://docs.python.org/2/library/itertools.html#itertools.islice:
import itertools
def method():
with open(input_file, 'r') as inf, open(output_file, 'w') as ouf:
seen = set()
for line in itertools.islice(inf, None, None, 4):
s = line[:6]+line[-6:]
if s not in seen:
seen.add(s)
ouf.write("{}\n".format(s))
Besides using set as Oscar suggested, you can also use islice to skip lines rather than use a for loop.
As stated in this post, islice preprocesses the iterator in C, so it should be much faster than using a plain vanilla python for loop.
Try replacing
lineChars = line[0:6]+line[145:151]
with
lineChars = ''.join([line[0:6], line[145:151]])
as it can be more efficient, depending on the circumstances.

How to only read lines in a text file after a certain string?

I'd like to read to a dictionary all of the lines in a text file that come after a particular string. I'd like to do this over thousands of text files.
I'm able to identify and print out the particular string ('Abstract') using the following code (gotten from this answer):
for files in filepath:
with open(files, 'r') as f:
for line in f:
if 'Abstract' in line:
print line;
But how do I tell Python to start reading the lines that only come after the string?
Just start another loop when you reach the line you want to start from:
for files in filepath:
with open(files, 'r') as f:
for line in f:
if 'Abstract' in line:
for line in f: # now you are at the lines you want
# do work
A file object is its own iterator, so when we reach the line with 'Abstract' in it we continue our iteration from that line until we have consumed the iterator.
A simple example:
gen = (n for n in xrange(8))
for x in gen:
if x == 3:
print('Starting second loop')
for x in gen:
print('In second loop', x)
else:
print('In first loop', x)
Produces:
In first loop 0
In first loop 1
In first loop 2
Starting second loop
In second loop 4
In second loop 5
In second loop 6
In second loop 7
You can also use itertools.dropwhile to consume the lines up to the point you want:
from itertools import dropwhile
for files in filepath:
with open(files, 'r') as f:
dropped = dropwhile(lambda _line: 'Abstract' not in _line, f)
next(dropped, '')
for line in dropped:
print(line)
Use a boolean to ignore lines up to that point:
found_abstract = False
for files in filepath:
with open(files, 'r') as f:
for line in f:
if 'Abstract' in line:
found_abstract = True
if found_abstract:
#do whatever you want
You can use itertools.dropwhile and itertools.islice here, a pseudo-example:
from itertools import dropwhile, islice
for fname in filepaths:
with open(fname) as fin:
start_at = dropwhile(lambda L: 'Abstract' not in L.split(), fin)
for line in islice(start_at, 1, None): # ignore the line still with Abstract in
print line
To me, the following code is easier to understand.
with open(file_name, 'r') as f:
while not 'Abstract' in next(f):
pass
for line in f:
#line will be now the next line after the one that contains 'Abstract'
Just to clarify, your code already "reads" all the lines. To start "paying attention" to lines after a certain point, you can just set a boolean flag to indicate whether or not lines should be ignored, and check it at each line.
pay_attention = False
for line in f:
if pay_attention:
print line
else: # We haven't found our trigger yet; see if it's in this line
if 'Abstract' in line:
pay_attention = True
If you don't mind a little more rearranging of your code, you can also use two partial loops instead: one loop that terminates once you've found your trigger phrase ('Abstract'), and one that reads all following lines. This approach is a little cleaner (and a very tiny bit faster).
for skippable_line in f: # First skim over all lines until we find 'Abstract'.
if 'Abstract' in skippable_line:
break
for line in f: # The file's iterator starts up again right where we left it.
print line
The reason this works is that the file object returned by open behaves like a generator, rather than, say, a list: it only produces values as they are requested. So when the first loop stops, the file is left with its internal position set at the beginning of the first "unread" line. This means that when you enter the second loop, the first line you see is the first line after the one that triggered the break.
Making a guess as to how the dictionary is involved, I'd write it this way:
lines = dict()
for filename in filepath:
with open(filename, 'r') as f:
for line in f:
if 'Abstract' in line:
break
lines[filename] = tuple(f)
So for each file, your dictionary contains a tuple of lines.
This works because the loop reads up to and including the line you identify, leaving the remaining lines in the file ready to be read from f.

Categories

Resources