actually I read a file like this:
f = open("myfile.txt")
for line in f:
#do s.th. with the line
what do I need to do to start reading not at the first line, but at the X line? (e.g. the 5.)
Using itertools.islice you can specify start, stop and step if needs be and apply that to your input file...
from itertools import islice
with open('yourfile') as fin:
for line in islice(fin, 5, None):
pass
An opened file object f is an iterator. Read (and throw away) the first four lines and then go on with regular reading:
with open("myfile.txt", 'r') as f:
for i in xrange(4):
next(f, None)
for line in f:
#do s.th. with the line
Related
I`m reading a log file which has several lines and write it to a text file.
the output is written to the file as one long line : ['\Bla Bla Bla\n Bla Bla Bla\n']
In original file it looks OK line by line. The reading is done like that:
text_file = open('my_file.log', 'rb')
lines = text_file.readlines()
lines = lines[100:150] #line numbers range
x = open(r"LocalLogFile.txt", "a")
x.write(lines)
Any ideas how to solve it so it will look like original file?
There are several problems here:
Opening a text file in binary mode
Reading the entire file into memory
Trying to write a list to the new file instead of a number of individual lines.
Use itertools.islice to create a new iterator that skips the first 100 lines, then yields the next 400 lines, which you can iterate over using an ordinary for loop to write to the output file.
from itertools import islice
with open('my_file.log', 'r') as f:
with open('LocalLogFile.txt', 'a') as x:
for line in islice(f, 100, 150):
print(line, file=x, end='')
Some might prefer a single with statement:
with open('my_file.log', 'r') as f, open('LocalLogFile.txt', 'a') as x:
for line in islice(f, 100, 150):
print(line, file=x, end='')
or
with open('my_file.log', 'r') as f, \
open('LocalLogFile.txt', 'a') as x:
for line in islice(f, 100, 150):
print(line, file=x, end='')
Python 3.10 will make it possible to parenthesize multiple context managers, allowing you to use multiple lines without explicit line continuation:
with (
open('my_file.log', 'r') as f,
open('LocalLogFile.txt', 'a') as x
):
for line in islice(f, 100, 150):
print(line, file=x, end='')
You'd need to write out each line, not the Python representation of a list.
Also, use with to manage files.
with open('my_file.log', 'rb') as text_file:
lines = list(text_file) # same as readlines
lines = lines[100:150]
with open("LocalLogFile.txt", "a") as x:
x.write("".join(lines))
Although other's answer valid a proper way to do this is
with open('my_file.log', 'r') as f1:
# f is an iterator
# so instead of reading the whole file you can
# go to the 100th line
# suppose you you file has thousands of line
for _ in range(100):
next(f1)
# now that you are at 100th line
with open('LocalLogFile.txt', 'a') as f2:
for i in range(50):
f2.write(f1.readline())
# no need to do next(f1 since f1.readline() will call next automatically)
x.write(''.join(lines))
you are currently writing it as a representation of an array... but you need to write it as a string
I have a problem with a code in python. I want to read a .txt file. I use the code:
f = open('test.txt', 'r') # We need to re-open the file
data = f.read()
print(data)
I would like to read ONLY the first line from this .txt file. I use
f = open('test.txt', 'r') # We need to re-open the file
data = f.readline(1)
print(data)
But I am seeing that in screen only the first letter of the line is showing.
Could you help me in order to read all the letters of the line ? (I mean to read whole the line of the .txt file)
with open("file.txt") as f:
print(f.readline())
This will open the file using with context block (which will close the file automatically when we are done with it), and read the first line, this will be the same as:
f = open(“file.txt”)
print(f.readline())
f.close()
Your attempt with f.readline(1) won’t work because it the argument is meant for how many characters to print in the file, therefore it will only print the first character.
Second method:
with open("file.txt") as f:
print(f.readlines()[0])
Or you could also do the above which will get a list of lines and print only the first line.
To read the fifth line, use
with open("file.txt") as f:
print(f.readlines()[4])
Or:
with open("file.txt") as f:
lines = []
lines += f.readline()
lines += f.readline()
lines += f.readline()
lines += f.readline()
lines += f.readline()
print(lines[-1])
The -1 represents the last item of the list
Learn more:
with statement
files in python
readline method
Your first try is almost there, you should have done the following:
f = open('my_file.txt', 'r')
line = f.readline()
print(line)
f.close()
A safer approach to read file is:
with open('my_file.txt', 'r') as f:
print(f.readline())
Both ways will print only the first line.
Your error was that you passed 1 to readline which means you want to read size of 1, which is only a single character. please refer to https://www.w3schools.com/python/ref_file_readline.asp
I tried this and it works, after your suggestions:
f = open('test.txt', 'r')
data = f.readlines()[1]
print(data)
Use with open(...) instead:
with open("test.txt") as file:
line = file.readline()
print(line)
Keep f.readline() without parameters.
It will return you first line as a string and move cursor to second line.
Next time you use f.readline() it will return second line and move cursor to the next, etc...
I have a big files with many lines and want to read the first line first and then loop through all lines starting with the first line again.
I first thought that something like that would do it:
file = open("fileName", 'r')
first_line = file.readline()
DoStuff_1(first_line)
for line in file:
DoStuff_2(line)
file.close()
But this issue with this script is that the first line that is passed to DoStuff_2 is the second line and not the first one. I don't have a good intuition of what kind of object file is. I think it is an iterator and don't really know how to deal with it. The bad solution I found is
file = open("fileName", 'r')
first_line = file.readline()
count = 0
for line in file:
if count == 0:
count = 1
DoStuff_1(first_line)
DoStuff_2(line)
file.close()
But it is pretty dumb and is computationally a bit costly as it runs a if statement at each iteration.
You could do this:
with open('fileName', 'r') as file:
first_line = file.readline()
DoStuff_1(first_line)
DoStuff_2(first_line)
# remaining lines
for line in file:
DoStuff_2(line)
Note that I changed your code to use with so file is automatically closed.
I'd like using a generator to abstract your general control flow. Something like:
def first_and_file(file_obj):
"""
:type file_obj: file
:rtype: (str, __generator[str])
"""
first_line = next(file_obj)
def gen_rest():
yield first_line
yield from file_obj
return first_line, gen_rest()
In Python 2.7, swap out the yield from for:
for line in file_obj:
yield line
Another answer is to just open the file twice.
with open("file.txt", "r") as r:
Do_Stuff1(r.readline())
with open("file.txt", "r") as r:
for line in r:
Do_Stuff2(line)
One of the solutions for a general case of this question is to save the line number on which you are. After completing an operation which requires you to go a previous line relative to the current line, use the line number variable by doing file.seek(0) and then looping over file.readline() the required number of times.
I'd like to read to a dictionary all of the lines in a text file that come after a particular string. I'd like to do this over thousands of text files.
I'm able to identify and print out the particular string ('Abstract') using the following code (gotten from this answer):
for files in filepath:
with open(files, 'r') as f:
for line in f:
if 'Abstract' in line:
print line;
But how do I tell Python to start reading the lines that only come after the string?
Just start another loop when you reach the line you want to start from:
for files in filepath:
with open(files, 'r') as f:
for line in f:
if 'Abstract' in line:
for line in f: # now you are at the lines you want
# do work
A file object is its own iterator, so when we reach the line with 'Abstract' in it we continue our iteration from that line until we have consumed the iterator.
A simple example:
gen = (n for n in xrange(8))
for x in gen:
if x == 3:
print('Starting second loop')
for x in gen:
print('In second loop', x)
else:
print('In first loop', x)
Produces:
In first loop 0
In first loop 1
In first loop 2
Starting second loop
In second loop 4
In second loop 5
In second loop 6
In second loop 7
You can also use itertools.dropwhile to consume the lines up to the point you want:
from itertools import dropwhile
for files in filepath:
with open(files, 'r') as f:
dropped = dropwhile(lambda _line: 'Abstract' not in _line, f)
next(dropped, '')
for line in dropped:
print(line)
Use a boolean to ignore lines up to that point:
found_abstract = False
for files in filepath:
with open(files, 'r') as f:
for line in f:
if 'Abstract' in line:
found_abstract = True
if found_abstract:
#do whatever you want
You can use itertools.dropwhile and itertools.islice here, a pseudo-example:
from itertools import dropwhile, islice
for fname in filepaths:
with open(fname) as fin:
start_at = dropwhile(lambda L: 'Abstract' not in L.split(), fin)
for line in islice(start_at, 1, None): # ignore the line still with Abstract in
print line
To me, the following code is easier to understand.
with open(file_name, 'r') as f:
while not 'Abstract' in next(f):
pass
for line in f:
#line will be now the next line after the one that contains 'Abstract'
Just to clarify, your code already "reads" all the lines. To start "paying attention" to lines after a certain point, you can just set a boolean flag to indicate whether or not lines should be ignored, and check it at each line.
pay_attention = False
for line in f:
if pay_attention:
print line
else: # We haven't found our trigger yet; see if it's in this line
if 'Abstract' in line:
pay_attention = True
If you don't mind a little more rearranging of your code, you can also use two partial loops instead: one loop that terminates once you've found your trigger phrase ('Abstract'), and one that reads all following lines. This approach is a little cleaner (and a very tiny bit faster).
for skippable_line in f: # First skim over all lines until we find 'Abstract'.
if 'Abstract' in skippable_line:
break
for line in f: # The file's iterator starts up again right where we left it.
print line
The reason this works is that the file object returned by open behaves like a generator, rather than, say, a list: it only produces values as they are requested. So when the first loop stops, the file is left with its internal position set at the beginning of the first "unread" line. This means that when you enter the second loop, the first line you see is the first line after the one that triggered the break.
Making a guess as to how the dictionary is involved, I'd write it this way:
lines = dict()
for filename in filepath:
with open(filename, 'r') as f:
for line in f:
if 'Abstract' in line:
break
lines[filename] = tuple(f)
So for each file, your dictionary contains a tuple of lines.
This works because the loop reads up to and including the line you identify, leaving the remaining lines in the file ready to be read from f.
Is there an elegant way of skipping first line of file when using python fileinput module?
I have data file with nicely formated data but the first line is header. Using fileinput I would have to include check and discard line if the line does not seem to contain data.
The problem is that it would apply the same check for the rest of the file.
With read() you can open file, read first line then go to loop over the rest of the file. Is there similar trick with fileinput?
Is there an elegant way to skip processing of the first line?
Example code:
import fileinput
# how to skip first line elegantly?
for line in fileinput.input(["file.dat"]):
data = proces_line(line);
output(data)
lines = iter(fileinput.input(["file.dat"]))
next(lines) # extract and discard first line
for line in lines:
data = proces_line(line)
output(data)
or use the itertools.islice way if you prefer
import itertools
finput = fileinput.input(["file.dat"])
lines = itertools.islice(finput, 1, None) # cuts off first line
dataset = (process_line(line) for line in lines)
results = [output(data) for data in dataset]
Since everything used are generators and iterators, no intermediate list will be built.
The fileinput module contains a bunch of handy functions, one of which seems to do exactly what you're looking for:
for line in fileinput.input(["file.dat"]):
if not fileinput.isfirstline():
data = proces_line(line);
output(data)
fileinput module documentation
It's right in the docs: http://docs.python.org/library/fileinput.html#fileinput.isfirstline
One option is to use openhook:
The openhook, when given, must be a function that takes two arguments,
filename and mode, and returns an accordingly opened file-like object.
You cannot use inplace and openhook together.
One could create helper function skip_header and use it as openhook, something like:
import fileinput
files = ['file_1', 'file_2']
def skip_header(filename, mode):
f = open(filename, mode)
next(f)
return f
for line in fileinput.input(files=files, openhook=skip_header):
# do something
Do two loops where the first one calls break immediately.
with fileinput.input(files=files, mode='rU', inplace=True) as f:
for line in f:
# add print() here if you only want to empty the line
break
for line in f:
process(line)
Lets say you want to remove or empty all of the first 5 lines.
with fileinput.input(files=files, mode='rU', inplace=True) as f:
for line in f:
# add print() here if you only want to empty the first 5 lines
if f._filelineno == 5:
break
for line in f:
process(line)
But if you only want to get rid of the first line, just use next before the loop to remove the first line.
with fileinput.input(files=files, mode='rU', inplace=True) as f:
next(f)
for line in f:
process(line)
with open(file) as j: #open file as j
for i in j.readlines()[1:]: #start reading j from second line.