Is there pythonic oneliner to iterate over lines of a file? - python

90% of the time when I read file, it ends up like this:
with open('file.txt') as f:
for line in f:
my_function(line)
This seems to be a very common scenario, so I thought of a shorter way, but is this safe? I mean will the file be closed correctly or do you see any other problems with this approach? :
for line in open('file.txt'):
my_function(line)
Edit: Thanks Eric, this seems to be best solution. Hopefully I don't turn this into discussion with this, but what do you think of this approach for the case when we want to use line in several operations (not just as argument for my_function):
def line_generator(filename):
with open(filename) as f:
for line in f:
yield line
and then using:
for line in line_generator('groceries.txt'):
print line
grocery_list += [line]
Does this function have disadvantages over iterate_over_file?

If you need this often, you could always define :
def iterate_over_file(filename, func):
with open(filename) as f:
for line in f:
func(line)
def my_function(line):
print line,
Your pythonic one-liner is now :
iterate_over_file('file.txt', my_function)

using a context manager is the best way, and that pretty much bars the way to your one-liner solution. If you naively want to create a one-liner you get:
with open('file.txt') as f: for line in f: my_function(line) # wrong code!!
which is invalid syntax.
So if you badly want a one-liner you could do
with open('file.txt') as f: [my_function(line) for line in f]
but that's bad practice since you're creating a list comprehension only for the side effect (you don't care about the return of my_function).
Another approach would be
with open('file.txt') as f: collections.deque((my_function(line) for line in f), maxlen=0)
so no list comprehension is created, and you force consumption of the iterator using a itertools recipe (0-size deque: no memory allocated either)
Conclusion:
to reach the "pythonic/one-liner" goal, we sacrificed readability.
Sometimes the best approach doesn't hold in one line, period.

Building upon the approach by Eric, you could also make it a bit more generic by just writing a function that uses with to open the file and then just returns the file. This, however:
def with_open(filename):
with open(filename) as f:
return f # won't work!
does not work, as the file f will already be closed by with when returned by the function. Instead, you can make it a generator function, and yield the individual lines:
def with_open(filename):
with open(filename) as f:
for line in f:
yield line
or shorter, with newer versions of Python:
def with_open(filename):
with open(filename) as f:
yield from f
And use it like this:
for line in with_open("test.txt"):
print line
or this:
nums = [int(n) for n in with_open("test.txt")]

Related

Return with argument inside generator

I know there's a similar question already asked, but doesn't answer what I need, as mine is a little different.
My code:
def tFileRead(fileName, JSON=False):
with open(fileName) as f:
if JSON:
return json.load(f)
for line in f:
yield line.rstrip('\n')
What I want to do:
if JSON is true, it means its reading from a json file and I want to return json.load(f), otherwise, I want to yield the lines of the file into a generator.
I've tried the alternative of converting the generator into json, but that got very messy, very fast, and doesn't work very well.
The first solution that came to my mind was to explicitly return a generator object which would provide the exact behavior you tried to achieve.
The problem is: if you explicitly returned a generator object like this return (line.rstrip('\n') for line in f) the file would be closed after returning and any further reading from the file would raise an exception.
You should write two functions here: one that reads a json file and one for the normal file. Then you can write a wrapper that desides on an argument which of these two functions to call.
Or just move the iteration part into another function like this:
def iterate_file(file_name):
with open(file_name) as fin:
for line in fin:
yield line.rstrip("\n")
def file_read(file_name, as_json=False):
if as_json:
with open(file_name) as fin:
return json.load(fin)
else:
return iterate_file(file_name)
You could yield from the dictionary loaded with JSON, thus iterating over the key-value-pairs in the dict, but this would not be your desired behaviour.
def tFileRead(fileName, JSON=False):
with open(fileName) as f:
if JSON:
yield from json.load(f).items() # works, but differently
for line in f:
yield line.rstrip('\n')
It would be nice if you could just return a generator, but this will not work, since using with, the file is closed as soon as the function returns, i.e. before the generator is consumed.
def tFileRead(fileName, JSON=False):
with open(fileName) as f:
if JSON:
return json.load(f)
else:
return (line.rstrip('\n') for line in f) # won't work
Alternatively, you could define another function just for yielding the lines from the file and use that in the generator:
def tFileRead(fileName, JSON=False):
if JSON:
with open(fileName) as f:
return json.load(f)
else:
def withopen(fileName):
with open(fileName) as f:
yield from f
return (line.rstrip('\n') for line in withopen(fileName))
But once you are there, you can really just use two separate functions for reading the file en-block as JSON or for iterating the lines...

converting string to integer from file

I am asked to read lines from one file and print every other line from that file to another file. This is what i have so far.
my_file = open("thisFile.txt", "r")
out_file = open("thatFile.txt", "w")
for line in my_file:
line = int(line)
if line%2 !=0:
print(line,file=out_file)
my_file.close()
out_file.close()
from itertools import islice
with open('input') as fin, open('output', 'w') as fout:
fout.writelines(islice(fin, None, None, 2)
This is a relatively simple task - your main issue is the use of int() - you are trying to convert the entire line to a number, which is then failing. What you want to do is keep track the number of the line you are on. The easiest way to do this is with the enumerate() builtin.
with open("thisFile.txt") as in_file, open("thatFile.txt") as out_file:
for line_no, line in enumerate(in_file, start=1):
if line_no % 2:
out_file.write(line)
Note the use of the with statement here too - this automatically closes files, and will do so even if exceptions occur, it's good practice to use it. It's also significantly more readable.
The minimal change to your program to make it do what you want, bringing in no new concepts, is this:
my_file = open("thisFile.txt", "r")
out_file = open("thatFile.txt", "w")
i = 1
for line in my_file:
if i % 2 != 0:
print(line, file=out_file)
i += 1
my_file.close()
out_file.close()
Your mistake was simply that you thought int(line) would give you the line number. What that actually does is attempt to interpret the text of each line as an integer, which is not what you want.
Now, a great deal of Python's virtue lies in its enormous library of helpful functions. The above works, but is not taking proper advantage of what the language has to offer. The first thing I would change is to use enumerate rather than an explicit counter, the second thing I would change is to use the write method of files rather than print, and the third thing I would change is to take advantage of the "nonzero is true" rule:
my_file = open("thisFile.txt", "r")
out_file = open("thatFile.txt", "w")
for i, line in enumerate(my_file, start=1):
if i % 2:
out_file.write(line)
my_file.close()
out_file.close()
I said that this works, but really it only works unless something goes wrong. If anything throws an exception (for instance, if an I/O error occurs) the file objects don't get closed.1 The with statement tells Python to make sure that the files get closed, whether or not an exception happens; this is conveniently also less typing.
with open("thisFile.txt", "r") as my_file,
open("thatFile.txt", "w") as out_file:
for i, line in enumerate(my_file, start=1):
if i % 2:
out_file.write(line)
You may notice that this is the same as Lattyware's answer. (Jon Clement's answer is maybe a little too clever; he appears to be playing golf.)
1 In this simple piece of code, if an exception happens CPython's garbage collector will notice that the file objects are no longer accessible and will close them for you, but this isn't something to rely on as your programs get more compicated.

How to read from a file using only readline()

The code i have right now is this
f = open(SINGLE_FILENAME)
lines = [i for i in f.readlines()]
but my proffessor demands that
You may use readline(). You may not use read(), readlines() or iterate over the open file using for.
any suggestions?
thanks
You could use a two-argument iter() version:
lines = iter(f.readline, "")
If you need a list of lines:
lines = list(lines)
First draft:
lines = []
with open(SINGLE_FILENAME) as f:
while True:
line = f.readline()
if line:
lines.append(line)
else:
break
I feel fairly certain there is a better way to do it, but that does avoid iterating with for, using read, or using readlines.
You could write a generator function to keep calling readline() until the file was empty, but that doesn't really seem like a large improvement here.

Pythonic way to read file line by line?

What's the Pythonic way to go about reading files line by line of the two methods below?
with open('file', 'r') as f:
for line in f:
print line
or
with open('file', 'r') as f:
for line in f.readlines():
print line
Or is there something I'm missing?
File handles are their own iterators (specifically, they implement the iterator protocol) so
with open('file', 'r') as f:
for line in f:
# code
Is the preferred usage. f.readlines() returns a list of lines, which means absorbing the entire file into memory -> generally ill advised, especially for large files.
It should be pointed out that I agree with the sentiment that context managers are worthwhile, and have included one in my code example.
Of the two you presented, the first is recommended practice. As pointed out in the comments, any solution (like that below) which doesn't use a context manager means that the file is left open, which is a bad idea.
Original answer which leaves dangling file handles so shouldn't be followed
However, if you don't need f for any purpose other than reading the lines, you can just do:
for line in open('file', 'r'):
print line
theres' no need for .readlines() method call.
PLUS: About with statement
The execution behavior of with statement is as commented below,
with open("xxx.txt",'r') as f:
// now, f is an opened file in context
for line in f:
// code with line
pass // when control exits *with*, f is closed
print f // if you print, you'll get <closed file 'xxx.txt'>

More pythonic way of skipping header lines

Is there a shorter (perhaps more pythonic) way of opening a text file and reading past the lines that start with a comment character?
In other words, a neater way of doing this
fin = open("data.txt")
line = fin.readline()
while line.startswith("#"):
line = fin.readline()
At this stage in my arc of learning Python, I find this most Pythonic:
def iscomment(s):
return s.startswith('#')
from itertools import dropwhile
with open(filename, 'r') as f:
for line in dropwhile(iscomment, f):
# do something with line
to skip all of the lines at the top of the file starting with #. To skip all lines starting with #:
from itertools import ifilterfalse
with open(filename, 'r') as f:
for line in ifilterfalse(iscomment, f):
# do something with line
That's almost all about readability for me; functionally there's almost no difference between:
for line in ifilterfalse(iscomment, f))
and
for line in (x for x in f if not x.startswith('#'))
Breaking out the test into its own function makes the intent of the code a little clearer; it also means that if your definition of a comment changes you have one place to change it.
for line in open('data.txt'):
if line.startswith('#'):
continue
# work with line
of course, if your commented lines are only at the beginning of the file, you might use some optimisations.
from itertools import dropwhile
for line in dropwhile(lambda line: line.startswith('#'), file('data.txt')):
pass
If you want to filter out all comment lines (not just those at the start of the file):
for line in file("data.txt"):
if not line.startswith("#"):
# process line
If you only want to skip those at the start then see ephemient's answer using itertools.dropwhile
You could use a generator function
def readlines(filename):
fin = open(filename)
for line in fin:
if not line.startswith("#"):
yield line
and use it like
for line in readlines("data.txt"):
# do things
pass
Depending on exactly where the files come from, you may also want to strip() the lines before the startswith() check. I once had to debug a script like that months after it was written because someone put in a couple of space characters before the '#'
As a practical matter if I knew I was dealing with reasonable sized text files (anything which will comfortably fit in memory) then I'd problem go with something like:
f = open("data.txt")
lines = [ x for x in f.readlines() if x[0] != "#" ]
... to snarf in the whole file and filter out all lines that begin with the octothorpe.
As others have pointed out one might want ignore leading whitespace occurring before the octothorpe like so:
lines = [ x for x in f.readlines() if not x.lstrip().startswith("#") ]
I like this for its brevity.
This assumes that we want to strip out all of the comment lines.
We can also "chop" the last characters (almost always newlines) off the end of each using:
lines = [ x[:-1] for x in ... ]
... assuming that we're not worried about the infamously obscure issue of a missing final newline on the last line of the file. (The only time a line from the .readlines() or related file-like object methods might NOT end in a newline is at EOF).
In reasonably recent versions of Python one can "chomp" (only newlines) off the ends of the lines using a conditional expression like so:
lines = [ x[:-1] if x[-1]=='\n' else x for x in ... ]
... which is about as complicated as I'll go with a list comprehension for legibility's sake.
If we were worried about the possibility of an overly large file (or low memory constraints) impacting our performance or stability, and we're using a version of Python that's recent enough to support generator expressions (which are more recent additions to the language than the list comprehensions I've been using here), then we could use:
for line in (x[:-1] if x[-1]=='\n' else x for x in
f.readlines() if x.lstrip().startswith('#')):
# do stuff with each line
... is at the limits of what I'd expect anyone else to parse in one line a year after the code's been checked in.
If the intent is only to skip "header" lines then I think the best approach would be:
f = open('data.txt')
for line in f:
if line.lstrip().startswith('#'):
continue
... and be done with it.
You could make a generator that loops over the file that skips those lines:
fin = open("data.txt")
fileiter = (l for l in fin if not l.startswith('#'))
for line in fileiter:
...
You could do something like
def drop(n, seq):
for i, x in enumerate(seq):
if i >= n:
yield x
And then say
for line in drop(1, file(filename)):
# whatever
I like #iWerner's generator function idea. One small change to his code and it does what the question asked for.
def readlines(filename):
f = open(filename)
# discard first lines that start with '#'
for line in f:
if not line.lstrip().startswith("#"):
break
yield line
for line in f:
yield line
and use it like
for line in readlines("data.txt"):
# do things
pass
But here is a different approach. This is almost very simple. The idea is that we open the file, and get a file object, which we can use as an iterator. Then we pull the lines we don't want out of the iterator, and just return the iterator. This would be ideal if we always knew how many lines to skip. The problem here is we don't know how many lines we need to skip; we just need to pull lines and look at them. And there is no way to put a line back into the iterator, once we have pulled it.
So: open the iterator, pull lines and count how many have the leading '#' character; then use the .seek() method to rewind the file, pull the correct number again, and return the iterator.
One thing I like about this: you get the actual file object back, with all its methods; you can just use this instead of open() and it will work in all cases. I renamed the function to open_my_text() to reflect this.
def open_my_text(filename):
f = open(filename, "rt")
# count number of lines that start with '#'
count = 0
for line in f:
if not line.lstrip().startswith("#"):
break
count += 1
# rewind file, and discard lines counted above
f.seek(0)
for _ in range(count):
f.readline()
# return file object with comment lines pre-skipped
return f
Instead of f.readline() I could have used f.next() (for Python 2.x) or next(f) (for Python 3.x) but I wanted to write it so it was portable to any Python.
EDIT: Okay, I know nobody cares and I"m not getting any upvotes for this, but I have re-written my answer one last time to make it more elegant.
You can't put a line back into an iterator. But, you can open a file twice, and get two iterators; given the way file caching works, the second iterator is almost free. If we imagine a file with a megabyte of '#' lines at the top, this version would greatly outperform the previous version that calls f.seek(0).
def open_my_text(filename):
# open the same file twice to get two file objects
# (We are opening the file read-only so this is safe.)
ftemp = open(filename, "rt")
f = open(filename, "rt")
# use ftemp to look at lines, then discard from f
for line in ftemp:
if not line.lstrip().startswith("#"):
break
f.readline()
# return file object with comment lines pre-skipped
return f
This version is much better than the previous version, and it still returns a full file object with all its methods.

Categories

Resources