Pythonic way of looping through a binary file? [duplicate] - python

This question already has answers here:
How to read user input until EOF?
(4 answers)
Closed 6 months ago.
To read some text file, in C or Pascal, I always use the following snippets to read the data until EOF:
while not eof do begin
readline(a);
do_something;
end;
Thus, I wonder how can I do this simple and fast in Python?

Loop over the file to read lines:
with open('somefile') as openfileobject:
for line in openfileobject:
do_something()
File objects are iterable and yield lines until EOF. Using the file object as an iterable uses a buffer to ensure performant reads.
You can do the same with the stdin (no need to use raw_input():
import sys
for line in sys.stdin:
do_something()
To complete the picture, binary reads can be done with:
from functools import partial
with open('somefile', 'rb') as openfileobject:
for chunk in iter(partial(openfileobject.read, 1024), b''):
do_something()
where chunk will contain up to 1024 bytes at a time from the file, and iteration stops when openfileobject.read(1024) starts returning empty byte strings.

You can imitate the C idiom in Python.
To read a buffer up to max_size (>0) number of bytes, you can do this:
with open(filename, 'rb') as f:
while True:
buf = f.read(max_size)
if buf == 0:
break
process(buf)
Or, a text file line by line:
# warning -- not idiomatic Python! See below...
with open(filename, 'rb') as f:
while True:
line = f.readline()
if not line:
break
process(line)
You need to use while True / break construct since there is no eof test in Python other than the lack of bytes returned from a read.
In C, you might have:
while ((ch != '\n') && (ch != EOF)) {
// read the next ch and add to a buffer
// ..
}
However, you cannot have this in Python:
while (line = f.readline()):
# syntax error
because assignments are not allowed in expressions in Python (although recent versions of Python can mimic this using assignment expressions, see below).
It is certainly more idiomatic in Python to do this:
# THIS IS IDIOMATIC Python. Do this:
with open('somefile') as f:
for line in f:
process(line)
Update: Since Python 3.8 you may also use assignment expressions:
while line := f.readline():
process(line)
That works even if the line read is blank and continues until EOF.

The Python idiom for opening a file and reading it line-by-line is:
with open('filename') as f:
for line in f:
do_something(line)
The file will be automatically closed at the end of the above code (the with construct takes care of that).
Finally, it is worth noting that line will preserve the trailing newline. This can be easily removed using:
line = line.rstrip()

You can use below code snippet to read line by line, till end of file
line = obj.readline()
while(line != ''):
# Do Something
line = obj.readline()

While there are suggestions above for "doing it the python way", if one wants to really have a logic based on EOF, then I suppose using exception handling is the way to do it --
try:
line = raw_input()
... whatever needs to be done incase of no EOF ...
except EOFError:
... whatever needs to be done incase of EOF ...
Example:
$ echo test | python -c "while True: print raw_input()"
test
Traceback (most recent call last):
File "<string>", line 1, in <module>
EOFError: EOF when reading a line
Or press Ctrl-Z at a raw_input() prompt (Windows, Ctrl-Z Linux)

In addition to #dawg's great answer, the equivalent solution using walrus operator (Python >= 3.8):
with open(filename, 'rb') as f:
while buf := f.read(max_size):
process(buf)

You can use the following code snippet. readlines() reads in the whole file at once and splits it by line.
line = obj.readlines()

How about this! Make it simple!
for line in open('myfile.txt', 'r'):
print(line)
No need to waste extra lines. And no need to use with keyword because the file will be automatically closed when there is no reference of the file object.

Related

Wrong result with a loop in a loop [duplicate]

This question already has answers here:
Why can't I call read() twice on an open file?
(7 answers)
Closed 7 months ago.
I have a problem with iterating on a file. Here's what I type on the interpreter and the result:
>>> f = open('baby1990.html', 'rU')
>>> for line in f.readlines():
... print(line)
...
# ... all the lines from the file appear here ...
When I try to iterate on the same open file again I get nothing!
>>> for line in f.readlines():
... print(line)
...
>>>
There is no output at all. To solve this I have to close() the file then open it again for reading! Is that normal behavior?
Yes, that is normal behavior. You basically read to the end of the file the first time (you can sort of picture it as reading a tape), so you can't read any more from it unless you reset it, by either using f.seek(0) to reposition to the start of the file, or to close it and then open it again which will start from the beginning of the file.
If you prefer you can use the with syntax instead which will automatically close the file for you.
e.g.,
with open('baby1990.html', 'rU') as f:
for line in f:
print line
once this block is finished executing, the file is automatically closed for you, so you could execute this block repeatedly without explicitly closing the file yourself and read the file this way over again.
As the file object reads the file, it uses a pointer to keep track of where it is. If you read part of the file, then go back to it later it will pick up where you left off. If you read the whole file, and go back to the same file object, it will be like reading an empty file because the pointer is at the end of the file and there is nothing left to read. You can use file.tell() to see where in the file the pointer is and file.seek to set the pointer. For example:
>>> file = open('myfile.txt')
>>> file.tell()
0
>>> file.readline()
'one\n'
>>> file.tell()
4L
>>> file.readline()
'2\n'
>>> file.tell()
6L
>>> file.seek(4)
>>> file.readline()
'2\n'
Also, you should know that file.readlines() reads the whole file and stores it as a list. That's useful to know because you can replace:
for line in file.readlines():
#do stuff
file.seek(0)
for line in file.readlines():
#do more stuff
with:
lines = file.readlines()
for each_line in lines:
#do stuff
for each_line in lines:
#do more stuff
You can also iterate over a file, one line at a time, without holding the whole file in memory (this can be very useful for very large files) by doing:
for line in file:
#do stuff
The file object is a buffer. When you read from the buffer, that portion that you read is consumed (the read position is shifted forward). When you read through the entire file, the read position is at the end of the file (EOF), so it returns nothing because there is nothing left to read.
If you have to reset the read position on a file object for some reason, you can do:
f.seek(0)
Of course.
That is normal and sane behaviour.
Instead of closing and re-opening, you could rewind the file.

truncating a text file does not change the file

When a novice (like me) asks for reading/processing a text file in python he often gets answers like:
with open("input.txt", 'r') as f:
for line in f:
#do your stuff
Now I would like to truncate everything in the file I'm reading after a special line. After modifying the example above I use:
with open("input.txt", 'r+') as file:
for line in file:
print line.rstrip("\n\r") #for debug
if line.rstrip("\n\r")=="CC":
print "truncating!" #for debug
file.truncate();
break;
and expect it to throw away everything after the first "CC" seen. Running this code on input.txt:
AA
CC
DD
the following is printed on the console (as expected):
AA
CC
truncating!
but the file "input.txt" stays unchanged!?!?
How can that be? What I'm doing wrong?
Edit: After the operation I want the file to contain:
AA
CC
It looks like you're falling victim to a read-ahead buffer used internally by Python. From the documentation for the file.next() method:
A file object is its own iterator, for example iter(f) returns f (unless f is closed). When a file is used as an iterator, typically in a for loop (for example, for line in f: print line.strip()), the next() method is called repeatedly. This method returns the next input line, or raises StopIteration when EOF is hit when the file is open for reading (behavior is undefined when the file is open for writing). In order to make a for loop the most efficient way of looping over the lines of a file (a very common operation), the next() method uses a hidden read-ahead buffer. As a consequence of using a read-ahead buffer, combining next() with other file methods (like readline()) does not work right. However, using seek() to reposition the file to an absolute position will flush the read-ahead buffer.
The upshot is that the file's position is not where you would expect it to be when you truncate. One way around this is to use readline to loop over the file, rather than the iterator:
line = file.readline()
while line:
...
line = file.readline()
In addition to glibdud's answer, truncate() needs the size from where it deletes the content. You can get the current position in your file by the tell() command. As he mentioned, by using the for-loop, the next() prohibits commands like tell. But in the suggested while-loop, you can truncate at the current tell()-position. So the complete code would look like this:
Python 3:
with open("test.txt", 'r+') as file:
line = file.readline()
while line:
print(line.strip())
if line.strip() == "CC":
print("truncating")
file.truncate(file.tell())
break
line = file.readline()

Reading the remaining lines of a file after searching for a word in the file

I'm trying to read the rest of a file after finding a word.
I'm trying to write a program that searches for a word in a file and then, when the word was found, it needs to do something with the remaining lines that are below / after the word.
Here's what I have so far but it's not working. Please assist. thanks.
def readFile():
with open(“file.txt”, "r") as file:
for line in file:
if “Hello” in line:
break
nextline = file.readlines()
for line in nextline
print(line)
You can't mix iteration (which basically calls the file.next method in the loop) and readlines.
To quote a great man (and file.next documentation):
In order to make a for loop the most efficient way of looping over the
lines of a file (a very common operation), the next() method uses a
hidden read-ahead buffer. As a consequence of using a read-ahead
buffer, combining next() with other file methods (like readline())
does not work right
You can do fine with using just iteration:
def readFile():
with open("file.txt", "r") as file:
for line in file:
if "Hello" in line:
break
for line in file:
# do something with the line

python loop won't iterate on second pass

When I run the following in the Python IDLE Shell:
f = open(r"H:\Test\test.csv", "rb")
for line in f:
print line
#this works fine
however, when I run the following for a second time:
for line in f:
print line
#this does nothing
This does not work because you've already seeked to the end of the file the first time. You need to rewind (using .seek(0)) or re-open your file.
Some other pointers:
Python has a very good csv module. Do not attempt to implement CSV parsing yourself unless doing so as an educational exercise.
You probably want to open your file in 'rU' mode, not 'rb'. 'rU' is universal newline mode, which will deal with source files coming from platforms with different line endings for you.
Use with when working with file objects, since it will cleanup the handles for you even in the case of errors. Ex:
.
with open(r"H:\Test\test.csv", "rU") as f:
for line in f:
...
You can read the data from the file in a variable, and then you can iterate over this data any no. of times you want to in your script. This is better than doing seek back and forth.
f = open(r"H:\Test\test.csv", "rb")
data = f.readlines()
for line in data:
print line
for line in data:
print line
Output:
# This is test.csv
Line1,This is line 1, there are, some numbers here,321423423
Line2,This is line2 , there are some characters here,sdfdsfdsf
# This is test.csv
Line1,This is line 1, there are, some numbers here,321423423
Line2,This is line2 , there are some characters here,sdfdsfdsf
Because you've gone all the way through the CSV file, and the iterator is exhausted. You'll need to re-open it before the second loop.

Python readline() on the Mac

New to python and trying to learn the ropes of file i/o.
Working with pulling lines from a large (2 million line) file in this format:
56fr4
4543d
4343d
hirh3
I've been reading that readline() is best because it doesn't pull the whole file into memory. But when I try to read the documentation on it, it seems to be Unix only? And I'm on a Mac.
Can I use readline on the Mac without loading the whole file into memory? What would the syntax be to simply readline number 3 in the file? The examples in the docs are a bit over my head.
Edit
Here is the function to return a code:
def getCode(i):
with open("test.txt") as file:
for index, line in enumerate(f):
if index == i:
code = # what does it equal?
break
return code
You don't need readline:
with open("data.txt") as file:
for line in file:
# do stuff with line
This will read the entire file line-by-line, but not all at once (so you don't need all the memory). If you want to abort reading the file, because you found the line you want, use break to terminate the loop. If you know the index of the line you want, use this:
with open("data.txt") as file:
for index, line in enumerate(file):
if index == 2: # looking for third line (0-based indexes)
# do stuff with this line
break # no need to go on
+1 # SpaceC0wb0y
You could also do:
f = open('filepath')
f.readline() # first line - let it pass
f.readline() # second line - let it pass
third_line = f.readline()
f.close()

Categories

Resources