Wrong result with a loop in a loop [duplicate] - python

This question already has answers here:
Why can't I call read() twice on an open file?
(7 answers)
Closed 7 months ago.
I have a problem with iterating on a file. Here's what I type on the interpreter and the result:
>>> f = open('baby1990.html', 'rU')
>>> for line in f.readlines():
... print(line)
...
# ... all the lines from the file appear here ...
When I try to iterate on the same open file again I get nothing!
>>> for line in f.readlines():
... print(line)
...
>>>
There is no output at all. To solve this I have to close() the file then open it again for reading! Is that normal behavior?

Yes, that is normal behavior. You basically read to the end of the file the first time (you can sort of picture it as reading a tape), so you can't read any more from it unless you reset it, by either using f.seek(0) to reposition to the start of the file, or to close it and then open it again which will start from the beginning of the file.
If you prefer you can use the with syntax instead which will automatically close the file for you.
e.g.,
with open('baby1990.html', 'rU') as f:
for line in f:
print line
once this block is finished executing, the file is automatically closed for you, so you could execute this block repeatedly without explicitly closing the file yourself and read the file this way over again.

As the file object reads the file, it uses a pointer to keep track of where it is. If you read part of the file, then go back to it later it will pick up where you left off. If you read the whole file, and go back to the same file object, it will be like reading an empty file because the pointer is at the end of the file and there is nothing left to read. You can use file.tell() to see where in the file the pointer is and file.seek to set the pointer. For example:
>>> file = open('myfile.txt')
>>> file.tell()
0
>>> file.readline()
'one\n'
>>> file.tell()
4L
>>> file.readline()
'2\n'
>>> file.tell()
6L
>>> file.seek(4)
>>> file.readline()
'2\n'
Also, you should know that file.readlines() reads the whole file and stores it as a list. That's useful to know because you can replace:
for line in file.readlines():
#do stuff
file.seek(0)
for line in file.readlines():
#do more stuff
with:
lines = file.readlines()
for each_line in lines:
#do stuff
for each_line in lines:
#do more stuff
You can also iterate over a file, one line at a time, without holding the whole file in memory (this can be very useful for very large files) by doing:
for line in file:
#do stuff

The file object is a buffer. When you read from the buffer, that portion that you read is consumed (the read position is shifted forward). When you read through the entire file, the read position is at the end of the file (EOF), so it returns nothing because there is nothing left to read.
If you have to reset the read position on a file object for some reason, you can do:
f.seek(0)

Of course.
That is normal and sane behaviour.
Instead of closing and re-opening, you could rewind the file.

Related

Writing into a file then reading it on Python 3.6.2

target=open("test.txt",'w+')
target.write('ffff')
print(target.read())
When running the following python script (test.txt is an empty file), it prints an empty string.
However, when reopening the file, it can read it just fine:
target=open("test.txt",'w+')
target.write('ffff')
target=open("test.txt",'r')
print(target.read())
This prints out 'ffff' as needed.
Why is this happening? Is 'target' still recognized as having no content, even though I updated it in line 2, and I have to reassign test.txt to it?
A file has a read/write position. Writing to the file puts that position at the end of the written text; reading starts from the same position.
Put that position back to the start with the seek method:
with open("test.txt",'w+') as target:
target.write('ffff')
target.seek(0) # to the start again
print(target.read())
Demo:
>>> with open("test.txt",'w+') as target:
... target.write('ffff')
... target.seek(0) # to the start again
... print(target.read())
...
4
0
ffff
The numbers are the return values of target.write() and target.seek(); they are the number of characters written, and the new position.
No need to close and re-open it. You just need to seek back to the file's starting point before reading it:
with open("test.txt",'w+') as f:
f.write('ffff')
f.seek(0)
print(f.read())
Try flushing, then seeking the beginning of the file:
f = open(path, 'w+')
f.write('foo')
f.write('bar')
f.flush()
f.seek(0)
print(f.read())
you have to close() the file before reading it. You cannot read and write to a file at the same time. this causes inconsistency.

truncating a text file does not change the file

When a novice (like me) asks for reading/processing a text file in python he often gets answers like:
with open("input.txt", 'r') as f:
for line in f:
#do your stuff
Now I would like to truncate everything in the file I'm reading after a special line. After modifying the example above I use:
with open("input.txt", 'r+') as file:
for line in file:
print line.rstrip("\n\r") #for debug
if line.rstrip("\n\r")=="CC":
print "truncating!" #for debug
file.truncate();
break;
and expect it to throw away everything after the first "CC" seen. Running this code on input.txt:
AA
CC
DD
the following is printed on the console (as expected):
AA
CC
truncating!
but the file "input.txt" stays unchanged!?!?
How can that be? What I'm doing wrong?
Edit: After the operation I want the file to contain:
AA
CC
It looks like you're falling victim to a read-ahead buffer used internally by Python. From the documentation for the file.next() method:
A file object is its own iterator, for example iter(f) returns f (unless f is closed). When a file is used as an iterator, typically in a for loop (for example, for line in f: print line.strip()), the next() method is called repeatedly. This method returns the next input line, or raises StopIteration when EOF is hit when the file is open for reading (behavior is undefined when the file is open for writing). In order to make a for loop the most efficient way of looping over the lines of a file (a very common operation), the next() method uses a hidden read-ahead buffer. As a consequence of using a read-ahead buffer, combining next() with other file methods (like readline()) does not work right. However, using seek() to reposition the file to an absolute position will flush the read-ahead buffer.
The upshot is that the file's position is not where you would expect it to be when you truncate. One way around this is to use readline to loop over the file, rather than the iterator:
line = file.readline()
while line:
...
line = file.readline()
In addition to glibdud's answer, truncate() needs the size from where it deletes the content. You can get the current position in your file by the tell() command. As he mentioned, by using the for-loop, the next() prohibits commands like tell. But in the suggested while-loop, you can truncate at the current tell()-position. So the complete code would look like this:
Python 3:
with open("test.txt", 'r+') as file:
line = file.readline()
while line:
print(line.strip())
if line.strip() == "CC":
print("truncating")
file.truncate(file.tell())
break
line = file.readline()

Can I consistently open a file in python?

I find each time I open a file and read its content, it automatically close. So next time I have to open that file again to read it. I know this may be a good idea to reduce memory usage. But I need to consistently open a file due to my current task.
How should I do it?
This is my current way of reading a file
f = open('some_file', 'rU')
f.read()
After you do f.read(), the file doesn't close. In fact, it doesn't close unless you explicitly call f.close() or you use it in a with block like this:
with open('some_file') as f:
...
In which case, the file will be closed for you when the with block ends. It will also tend to be closed in any case when the file object has no more variables associated with it (ie, when f falls out of scope or gets reassigned), but this isn't guaranteed behaviour. If none of these things happen, the file is kept open.
The problem you are most likely seeing is that calling read again will get you an empty string. This doesn't happen because the file is closed - reading from a closed file gives you an error. Rather, files keep track of where you have read up to, so that if you only read part of it, you can then request the next part and it will start at the right place. To set it back to read from the start of the file again again, you can use the seek method:
with open('some_file') as f:
contents1 = f.read()
f.seek(0)
contents2 = f.read()
will give you contents1 and contents2 both containing the full contents of the file, rather than contents2 being empty. However, you probably don't want to do this unless the file could have changed in the meantime.
To make sure your file will consisently open or be consistently open, you need to be closing files or seek to 0.
while True:
f = open(...)
x = f.read()
print x
f.close()
or
f = open(...)
while True:
x = f.read()
print x
f.seek(0)
...unless you are going to write a one-liner, which will close automatically.
print open('some_file', 'rU').read()
This avoids any limit on the number of open files.
Additional thought: You can also use for line in open(...): pass, again as long as you remember to get to the beginning of the file one way or another.

Why can i read lines from file only one time?

I have a file containing python's object as string, then i open it and doing things like i showing:
>>> file = open('gods.txt')
>>> file.readlines()
["{'brahman': 'impersonal', 'wishnu': 'personal, immortal', 'brahma': 'personal, mortal'}\n"]
But then i have problem because there is no longer any lines:
>>> f.readlines()
[]
>>> f.readline(0)
''
Why it is heppening and how can i stay with access to file's lines?
There's only one line in that file, and you just read it. readlines returns a list of all the lines. If you want to re-read the file, you have to do file.seek(0)
Your position in the file has moved
f = open("/home/usr/stuff", "r")
f.tell()
# shows you're at the start of the file
l = f.readlines()
f.tell()
# now shows your file position is at the end of the file
readlines() gives you a list of contents of the file, and you can read that list over and over. It's good practice to close the file after reading it, and then use the contents you've got from the file. Don't keep trying to read the file contents over and over, you've already got it.
save the result to a variable or reopen the file?
lines = file.readlines()
You can store the lines list in a variable and then access it
whenever you want:
file = open('gods.txt')
# store the lines list in a variable
lines = file.readlines()
# then you can iterate the list whenever you want
for line in lines:
print line

Why the second time I run "readlines" on the same file nothing is returned?

>>> f = open('/tmp/version.txt', 'r')
>>> f
<open file '/tmp/version.txt', mode 'r' at 0xb788e2e0>
>>> f.readlines()
['2.3.4\n']
>>> f.readlines()
[]
>>>
I've tried this in Python's interpreter. Why does this happen?
You need to seek to the beginning of the file. Use f.seek(0) to return to the begining:
>>> f = open('/tmp/version.txt', 'r')
>>> f
<open file '/tmp/version.txt', mode 'r' at 0xb788e2e0>
>>> f.readlines()
['2.3.4\n']
>>> f.seek(0)
>>> f.readlines()
['2.3.4\n']
>>>
Python keeps track of where you are in the file. When you're at the end, it doesn't automatically roll back over. Try f.seek(0).
The important part to understand that some of the other posters don't explicitly state is that files are read with a cursor that marks the current position in the file. So on the first readlines() call the cursor is at the beginning of your file, and is progressed all the way to the end of the file since all the files data was returned. On the second readlines call the cursor is at the end of the file, so when it reads to the end of the file, it doesn't move at all, and no data is returned. For educational purposes, you could write a quick bit of code that would open a file, read a few bytes or lines out, and then call readlines(), you will see that the output of the readlines() call begins where you left off with your previous reads, and continues until the end of the file.
The seek(0) call mentioned by other will allow you to reset the cursor at the beginning of the file to start over with the reads.
In addition to seeking to the beginning of the file, you can also store the value as something that you can reuse later if you just need them in memory. Something like this:
with open('tmp/version.txt', 'r') as f:
lines = f.readlines()
The with statement is new in 2.6 I believe, in prior versions you'd need to import it from future.

Categories

Resources