truncating a text file does not change the file - python

When a novice (like me) asks for reading/processing a text file in python he often gets answers like:
with open("input.txt", 'r') as f:
for line in f:
#do your stuff
Now I would like to truncate everything in the file I'm reading after a special line. After modifying the example above I use:
with open("input.txt", 'r+') as file:
for line in file:
print line.rstrip("\n\r") #for debug
if line.rstrip("\n\r")=="CC":
print "truncating!" #for debug
file.truncate();
break;
and expect it to throw away everything after the first "CC" seen. Running this code on input.txt:
AA
CC
DD
the following is printed on the console (as expected):
AA
CC
truncating!
but the file "input.txt" stays unchanged!?!?
How can that be? What I'm doing wrong?
Edit: After the operation I want the file to contain:
AA
CC

It looks like you're falling victim to a read-ahead buffer used internally by Python. From the documentation for the file.next() method:
A file object is its own iterator, for example iter(f) returns f (unless f is closed). When a file is used as an iterator, typically in a for loop (for example, for line in f: print line.strip()), the next() method is called repeatedly. This method returns the next input line, or raises StopIteration when EOF is hit when the file is open for reading (behavior is undefined when the file is open for writing). In order to make a for loop the most efficient way of looping over the lines of a file (a very common operation), the next() method uses a hidden read-ahead buffer. As a consequence of using a read-ahead buffer, combining next() with other file methods (like readline()) does not work right. However, using seek() to reposition the file to an absolute position will flush the read-ahead buffer.
The upshot is that the file's position is not where you would expect it to be when you truncate. One way around this is to use readline to loop over the file, rather than the iterator:
line = file.readline()
while line:
...
line = file.readline()

In addition to glibdud's answer, truncate() needs the size from where it deletes the content. You can get the current position in your file by the tell() command. As he mentioned, by using the for-loop, the next() prohibits commands like tell. But in the suggested while-loop, you can truncate at the current tell()-position. So the complete code would look like this:
Python 3:
with open("test.txt", 'r+') as file:
line = file.readline()
while line:
print(line.strip())
if line.strip() == "CC":
print("truncating")
file.truncate(file.tell())
break
line = file.readline()

Related

Wrong result with a loop in a loop [duplicate]

This question already has answers here:
Why can't I call read() twice on an open file?
(7 answers)
Closed 7 months ago.
I have a problem with iterating on a file. Here's what I type on the interpreter and the result:
>>> f = open('baby1990.html', 'rU')
>>> for line in f.readlines():
... print(line)
...
# ... all the lines from the file appear here ...
When I try to iterate on the same open file again I get nothing!
>>> for line in f.readlines():
... print(line)
...
>>>
There is no output at all. To solve this I have to close() the file then open it again for reading! Is that normal behavior?
Yes, that is normal behavior. You basically read to the end of the file the first time (you can sort of picture it as reading a tape), so you can't read any more from it unless you reset it, by either using f.seek(0) to reposition to the start of the file, or to close it and then open it again which will start from the beginning of the file.
If you prefer you can use the with syntax instead which will automatically close the file for you.
e.g.,
with open('baby1990.html', 'rU') as f:
for line in f:
print line
once this block is finished executing, the file is automatically closed for you, so you could execute this block repeatedly without explicitly closing the file yourself and read the file this way over again.
As the file object reads the file, it uses a pointer to keep track of where it is. If you read part of the file, then go back to it later it will pick up where you left off. If you read the whole file, and go back to the same file object, it will be like reading an empty file because the pointer is at the end of the file and there is nothing left to read. You can use file.tell() to see where in the file the pointer is and file.seek to set the pointer. For example:
>>> file = open('myfile.txt')
>>> file.tell()
0
>>> file.readline()
'one\n'
>>> file.tell()
4L
>>> file.readline()
'2\n'
>>> file.tell()
6L
>>> file.seek(4)
>>> file.readline()
'2\n'
Also, you should know that file.readlines() reads the whole file and stores it as a list. That's useful to know because you can replace:
for line in file.readlines():
#do stuff
file.seek(0)
for line in file.readlines():
#do more stuff
with:
lines = file.readlines()
for each_line in lines:
#do stuff
for each_line in lines:
#do more stuff
You can also iterate over a file, one line at a time, without holding the whole file in memory (this can be very useful for very large files) by doing:
for line in file:
#do stuff
The file object is a buffer. When you read from the buffer, that portion that you read is consumed (the read position is shifted forward). When you read through the entire file, the read position is at the end of the file (EOF), so it returns nothing because there is nothing left to read.
If you have to reset the read position on a file object for some reason, you can do:
f.seek(0)
Of course.
That is normal and sane behaviour.
Instead of closing and re-opening, you could rewind the file.

Python:Printing file content

I am a newbie to programming and trying to print contents of a file using the following statements but while trying to print the file contents, the output I get is empty space:-
with open('myfile.txt','a+') as myfile:
myfile.write("hello once again 2")
data=myfile.read()
print(data)
The reason for that is a wrong parameter to the open function. Try to replace a+ with r+, and read with readlines
with open('myfile.txt', 'r+') as myfile:
myfile.write("hello once again 2")
data = myfile.readlines() #please notice readlines
print(data)
Here is a reason for that.
When you open a file with 'a+' flag it is opened for reading and writing but the stream is position in the end the file. That why you read 'empty', because there is nothing.
I would advice you to work with file in two steps. First write to it, and then read it.
What write and read do - they write the content into the file but it is not going to be there immediately unless you close the file or call the flush function explicitly. The flush is going to be called in the end of the 'context manager' which is created by with open('myfile.txt', 'r+') as myfile. You can imagine 'context manager' as a wrapper which makes sure that 'flush' is called after you've done writing your code under with statement.
When you write your content your filepointer is at the end of the file.
To read it from the begining you need to reset your pointer.
do myfile.seek(0) before myfile.read()
for more details see: https://docs.python.org/2/tutorial/inputoutput.html
f.tell() returns an integer giving the file object’s current position
in the file, measured in bytes from the beginning of the file. To
change the file object’s position, use f.seek(offset, from_what). The
position is computed from adding offset to a reference point; the
reference point is selected by the from_what argument. A from_what
value of 0 measures from the beginning of the file, 1 uses the current
file position, and 2 uses the end of the file as the reference point.
from_what can be omitted and defaults to 0, using the beginning of the
file as the reference point.
Since the behavior of a+ can vary among operating systems, it is probably best not to use it is you want your code to be portable.
Unless your files are huge (is in a significant fraction of available RAM) I would do the following.
Read your whole file into a list of lines.
with open('myfile.txt') as myfile:
mylines = myfile.readlines()
You can now manipulate mylines as you like. Append, insert, change or delete lines as you wish.
At the end, write it all back.
with open('myfile.txt', 'w') as myfile:
myfile.writelines(mylines)
To the best of my knowledge, this should behave the same on all Python platforms.

What are the differences among `next(f)`, `f.readline()` and `f.next()` in Python? [duplicate]

This question already has answers here:
file.tell() inconsistency
(3 answers)
Closed 7 years ago.
I process one file: skip the header (comment), process the first line, process other lines.
f = open(filename, 'r')
# skip the header
next(f)
# handle the first line
line = next(f)
process_first_line(line)
# handle other lines
for line in f:
process_line(line)
If line = next(f) is replaced with line = f.readline(), it will encounter the error.
ValueError: Mixing iteration and read methods would lose data
Therefore, I would like to know the differences among next(f), f.readline() and f.next() in Python?
Quoting official Python documentation,
A file object is its own iterator, for example iter(f) returns f (unless f is closed). When a file is used as an iterator, typically in a for loop (for example, for line in f: print line.strip()), the next() method is called repeatedly. This method returns the next input line, or raises StopIteration when EOF is hit when the file is open for reading (behavior is undefined when the file is open for writing). In order to make a for loop the most efficient way of looping over the lines of a file (a very common operation), the next() method uses a hidden read-ahead buffer. As a consequence of using a read-ahead buffer, combining next() with other file methods (like readline()) does not work right.
Basically, when the next function is called on a Python's file object, it fetches a certain number of bytes from the file and processes them and returns only the current line (end of current line is determined by the newline character). So, the file pointer is moved. It will not be at the same position where the current returned line ends. So, calling readline on it will give inconsistent result. That is why mixing both of them are not allowed.

Reading the remaining lines of a file after searching for a word in the file

I'm trying to read the rest of a file after finding a word.
I'm trying to write a program that searches for a word in a file and then, when the word was found, it needs to do something with the remaining lines that are below / after the word.
Here's what I have so far but it's not working. Please assist. thanks.
def readFile():
with open(“file.txt”, "r") as file:
for line in file:
if “Hello” in line:
break
nextline = file.readlines()
for line in nextline
print(line)
You can't mix iteration (which basically calls the file.next method in the loop) and readlines.
To quote a great man (and file.next documentation):
In order to make a for loop the most efficient way of looping over the
lines of a file (a very common operation), the next() method uses a
hidden read-ahead buffer. As a consequence of using a read-ahead
buffer, combining next() with other file methods (like readline())
does not work right
You can do fine with using just iteration:
def readFile():
with open("file.txt", "r") as file:
for line in file:
if "Hello" in line:
break
for line in file:
# do something with the line

Most efficient way to "nibble" the first line of text from a text document then resave it in python

I have a text document that I would like to repeatedly remove the first line of text from every 30 seconds or so.
I have already written (or more accurately copied) the code for the python resettable timer object that allows a function to be called every 30 seconds in a non blocking way if not asked to reset or cancel.
Resettable timer in python repeats until cancelled
(If someone could check the way I implemented the repeat in that is ok, because my python sometimes crashes while running that, would be appreciated :))
I now want to write my function to load a text file and perhaps copy all but the first line and then rewrite it to the same text file. I can do this, this way I think... but is it the most efficient ?
def removeLine():
with open(path, 'rU') as file:
lines = deque(file)
try:
print lines.popleft()
except IndexError:
print "Nothing to pop?"
with open(path, 'w') as file:
file.writelines(lines)
This works, but is it the best way to do it ?
I'd use the fileinput module with inplace=True:
import fileinput
def removeLine():
inputfile = fileinput.input(path, inplace=True, mode='rU')
next(inputfile, None) # skip a line *if present*
for line in inputfile:
print line, # write out again, but without an extra newline
inputfile.close()
inplace=True causes sys.stdout to be redirected to the open file, so we can simply 'print' the lines.
The next() call is used to skip the first line; giving it a default None suppresses the StopIteration exception for an empty file.
This makes rewriting a large file more efficient as you only need to keep the fileinput readlines buffer in memory.
I don't think a deque is needed at all, even for your solution; just use next() there too, then use list() to catch the remaining lines:
def removeLine():
with open(path, 'rU') as file:
next(file, None) # skip a line *if present*
lines = list(file)
with open(path, 'w') as file:
file.writelines(lines)
but this requires you to read all of the file in memory; don't do that with large files.

Categories

Resources