I am trying to teach myself Python by reading documentation. I am trying to understand what it means to flush a file buffer. According to documentation, "file.flush" does the following.
Flush the internal buffer, like stdio‘s fflush().
This may be a no-op on some file-like objects.
I don't know what "internal buffer" and "no-op" mean, but I think it says that flush writes data from some buffer to a file.
Hence, I ran this file toggling the pound sign in the line in the middle.
with open("myFile.txt", "w+") as file:
file.write("foo")
file.write("bar")
# file.flush()
file.write("baz")
file.write("quux")
However, I seem to get the same myFile.txt with and without the call to file.flush(). What effect does file.flush() have?
Python buffers writes to files. That is, file.write returns before the data is actually written to your hard drive. The main motivation of this is that a few large writes are much faster than many tiny writes, so by saving up the output of file.write until a bit has accumulated, Python can maintain good writing speeds.
file.flush forces the data to be written out at that moment. This is hand when you know that it might be a while before you have more data to write out, but you want other processes to be able to view the data you've already written. Imagine a log file that grows slowly. You don't want to have to wait ages before enough entries have built up to cause the data to be written out in one big chunk.
In either case, file.close causes the remaining data to be flushed, so "quux" in your code will be written out as soon as file (which is a really bad name as it shadows the builtin file constructor) falls out of scope of the with context manager and gets closed.
Note: your OS does some buffering of its own, but I believe every OS where Python is implemented will honor file.flush's request to write data out to the drive. Someone please correct me if I'm wrong.
By the way, "no-op" means "no operation", as in it won't actually do anything. For example, StringIO objects manipulate strings in memory, not files on your hard drive. StringIO.flush probably just immediately returns because there's not really anything for it to do.
Buffer content might be cached to improve performance. Flush makes sure that the content is written to disk completely, avoiding data loss. It is also useful when, for example, you want the line asking for user input printed completely on-screen before the next file operation takes place.
Related
I was looking into the tempfile options in Python, when I ran into SpooledTemporaryFile, now, the description says:
This function operates exactly as TemporaryFile() does, except that data is spooled in memory until the file size exceeds max_size, or until the file’s fileno() method is called, at which point the contents are written to disk and operation proceeds as with TemporaryFile().
I would like to understand exactly what that means, I have looked around but found no answer, if I get it right:
Is the written data 'buffered' in RAM until it reaches a certain threshold and then saved in disk? What is the advantage of this method over the standard approach? Is it faster? Because in the end it will have to be saved in disk anyway...
Anyway, if anyone can provide me with some insights I'd be grateful.
The data is buffered in memory before writing. The question becomes, will anything be written?
Not necessarily. TemporaryFile, by default, deletes the file after it is closed. If a spooled file never gets big enough to write to disk (and the user never tries to invoke its fileno method), the buffer will simply be flushed when it is closed, with no actual disk I/O taking place.
This might not even be an issue but I've got a couple of related Python questions that will hopefully help clear up a bit of debugging I've been stuck on for the past week or two.
If you call a function that returns a large object, is there a way to ignore the return value in order to save memory?
My best example is, let's say you are piping a large text file, line-by-line, to another server and when you complete the pipe, the function returns a confirmation for every successful line. If you pipe too many lines, the returned list of confirmations could potentially overrun your available memory.
for line in lines:
connection.put(line)
response = connection.execute()
If you remove the response variable, I believe the return value is still loaded into memory so is there a way to ignore/block the return value when you don't really care about the response?
More background: I'm using the redis-python package to pipeline a large number of set-additions. My processes occasionally die with out-of-memory issues even though the file itself is not THAT big and I'm not entirely sure why. This is just my latest hypothesis.
I think the confirmation response is not big enough to overrun your memory. In python, when you read lines from file , the lines is always in memory which cost large memory resource.
I'm having a bit of a conceptual problem. For writing to the file "to_file", this works:
out_file = open(to_file, 'w')
out_file.write(indata)
...but this doesn't:
(open(to_file, 'w')).write(indata)
In theory, shouldn't swapping out a variable's (out_file) definition for the variable itself produce the same result? I'm confused as to why the extra step of creating the variable is necessary.
As others have pointed out, your code will actually open and write to the file. However,...
In your second, single-line code, you now have no reference to the open file. Therefore you have no way to close it or do anything else with it.
Leaving a file open is a resource leak. If your program closes right away, Python will try to close the file just before ending. But Python could possible fail, for a variety of reasons. For example, the removable disk drive containing the file may be removed after you write to the file but before your program ends. That could make the file unreadable on the removable drive--and I have seen this happen. And if your program does not close right away, you have this extra resource hanging around that takes memory and other resources that need not be taken. If your program continues for a long time, the growing resources could slow down or stop the computer.
Even if your program will close right away, this is a bad habit to develop. You don't just want to write programs, you want to write code that will work well in a variety of situations. You may think "I will never use this code in a long-running program." Such declarations often turn out to be mistaken. Coding is difficult enough--don't make it harder for yourself. Avoid the "anti-pattern" of your second example.
There is a better pattern in Python for such things, using the with statement. Read that link and use that pattern rather than either of your two examples.
with open(to_file, 'w') as out_file:
out_file.write(indata)
Those two lines opened the file, wrote the data to the file, then closed the file. If you want to do more with the file before it is closed, put that code in the indented section under the with statement.
In Python 2.7, both of your provided examples will work and write to the file.
I am writing a program in which I would like to be able to view a log file before the program is complete. I have noticed that, in python (2.7 and 3), that file.write() does not save the file, file.close() does. I don't want to create a million little log files with unique names but I would like to be able to view the updated log file before the program is finished. How can I do this?
Now, to be clear I am scripting using Ansys Workbench (trying to batch some CFX runs). Here's a link to a tutorial that shows what I'm talking about. They appear to have wrapped python, and by running the script I can send commands to the various modules. When the script is running there is no console onscreen and it appears to be eating all of the print statements, so the only way I can report what's happening is via a file. Also, I don't want to bring a console window up because eventually I will just run the program in batch mode (no interface). But the simulations take a long time to run and I can't wait for the program to finish before checking on what's happening.
You would need this:
file.flush()
# typically the above line would do. however this is used to ensure that the file is written
os.fsync(file.fileno())
Check this: http://docs.python.org/2/library/stdtypes.html#file.flush
file.flush()
Flush the internal buffer, like stdio‘s fflush(). This may be a no-op on some file-like objects.
Note flush() does not necessarily write the file’s data to disk. Use flush() followed by os.fsync() to ensure this behavior.
EDITED: See this question for detailed explanations: what exactly the python's file.flush() is doing?
Does file.flush() after each write help?
Hannu
This will write the file to disk immediately:
file.flush()
os.fsync(file.fileno())
According to the documentation https://docs.python.org/2/library/os.html#os.fsync
Force write of file with filedescriptor fd to disk. On Unix, this calls the native fsync() function; on Windows, the MS _commit() function.
If you’re starting with a Python file object f, first do f.flush(), and then do os.fsync(f.fileno()), to ensure that all internal buffers associated with f are written to disk.
I have a file open for writing, and a process running for days -- something is written into the file in relatively random moments. My understanding is -- until I do file.close() -- there is a chance nothing is really saved to disk. Is that true?
What if the system crashes when the main process is not finished yet? Is there a way to do kind of commit once every... say -- 10 minutes (and I call this commit myself -- no need to run timer)? Is file.close() and open(file,'a') the only way, or there are better alternatives?
You should be able to use file.flush() to do this.
If you don't want to kill the current process to add f.flush() (it sounds like it's been running for days already?), you should be OK. If you see the file you are writing to getting bigger, you will not lose that data...
From Python docs:
write(str)
Write a string to the file. There is no return value. Due to buffering,
the string may not actually show up in
the file until the flush() or close()
method is called.
It sounds like Python's buffering system will automatically flush file objects, but it is not guaranteed when that happens.
To make sure that you're data is written to disk, use file.flush() followed by os.fsync(file.fileno()).
As has already been stated use the .flush() method to force the write out of the buffer, but avoid using a lot of calls to flush as this can actually slow your writing down (if the application relies on fast writes) as you'll be forcing your filesystem to write changes that are smaller than it's buffer size which can bring you to your knees. :)