Python file objects, closing, and destructors

Python file objects, closing, and destructors - python

The description of tempfile.NamedTemporaryFile() says:
If delete is true (the default), the file is deleted as soon as it
is closed.
In some circumstances, this means that the file is not deleted after the
Python interpreter ends. For example, when running the following test under
py.test, the temporary file remains:
from __future__ import division, print_function, absolute_import
import tempfile
import unittest2 as unittest
class cache_tests(unittest.TestCase):
def setUp(self):
self.dbfile = tempfile.NamedTemporaryFile()
def test_get(self):
self.assertEqual('foo', 'foo')
In some way this makes sense, because this program never explicitly
closes the file object. The only other way for the object to get closed
would presumably be in the __del__ destructor, but here the language
references states that "It is not guaranteed that __del__() methods are
called for objects that still exist when the interpreter exits." So
everything is consistent with the documentation so far.
However, I'm confused about the implications of this. If it is not
guaranteed that file objects are closed on interpreter exit, can it
possibly happen that some data that was successfully written to a
(buffered) file object is lost even though the program exits gracefully,
because it was still in the file object's buffer, and the file object
never got closed?
Somehow that seems very unlikely and un-pythonic to me, and the open()
documentation doesn't contain any such warnings either. So I
(tentatively) conclude that file objects are, after all, guaranteed to
be closed.
But how does this magic happen, and why can't NamedTemporaryFile() use
the same magic to ensure that the file is deleted?
Edit: Note that I am not talking about file descriptors here (that are buffered by the OS and closed by the OS on program exit), but about Python file objects that may implement their own buffering.

On Windows, NamedTemporaryFile uses a Windows-specific extension (os.O_TEMPORARY) to ensure that the file is deleted when it is closed. This probably also works if the process is killed in any way. However there is no obvious equivalent on POSIX, most likely because on POSIX you can simply delete files that are still in use; it only deletes the name, and the file's content is only removed after it is closed (in any way). But indeed assuming that we want the file name to persist until the file is closed, like with NamedTemporaryFile, then we need "magic".
We cannot use the same magic as for flushing buffered files. What occurs there is that the C library handles it (in Python 2): the files are FILE objects in C, and the C guarantees that they are flushed on normal program exit (but not if the process is killed). In the case of Python 3, there is custom C code to achieve the same effect. But it's specific to this use case, not anything directly reusable.
That's why NamedTemporaryFile uses a custom __del__. And indeed, __del__ are not guaranteed to be called when the interpreter exits. (We can prove it with a global cycle of references that also references a NamedTemporaryFile instance; or running PyPy instead of CPython.)
As a side note, NamedTemporaryFile could be implemented a bit more robustly, e.g. by registering itself with atexit to ensure that the file name is removed then. But you can call it yourself too: if your process doesn't use an unbounded number of NamedTemporaryFiles, it's simply atexit.register(my_named_temporary_file.close).

On any version of *nix, all file descriptors are closed when a process finishes, and this is taken care of by the operating system. Windows is likely exactly the same in this respect. Without digging in the source code, I can't say with 100% authority what actually happens, but likely what happens is:
If delete is False, unlink() (or a function similar to it on other operating systems) is called. This means that the file will automatically be deleted when the process exits and there are no more open file descriptors. While the process is running, the file will still remain around.
If delete is True, likely the C function remove() is used. This will forcibly delete the file before the process exits.

The file buffering is handled by the Operating System. If you do not close a file after you open it, it is because you are assuming that the operating system will flush the buffer and close the file after the owner exists. This is not Python magic, this is your OS doing it's thing. The __del__() method is related to Python and requires explicit calls.

Related

Do functions in the "os" module wait to be finished?

With:
import os
for file in files:
os.remove(file)
Will it wait for every file to be removed like a synchronous function in each call to os.remove(), or will it iterate through while calling os.remove()?

Generally, the os module provides wrappers around system calls of the operating system. For example, on Linux the os.remove/os.unlink functions correspond to the unlink system call. These functions wait until the system call has finished.
Whether this means that the high level operation intended by the program has finished depends on the use-case.
For example, unlink merely removes the path pointing to the file content; if there are other paths for the same file (i.e. hardlinks) or processes with a file handle on it, the file content remains. Only when all references are gone is the file content eligible for removal from the filesystem (similar to reference counting). The filesystem itself may arbitrarily delay removal of the content, and distributed filesystems may have additional consistency and synchronisation constraints.
As a rule of thumb, if there are no special requirements then it is fine to consider the os call to be prompt and synchronous. If there are specific requirements, such as file content being completely destroyed, read up on the specific behaviour of the involved components.

Is it dangerous to open, and immediately use a file without context manager? [duplicate]

In Python, if you either open a file without calling close(), or close the file but not using try-finally or the "with" statement, is this a problem? Or does it suffice as a coding practice to rely on the Python garbage-collection to close all files? For example, if one does this:
for line in open("filename"):
# ... do stuff ...
... is this a problem because the file can never be closed and an exception could occur that prevents it from being closed? Or will it definitely be closed at the conclusion of the for statement because the file goes out of scope?

In your example the file isn't guaranteed to be closed before the interpreter exits. In current versions of CPython the file will be closed at the end of the for loop because CPython uses reference counting as its primary garbage collection mechanism but that's an implementation detail, not a feature of the language. Other implementations of Python aren't guaranteed to work this way. For example IronPython, PyPy, and Jython don't use reference counting and therefore won't close the file at the end of the loop.
It's bad practice to rely on CPython's garbage collection implementation because it makes your code less portable. You might not have resource leaks if you use CPython, but if you ever switch to a Python implementation which doesn't use reference counting you'll need to go through all your code and make sure all your files are closed properly.
For your example use:
with open("filename") as f:
for line in f:
# ... do stuff ...

Some Pythons will close files automatically when they are no longer referenced, while others will not and it's up to the O/S to close files when the Python interpreter exits.
Even for the Pythons that will close files for you, the timing is not guaranteed: it could be immediately, or it could be seconds/minutes/hours/days later.
So, while you may not experience problems with the Python you are using, it is definitely not good practice to leave your files open. In fact, in cpython 3 you will now get warnings that the system had to close files for you if you didn't do it.
Moral: Clean up after yourself. :)

Although it is quite safe to use such construct in this particular case, there are some caveats for generalising such practice:
run can potentially run out of file descriptors, although unlikely, imagine hunting a bug like that
you may not be able to delete said file on some systems, e.g. win32
if you run anything other than CPython, you don't know when file is closed for you
if you open the file in write or read-write mode, you don't know when data is flushed

The file does get garbage collected, and hence closed. The GC determines when it gets closed, not you. Obviously, this is not a recommended practice because you might hit open file handle limit if you do not close files as soon as you finish using them. What if within that for loop of yours, you open more files and leave them lingering?

Hi It is very important to close your file descriptor in situation when you are going to use it's content in the same python script. I today itself realize after so long hecting debugging. The reason is content will be edited/removed/saved only after you close you file descriptor and changes are affected to file!
So suppose you have situation that you write content to a new file and then without closing fd you are using that file(not fd) in another shell command which reads its content. In this situation you will not get you contents for shell command as expected and if you try to debug you can't find the bug easily. you can also read more in my blog entry http://magnificentzps.blogspot.in/2014/04/importance-of-closing-file-descriptor.html

During the I/O process, data is buffered: this means that it is held in a temporary location before being written to the file.
Python doesn't flush the buffer—that is, write data to the file—until it's sure you're done writing. One way to do this is to close the file.
If you write to a file without closing, the data won't make it to the target file.

Python uses close() method to close the opened file. Once the file is closed, you cannot read/write data in that file again.
If you will try to access the same file again, it will raise ValueError since the file is already closed.
Python automatically closes the file, if the reference object has been assigned to some another file. Closing the file is a standard practice as it reduces the risk of being unwarrantedly modified.
One another way to solve this issue is.... with statement
If you open a file using with statement, a temporary variable gets reserved for use to access the file and it can only be accessed with the indented block. With statement itself calls the close() method after execution of indented code.
Syntax:
with open('file_name.text') as file:
#some code here

Pythonic way to handle if the file being written is deleted externally

A python process is writing to a file and the file has been deleted/moved by an external process (cron job in my case).
The python process will continue to execute without any errors (expected as it is being written to the buffer rather than the file and will be flushed after f.close()). Yet there won't be any new file created in this case and buffer will be silently discarded(correct me if I'm wrong).
Is there any pythonic way to handle this instead of checking if file exists and create one if not, before every write operation.

There is no "pythonic" way to do this because the question isn't about a specific language. It's an operating system question. So the answer is going to be different for MS Windows than it is for a UNIX like OS such as Linux or macOS. To do this efficiently requires using a facility such as the Linux inotify API. A simpler approach that will work on any UNIX like OS is to open the file then call os.fstat() and remember the st_ino member of the returned object. Then periodically call os.stat() on the path name and compare its st_ino value to the one you saved earlier. If it changes, or the os.stat() call fails, then you know the file name you are writing to is no longer the same file.

What are the dangers of not closing files in Python? [duplicate]

In Python, if you either open a file without calling close(), or close the file but not using try-finally or the "with" statement, is this a problem? Or does it suffice as a coding practice to rely on the Python garbage-collection to close all files? For example, if one does this:
for line in open("filename"):
# ... do stuff ...
... is this a problem because the file can never be closed and an exception could occur that prevents it from being closed? Or will it definitely be closed at the conclusion of the for statement because the file goes out of scope?

In your example the file isn't guaranteed to be closed before the interpreter exits. In current versions of CPython the file will be closed at the end of the for loop because CPython uses reference counting as its primary garbage collection mechanism but that's an implementation detail, not a feature of the language. Other implementations of Python aren't guaranteed to work this way. For example IronPython, PyPy, and Jython don't use reference counting and therefore won't close the file at the end of the loop.
It's bad practice to rely on CPython's garbage collection implementation because it makes your code less portable. You might not have resource leaks if you use CPython, but if you ever switch to a Python implementation which doesn't use reference counting you'll need to go through all your code and make sure all your files are closed properly.
For your example use:
with open("filename") as f:
for line in f:
# ... do stuff ...

Some Pythons will close files automatically when they are no longer referenced, while others will not and it's up to the O/S to close files when the Python interpreter exits.
Even for the Pythons that will close files for you, the timing is not guaranteed: it could be immediately, or it could be seconds/minutes/hours/days later.
So, while you may not experience problems with the Python you are using, it is definitely not good practice to leave your files open. In fact, in cpython 3 you will now get warnings that the system had to close files for you if you didn't do it.
Moral: Clean up after yourself. :)

Although it is quite safe to use such construct in this particular case, there are some caveats for generalising such practice:
run can potentially run out of file descriptors, although unlikely, imagine hunting a bug like that
you may not be able to delete said file on some systems, e.g. win32
if you run anything other than CPython, you don't know when file is closed for you
if you open the file in write or read-write mode, you don't know when data is flushed

The file does get garbage collected, and hence closed. The GC determines when it gets closed, not you. Obviously, this is not a recommended practice because you might hit open file handle limit if you do not close files as soon as you finish using them. What if within that for loop of yours, you open more files and leave them lingering?

Hi It is very important to close your file descriptor in situation when you are going to use it's content in the same python script. I today itself realize after so long hecting debugging. The reason is content will be edited/removed/saved only after you close you file descriptor and changes are affected to file!
So suppose you have situation that you write content to a new file and then without closing fd you are using that file(not fd) in another shell command which reads its content. In this situation you will not get you contents for shell command as expected and if you try to debug you can't find the bug easily. you can also read more in my blog entry http://magnificentzps.blogspot.in/2014/04/importance-of-closing-file-descriptor.html

During the I/O process, data is buffered: this means that it is held in a temporary location before being written to the file.
Python doesn't flush the buffer—that is, write data to the file—until it's sure you're done writing. One way to do this is to close the file.
If you write to a file without closing, the data won't make it to the target file.

Python uses close() method to close the opened file. Once the file is closed, you cannot read/write data in that file again.
If you will try to access the same file again, it will raise ValueError since the file is already closed.
Python automatically closes the file, if the reference object has been assigned to some another file. Closing the file is a standard practice as it reduces the risk of being unwarrantedly modified.
One another way to solve this issue is.... with statement
If you open a file using with statement, a temporary variable gets reserved for use to access the file and it can only be accessed with the indented block. With statement itself calls the close() method after execution of indented code.
Syntax:
with open('file_name.text') as file:
#some code here

about close a file in Python

I know it is a good habit of using close to close a file if not used any more in Python. I have tried to open a large number of open files, and not close them (in the same Python process), but not see any exceptions or errors. I have tried both Mac and Linux. So, just wondering if Python is smart enough to manage file handle to close/reuse them automatically, so that we do not need to care about file close?
thanks in advance,
Lin

Python will, in general, garbage collect objects no longer in use and no longer being referenced. This means it's entirely possible that open file objects, that match the garbage collector's filters, will get cleaned up and probably closed. However; you should not rely on this, and instead use:
with open(..):
Example (Also best practice):
with open("file.txt", "r") as f:
# do something with f
NB: If you don't close the file and leave "open file descriptors" around on your system, you will eventually start hitting resource limits on your system; specifically "ulimit". You will eventually start to see OS errors related to "too many open files". (Assuming Linux here, but other OS(es) will have similar behaviour).
Important: It's also a good practice to close any open files you've written too, so that data you have written is properly flushed. This helps to ensure data integrity, and not have files unexpectedly contain corrupted data because of an application crash.
It's worth noting that the above important note is the cause of many issues with where you write to a file; read it back; discover it's empty; but then close your python program; read it in a text editor and realize it's not empty.
Demo: A good example of the kind of resource limits and errors you might hit if you don't ensure you close open file(s):
$ python
Python 2.7.6 (default, Mar 22 2014, 22:59:56)
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> xs = [open("/dev/null", "r") for _ in xrange(100000)]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IOError: [Errno 24] Too many open files: '/dev/null'

To add to James Mills' answer, if you need to understand the specific reason you don't see any errors:
Python defines that when a file gets garbage collected, it will be automatically closed. But Python leaves it up to the implementation how it wants to handle garbage collection. CPython, the implementation you're probably using, does it with reference counting: as soon as the last reference to an object goes away, it's immediately collected. So, all of these will appear to work in CPython:
def spam():
for i in range(100000):
open('spam.txt') # collected at end of statement
def eggs():
for i in range(100000):
f = open('eggs.txt') # collected next loop, when f is reassigned
def beans():
def toast():
f = open('toast.txt') # collected when toast exits
for i in range(100000):
toast()
But many other implementations (including the other big three, PyPy, Jython, and IronPython) use smarter garbage collectors that detect garbage on the fly, without having to keep track of all the references. This makes them more efficient, better at threading, etc., but it means that it's not deterministic when an object gets collected. So the same code will not work. Or, worse, it will work in your 60 tests, and then fail as soon as you're doing a demo for your investors.
It would be a shame if you needed PyPy's speed or IronPython's .NET integration, but couldn't have it without rewriting all your code. Or if someone else wanted to use your code, but needed it to work in Jython, and had to look elsewhere.
Meanwhile, even in CPython, the interpreter doesn't collect all of its garbage at shutdown. (It's getting better, but even in 3.4 it's not perfect.) So in some cases, you're relying on the OS to close your files for you. The OS will usually flush them when it does so—but maybe not if you, e.g., had them open in a daemon thread, or you exited with os._exit, or segfaulted. (And of course definitely not if you exited by someone tripping over the power cord.)
Finally, even CPython (since 3.3, I think) has code specifically to generate warnings if you let your files be garbage collected instead of closing them. Those warnings are off by default, but people regularly propose turning them on, and one day it may happen.

You do need to close (output) files in Python.
One example of why is to flush the output to them. If you don't properly close files and your program is killed for some reason, the left-open file can be corrupted.
In addition, there is this: Why python has limit for count of file handles?

There are two good reasons.
If your program crashes or is unexpectedly terminated, then output files may be corrupted.
It's good practice to close what you open.

It's a good idea to handle file closing. It's not the sort of thing that will give errors and exceptions: it will corrupt files, or not write what you tried to write, and so on.
The most common python interpreter, CPython, which you're probably using, does, however, try to handle file closing smartly, just in case you don't. If you open a file, and then it gets garbage collected, which generally happens when there are no longer any references to it, CPython will close the file.
So for example, if you have a function like
def write_something(fname):
f = open(fname, 'w')
f.write("Hi there!\n")
then Python will generally close the file at some point after the function returns.
That's not that bad for simple situations, but consider this:
def do_many_things(fname):
# Some stuff here
f = open(fname, 'w')
f.write("Hi there!\n")
# All sorts of other stuff here
# Calls to some other functions
# more stuff
return something
Now you've opened the file, but it could be a long time before it is closed. On some OSes, that might mean other processes won't be able to open it. If the other stuff has an error, your message might not actually get written to the file. If you're writing quite a bit of stuff, some of it might be written, and some other parts might not; if you're editing a file, you might cause all sorts of problems. And so on.
An interesting question to consider, however, is whether, in OSes where files can be open for reading by multiple processes, there's any significant risk to opening a file for reading and not closing it.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.