I know it is a good habit of using close to close a file if not used any more in Python. I have tried to open a large number of open files, and not close them (in the same Python process), but not see any exceptions or errors. I have tried both Mac and Linux. So, just wondering if Python is smart enough to manage file handle to close/reuse them automatically, so that we do not need to care about file close?
thanks in advance,
Lin
Python will, in general, garbage collect objects no longer in use and no longer being referenced. This means it's entirely possible that open file objects, that match the garbage collector's filters, will get cleaned up and probably closed. However; you should not rely on this, and instead use:
with open(..):
Example (Also best practice):
with open("file.txt", "r") as f:
# do something with f
NB: If you don't close the file and leave "open file descriptors" around on your system, you will eventually start hitting resource limits on your system; specifically "ulimit". You will eventually start to see OS errors related to "too many open files". (Assuming Linux here, but other OS(es) will have similar behaviour).
Important: It's also a good practice to close any open files you've written too, so that data you have written is properly flushed. This helps to ensure data integrity, and not have files unexpectedly contain corrupted data because of an application crash.
It's worth noting that the above important note is the cause of many issues with where you write to a file; read it back; discover it's empty; but then close your python program; read it in a text editor and realize it's not empty.
Demo: A good example of the kind of resource limits and errors you might hit if you don't ensure you close open file(s):
$ python
Python 2.7.6 (default, Mar 22 2014, 22:59:56)
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> xs = [open("/dev/null", "r") for _ in xrange(100000)]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IOError: [Errno 24] Too many open files: '/dev/null'
To add to James Mills' answer, if you need to understand the specific reason you don't see any errors:
Python defines that when a file gets garbage collected, it will be automatically closed. But Python leaves it up to the implementation how it wants to handle garbage collection. CPython, the implementation you're probably using, does it with reference counting: as soon as the last reference to an object goes away, it's immediately collected. So, all of these will appear to work in CPython:
def spam():
for i in range(100000):
open('spam.txt') # collected at end of statement
def eggs():
for i in range(100000):
f = open('eggs.txt') # collected next loop, when f is reassigned
def beans():
def toast():
f = open('toast.txt') # collected when toast exits
for i in range(100000):
toast()
But many other implementations (including the other big three, PyPy, Jython, and IronPython) use smarter garbage collectors that detect garbage on the fly, without having to keep track of all the references. This makes them more efficient, better at threading, etc., but it means that it's not deterministic when an object gets collected. So the same code will not work. Or, worse, it will work in your 60 tests, and then fail as soon as you're doing a demo for your investors.
It would be a shame if you needed PyPy's speed or IronPython's .NET integration, but couldn't have it without rewriting all your code. Or if someone else wanted to use your code, but needed it to work in Jython, and had to look elsewhere.
Meanwhile, even in CPython, the interpreter doesn't collect all of its garbage at shutdown. (It's getting better, but even in 3.4 it's not perfect.) So in some cases, you're relying on the OS to close your files for you. The OS will usually flush them when it does so—but maybe not if you, e.g., had them open in a daemon thread, or you exited with os._exit, or segfaulted. (And of course definitely not if you exited by someone tripping over the power cord.)
Finally, even CPython (since 3.3, I think) has code specifically to generate warnings if you let your files be garbage collected instead of closing them. Those warnings are off by default, but people regularly propose turning them on, and one day it may happen.
You do need to close (output) files in Python.
One example of why is to flush the output to them. If you don't properly close files and your program is killed for some reason, the left-open file can be corrupted.
In addition, there is this: Why python has limit for count of file handles?
There are two good reasons.
If your program crashes or is unexpectedly terminated, then output files may be corrupted.
It's good practice to close what you open.
It's a good idea to handle file closing. It's not the sort of thing that will give errors and exceptions: it will corrupt files, or not write what you tried to write, and so on.
The most common python interpreter, CPython, which you're probably using, does, however, try to handle file closing smartly, just in case you don't. If you open a file, and then it gets garbage collected, which generally happens when there are no longer any references to it, CPython will close the file.
So for example, if you have a function like
def write_something(fname):
f = open(fname, 'w')
f.write("Hi there!\n")
then Python will generally close the file at some point after the function returns.
That's not that bad for simple situations, but consider this:
def do_many_things(fname):
# Some stuff here
f = open(fname, 'w')
f.write("Hi there!\n")
# All sorts of other stuff here
# Calls to some other functions
# more stuff
return something
Now you've opened the file, but it could be a long time before it is closed. On some OSes, that might mean other processes won't be able to open it. If the other stuff has an error, your message might not actually get written to the file. If you're writing quite a bit of stuff, some of it might be written, and some other parts might not; if you're editing a file, you might cause all sorts of problems. And so on.
An interesting question to consider, however, is whether, in OSes where files can be open for reading by multiple processes, there's any significant risk to opening a file for reading and not closing it.
Related
In Python, if you either open a file without calling close(), or close the file but not using try-finally or the "with" statement, is this a problem? Or does it suffice as a coding practice to rely on the Python garbage-collection to close all files? For example, if one does this:
for line in open("filename"):
# ... do stuff ...
... is this a problem because the file can never be closed and an exception could occur that prevents it from being closed? Or will it definitely be closed at the conclusion of the for statement because the file goes out of scope?
In your example the file isn't guaranteed to be closed before the interpreter exits. In current versions of CPython the file will be closed at the end of the for loop because CPython uses reference counting as its primary garbage collection mechanism but that's an implementation detail, not a feature of the language. Other implementations of Python aren't guaranteed to work this way. For example IronPython, PyPy, and Jython don't use reference counting and therefore won't close the file at the end of the loop.
It's bad practice to rely on CPython's garbage collection implementation because it makes your code less portable. You might not have resource leaks if you use CPython, but if you ever switch to a Python implementation which doesn't use reference counting you'll need to go through all your code and make sure all your files are closed properly.
For your example use:
with open("filename") as f:
for line in f:
# ... do stuff ...
Some Pythons will close files automatically when they are no longer referenced, while others will not and it's up to the O/S to close files when the Python interpreter exits.
Even for the Pythons that will close files for you, the timing is not guaranteed: it could be immediately, or it could be seconds/minutes/hours/days later.
So, while you may not experience problems with the Python you are using, it is definitely not good practice to leave your files open. In fact, in cpython 3 you will now get warnings that the system had to close files for you if you didn't do it.
Moral: Clean up after yourself. :)
Although it is quite safe to use such construct in this particular case, there are some caveats for generalising such practice:
run can potentially run out of file descriptors, although unlikely, imagine hunting a bug like that
you may not be able to delete said file on some systems, e.g. win32
if you run anything other than CPython, you don't know when file is closed for you
if you open the file in write or read-write mode, you don't know when data is flushed
The file does get garbage collected, and hence closed. The GC determines when it gets closed, not you. Obviously, this is not a recommended practice because you might hit open file handle limit if you do not close files as soon as you finish using them. What if within that for loop of yours, you open more files and leave them lingering?
Hi It is very important to close your file descriptor in situation when you are going to use it's content in the same python script. I today itself realize after so long hecting debugging. The reason is content will be edited/removed/saved only after you close you file descriptor and changes are affected to file!
So suppose you have situation that you write content to a new file and then without closing fd you are using that file(not fd) in another shell command which reads its content. In this situation you will not get you contents for shell command as expected and if you try to debug you can't find the bug easily. you can also read more in my blog entry http://magnificentzps.blogspot.in/2014/04/importance-of-closing-file-descriptor.html
During the I/O process, data is buffered: this means that it is held in a temporary location before being written to the file.
Python doesn't flush the buffer—that is, write data to the file—until it's sure you're done writing. One way to do this is to close the file.
If you write to a file without closing, the data won't make it to the target file.
Python uses close() method to close the opened file. Once the file is closed, you cannot read/write data in that file again.
If you will try to access the same file again, it will raise ValueError since the file is already closed.
Python automatically closes the file, if the reference object has been assigned to some another file. Closing the file is a standard practice as it reduces the risk of being unwarrantedly modified.
One another way to solve this issue is.... with statement
If you open a file using with statement, a temporary variable gets reserved for use to access the file and it can only be accessed with the indented block. With statement itself calls the close() method after execution of indented code.
Syntax:
with open('file_name.text') as file:
#some code here
In Python, if you either open a file without calling close(), or close the file but not using try-finally or the "with" statement, is this a problem? Or does it suffice as a coding practice to rely on the Python garbage-collection to close all files? For example, if one does this:
for line in open("filename"):
# ... do stuff ...
... is this a problem because the file can never be closed and an exception could occur that prevents it from being closed? Or will it definitely be closed at the conclusion of the for statement because the file goes out of scope?
In your example the file isn't guaranteed to be closed before the interpreter exits. In current versions of CPython the file will be closed at the end of the for loop because CPython uses reference counting as its primary garbage collection mechanism but that's an implementation detail, not a feature of the language. Other implementations of Python aren't guaranteed to work this way. For example IronPython, PyPy, and Jython don't use reference counting and therefore won't close the file at the end of the loop.
It's bad practice to rely on CPython's garbage collection implementation because it makes your code less portable. You might not have resource leaks if you use CPython, but if you ever switch to a Python implementation which doesn't use reference counting you'll need to go through all your code and make sure all your files are closed properly.
For your example use:
with open("filename") as f:
for line in f:
# ... do stuff ...
Some Pythons will close files automatically when they are no longer referenced, while others will not and it's up to the O/S to close files when the Python interpreter exits.
Even for the Pythons that will close files for you, the timing is not guaranteed: it could be immediately, or it could be seconds/minutes/hours/days later.
So, while you may not experience problems with the Python you are using, it is definitely not good practice to leave your files open. In fact, in cpython 3 you will now get warnings that the system had to close files for you if you didn't do it.
Moral: Clean up after yourself. :)
Although it is quite safe to use such construct in this particular case, there are some caveats for generalising such practice:
run can potentially run out of file descriptors, although unlikely, imagine hunting a bug like that
you may not be able to delete said file on some systems, e.g. win32
if you run anything other than CPython, you don't know when file is closed for you
if you open the file in write or read-write mode, you don't know when data is flushed
The file does get garbage collected, and hence closed. The GC determines when it gets closed, not you. Obviously, this is not a recommended practice because you might hit open file handle limit if you do not close files as soon as you finish using them. What if within that for loop of yours, you open more files and leave them lingering?
Hi It is very important to close your file descriptor in situation when you are going to use it's content in the same python script. I today itself realize after so long hecting debugging. The reason is content will be edited/removed/saved only after you close you file descriptor and changes are affected to file!
So suppose you have situation that you write content to a new file and then without closing fd you are using that file(not fd) in another shell command which reads its content. In this situation you will not get you contents for shell command as expected and if you try to debug you can't find the bug easily. you can also read more in my blog entry http://magnificentzps.blogspot.in/2014/04/importance-of-closing-file-descriptor.html
During the I/O process, data is buffered: this means that it is held in a temporary location before being written to the file.
Python doesn't flush the buffer—that is, write data to the file—until it's sure you're done writing. One way to do this is to close the file.
If you write to a file without closing, the data won't make it to the target file.
Python uses close() method to close the opened file. Once the file is closed, you cannot read/write data in that file again.
If you will try to access the same file again, it will raise ValueError since the file is already closed.
Python automatically closes the file, if the reference object has been assigned to some another file. Closing the file is a standard practice as it reduces the risk of being unwarrantedly modified.
One another way to solve this issue is.... with statement
If you open a file using with statement, a temporary variable gets reserved for use to access the file and it can only be accessed with the indented block. With statement itself calls the close() method after execution of indented code.
Syntax:
with open('file_name.text') as file:
#some code here
Im having an issue with a pretty simple piece of code. i have a file of 6551 lines(ASCII), and all i've managed to do so far is to read the file and print it.
a_file = open(myfile_path).readlines()
print a_file
upon trying to print, the interpreter gets completely stuck for a few minutes.
i've tried this both in idle and in jetbrains pycharm. im runnning windows server 2012 as my work work station and windows 7 at home - funny thing is that this worked perfectly on the weaker windows 7 machine back home(q9550 and 8gb ram) - but i cant(and neither the it guy) find a solution for this on my work station(i7 on x99, 64gb ram, gtx980).
would appreciate all and any assistance.
It's not a good idea to read a file to memory (that's what you do) exactly because you can face problems you encountered with.
If you want to print every line of the file, you can use the following construction:
with open(myfile_path) as input_file:
for line in input_file:
print line
for more complicated actions you (or the IT-guy) should better address to the documentation for open() method and file operations.
First, try this:
import sys
with open(myfile_path) as f:
for line in f:
print f
sys.stdout.flush()
It could be a number of things causing the script to hang such as the file being open or locked or the output blocking. This will cause everything that can print to print.
Additionally, generally doing things a line at a time is beneficial unless you actually need all of the lines at the same time in a list This shouldn't be the root cause here unless the line lengths are truly enormous but it's good style to not allocate giant data structures.
Additionally, setting non-blocking mode on the file will resolve issues such as the file being written to and locked (though this won't give you a solution that is stable but it will stop blocking on read instead). This is OS dependent and probably won't help you more than it will hurt.
If the issue is that the file is being written to (I've come across this a lot in Windows), another option is to copy the file being written to into a temp file and handing that copy to the script.
What you choose to do will depend greatly on whether you want to ensure you have all the data possible or whether you want to ensure the script runs immediately.
This question already has answers here:
Is explicitly closing files important?
(7 answers)
Is close() necessary when using iterator on a Python file object [duplicate]
(8 answers)
Closed 8 years ago.
Usually when I open files I never call the close() method, and nothing bad happens. But I've been told this is bad practice. Why is that?
For the most part, not closing files is a bad idea, for the following reasons:
It puts your program in the garbage collectors hands - though the file in theory will be auto closed, it may not be closed. Python 3 and Cpython generally do a pretty good job at garbage collecting, but not always, and other variants generally suck at it.
It can slow down your program. Too many things open, and thus more used space in the RAM, will impact performance.
For the most part, many changes to files in python do not go into effect until after the file is closed, so if your script edits, leaves open, and reads a file, it won't see the edits.
You could, theoretically, run in to limits of how many files you can have open.
As #sai stated below, Windows treats open files as locked, so legit things like AV scanners or other python scripts can't read the file.
It is sloppy programming (then again, I'm not exactly the best at remembering to close files myself!)
Found some good answers:
(1) It is a matter of good programming practice. If you don't close
them yourself, Python will eventually close them for you. In some
versions of Python, that might be the instant they are no longer
being used; in others, it might not happen for a long time. Under
some circumstances, it might not happen at all.
(2) When writing to a file, the data may not be written to disk until
the file is closed. When you say "output.write(...)", the data is
often cached in memory and doesn't hit the hard drive until the file
is closed. The longer you keep the file open, the greater the
chance that you will lose data.
(3) Since your operating system has strict limits on how many file
handles can be kept open at any one instant, it is best to get into
the habit of closing them when they aren't needed and not wait for
"maid service" to clean up after you.
(4) Also, some operating systems (Windows, in particular) treat open
files as locked and private. While you have a file open, no other
program can also open it, even just to read the data. This spoils
backup programs, anti-virus scanners, etc.
http://python.6.x6.nabble.com/Tutor-Why-do-you-have-to-close-files-td4341928.html
https://docs.python.org/2/tutorial/inputoutput.html
Open files use resources and may be locked, preventing other programs from using them. Anyway, it is good practice to use with when reading files, as it takes care of closing the file for you.
with open('file', 'r') as f:
read_data = f.read()
Here's an example of something "bad" that might happen if you leave a file open.
Open a file for writing in your python interpreter, write a string to it, then open that file in a text editor. On my system, the file will be empty until I close the file handle.
The close() method of a file object flushes any unwritten information and closes the file object, after which no more writing can be done.
Python automatically closes a file when the reference object of a file is reassigned to another file. It is a good practice to use the close() method to close a file.Here is the link about the close() method. I hope this helps.
You only have to call close() when you're writing to a file.
Python automatically closes files most of the time, but sometimes it won't, so you want to call it manually just in case.
I had a problem with that recently:
I was writing some stuff to a file in a for-loop, but if I interrupt the script with ^C, a lot of data which should have actually been written to the file wasn't there. It looks like Python stops to writing there for no reason. I opened the file before the for loop. Then I changed the code so that Python opens and closes the file for ever single pass of the loop.
Basically, if you write stuff for your own and you don't have any issues - it's fine, if you write stuff for more people than just yourself - put a close() inside the code, because someone could randomly get an error message and you should try to prevent this.
The description of tempfile.NamedTemporaryFile() says:
If delete is true (the default), the file is deleted as soon as it
is closed.
In some circumstances, this means that the file is not deleted after the
Python interpreter ends. For example, when running the following test under
py.test, the temporary file remains:
from __future__ import division, print_function, absolute_import
import tempfile
import unittest2 as unittest
class cache_tests(unittest.TestCase):
def setUp(self):
self.dbfile = tempfile.NamedTemporaryFile()
def test_get(self):
self.assertEqual('foo', 'foo')
In some way this makes sense, because this program never explicitly
closes the file object. The only other way for the object to get closed
would presumably be in the __del__ destructor, but here the language
references states that "It is not guaranteed that __del__() methods are
called for objects that still exist when the interpreter exits." So
everything is consistent with the documentation so far.
However, I'm confused about the implications of this. If it is not
guaranteed that file objects are closed on interpreter exit, can it
possibly happen that some data that was successfully written to a
(buffered) file object is lost even though the program exits gracefully,
because it was still in the file object's buffer, and the file object
never got closed?
Somehow that seems very unlikely and un-pythonic to me, and the open()
documentation doesn't contain any such warnings either. So I
(tentatively) conclude that file objects are, after all, guaranteed to
be closed.
But how does this magic happen, and why can't NamedTemporaryFile() use
the same magic to ensure that the file is deleted?
Edit: Note that I am not talking about file descriptors here (that are buffered by the OS and closed by the OS on program exit), but about Python file objects that may implement their own buffering.
On Windows, NamedTemporaryFile uses a Windows-specific extension (os.O_TEMPORARY) to ensure that the file is deleted when it is closed. This probably also works if the process is killed in any way. However there is no obvious equivalent on POSIX, most likely because on POSIX you can simply delete files that are still in use; it only deletes the name, and the file's content is only removed after it is closed (in any way). But indeed assuming that we want the file name to persist until the file is closed, like with NamedTemporaryFile, then we need "magic".
We cannot use the same magic as for flushing buffered files. What occurs there is that the C library handles it (in Python 2): the files are FILE objects in C, and the C guarantees that they are flushed on normal program exit (but not if the process is killed). In the case of Python 3, there is custom C code to achieve the same effect. But it's specific to this use case, not anything directly reusable.
That's why NamedTemporaryFile uses a custom __del__. And indeed, __del__ are not guaranteed to be called when the interpreter exits. (We can prove it with a global cycle of references that also references a NamedTemporaryFile instance; or running PyPy instead of CPython.)
As a side note, NamedTemporaryFile could be implemented a bit more robustly, e.g. by registering itself with atexit to ensure that the file name is removed then. But you can call it yourself too: if your process doesn't use an unbounded number of NamedTemporaryFiles, it's simply atexit.register(my_named_temporary_file.close).
On any version of *nix, all file descriptors are closed when a process finishes, and this is taken care of by the operating system. Windows is likely exactly the same in this respect. Without digging in the source code, I can't say with 100% authority what actually happens, but likely what happens is:
If delete is False, unlink() (or a function similar to it on other operating systems) is called. This means that the file will automatically be deleted when the process exits and there are no more open file descriptors. While the process is running, the file will still remain around.
If delete is True, likely the C function remove() is used. This will forcibly delete the file before the process exits.
The file buffering is handled by the Operating System. If you do not close a file after you open it, it is because you are assuming that the operating system will flush the buffer and close the file after the owner exists. This is not Python magic, this is your OS doing it's thing. The __del__() method is related to Python and requires explicit calls.