python open & memory leaks

python open & memory leaks - python

When is a file closed?
# Read all data from input file
mergeData = open( "myinput.txt","r" )
allData = mergeData.read()
mergeData.close()
Can I substitute this code?
allData = read.open( "myinput.txt","r" )
I was wondering when the file would be closed? would be closed one the statement is run? or wait until the program exits.

CPython closes a file object automatically when the object is deleted; it is deleted when it's reference count drops to zero (no more variables refer to it). So if you use mergeData in a function, as soon as the function is done, the local variables are cleaned up and the file is closed.
If you use allData = open( "myinput.txt","r" ).read() the reference count drops to 0 the moment .read() returns, and on CPython that means the file is closed there and then.
On other implementations such as Jython or IronPython, where object lifetime is managed differently, the moment an object is actually deleted could be much later.
The best way to use a file though, is as a context manager:
with open( "myinput.txt","r" ) as mergeData:
allData = mergeData.read()
which calls .close() on mergeData automatically. See the file.open() documentation and the documentation for the with statement.

Yes. Yes you can. There is no memory leak or anything of the sort.
The file handle will be closed soon after the file object returned by open() goes out scope and is garbage collected.
Though if you prefer you might wish to do something like:
with open('myinput.txt') as f:
data = f.read()
This will ensure that the file is closed as soon as you're done with it.

Related

Is file closing necessary in this situation?

if I have:
fdata = open(pathf, "r").read().splitlines()
Will the file automatically close after getting the data? If not how can I close it since fdata is not a handle?
Thank you

Use
with open(pathf, "r") as r:
fdata = r.read().splitlines()
# as soon as you leave the with-scope, the file is autoclosed, even if exceptions happen.
Its not only about auto-closing, but also about correct closing in case of exceptions.
Doku: methods of file objects
It is good practice to use the with keyword when dealing with file
objects. The advantage is that the file is properly closed after its
suite finishes, even if an exception is raised at some point. Using
with is also much shorter than writing equivalent try-finally blocks:
If you’re not using the with keyword, then you should call f.close()
to close the file and immediately free up any system resources used by
it.
If you don’t explicitly close a file, Python’s garbage collector
will eventually destroy the object and close the open file for you,
but the file may stay open for a while. Another risk is that different
Python implementations will do this clean-up at different times.

The file will be automatically closed during exit or garbage collection. But as best practices matter, the better approach would be to use a context manager such as below:
with open(pathf, "r") as f:
fdata = f.read().splitlines()
Thank you.

If you use this:
with open(pathf, 'r') as f:
fdata = f.read().splitlines()
Then you don't have to close your file, it is done automatically. It's always good practice to have your files closed after you are done using them (reduces the risk of memory leaks, etc...)

Will the file automatically close after getting the data?
In your example, fdata is actually a list, not a file object. The file object is what returned by open().
If you had a name bound to a file object or fdata were a file object, the answer would be it depends.
If the file object, does not have any reference present i.e. it's reference count reaches 0, it will be garbage collected and hence will be destroyed in the process.
If not how can I close it since fdata is not a handle?
You can't as fdata is not a file object (like you mentioned) and you don't have any reference to the file object returned by open() either.
If you had a file object, you could explicitly call close() on it:
f_object.close()
Better yet, as the open is a context manager, use the with ... construct to let it close automatically upon the block end:
with open('file.txt') as f_object:
...
One added advantage is that the file will be closed in case of an exception too. If you are interested, check the __enter__ and __exit__ special methods of open.

File closes before async call finishes, causing IO error

I wrote a package that includes a function to upload something asynchronously. The intent is that the user can use my package, open a file, and upload it async. The problem is, depending on how the user writes their code, I get an IO error.
# EXAMPLE 1
with open("my_file", "rb") as my_file:
package.upload(my_file)
# I/O operation on closed file error
#EXAMPLE 2
my_file = open("my_file", "rb")
package.upload(my_file)
# everything works
I understand that in the first example the file is closing immediately because the call is async. I don't know how to fix this though. I can't tell the user they can't open files in the style of example 1. Can I do something in my package.upload() implementation to prevent the file from closing?

You can use os.dup to duplicate the file descriptor and shield the async process from a close in the caller. The duplicated handle shares other characteristics of the original such as the current file position, so you are not completely shielded from bad things the caller can do.
This also limits your process to things that have file descriptors. If you stick to using the standard file calls, then a user can hand in any file-like object instead of just a file on disk.
def upload(my_file):
my_file = os.fdopen(os.dup(my_file.fileno()))
# ...queue for async

If you are using with to open files, it will close when code block execution finishes inside with. In your case, just pass filename and open inside asynchronus function

Will file be closed after "open(file_name, 'w+').write(somestr)"

I'm new to python.
I wonder if I write:
open('/tmp/xxx.txt', 'w+').write('aabb')
Will the file be still opened or closed?
In another word, what's the difference between the above and
with open('/tmp/xxx.txt', 'w+') as f:
f.write('aabb')

The file might stay open.
Keep in mind that it will be automatically closed upon garbage collection or software termination but it's a bad practice to count on it as exceptions, frames or even delayed GC might keep it open.
Also, you might lose data if the program terminated unexpectedly and you don't flush() it.
In many distributions of python, where the GC is different (PyParallel for example) it might cause a big problem.
Even in CPython, it might still stay open in case of frame reference for example. Try running this:
import sys
glob_list = []
def func(*args, **kwargs):
glob_list.append((args, kwargs))
return func
sys.settrace(func)
open('/tmp/xxx.txt', 'w+').write('aabb')

File opened in a function doesn't need to be closed manually?

If I open a file in a function:
In [108]: def foo(fname):
...: f=open(fname)
...: print f
...:
In [109]: foo('t.py')
<open file 't.py', mode 'r' at 0x05DA1B78>
Is it better to close f manually or not? Why?

It is better to close the file when you are done with it because it is a good habit, but it isn't entirely necessary because the garbage collector will close the file for you. The reason you'd close it manually is to have more control. You don't know when the garbage collector will run.
But even better is to use the with statement introduced in python 2.5.
with open(f_name) as f:
# do stuff with f
# possibly throw an exception
This will close the file no matter what happens while in the scope of the with statement.

$ cat /proc/sys/fs/file-max
390957
this may break my system ( forgive me for not trying :) ):
fs = []
for i in range(390957+1):
fs.append(open(str(i), 'w'))
for f in files:
f.close()
this (hopefully) won't:
for i in range(390957+1):
with open(str(i), 'w') as f:
# do stuff

Yes, it is better to close the file manually or even better use the with statement when dealing with files(it will automatically close the file for you even if an exception occurs). In CPython an unreferenced file object object will be closed automatically when the garbage collector actually destroys the file object, until then any unflushed data/resources may still hang around in memory.
From docs:
It is good practice to use the with keyword when dealing with file
objects. This has the advantage that the file is properly closed after
its suite finishes, even if an exception is raised on the way.
Related:
File buffer flushing and closing in Python with variable re-assign
How does python close files that have been gc'ed?

Python order of execution

I was wondering if Python has similar issues as C regarding the order of execution of certain elements of code.
For example, I know in C there are times say when it's not guaranteed that some variable is initialized before another. Or just because one line of code is above another it's not guaranteed that it is implemented before all the ones below it.
Is it the same for Python? Like if I open a file of data, read in the data, close the file, then do other stuff do I know for sure that the file is closed before the lines after I close the file are executed??
The reason I ask is because I'm trying to read in a large file of data (1.6GB) and use this python module specific to the work I do on the data. When I run this module I get this error message:
File "/glast01/software/ScienceTools/ScienceTools-v9r15p2-SL4/sane/v3r18p1/python/GtApp.py", line 57, in run
input, output = self.runWithOutput(print_command)
File "/glast01/software/ScienceTools/ScienceTools-v9r15p2-SL4/sane/v3r18p1/python/GtApp.py", line 77, in runWithOutput
return os.popen4(self.command(print_command))
File "/Home/eud/jmcohen/.local/lib/python2.5/os.py", line 690, in popen4
stdout, stdin = popen2.popen4(cmd, bufsize)
File "/Home/eud/jmcohen/.local/lib/python2.5/popen2.py", line 199, in popen4
inst = Popen4(cmd, bufsize)
File "/Home/eud/jmcohen/.local/lib/python2.5/popen2.py", line 125, in __init__
self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory
>>>
Exception exceptions.AttributeError: AttributeError("Popen4 instance has no attribute 'pid'",) in <bound method Popen4.__del__ of <popen2.Popen4 instance at 0x9ee6fac>> ignored
I assume it's related to the size of the data I read in (it has 17608310 rows and 22 columns).
I thought perhaps if I closed the file I opened right after I read in the data this would help, but it didn't. This led me to thinking about the order that lines of code are executed in, hence my question.
Thanks

The only thing I can think of that may surprise some people is:
def test():
try:
return True
finally:
return False
print test()
Output:
False
finally clauses really are executed last, even if a return statement precedes them. However, this is not specific to Python.

Execution of C certainly is sequential, for actual statements. There are even rules that define the sequence points, so you can know how individual expressions evaluate.

CPython itself is written in such a way that any effects like those you mention are minimized; code always executes top to bottom barring literal evaluation during compilation, objects are GCed as soon as their refcount hits 0, etc.

Execution in the cpython vm is very linear. I do not think whatever problem you have has to do with order of execution.
One thing you should be careful about in Python but not C: exceptions can be raised everywhere, so just because you see a close() call below the corresponding open() call does not mean that call is actually reached. Use try/finally everywhere (or the with statement in new enough pythons) to make sure opened files are closed (and other kinds of resources that can be freed explicitly are freed).
If your problem is with memory usage, not some other kind of resource, debugging it can be harder. Memory cannot be freed explicitly in python. The cpython vm (which you are most likely using) does release memory as soon as the last reference to it goes away, but sometimes cannot free memory trapped in cycles with objects that have a __del__ method. If you have any __del__ methods of your own or use classes that have them this may be part of your problem.
Your actual question (the memory one, not the order of execution one) is hard to answer without seeing more code, though. It may be something obvious (or there may at least be some obvious way to reduce the amount of memory you need).

"if I open a file of data, read in the data, close the file, then do other stuff do I know for sure that the file is closed before the lines after I close the file are executed??"
Closed yes.
Released from memory. No. No guarantees about when garbage collection will occur.
Further, closing a file says nothing about all the other variables you've created and the other objects you've left laying around attached to those variables.
There's no "order of operations" issue.
I'll bet that you have too many global variables with too many copies of the data.

If the data consists of columns and rows, why not use the built in file iterator to fetch one line at a time?
f = open('file.txt')
first_line = f.next()

popen2.py:
class Popen4(Popen3):
childerr = None
def __init__(self, cmd, bufsize=-1):
_cleanup()
self.cmd = cmd
p2cread, p2cwrite = os.pipe()
c2pread, c2pwrite = os.pipe()
self.pid = os.fork()
if self.pid == 0:
# Child
os.dup2(p2cread, 0)
os.dup2(c2pwrite, 1)
os.dup2(c2pwrite, 2)
self._run_child(cmd)
os.close(p2cread)
self.tochild = os.fdopen(p2cwrite, 'w', bufsize)
os.close(c2pwrite)
self.fromchild = os.fdopen(c2pread, 'r', bufsize)
man 2 fork:
The fork() function may fail if:
[ENOMEM]
        Insufficient storage space is available.
os.popen4 eventually calls open2.Popen4.__init__, which must fork in order to create the child process that you try to read from/write to. This underlying call is failing, likely due to resource exhaustion.
You may be using too much memory elsewhere, causing fork to attempt to use more than the RLIMIT_DATA or RLIMIT_RSS limit given to your user. As recommended by Python memory profiler - Stack Overflow, Heapy can help you determine whether this is the case.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

python open & memory leaks - python

Related

Is file closing necessary in this situation?

File closes before async call finishes, causing IO error

Will file be closed after "open(file_name, 'w+').write(somestr)"

File opened in a function doesn't need to be closed manually?

Python order of execution

Categories

Resources