Will file be closed after "open(file_name, 'w+').write(somestr)" - python

I'm new to python.
I wonder if I write:
open('/tmp/xxx.txt', 'w+').write('aabb')
Will the file be still opened or closed?
In another word, what's the difference between the above and
with open('/tmp/xxx.txt', 'w+') as f:
f.write('aabb')

The file might stay open.
Keep in mind that it will be automatically closed upon garbage collection or software termination but it's a bad practice to count on it as exceptions, frames or even delayed GC might keep it open.
Also, you might lose data if the program terminated unexpectedly and you don't flush() it.
In many distributions of python, where the GC is different (PyParallel for example) it might cause a big problem.
Even in CPython, it might still stay open in case of frame reference for example. Try running this:
import sys
glob_list = []
def func(*args, **kwargs):
glob_list.append((args, kwargs))
return func
sys.settrace(func)
open('/tmp/xxx.txt', 'w+').write('aabb')

Related

Is it reasonable to wrap an entire main loop in a try..finally block?

I've made a map editor in Python2.7.9 for a small project and I'm looking for ways to preserve the data I edit in the event of some unhandled exception. My editor already has a method for saving out data, and my current solution is to have the main loop wrapped in a try..finally block, similar to this example:
import os, datetime #..and others.
if __name__ == '__main__':
DataMgr = DataManager() # initializes the editor.
save_note = None
try:
MainLoop() # unsurprisingly, this calls the main loop.
except Exception as e: # I am of the impression this will catch every type of exception.
save_note = "Exception dump: %s : %s." % (type(e).__name__, e) # A memo appended to the comments in the save file.
finally:
exception_fp = DataMgr.cwd + "dump_%s.kmap" % str(datetime.datetime.now())
DataMgr.saveFile(exception_fp, memo = save_note) # saves out to a dump file using a familiar method with a note outlining what happened.
This seems like the best way to make sure that, no matter what happens, an attempt is made to preserve the editor's current state (to the extent that saveFile() is equipped to do so) in the event that it should crash. But I wonder if encapsulating my entire main loop in a try block is actually safe and efficient and good form. Is it? Are there risks or problems? Is there a better or more conventional way?
Wrapping the main loop in a try...finally block is the accepted pattern when you need something to happen no matter what. In some cases it's logging and continuing, in others it's saving everything possible and quitting.
So you're code is fine.
If your file isn't that big, I would suggest maybe reading the entire input file into memory, closing the file, then doing your data processing on the copy in memory, this will solve any problems you have with not corrupting your data at the cost of potentially slowing down your runtime.
Alternatively, take a look at the atexit python module. This allows you to register a function(s) for a automatic callback function when the program exits.
That being said what you have should work reasonably well.

File opened in a function doesn't need to be closed manually?

If I open a file in a function:
In [108]: def foo(fname):
...: f=open(fname)
...: print f
...:
In [109]: foo('t.py')
<open file 't.py', mode 'r' at 0x05DA1B78>
Is it better to close f manually or not? Why?
It is better to close the file when you are done with it because it is a good habit, but it isn't entirely necessary because the garbage collector will close the file for you. The reason you'd close it manually is to have more control. You don't know when the garbage collector will run.
But even better is to use the with statement introduced in python 2.5.
with open(f_name) as f:
# do stuff with f
# possibly throw an exception
This will close the file no matter what happens while in the scope of the with statement.
$ cat /proc/sys/fs/file-max
390957
this may break my system ( forgive me for not trying :) ):
fs = []
for i in range(390957+1):
fs.append(open(str(i), 'w'))
for f in files:
f.close()
this (hopefully) won't:
for i in range(390957+1):
with open(str(i), 'w') as f:
# do stuff
Yes, it is better to close the file manually or even better use the with statement when dealing with files(it will automatically close the file for you even if an exception occurs). In CPython an unreferenced file object object will be closed automatically when the garbage collector actually destroys the file object, until then any unflushed data/resources may still hang around in memory.
From docs:
It is good practice to use the with keyword when dealing with file
objects. This has the advantage that the file is properly closed after
its suite finishes, even if an exception is raised on the way.
Related:
File buffer flushing and closing in Python with variable re-assign
How does python close files that have been gc'ed?

Why is my file getting closed if I don't do anything with it for a while?

Original situation:
The application I'm working on at the moment will receive notification from another application when a particular file has had data added and is ready to be read. At the moment I have something like this:
class Foo(object):
def __init__(self):
self.myFile = open("data.txt", "r")
self.myFile.seek(0, 2) #seeks to the end of the file
self.mainWindow = JFrame("Foo",
defaultCloseOperation = JFrame.EXIT_ON_CLOSE,
size = (640, 480))
self.btn = JButton("Check the file", actionPerformed=self.CheckFile)
self.mainWindow.add(self.btn)
self.mainWindow.visible = True
def CheckFile(self, event):
while True:
line = self.myFile.readline()
if not line:
break
print line
foo = Foo()
Eventually, CheckFile() will be triggered when a certain message is received on a socket. At the moment, I'm triggering it from a JButton.
Despite the fact that the file is not touched anywhere else in the program, and I'm not using with on the file, I keep on getting ValueError: I/O operation on closed file when I try to readline() it.
Initial Solution:
In trying to figure out when exactly the file was being closed, I changed my application code to:
foo = Foo()
while True:
if foo.myFile.closed == True:
print "File is closed!"
But then the problem went away! Or if I change it to:
foo = Foo()
foo.CheckFile()
then the initial CheckFile(), happening straight away, works. But then when I click the button ~5 seconds later, the exception is raised again!
After changing the infinite loop to just pass, and discovering that everything was still working, my conclusion was that initially, with nothing left to do after instantiating a Foo, the application code was ending, foo was going out of scope, and thus foo.myFile was going out of scope and the file was being closed. Despite this, swing was keeping the window open, which was then causing errors when I tried to operate on an unopened file.
Why I'm still confused:
The odd part is, if foo had gone out of scope, why then, was swing still able to hook into foo.CheckFile() at all? When I click on the JButton, shouldn't the error be that the object or method no longer exists, rather than the method being called successfully and giving an error on the file operation?
My next idea was that maybe, when the JButton attempted to call foo.CheckFile() and found that foo no longer existed, it created a new Foo, somehow skipped its __init__ and went straight to its CheckFile(). However, this doesn't seem to be the case either. If I modify Foo.__init__ to take a parameter, store that in self.myNum, and print it out in CheckFile(), the value I pass in when I instantiate the initial object is always there. This would seem to suggest that foo isn't going out of scope at all, which puts me right back where I started!!!
EDIT: Tidied question up with relevant info from comments, and deleted a lot of said comments.
* Initial, Partial Answer (Added to Question) *
I think I just figured this out. After foo = Foo(), with no code left to keep the module busy, it would appear that the object ceases to exist, despite the fact that the application is still running, with a Swing window doing stuff.
If I do this:
foo = Foo()
while True:
pass
Then everything works as I would expect.
I'm still confused though, as to how foo.CheckFile() was being called at all. If the problem was that foo.myFile was going out of scope and being closed, then how come foo.CheckFile() was able to be called by the JButton?
Maybe someone else can provide a better answer.
I think the problem arises from memory being partitioned into two types in Java, heap and non-heap.Your class instance foo gets stored in heap memory while its method CheckFile is loaded into the method area of non-heap memory. After your script finishes, there are no more references to foo so it gets marked for garbage collection, while the Swing interface is still referring to CheckFile, so it gets marked as in-use. I am assuming that foo.myFile is not considered static so it also is stored in heap memory. As for the Swing interface, it's presumably still being tracked as in-use as long as the window is open and being updated by the window manager.
Edit: your solution of using a while True loop is a correct one, in my opinion. Use it to monitor for events and when the window closes or the last line is read, exit the loop and let the program finish.
Edit 2: alternative solution - try having foo inherit from JFrame to make Swing keep a persistent pointer to it in its main loop as long as the window is open.

What is the python "with" statement designed for?

I came across the Python with statement for the first time today. I've been using Python lightly for several months and didn't even know of its existence! Given its somewhat obscure status, I thought it would be worth asking:
What is the Python with statement
designed to be used for?
What do
you use it for?
Are there any
gotchas I need to be aware of, or
common anti-patterns associated with
its use? Any cases where it is better use try..finally than with?
Why isn't it used more widely?
Which standard library classes are compatible with it?
I believe this has already been answered by other users before me, so I only add it for the sake of completeness: the with statement simplifies exception handling by encapsulating common preparation and cleanup tasks in so-called context managers. More details can be found in PEP 343. For instance, the open statement is a context manager in itself, which lets you open a file, keep it open as long as the execution is in the context of the with statement where you used it, and close it as soon as you leave the context, no matter whether you have left it because of an exception or during regular control flow. The with statement can thus be used in ways similar to the RAII pattern in C++: some resource is acquired by the with statement and released when you leave the with context.
Some examples are: opening files using with open(filename) as fp:, acquiring locks using with lock: (where lock is an instance of threading.Lock). You can also construct your own context managers using the contextmanager decorator from contextlib. For instance, I often use this when I have to change the current directory temporarily and then return to where I was:
from contextlib import contextmanager
import os
#contextmanager
def working_directory(path):
current_dir = os.getcwd()
os.chdir(path)
try:
yield
finally:
os.chdir(current_dir)
with working_directory("data/stuff"):
# do something within data/stuff
# here I am back again in the original working directory
Here's another example that temporarily redirects sys.stdin, sys.stdout and sys.stderr to some other file handle and restores them later:
from contextlib import contextmanager
import sys
#contextmanager
def redirected(**kwds):
stream_names = ["stdin", "stdout", "stderr"]
old_streams = {}
try:
for sname in stream_names:
stream = kwds.get(sname, None)
if stream is not None and stream != getattr(sys, sname):
old_streams[sname] = getattr(sys, sname)
setattr(sys, sname, stream)
yield
finally:
for sname, stream in old_streams.iteritems():
setattr(sys, sname, stream)
with redirected(stdout=open("/tmp/log.txt", "w")):
# these print statements will go to /tmp/log.txt
print "Test entry 1"
print "Test entry 2"
# back to the normal stdout
print "Back to normal stdout again"
And finally, another example that creates a temporary folder and cleans it up when leaving the context:
from tempfile import mkdtemp
from shutil import rmtree
#contextmanager
def temporary_dir(*args, **kwds):
name = mkdtemp(*args, **kwds)
try:
yield name
finally:
shutil.rmtree(name)
with temporary_dir() as dirname:
# do whatever you want
I would suggest two interesting lectures:
PEP 343 The "with" Statement
Effbot Understanding Python's
"with" statement
1.
The with statement is used to wrap the execution of a block with methods defined by a context manager. This allows common try...except...finally usage patterns to be encapsulated for convenient reuse.
2.
You could do something like:
with open("foo.txt") as foo_file:
data = foo_file.read()
OR
from contextlib import nested
with nested(A(), B(), C()) as (X, Y, Z):
do_something()
OR (Python 3.1)
with open('data') as input_file, open('result', 'w') as output_file:
for line in input_file:
output_file.write(parse(line))
OR
lock = threading.Lock()
with lock:
# Critical section of code
3.
I don't see any Antipattern here.
Quoting Dive into Python:
try..finally is good. with is better.
4.
I guess it's related to programmers's habit to use try..catch..finally statement from other languages.
The Python with statement is built-in language support of the Resource Acquisition Is Initialization idiom commonly used in C++. It is intended to allow safe acquisition and release of operating system resources.
The with statement creates resources within a scope/block. You write your code using the resources within the block. When the block exits the resources are cleanly released regardless of the outcome of the code in the block (that is whether the block exits normally or because of an exception).
Many resources in the Python library that obey the protocol required by the with statement and so can used with it out-of-the-box. However anyone can make resources that can be used in a with statement by implementing the well documented protocol: PEP 0343
Use it whenever you acquire resources in your application that must be explicitly relinquished such as files, network connections, locks and the like.
Again for completeness I'll add my most useful use-case for with statements.
I do a lot of scientific computing and for some activities I need the Decimal library for arbitrary precision calculations. Some part of my code I need high precision and for most other parts I need less precision.
I set my default precision to a low number and then use with to get a more precise answer for some sections:
from decimal import localcontext
with localcontext() as ctx:
ctx.prec = 42 # Perform a high precision calculation
s = calculate_something()
s = +s # Round the final result back to the default precision
I use this a lot with the Hypergeometric Test which requires the division of large numbers resulting form factorials. When you do genomic scale calculations you have to be careful of round-off and overflow errors.
An example of an antipattern might be to use the with inside a loop when it would be more efficient to have the with outside the loop
for example
for row in lines:
with open("outfile","a") as f:
f.write(row)
vs
with open("outfile","a") as f:
for row in lines:
f.write(row)
The first way is opening and closing the file for each row which may cause performance problems compared to the second way with opens and closes the file just once.
See PEP 343 - The 'with' statement, there is an example section at the end.
... new statement "with" to the Python
language to make
it possible to factor out standard uses of try/finally statements.
points 1, 2, and 3 being reasonably well covered:
4: it is relatively new, only available in python2.6+ (or python2.5 using from __future__ import with_statement)
The with statement works with so-called context managers:
http://docs.python.org/release/2.5.2/lib/typecontextmanager.html
The idea is to simplify exception handling by doing the necessary cleanup after leaving the 'with' block. Some of the python built-ins already work as context managers.
Another example for out-of-the-box support, and one that might be a bit baffling at first when you are used to the way built-in open() behaves, are connection objects of popular database modules such as:
sqlite3
psycopg2
cx_oracle
The connection objects are context managers and as such can be used out-of-the-box in a with-statement, however when using the above note that:
When the with-block is finished, either with an exception or without, the connection is not closed. In case the with-block finishes with an exception, the transaction is rolled back, otherwise the transaction is commited.
This means that the programmer has to take care to close the connection themselves, but allows to acquire a connection, and use it in multiple with-statements, as shown in the psycopg2 docs:
conn = psycopg2.connect(DSN)
with conn:
with conn.cursor() as curs:
curs.execute(SQL1)
with conn:
with conn.cursor() as curs:
curs.execute(SQL2)
conn.close()
In the example above, you'll note that the cursor objects of psycopg2 also are context managers. From the relevant documentation on the behavior:
When a cursor exits the with-block it is closed, releasing any resource eventually associated with it. The state of the transaction is not affected.
In python generally “with” statement is used to open a file, process the data present in the file, and also to close the file without calling a close() method. “with” statement makes the exception handling simpler by providing cleanup activities.
General form of with:
with open(“file name”, “mode”) as file_var:
processing statements
note: no need to close the file by calling close() upon file_var.close()
The answers here are great, but just to add a simple one that helped me:
with open("foo.txt") as file:
data = file.read()
open returns a file
Since 2.6 python added the methods __enter__ and __exit__ to file.
with is like a for loop that calls __enter__, runs the loop once and then calls __exit__
with works with any instance that has __enter__ and __exit__
a file is locked and not re-usable by other processes until it's closed, __exit__ closes it.
source: http://web.archive.org/web/20180310054708/http://effbot.org/zone/python-with-statement.htm

Python order of execution

I was wondering if Python has similar issues as C regarding the order of execution of certain elements of code.
For example, I know in C there are times say when it's not guaranteed that some variable is initialized before another. Or just because one line of code is above another it's not guaranteed that it is implemented before all the ones below it.
Is it the same for Python? Like if I open a file of data, read in the data, close the file, then do other stuff do I know for sure that the file is closed before the lines after I close the file are executed??
The reason I ask is because I'm trying to read in a large file of data (1.6GB) and use this python module specific to the work I do on the data. When I run this module I get this error message:
File "/glast01/software/ScienceTools/ScienceTools-v9r15p2-SL4/sane/v3r18p1/python/GtApp.py", line 57, in run
input, output = self.runWithOutput(print_command)
File "/glast01/software/ScienceTools/ScienceTools-v9r15p2-SL4/sane/v3r18p1/python/GtApp.py", line 77, in runWithOutput
return os.popen4(self.command(print_command))
File "/Home/eud/jmcohen/.local/lib/python2.5/os.py", line 690, in popen4
stdout, stdin = popen2.popen4(cmd, bufsize)
File "/Home/eud/jmcohen/.local/lib/python2.5/popen2.py", line 199, in popen4
inst = Popen4(cmd, bufsize)
File "/Home/eud/jmcohen/.local/lib/python2.5/popen2.py", line 125, in __init__
self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory
>>>
Exception exceptions.AttributeError: AttributeError("Popen4 instance has no attribute 'pid'",) in <bound method Popen4.__del__ of <popen2.Popen4 instance at 0x9ee6fac>> ignored
I assume it's related to the size of the data I read in (it has 17608310 rows and 22 columns).
I thought perhaps if I closed the file I opened right after I read in the data this would help, but it didn't. This led me to thinking about the order that lines of code are executed in, hence my question.
Thanks
The only thing I can think of that may surprise some people is:
def test():
try:
return True
finally:
return False
print test()
Output:
False
finally clauses really are executed last, even if a return statement precedes them. However, this is not specific to Python.
Execution of C certainly is sequential, for actual statements. There are even rules that define the sequence points, so you can know how individual expressions evaluate.
CPython itself is written in such a way that any effects like those you mention are minimized; code always executes top to bottom barring literal evaluation during compilation, objects are GCed as soon as their refcount hits 0, etc.
Execution in the cpython vm is very linear. I do not think whatever problem you have has to do with order of execution.
One thing you should be careful about in Python but not C: exceptions can be raised everywhere, so just because you see a close() call below the corresponding open() call does not mean that call is actually reached. Use try/finally everywhere (or the with statement in new enough pythons) to make sure opened files are closed (and other kinds of resources that can be freed explicitly are freed).
If your problem is with memory usage, not some other kind of resource, debugging it can be harder. Memory cannot be freed explicitly in python. The cpython vm (which you are most likely using) does release memory as soon as the last reference to it goes away, but sometimes cannot free memory trapped in cycles with objects that have a __del__ method. If you have any __del__ methods of your own or use classes that have them this may be part of your problem.
Your actual question (the memory one, not the order of execution one) is hard to answer without seeing more code, though. It may be something obvious (or there may at least be some obvious way to reduce the amount of memory you need).
"if I open a file of data, read in the data, close the file, then do other stuff do I know for sure that the file is closed before the lines after I close the file are executed??"
Closed yes.
Released from memory. No. No guarantees about when garbage collection will occur.
Further, closing a file says nothing about all the other variables you've created and the other objects you've left laying around attached to those variables.
There's no "order of operations" issue.
I'll bet that you have too many global variables with too many copies of the data.
If the data consists of columns and rows, why not use the built in file iterator to fetch one line at a time?
f = open('file.txt')
first_line = f.next()
popen2.py:
class Popen4(Popen3):
childerr = None
def __init__(self, cmd, bufsize=-1):
_cleanup()
self.cmd = cmd
p2cread, p2cwrite = os.pipe()
c2pread, c2pwrite = os.pipe()
self.pid = os.fork()
if self.pid == 0:
# Child
os.dup2(p2cread, 0)
os.dup2(c2pwrite, 1)
os.dup2(c2pwrite, 2)
self._run_child(cmd)
os.close(p2cread)
self.tochild = os.fdopen(p2cwrite, 'w', bufsize)
os.close(c2pwrite)
self.fromchild = os.fdopen(c2pread, 'r', bufsize)
man 2 fork:
The fork() function may fail if:
[ENOMEM]
        Insufficient storage space is available.
os.popen4 eventually calls open2.Popen4.__init__, which must fork in order to create the child process that you try to read from/write to. This underlying call is failing, likely due to resource exhaustion.
You may be using too much memory elsewhere, causing fork to attempt to use more than the RLIMIT_DATA or RLIMIT_RSS limit given to your user. As recommended by Python memory profiler - Stack Overflow, Heapy can help you determine whether this is the case.

Categories

Resources