python 'with' statement, should I use contextlib.closing?

python 'with' statement, should I use contextlib.closing? - python

from contextlib import closing
def init_db():
with closing(connect_db()) as db:
with app.open_resource('schema.sql') as f:
db.cursor().executescript(f.read())
db.commit()
This is from flask tutorial Step 3(http://flask.pocoo.org/docs/tutorial/dbinit/#tutorial-dbinit). And I'm little curious about the line 4 of that.
Must I import and use that 'contextlib.closing()' method?
When I've learned about with statement, many articles said that it closes file automatically after process like below.(same as Finally: thing.close())
with open('filename','w') as f:
f.write(someString);
Even though I don't use that contextlib.closing() like below, What's difference?
It's from version 2.7.6, Thank you.
def init_db():
with connect_db() as db:
with app.open_resource('schema.sql') as f:
db.cursor().executescript(f.read())
db.commit()

Yes, you should be using context.closing(); your own version does something different entirely.
The with statement lets a context manager know when a block of code is entered and exited; on exit the context manager is also given access to the exception, if one occurred. File objects use this to automatically close the file when the block is exited.
The connect_db() function from the tutorial returns a sqlite3 connection object, which can indeed be used as a context manager. However, the connection.__exit__() method doesn't close the connection, it commits the transaction on a successful completion, or aborts it when there is an exception.
The contextlib.closing() context manager on the other hand, calls the connection.close() method on the connection. This is something entirely different.
So, your second snippet may work, but does something different. The tutorial code closes the connection, your version commits a transaction. You are already calling db.commit(), so the action is actually redundant provided no exceptions are raised.
You could use the connection as a context manager again to have the automatic transaction handling behaviour:
def init_db():
with closing(connect_db()) as db:
with app.open_resource('schema.sql') as f, db:
db.cursor().executescript(f.read())
Note the , db on the second with line, ensuring that the db.__exit__() method is called when the block exits.

The only thing done by the with statement is to call __enter__ method before entering its block and __exit__ method before exiting it.
If those methods are not defined the with statement won't work as you may expect. I don't know what is the return type of connect_db, but I guess that it could be many different things from different third-party libraries. So, your code without closing will probably work in many (all?) cases, but you never know what can be returned by connect_db.

Related

Check if a database connection is busy using python

I want to create a Database class which can create cursors on demand.
It must be possible to use the cursors in parallel (two or more cursor can coexist) and, since we can only have one cursor per connection, the Database class must handle multiple connections.
For performance reasons we want to reuse connections as much as possible and avoid creating a new connection every time a cursor is created:
whenever a request is made the class will try to find, among the opened connections, the first non-busy connection and use it.
A connection is still busy as long as the cursor has not been consumed.
Here is an example of such class:
class Database:
...
def get_cursos(self,query):
selected_connection = None
# Find usable connection
for con in self.connections:
if con.is_busy() == False: # <--- This is not PEP 249
selected_connection = con
break
# If all connections are busy, create a new one
if (selected_connection is None):
selected_connection = self._new_connection()
self.connections.append(selected_connection)
# Return cursor on query
cur = selected_connection.cursor()
cur.execute(query)
return cur
However looking at the PEP 249 standard I cannot find any way to check whether a connection is actually being used or not.
Some implementations such as MySQL Connector offer ways to check whether a connection has still unread content (see here), however as far as I know those are not part of PEP 249.
Is there a way I can achieve what described before for any PEP 249 compliant python database API ?

Perhaps you could use the status of the cursor to tell you if a cursor is being used. Let's say you had the following cursor:
new_cursor = new_connection.cursor()
cursor.execute(new_query)
and you wanted to see if that connection was available for another cursor to use. You might be able to do something like:
if (new_cursor.rowcount == -1):
another_new_cursor = new_connection.cursor()
...
Of course, all this really tells you is that the cursor hasn't executed anything yet since the last time it was closed. It could point to a cursor that is done (and therefore a connection that has been closed) or it could point to a cursor that has just been created or attached to a connection. Another option is to use a try/catch loop, something along the lines of:
try:
another_new_cursor = new_connection.cursor()
except ConnectionError?: //not actually sure which error would go here but you get the idea.
print("this connection is busy.")
Of course, you probably don't want to be spammed with printed messages but you can do whatever you want in that except block, sleep for 5 seconds, wait for some other variable to be passed, wait for user input, etc. If you are restricted to PEP 249, you are going to have to do a lot of things from scratch. Is there a reason you can't use external libraries?
EDIT: If you are willing to move outside of PEP 249, here is something that might work, but it may not be suitable for your purposes. If you make use of the mysql python library, you can take advantage of the is_connected method.
new_connection = mysql.connector.connect(host='myhost',
database='myDB',
user='me',
password='myPassword')
...stuff happens...
if (new_connection.is_connected()):
pass
else:
another_new_cursor = new_connection.cursor()
...

Why doesn't the MySQLdb Connection context manager close the cursor?

MySQLdb Connections have a rudimentary context manager that creates a cursor on enter, either rolls back or commits on exit, and implicitly doesn't suppress exceptions. From the Connection source:
def __enter__(self):
if self.get_autocommit():
self.query("BEGIN")
return self.cursor()
def __exit__(self, exc, value, tb):
if exc:
self.rollback()
else:
self.commit()
So, does anyone know why the cursor isn't closed on exit?
At first, I assumed it was because closing the cursor didn't do anything and that cursors only had a close method in deference to the Python DB API (see the comments to this answer). However, the fact is that closing the cursor burns through the remaining results sets, if any, and disables the cursor. From the cursor source:
def close(self):
"""Close the cursor. No further queries will be possible."""
if not self.connection: return
while self.nextset(): pass
self.connection = None
It would be so easy to close the cursor at exit, so I have to suppose that it hasn't been done on purpose. On the other hand, we can see that when a cursor is deleted, it is closed anyway, so I guess the garbage collector will eventually get around to it. I don't know much about garbage collection in Python.
def __del__(self):
self.close()
self.errorhandler = None
self._result = None
Another guess is that there may be a situation where you want to re-use the cursor after the with block. But I can't think of any reason why you would need to do this. Can't you always finish using the cursor inside its context, and just use a separate context for the next transaction?
To be very clear, this example obviously doesn't make sense:
with conn as cursor:
cursor.execute(select_stmt)
rows = cursor.fetchall()
It should be:
with conn as cursor:
cursor.execute(select_stmt)
rows = cursor.fetchall()
Nor does this example make sense:
# first transaction
with conn as cursor:
cursor.execute(update_stmt_1)
# second transaction, reusing cursor
try:
cursor.execute(update_stmt_2)
except:
conn.rollback()
else:
conn.commit()
It should just be:
# first transaction
with conn as cursor:
cursor.execute(update_stmt_1)
# second transaction, new cursor
with conn as cursor:
cursor.execute(update_stmt_2)
Again, what would be the harm in closing the cursor on exit, and what benefits are there to not closing it?

To answer your question directly: I cannot see any harm whatsoever in closing at the end of a with block. I cannot say why it is not done in this case. But, as there is a dearth of activity on this question, I had a search through the code history and will throw in a few thoughts (guesses) on why the close() may not be called:
There is a small chance that spinning through calls to nextset() may throw an exception - possibly this had been observed and seen as undesirable. This may be why the newer version of cursors.py contains this structure in close():
def close(self):
"""Close the cursor. No further queries will be possible."""
if not self.connection:
return
self._flush()
try:
while self.nextset():
pass
except:
pass
self.connection = None
There is the (somewhat remote) potential that it might take some time to spin through all the remaining results doing nothing. Therefore close() may not be called to avoid doing some unnecessary iterations. Whether you think it's worth saving those clock cycles is subjective, I suppose, but you could argue along the lines of "if it's not necessary, don't do it".
Browsing the sourceforge commits, the functionality was added to the trunk by this commit in 2007 and it appears that this section of connections.py has not changed since. That's a merge based on this commit, which has the message
Add Python-2.5 support for with statement as described in http://docs.python.org/whatsnew/pep-343.html Please test
And the code you quote has never changed since.
This prompts my final thought - it's probably just a first attempt / prototype that just worked and therefore never got changed.
More modern version
You link to source for a legacy version of the connector. I note there is a more active fork of the same library here, which I link to in my comments about "newer version" in point 1.
Note that the more recent version of this module has implemented __enter__() and __exit__() within cursor itself: see here. __exit__() here does call self.close() and perhaps this provides a more standard way to use the with syntax e.g.
with conn.cursor() as c:
#Do your thing with the cursor
End notes
N.B. I guess I should add, as far as I understand garbage collection (not an expert either) once there are no references to conn, it will be deallocated. At this point there will be no references to the cursor object and it will be deallocated too.
However calling cursor.close() does not mean that it will be garbage collected. It simply burns through the results and set the connection to None. This means it can't be re-used, but it won't be garbage collected immediately. You can convince yourself of that by manually calling cursor.close() after your with block and then, say, printing some attribute of cursor
N.B. 2 I think this is a somewhat unusual use of the with syntax as the conn object persists because it is already in the outer scope - unlike, say, the more common with open('filename') as f: where there are no objects hanging around with references after the end of the with block.

multiple execute on a single connection.cursor in django. Is it safe?

I am opening a cursor with connection.cursor executing a bunch of deletes then closing the cursor. It works, but I am not sure if it has any side effect. Would appreciate any feedback.
from django.db import connection
c=connection.cursor()
try:
c.execute('delete from table_a')
c.execute('delete from table_b')
...
finally:
c.close()

Since you are not executing these SQL statements in the transaction, you may encounter confusing states (for example, data was deleted from table_a, but not from table_b).
Django documentation says about this particular situation:
If you’re executing several custom SQL queries in a row, each one now
runs in its own transaction, instead of sharing the same “automatic
transaction”. If you need to enforce atomicity, you must wrap the
sequence of queries in atomic().
So, results of each execute() call are committed right after it, but we want them either all to pass, or all to fail - as a single one set of changes.
Wrap the view with a transacton.atomic decorator:
from django.db import transaction
#transaction.atomic
def my_view(request):
c = connection.cursor()
try:
c.execute('delete from table_a')
c.execute('delete from table_b')
finally:
c.close()
Note that atomic() and the whole new transaction management system were introduced in Django 1.6.
If you are using Django < 1.6, apply transaction.commit_on_success decorator.

Execute some code when an SQLAlchemy object's deletion is actually committed

I have a SQLAlchemy model that represents a file and thus contains the path to an actual file. Since deletion of the database row and file should go along (so no orphaned files are left and no rows point to deleted files) I added a delete() method to my model class:
def delete(self):
if os.path.exists(self.path):
os.remove(self.path)
db.session.delete(self)
This works fine but has one huge disadvantage: The file is deleted immediately before the transaction containing the database deletion is committed.
One option would be committing in the delete() method - but I don't want to do this since I might not be finished with the current transaction. So I'm looking for a way to delay the deletion of the physical file until the transaction deleting the row is actually committed.
SQLAlchemy has an after_delete event but according to the docs this is triggered when the SQL is emitted (i.e. on flush) which is too early. It also has an after_commit event but at this point everything deleted in the transaction has probably been deleted from SA.

When using SQLAlchemy in a Flask app with Flask-SQLAlchemy it provides a models_committed signal which receives a list of (model, operation) tuples. Using this signal doing what I'm looking for is extremely easy:
#models_committed.connect_via(app)
def on_models_committed(sender, changes):
for obj, change in changes:
if change == 'delete' and hasattr(obj, '__commit_delete__'):
obj.__commit_delete__()
With this generic function every model that needs on-delete-commit code now simply needs to have a method __commit_delete__(self) and do whatever it needs to do in that method.
It can also be done without Flask-SQLAlchemy, however, in this case it needs some more code:
A deletion needs to be recorded when it's performed. This is be done using the after_delete event.
Any recorded deletions need to be handled when a COMMIT is successful. This is done using the after_commit event.
In case the transaction fails or is manually rolled back the recorded changes also need to be cleared. This is done using the after_rollback() event.

This follows along with the other event-based answers, but I thought I'd post this code, since I wrote it to solve pretty much your exact problem:
The code (below) registers a SessionExtension class that accumulates all new, changed, and deleted objects as flushes occur, then clears or evaluates the queue when the session is actually committed or rolled back. For the classes which have an external file attached, I then implemented obj.after_db_new(session), obj.after_db_update(session), and/or obj.after_db_delete(session) methods which the SessionExtension invokes as appropriate; you can then populate those methods to take care of creating / saving / deleting the external files.
Note: I'm almost positive this could be rewritten in a cleaner manner using SqlAlchemy's new event system, and it has a few other flaws, but it's in production and working, so I haven't updated it :)
import logging; log = logging.getLogger(__name__)
from sqlalchemy.orm.session import SessionExtension
class TrackerExtension(SessionExtension):
def __init__(self):
self.new = set()
self.deleted = set()
self.dirty = set()
def after_flush(self, session, flush_context):
# NOTE: requires >= SA 0.5
self.new.update(obj for obj in session.new
if hasattr(obj, "after_db_new"))
self.deleted.update(obj for obj in session.deleted
if hasattr(obj, "after_db_delete"))
self.dirty.update(obj for obj in session.dirty
if hasattr(obj, "after_db_update"))
def after_commit(self, session):
# NOTE: this is rather hackneyed, in that it hides errors until
# the end, just so it can commit as many objects as possible.
# FIXME: could integrate this w/ twophase to make everything safer in case the methods fail.
log.debug("after commit: new=%r deleted=%r dirty=%r",
self.new, self.deleted, self.dirty)
ecount = 0
if self.new:
for obj in self.new:
try:
obj.after_db_new(session)
except:
ecount += 1
log.critical("error occurred in after_db_new: obj=%r",
obj, exc_info=True)
self.new.clear()
if self.deleted:
for obj in self.deleted:
try:
obj.after_db_delete(session)
except:
ecount += 1
log.critical("error occurred in after_db_delete: obj=%r",
obj, exc_info=True)
self.deleted.clear()
if self.dirty:
for obj in self.dirty:
try:
obj.after_db_update(session)
except:
ecount += 1
log.critical("error occurred in after_db_update: obj=%r",
obj, exc_info=True)
self.dirty.clear()
if ecount:
raise RuntimeError("%r object error during after_commit() ... "
"see traceback for more" % ecount)
def after_rollback(self, session):
self.new.clear()
self.deleted.clear()
self.dirty.clear()
# then add "extension=TrackerExtension()" to the Session constructor

this seems to be a bit challenging, Im curious if a sql trigger AFTER DELETE might be the best route for this, granted it won't be dry and Im not sure the sql database you are using supports it, still AFAIK sqlalchemy pushes transactions to the db but it really doesn't know when they have being committed, if Im interpreting this comment correctly:
its the database server itself that maintains all "pending" data in an ongoing transaction. The changes aren't persisted permanently to disk, and revealed publically to other transactions, until the database receives a COMMIT command which is what Session.commit() sends.
taken from SQLAlchemy: What's the difference between flush() and commit()? by the creator of sqlalchemy ...

If your SQLAlchemy backend supports it, enable two-phase commit. You will need to use (or write) a transaction model for the filesystem that:
checks permissions, etc. to ensure that the file exists and can be deleted during the first commit phase
actually deletes the file during the second commit phase.
That's probably as good as it's going to get. Unix filesystems, as far as I know, do not natively support XA or other two-phase transactional systems, so you will have to live with the small exposure from having a second-phase filesystem delete fail unexpectedly.

What is the python "with" statement designed for?

I came across the Python with statement for the first time today. I've been using Python lightly for several months and didn't even know of its existence! Given its somewhat obscure status, I thought it would be worth asking:
What is the Python with statement
designed to be used for?
What do
you use it for?
Are there any
gotchas I need to be aware of, or
common anti-patterns associated with
its use? Any cases where it is better use try..finally than with?
Why isn't it used more widely?
Which standard library classes are compatible with it?

I believe this has already been answered by other users before me, so I only add it for the sake of completeness: the with statement simplifies exception handling by encapsulating common preparation and cleanup tasks in so-called context managers. More details can be found in PEP 343. For instance, the open statement is a context manager in itself, which lets you open a file, keep it open as long as the execution is in the context of the with statement where you used it, and close it as soon as you leave the context, no matter whether you have left it because of an exception or during regular control flow. The with statement can thus be used in ways similar to the RAII pattern in C++: some resource is acquired by the with statement and released when you leave the with context.
Some examples are: opening files using with open(filename) as fp:, acquiring locks using with lock: (where lock is an instance of threading.Lock). You can also construct your own context managers using the contextmanager decorator from contextlib. For instance, I often use this when I have to change the current directory temporarily and then return to where I was:
from contextlib import contextmanager
import os
#contextmanager
def working_directory(path):
current_dir = os.getcwd()
os.chdir(path)
try:
yield
finally:
os.chdir(current_dir)
with working_directory("data/stuff"):
# do something within data/stuff
# here I am back again in the original working directory
Here's another example that temporarily redirects sys.stdin, sys.stdout and sys.stderr to some other file handle and restores them later:
from contextlib import contextmanager
import sys
#contextmanager
def redirected(**kwds):
stream_names = ["stdin", "stdout", "stderr"]
old_streams = {}
try:
for sname in stream_names:
stream = kwds.get(sname, None)
if stream is not None and stream != getattr(sys, sname):
old_streams[sname] = getattr(sys, sname)
setattr(sys, sname, stream)
yield
finally:
for sname, stream in old_streams.iteritems():
setattr(sys, sname, stream)
with redirected(stdout=open("/tmp/log.txt", "w")):
# these print statements will go to /tmp/log.txt
print "Test entry 1"
print "Test entry 2"
# back to the normal stdout
print "Back to normal stdout again"
And finally, another example that creates a temporary folder and cleans it up when leaving the context:
from tempfile import mkdtemp
from shutil import rmtree
#contextmanager
def temporary_dir(*args, **kwds):
name = mkdtemp(*args, **kwds)
try:
yield name
finally:
shutil.rmtree(name)
with temporary_dir() as dirname:
# do whatever you want

I would suggest two interesting lectures:
PEP 343 The "with" Statement
Effbot Understanding Python's
"with" statement
1.
The with statement is used to wrap the execution of a block with methods defined by a context manager. This allows common try...except...finally usage patterns to be encapsulated for convenient reuse.
2.
You could do something like:
with open("foo.txt") as foo_file:
data = foo_file.read()
OR
from contextlib import nested
with nested(A(), B(), C()) as (X, Y, Z):
do_something()
OR (Python 3.1)
with open('data') as input_file, open('result', 'w') as output_file:
for line in input_file:
output_file.write(parse(line))
OR
lock = threading.Lock()
with lock:
# Critical section of code
3.
I don't see any Antipattern here.
Quoting Dive into Python:
try..finally is good. with is better.
4.
I guess it's related to programmers's habit to use try..catch..finally statement from other languages.

The Python with statement is built-in language support of the Resource Acquisition Is Initialization idiom commonly used in C++. It is intended to allow safe acquisition and release of operating system resources.
The with statement creates resources within a scope/block. You write your code using the resources within the block. When the block exits the resources are cleanly released regardless of the outcome of the code in the block (that is whether the block exits normally or because of an exception).
Many resources in the Python library that obey the protocol required by the with statement and so can used with it out-of-the-box. However anyone can make resources that can be used in a with statement by implementing the well documented protocol: PEP 0343
Use it whenever you acquire resources in your application that must be explicitly relinquished such as files, network connections, locks and the like.

Again for completeness I'll add my most useful use-case for with statements.
I do a lot of scientific computing and for some activities I need the Decimal library for arbitrary precision calculations. Some part of my code I need high precision and for most other parts I need less precision.
I set my default precision to a low number and then use with to get a more precise answer for some sections:
from decimal import localcontext
with localcontext() as ctx:
ctx.prec = 42 # Perform a high precision calculation
s = calculate_something()
s = +s # Round the final result back to the default precision
I use this a lot with the Hypergeometric Test which requires the division of large numbers resulting form factorials. When you do genomic scale calculations you have to be careful of round-off and overflow errors.

An example of an antipattern might be to use the with inside a loop when it would be more efficient to have the with outside the loop
for example
for row in lines:
with open("outfile","a") as f:
f.write(row)
vs
with open("outfile","a") as f:
for row in lines:
f.write(row)
The first way is opening and closing the file for each row which may cause performance problems compared to the second way with opens and closes the file just once.

See PEP 343 - The 'with' statement, there is an example section at the end.
... new statement "with" to the Python
language to make
it possible to factor out standard uses of try/finally statements.

points 1, 2, and 3 being reasonably well covered:
4: it is relatively new, only available in python2.6+ (or python2.5 using from __future__ import with_statement)

The with statement works with so-called context managers:
http://docs.python.org/release/2.5.2/lib/typecontextmanager.html
The idea is to simplify exception handling by doing the necessary cleanup after leaving the 'with' block. Some of the python built-ins already work as context managers.

Another example for out-of-the-box support, and one that might be a bit baffling at first when you are used to the way built-in open() behaves, are connection objects of popular database modules such as:
sqlite3
psycopg2
cx_oracle
The connection objects are context managers and as such can be used out-of-the-box in a with-statement, however when using the above note that:
When the with-block is finished, either with an exception or without, the connection is not closed. In case the with-block finishes with an exception, the transaction is rolled back, otherwise the transaction is commited.
This means that the programmer has to take care to close the connection themselves, but allows to acquire a connection, and use it in multiple with-statements, as shown in the psycopg2 docs:
conn = psycopg2.connect(DSN)
with conn:
with conn.cursor() as curs:
curs.execute(SQL1)
with conn:
with conn.cursor() as curs:
curs.execute(SQL2)
conn.close()
In the example above, you'll note that the cursor objects of psycopg2 also are context managers. From the relevant documentation on the behavior:
When a cursor exits the with-block it is closed, releasing any resource eventually associated with it. The state of the transaction is not affected.

In python generally “with” statement is used to open a file, process the data present in the file, and also to close the file without calling a close() method. “with” statement makes the exception handling simpler by providing cleanup activities.
General form of with:
with open(“file name”, “mode”) as file_var:
processing statements
note: no need to close the file by calling close() upon file_var.close()

The answers here are great, but just to add a simple one that helped me:
with open("foo.txt") as file:
data = file.read()
open returns a file
Since 2.6 python added the methods __enter__ and __exit__ to file.
with is like a for loop that calls __enter__, runs the loop once and then calls __exit__
with works with any instance that has __enter__ and __exit__
a file is locked and not re-usable by other processes until it's closed, __exit__ closes it.
source: http://web.archive.org/web/20180310054708/http://effbot.org/zone/python-with-statement.htm

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.