I'm building a WSGI web app and I have a MySQL database. I'm using MySQLdb, which provides cursors for executing statements and getting results. What is the standard practice for getting and closing cursors? In particular, how long should my cursors last? Should I get a new cursor for each transaction?
I believe you need to close the cursor before committing the connection. Is there any significant advantage to finding sets of transactions that don't require intermediate commits so that you don't have to get new cursors for each transaction? Is there a lot of overhead for getting new cursors, or is it just not a big deal?
Instead of asking what is standard practice, since that's often unclear and subjective, you might try looking to the module itself for guidance. In general, using the with keyword as another user suggested is a great idea, but in this specific circumstance it may not give you quite the functionality you expect.
As of version 1.2.5 of the module, MySQLdb.Connection implements the context manager protocol with the following code (github):
def __enter__(self):
if self.get_autocommit():
self.query("BEGIN")
return self.cursor()
def __exit__(self, exc, value, tb):
if exc:
self.rollback()
else:
self.commit()
There are several existing Q&A about with already, or you can read Understanding Python's "with" statement, but essentially what happens is that __enter__ executes at the start of the with block, and __exit__ executes upon leaving the with block. You can use the optional syntax with EXPR as VAR to bind the object returned by __enter__ to a name if you intend to reference that object later. So, given the above implementation, here's a simple way to query your database:
connection = MySQLdb.connect(...)
with connection as cursor: # connection.__enter__ executes at this line
cursor.execute('select 1;')
result = cursor.fetchall() # connection.__exit__ executes after this line
print result # prints "((1L,),)"
The question now is, what are the states of the connection and the cursor after exiting the with block? The __exit__ method shown above calls only self.rollback() or self.commit(), and neither of those methods go on to call the close() method. The cursor itself has no __exit__ method defined – and wouldn't matter if it did, because with is only managing the connection. Therefore, both the connection and the cursor remain open after exiting the with block. This is easily confirmed by adding the following code to the above example:
try:
cursor.execute('select 1;')
print 'cursor is open;',
except MySQLdb.ProgrammingError:
print 'cursor is closed;',
if connection.open:
print 'connection is open'
else:
print 'connection is closed'
You should see the output "cursor is open; connection is open" printed to stdout.
I believe you need to close the cursor before committing the connection.
Why? The MySQL C API, which is the basis for MySQLdb, does not implement any cursor object, as implied in the module documentation: "MySQL does not support cursors; however, cursors are easily emulated." Indeed, the MySQLdb.cursors.BaseCursor class inherits directly from object and imposes no such restriction on cursors with regard to commit/rollback. An Oracle developer had this to say:
cnx.commit() before cur.close() sounds most logical to me. Maybe you
can go by the rule: "Close the cursor if you do not need it anymore."
Thus commit() before closing the cursor. In the end, for
Connector/Python, it does not make much difference, but or other
databases it might.
I expect that's as close as you're going to get to "standard practice" on this subject.
Is there any significant advantage to finding sets of transactions that don't require intermediate commits so that you don't have to get new cursors for each transaction?
I very much doubt it, and in trying to do so, you may introduce additional human error. Better to decide on a convention and stick with it.
Is there a lot of overhead for getting new cursors, or is it just not a big deal?
The overhead is negligible, and doesn't touch the database server at all; it's entirely within the implementation of MySQLdb. You can look at BaseCursor.__init__ on github if you're really curious to know what's happening when you create a new cursor.
Going back to earlier when we were discussing with, perhaps now you can understand why the MySQLdb.Connection class __enter__ and __exit__ methods give you a brand new cursor object in every with block and don't bother keeping track of it or closing it at the end of the block. It's fairly lightweight and exists purely for your convenience.
If it's really that important to you to micromanage the cursor object, you can use contextlib.closing to make up for the fact that the cursor object has no defined __exit__ method. For that matter, you can also use it to force the connection object to close itself upon exiting a with block. This should output "my_curs is closed; my_conn is closed":
from contextlib import closing
import MySQLdb
with closing(MySQLdb.connect(...)) as my_conn:
with closing(my_conn.cursor()) as my_curs:
my_curs.execute('select 1;')
result = my_curs.fetchall()
try:
my_curs.execute('select 1;')
print 'my_curs is open;',
except MySQLdb.ProgrammingError:
print 'my_curs is closed;',
if my_conn.open:
print 'my_conn is open'
else:
print 'my_conn is closed'
Note that with closing(arg_obj) will not call the argument object's __enter__ and __exit__ methods; it will only call the argument object's close method at the end of the with block. (To see this in action, simply define a class Foo with __enter__, __exit__, and close methods containing simple print statements, and compare what happens when you do with Foo(): pass to what happens when you do with closing(Foo()): pass.) This has two significant implications:
First, if autocommit mode is enabled, MySQLdb will BEGIN an explicit transaction on the server when you use with connection and commit or rollback the transaction at the end of the block. These are default behaviors of MySQLdb, intended to protect you from MySQL's default behavior of immediately committing any and all DML statements. MySQLdb assumes that when you use a context manager, you want a transaction, and uses the explicit BEGIN to bypass the autocommit setting on the server. If you're used to using with connection, you might think autocommit is disabled when actually it was only being bypassed. You might get an unpleasant surprise if you add closing to your code and lose transactional integrity; you won't be able to rollback changes, you may start seeing concurrency bugs and it may not be immediately obvious why.
Second, with closing(MySQLdb.connect(user, pass)) as VAR binds the connection object to VAR, in contrast to with MySQLdb.connect(user, pass) as VAR, which binds a new cursor object to VAR. In the latter case you would have no direct access to the connection object! Instead, you would have to use the cursor's connection attribute, which provides proxy access to the original connection. When the cursor is closed, its connection attribute is set to None. This results in an abandoned connection that will stick around until one of the following happens:
All references to the cursor are removed
The cursor goes out of scope
The connection times out
The connection is closed manually via server administration tools
You can test this by monitoring open connections (in Workbench or by using SHOW PROCESSLIST) while executing the following lines one by one:
with MySQLdb.connect(...) as my_curs:
pass
my_curs.close()
my_curs.connection # None
my_curs.connection.close() # throws AttributeError, but connection still open
del my_curs # connection will close here
It's better to rewrite it using 'with' keyword. 'With' will take care about closing cursor (it's important because it's unmanaged resource) automatically. The benefit is it will close cursor in case of exception too.
from contextlib import closing
import MySQLdb
''' At the beginning you open a DB connection. Particular moment when
you open connection depends from your approach:
- it can be inside the same function where you work with cursors
- in the class constructor
- etc
'''
db = MySQLdb.connect("host", "user", "pass", "database")
with closing(db.cursor()) as cur:
cur.execute("somestuff")
results = cur.fetchall()
# do stuff with results
cur.execute("insert operation")
# call commit if you do INSERT, UPDATE or DELETE operations
db.commit()
cur.execute("someotherstuff")
results2 = cur.fetchone()
# do stuff with results2
# at some point when you decided that you do not need
# the open connection anymore you close it
db.close()
Note: this answer is for PyMySQL, which is a drop-in replacement for MySQLdb and effectively the latest version of MySQLdb since MySQLdb stopped being maintained. I believe everything here is also true of the legacy MySQLdb, but haven't checked.
First of all, some facts:
Python's with syntax calls the context manager's __enter__ method before executing the body of the with block, and its __exit__ method afterwards.
Connections have an __enter__ method that does nothing besides create and return a cursor, and an __exit__ method that either commits or rolls back (depending upon whether an exception was thrown). It does not close the connection.
Cursors in PyMySQL are purely an abstraction implemented in Python; there is no equivalent concept in MySQL itself.1
Cursors have an __enter__ method that doesn't do anything and an __exit__ method which "closes" the cursor (which just means nulling the cursor's reference to its parent connection and throwing away any data stored on the cursor).
Cursors hold a reference to the connection that spawned them, but connections don't hold a reference to the cursors that they've created.
Connections have a __del__ method which closes them
Per https://docs.python.org/3/reference/datamodel.html, CPython (the default Python implementation) uses reference counting and automatically deletes an object once the number of references to it hits zero.
Putting these things together, we see that naive code like this is in theory problematic:
# Problematic code, at least in theory!
import pymysql
with pymysql.connect() as cursor:
cursor.execute('SELECT 1')
# ... happily carry on and do something unrelated
The problem is that nothing has closed the connection. Indeed, if you paste the code above into a Python shell and then run SHOW FULL PROCESSLIST at a MySQL shell, you'll be able to see the idle connection that you created. Since MySQL's default number of connections is 151, which isn't huge, you could theoretically start running into problems if you had many processes keeping these connections open.
However, in CPython, there is a saving grace that ensures that code like my example above probably won't cause you to leave around loads of open connections. That saving grace is that as soon as cursor goes out of scope (e.g. the function in which it was created finishes, or cursor gets another value assigned to it), its reference count hits zero, which causes it to be deleted, dropping the connection's reference count to zero, causing the connection's __del__ method to be called which force-closes the connection. If you already pasted the code above into your Python shell, then you can now simulate this by running cursor = 'arbitrary value'; as soon as you do this, the connection you opened will vanish from the SHOW PROCESSLIST output.
However, relying upon this is inelegant, and theoretically might fail in Python implementations other than CPython. Cleaner, in theory, would be to explicitly .close() the connection (to free up a connection on the database without waiting for Python to destroy the object). This more robust code looks like this:
import contextlib
import pymysql
with contextlib.closing(pymysql.connect()) as conn:
with conn as cursor:
cursor.execute('SELECT 1')
This is ugly, but doesn't rely upon Python destructing your objects to free up your (finite available number of) database connections.
Note that closing the cursor, if you're already closing the connection explicitly like this, is entirely pointless.
Finally, to answer the secondary questions here:
Is there a lot of overhead for getting new cursors, or is it just not a big deal?
Nope, instantiating a cursor doesn't hit MySQL at all and basically does nothing.
Is there any significant advantage to finding sets of transactions that don't require intermediate commits so that you don't have to get new cursors for each transaction?
This is situational and difficult to give a general answer to. As https://dev.mysql.com/doc/refman/en/optimizing-innodb-transaction-management.html puts it, "an application might encounter performance issues if it commits thousands of times per second, and different performance issues if it commits only every 2-3 hours". You pay a performance overhead for every commit, but by leaving transactions open for longer, you increase the chance of other connections having to spend time waiting for locks, increase your risk of deadlocks, and potentially increase the cost of some lookups performed by other connections.
1 MySQL does have a construct it calls a cursor but they only exist inside stored procedures; they're completely different to PyMySQL cursors and are not relevant here.
I think you'll be better off trying to use one cursor for all of your executions, and close it at the end of your code. It's easier to work with, and it might have efficiency benefits as well (don't quote me on that one).
conn = MySQLdb.connect("host","user","pass","database")
cursor = conn.cursor()
cursor.execute("somestuff")
results = cursor.fetchall()
..do stuff with results
cursor.execute("someotherstuff")
results2 = cursor.fetchall()
..do stuff with results2
cursor.close()
The point is that you can store the results of a cursor's execution in another variable, thereby freeing your cursor to make a second execution. You run into problems this way only if you're using fetchone(), and need to make a second cursor execution before you've iterated through all results from the first query.
Otherwise, I'd say just close your cursors as soon as you're done getting all of the data out of them. That way you don't have to worry about tying up loose ends later in your code.
I suggest to do it like php and mysql. Start i at the beginning of your code before printing of the first data. So if you get a connect error you can display a 50x(Don't remember what internal error is) error message. And keep it open for the whole session and close it when you know you wont need it anymore.
Related
When I do a engine=create_engine(...) and then engine.execute(SQL) does SQLAlchemy manage the closing of connection/cursor with the execute statement or is it something I need to do myself?
I've looked at the execute method but it wasn't clear for me.
First of all, from the bits of code that you show, you are looking at the wrong documentation for execute.
The right doc for execute from what you show to be your code is the following:
engine.execute.
Here it says
The returned ResultProxy is flagged such that when the ResultProxy is exhausted and its underlying cursor is closed, the Connection created here will also be closed, which allows its associated DBAPI connection resource to be returned to the connection pool.
You are using a
connectionless execution
and you can see the example presented there:
engine = create_engine('sqlite:///file.db')
result = engine.execute(users_table.select())
for row in result:
# ....
result.close()
So the result variable is a ResultProxy and you can see that in the example that is explicitly closed.
The documentation ResultProxy.close() tells exactly what gets closed, and your case seems to belong to connectionless execution so once closed, also the connection will be closed.
These bits of the doc on close are particularly important:
This closes out the underlying DBAPI cursor corresponding to the statement execution, if one is still present. Note that the DBAPI cursor is automatically released when the ResultProxy exhausts all available rows. ResultProxy.close() is generally an optional method except in the case when discarding a ResultProxy that still has additional rows pending for fetch.
In the case of a result that is the product of connectionless execution, the underlying Connection object is also closed, which releases DBAPI connection resources.
I suggest to also read carefully the notes about changes in the 1.0.0 version:
Changed in version 1.0.0: - the ResultProxy.close() method has been separated out from the process that releases the underlying DBAPI cursor resource. The “auto close” feature of the Connection now performs a so-called “soft close”, which releases the underlying DBAPI cursor, but allows the ResultProxy to still behave as an open-but-exhausted result set; the actual ResultProxy.close() method is never called. It is still safe to discard a ResultProxy that has been fully exhausted without calling this method.
I think now you should have all the material and decide what fit your case.
Personally, even if it says the close is optional in most cases except that case mentioned, I would use it.
cnx = mysql.connector.connect(user="root", password="pass", host="127.0.0.1", database="db")
cursor = cnx.cursor()
return cursor, cnx
Lets say I open a connection like the above, and make a select query cursor.execute(selectquery) and use cursor.fetchall() to get the data.
If I don't cnx.close() and cursor.close() what will happen when I continuously open and do select queries like this?
Will the connection stay open? Or are there any "garbage patrol" that can see it's not getting used?
Let's assume you are reusing the same variable name, or that it went out of scope (the function that contained it returns) without being explicitly closed. The most common version of python has reference counting, which should detect that there's no longer any way to access your connection object. So eventually the connection should be closed. But python does not guarantee this behavior, and in fact (as I implied) it does not hold for all python engines.
Now suppose you used two different variables to hold two simultaneously open connections. This is entirely legal, and occasionally useful. Mostly you won't notice anything odd, but if transactions are enabled and you abandon the first connection without closing it or committing the transaction, database changes made in this transaction will not yet be visible from the other connection. That's how transactions are meant to work.
If you don't like the style of closing explicitly, you can guard the connection with a with statement:
with mysql.connector.connect(...) as cnx:
cursor = cnx.cursor()
...
The connection will be closed for you as soon as execution leaves the with block, no matter by what route.
I'm using cx_Oracle module in python. Do we need to close opened cursors explicitly? What will happen when we miss to close the cursor after fetching data and closing only the connection object (con.close()) without issuing cursor.close()?
Will there be any chance of memory leak in this situation?
According to the documentation of cx_Oracle, the cursor should be garbage-collected automatically, and there should be no risk of a leak.
However, in my anecdotal experience, if I didn't close the cursor explicitly, the memory usage of the process would grow without bounds -- this might have something to do with my usage of a custom rowfactory, where I had to capture the cursor in a lambda (although, in theory, the GC should be able to handle this case as well).
Since the Cursor class implements the context manager pattern, you can safely and conveniently write:
with connection.cursor() as cursor:
cursor.execute("...")
If you use multiple cursor. cursor.close() will help you to release the resources you don't need anymore.
If you just use one cursor with one connection. I think connection.close() is fine.
The following is a common pattern I have in my code, and I was wondering more about the internals of cursors and connections.
cursor = connection.cursor()
cursor.execute("SET NAMES utf8")
cursor.execute(sql, args)
results = cursor.fetchall()
cursor.close()
What is the difference between a connection to a database and a cursor? Is there any downside of having an open connection (for example, for a few minutes?). What about have un-closed cursors, what is the effect? When executing multiple SQL statements in succession, should a new cursor be created each time?
It depends on the underlying implementation - what the Cursor object actually IS inside the driver.
In many DB-api implementations, the Cursor object isn't "interesting" (i.e. you can keep lots of them and let the GC worry about them), especially if you haven't done a query which returns result sets.
I've not used Python with Oracle, but I suspect (based on experience with JDBC and others) that this is not the case in Oracle. Oracle JDBC drivers have server-side cursors which it is vitally important that you close quickly (there is a fairly low default per-connection cursor limit; exceeding the limit causes a failure trying to open another).
In Oracle, relying on the GC to close your cursors might be hazardous if, for example, you open a new cursor in the loop and the GC keeps them all until the looping function returns.
if this is true, it might be helpful to use a with-statement construction to ensure that the cursor is closed in a timely fashion, even if an exception occurs.
UPDATE: You can use contextlib.closing as a context-manager such as
with contextlib.closing(myconnection.cursor()) as curs:
curs.execute(...
# even if exception happens, cursor is still closed immediately
# after this block
Cursor is something similar to iterator in python. It enables you to traverse result set without keeping it whole in memory. Cursor can be implemented differently for each RDBMS you are using.
Unclosed cursor will use some memory if garbage collector doesn't delete it.
You can open more than one cursor inside one connection.
You can keep connection open. Depending on database you are using, open connection will use some resources and there can be limit of how much counnections can be opened at one time.
I am confused about why python needs cursor object. I know jdbc and there the database connection is quite intuitive but in python I am confused with cursor object. Also I am doubtful about what is the difference between cursor.close() and connection.close() function in terms of resource release.
The cursor paradigm is not specific to Python but are a frequent data structure in databases themselves.
Depending on the underlying implementation it may be possible to generate several cursors sharing the same connection to a database. Closing the cursor should free resources associated to the query, including any results never fetched from the DB (or fetched but not used) but would not eliminate the connection to the database itself so you would be able to get a new cursor on the same database without the need to authenticate again.
As others mention, a Connection() is the network connection to the database, and it's only real use is to return cursors. PEP-249, where DBApi 2.0 is specified, does not clearly define what exactly a connection or cursor is, nor what the close() method on each must do; only that
<module>.connect()
must return an instance of
<module>.Connection
, that
<module>.Connection.cursor()
must return an instance of
<module>.Cursor
, and
<module>.Cursor.execute()
should invoke the statement provided and return the resulting rows. In particular, it does not define a
<module>.Connection.execute()
, although specific implementations are free to implement them as extensions.
Depending on those extensions is probably unwise, though, since it means you won't have as portable code. DBApi makes this two-level requirement because having an execute on the connection without an intermediate object can be difficult on some databases.
Connection object is your connection to the database, close that when you're done talking to the database all together. Cursor object is an iterator over a result set from a query. Close those when you're done with that result set.