I want to create a Database class which can create cursors on demand.
It must be possible to use the cursors in parallel (two or more cursor can coexist) and, since we can only have one cursor per connection, the Database class must handle multiple connections.
For performance reasons we want to reuse connections as much as possible and avoid creating a new connection every time a cursor is created:
whenever a request is made the class will try to find, among the opened connections, the first non-busy connection and use it.
A connection is still busy as long as the cursor has not been consumed.
Here is an example of such class:
class Database:
...
def get_cursos(self,query):
selected_connection = None
# Find usable connection
for con in self.connections:
if con.is_busy() == False: # <--- This is not PEP 249
selected_connection = con
break
# If all connections are busy, create a new one
if (selected_connection is None):
selected_connection = self._new_connection()
self.connections.append(selected_connection)
# Return cursor on query
cur = selected_connection.cursor()
cur.execute(query)
return cur
However looking at the PEP 249 standard I cannot find any way to check whether a connection is actually being used or not.
Some implementations such as MySQL Connector offer ways to check whether a connection has still unread content (see here), however as far as I know those are not part of PEP 249.
Is there a way I can achieve what described before for any PEP 249 compliant python database API ?
Perhaps you could use the status of the cursor to tell you if a cursor is being used. Let's say you had the following cursor:
new_cursor = new_connection.cursor()
cursor.execute(new_query)
and you wanted to see if that connection was available for another cursor to use. You might be able to do something like:
if (new_cursor.rowcount == -1):
another_new_cursor = new_connection.cursor()
...
Of course, all this really tells you is that the cursor hasn't executed anything yet since the last time it was closed. It could point to a cursor that is done (and therefore a connection that has been closed) or it could point to a cursor that has just been created or attached to a connection. Another option is to use a try/catch loop, something along the lines of:
try:
another_new_cursor = new_connection.cursor()
except ConnectionError?: //not actually sure which error would go here but you get the idea.
print("this connection is busy.")
Of course, you probably don't want to be spammed with printed messages but you can do whatever you want in that except block, sleep for 5 seconds, wait for some other variable to be passed, wait for user input, etc. If you are restricted to PEP 249, you are going to have to do a lot of things from scratch. Is there a reason you can't use external libraries?
EDIT: If you are willing to move outside of PEP 249, here is something that might work, but it may not be suitable for your purposes. If you make use of the mysql python library, you can take advantage of the is_connected method.
new_connection = mysql.connector.connect(host='myhost',
database='myDB',
user='me',
password='myPassword')
...stuff happens...
if (new_connection.is_connected()):
pass
else:
another_new_cursor = new_connection.cursor()
...
Related
So I'm doing a natural language processing task, or rather, would like to be doing a NL processing task, in Python 3.8 - but my corpus is stored in a Postgres database. I've previously done work with the psycopg2 database driver/library, specifically in getting my corpus from an API in to the database, but I kind of kludged it when I noticed the specific issue I'm dealing with. I'm guessing the answer to this is going to be something like "design your program better," but I guess I'd like some guidance on how to do that.
So, psycopg2 creates a database connection and curser as variables. I try to follow modern programming practices and not just write everything in a sequence of BASIC-like code, which means that I would really like to do things like log in and verify the server connection in one function, retrieve the strings from the corpus in another, terminate the connection in yet another... but as far as I can tell, whatever function/method I start a connection with "owns" the connection to the database server and the variable is local to it.
I thought of a couple of possibilities, and I'm experimenting with them, but I thought I'd ask for feedback on what the best way to handle this and whether these are even feasible:
I could instantiate the connection outside of a function. Like so:
print(f"Establishing connection with {address}...")
conn = psycopg2.connect(host=address, database=database, user=user, password=password)
print("Establishing an SQL input cursor...")
cur = conn.cursor()
This would require a refactoring of a lot of my code to be less functional, and I don't really like that, since this is code I'm hoping to put in a portfolio when I complete my project. It's also just frustrating and inelegant. I'm also not 100% sure it'd work. I've not seen any code examples of others doing this with psycopg2, probably for good reason.
I could create a class for the Postgres interface that could be called from anywhere. This would also require refactoring, but I think will allow the cursor to be accessible from any method... but I'll still have to instantiate the class somewhere, and I'm not sure it'll be accessible globally even if I put it in my main() function and call everything from within there.
If I figure this out, I'll post my solution, but if anyone has any ideas about how to do this elegantly - to keep the Postgres database accessible to all the functions for the entire runtime - I would love to hear them.
For that kind of issue, I generally encapsulate the connection and the cursor behind a class with a nice interface, that will also implement the context manager methods.
E.g. I will typically write:
from your_connection_module import create_connection
from your_data_class_module import Movie, Review
class DataStore:
def __init__(self, connection_parameters):
self._connection = create_connection(connection_parameters)
self._in_transaction = False
def __enter__(self):
return self
def __exit__(self, exc_type, exc_val, exc_tb):
self._connection.close()
def get_all_movies(self):
try:
cursor = self._connection.cursor()
cursor.execute("SELECT id, title, release_date FROM movies")
while (data_tuple := cursor.fetchone()) is not None:
id, title, release_date = data_tuple
yield Movie(id=id, title=title, release_date=release_date)
except:
self._connection.rollback()
raise
else:
self._connection.commit()
finally:
cursor.close()
def get_movie_reviews(self, movie):
try:
cursor = self._connection.cursor()
parameters = {"movie_id": movie.id}
cursor.execute("SELECT author, date, rating FROM reviews WHERE movie_id = %(movie_id)s", parameters)
while (data_tuple := cursor.fetchone()) is not None:
author, date, rating = data_tuple
yield Review(author=author, date=date, rating=rating)
except:
self._connection.rollback()
raise
else:
self._connection.commit()
finally:
cursor.close()
N.B. There are ways to reduce boilerplate around cursor creation and closing, as well as connection transaction management, but I would rather not enter into that here.
Afterwards, I can then use the data store as a context manager:
def get_movies_and_reviews(connection_parameters):
with DataStore(connection_parameters) as data_store:
all_movies = list(data_store.get_movies())
for movie in all_movies:
movie_reviews = list(data_store.get_movie_reviews(movie))
yield movie, movie_reviews
And in this way, the connection will be opened, used for a number of queries, and then closed.
I have been using always the command cur.close() once I'm done with the database:
import sqlite3
conn = sqlite3.connect('mydb')
cur = conn.cursor()
# whatever actions in the database
cur.close()
However, I just saw in some cases the following approach:
import sqlite3
conn = sqlite3.connect('mydb')
cur = conn.cursor()
# whatever actions in the database
cur.close()
conn.close()
And in the official documentation sometimes the cursor is closed, sometimes the connection and sometimes both.
My questions are:
Is there any difference between cur.close() and conn.close()?
Is it enough to close one, once I am done (or I must close both)? If so, which one is preferable?
[On closing cursors]
If you close the cursor, you are simply flagging it as invalid to process further requests ("I am done with this").
So, in the end of a function/transaction, you should keep closing the cursor, giving hint to the database that that transaction is finished.
A good pattern is to make cursors are short-lived: you get one from the connection object, do what you need, and then discard it. So closing makes sense and you should keep using cursor.close() at the end of your code section that makes use of it.
I believe (couldn't find any references) that if you just let the cursor fall out of scope (end of function, or simply del cursor) you should get the same behavior. But for the sake of good coding practices you should explicitly close it.
[Connection Objects]
When you are actually done with the database, you should close your connection to it. that means connection.close()
MySQLdb Connections have a rudimentary context manager that creates a cursor on enter, either rolls back or commits on exit, and implicitly doesn't suppress exceptions. From the Connection source:
def __enter__(self):
if self.get_autocommit():
self.query("BEGIN")
return self.cursor()
def __exit__(self, exc, value, tb):
if exc:
self.rollback()
else:
self.commit()
So, does anyone know why the cursor isn't closed on exit?
At first, I assumed it was because closing the cursor didn't do anything and that cursors only had a close method in deference to the Python DB API (see the comments to this answer). However, the fact is that closing the cursor burns through the remaining results sets, if any, and disables the cursor. From the cursor source:
def close(self):
"""Close the cursor. No further queries will be possible."""
if not self.connection: return
while self.nextset(): pass
self.connection = None
It would be so easy to close the cursor at exit, so I have to suppose that it hasn't been done on purpose. On the other hand, we can see that when a cursor is deleted, it is closed anyway, so I guess the garbage collector will eventually get around to it. I don't know much about garbage collection in Python.
def __del__(self):
self.close()
self.errorhandler = None
self._result = None
Another guess is that there may be a situation where you want to re-use the cursor after the with block. But I can't think of any reason why you would need to do this. Can't you always finish using the cursor inside its context, and just use a separate context for the next transaction?
To be very clear, this example obviously doesn't make sense:
with conn as cursor:
cursor.execute(select_stmt)
rows = cursor.fetchall()
It should be:
with conn as cursor:
cursor.execute(select_stmt)
rows = cursor.fetchall()
Nor does this example make sense:
# first transaction
with conn as cursor:
cursor.execute(update_stmt_1)
# second transaction, reusing cursor
try:
cursor.execute(update_stmt_2)
except:
conn.rollback()
else:
conn.commit()
It should just be:
# first transaction
with conn as cursor:
cursor.execute(update_stmt_1)
# second transaction, new cursor
with conn as cursor:
cursor.execute(update_stmt_2)
Again, what would be the harm in closing the cursor on exit, and what benefits are there to not closing it?
To answer your question directly: I cannot see any harm whatsoever in closing at the end of a with block. I cannot say why it is not done in this case. But, as there is a dearth of activity on this question, I had a search through the code history and will throw in a few thoughts (guesses) on why the close() may not be called:
There is a small chance that spinning through calls to nextset() may throw an exception - possibly this had been observed and seen as undesirable. This may be why the newer version of cursors.py contains this structure in close():
def close(self):
"""Close the cursor. No further queries will be possible."""
if not self.connection:
return
self._flush()
try:
while self.nextset():
pass
except:
pass
self.connection = None
There is the (somewhat remote) potential that it might take some time to spin through all the remaining results doing nothing. Therefore close() may not be called to avoid doing some unnecessary iterations. Whether you think it's worth saving those clock cycles is subjective, I suppose, but you could argue along the lines of "if it's not necessary, don't do it".
Browsing the sourceforge commits, the functionality was added to the trunk by this commit in 2007 and it appears that this section of connections.py has not changed since. That's a merge based on this commit, which has the message
Add Python-2.5 support for with statement as described in http://docs.python.org/whatsnew/pep-343.html Please test
And the code you quote has never changed since.
This prompts my final thought - it's probably just a first attempt / prototype that just worked and therefore never got changed.
More modern version
You link to source for a legacy version of the connector. I note there is a more active fork of the same library here, which I link to in my comments about "newer version" in point 1.
Note that the more recent version of this module has implemented __enter__() and __exit__() within cursor itself: see here. __exit__() here does call self.close() and perhaps this provides a more standard way to use the with syntax e.g.
with conn.cursor() as c:
#Do your thing with the cursor
End notes
N.B. I guess I should add, as far as I understand garbage collection (not an expert either) once there are no references to conn, it will be deallocated. At this point there will be no references to the cursor object and it will be deallocated too.
However calling cursor.close() does not mean that it will be garbage collected. It simply burns through the results and set the connection to None. This means it can't be re-used, but it won't be garbage collected immediately. You can convince yourself of that by manually calling cursor.close() after your with block and then, say, printing some attribute of cursor
N.B. 2 I think this is a somewhat unusual use of the with syntax as the conn object persists because it is already in the outer scope - unlike, say, the more common with open('filename') as f: where there are no objects hanging around with references after the end of the with block.
I have functions in which I am doing database operations. I want to do something special,before trying to fetch data from database, I want to check whether cursor object is null and whether connection is dropped due to time out. How can I do pre checking in the Python ?
my functions, foo, bar :
Class A:
connect = None
cursor = None
def connect(self)
self.connect = MySQLdb.connect ( ... )
self.cursor = self.connect.cursor()
def foo (self) :
self.cursor.execute("SELECT * FROM car_models");
def bar (self) :
self.cursor.execute("SELECT * FROM incomes");
The problem with checking before an operation is that the bad condition you're checking for could happen between the check and the operation.
For example, suppose you had a method is_timed_out() to check if the connection had timed out:
if not self.cursor.is_timed_out():
self.cursor.execute("SELECT * FROM incomes")
On the face of it, this looks like you've avoided the possibility of a CursorTimedOut exception from the execute call. But it's possible for you to call is_timed_out, get a False back, then the cursor times out, and then you call the execute function, and get an exception.
Yes, the chance is very small that it will happen at just the right moment. But in a server environment, a one-in-a-million chance will happen a few times a day. Bad stuff.
You have to be prepared for your operations to fail with exceptions. And once you've got exception handling in place for those problems, you don't need the pre-checks any more, because they are redundant.
You can check whether the cursor is null easily:
if cursor is None:
... do something ...
Otherwise the usual thing in Python is to "ask for forgiveness not permission": use your database connection and if it has timed out catch the exception and handle it (otherwise you might just find that it times out between your test and the point where you use it).
I am doing something like this...
conn = sqlite3.connect(db_filename)
with conn:
cur = conn.cursor()
cur.execute( ... )
with automatically commits the changes. But the docs say nothing about closing the connection.
Actually I can use conn in later statements (which I have tested). Hence it seems that the context manager is not closing the connection.
Do I have to manually close the connection. What if I leave it open?
EDIT
My findings:
The connection is not closed in the context manager, I have tested and confirmed it. Upon __exit__, the context manager only commits the changes by doing conn.commit()
with conn and with sqlite3.connect(db_filename) as conn are same, so using either will still keep the connection alive
with statement does not create a new scope, hence all the variables created inside the suite of with will be accessible outside it
Finally, you should close the connection manually
In answer to the specific question of what happens if you do not close a SQLite database, the answer is quite simple and applies to using SQLite in any programming language. When the connection is closed explicitly by code or implicitly by program exit then any outstanding transaction is rolled back. (The rollback is actually done by the next program to open the database.) If there is no outstanding transaction open then nothing happens.
This means you do not need to worry too much about always closing the database before process exit, and that you should pay attention to transactions making sure to start them and commit at appropriate points.
You have a valid underlying concern here, however it's also important to understand how sqlite operates too:
1. connection open
2. transaction started
3. statement executes
4. transaction done
5. connection closed
in terms of data correctness, you only need to worry about transactions and not open handles. sqlite only holds a lock on a database inside a transaction(*) or statement execution.
however in terms of resource management, e.g. if you plan to remove sqlite file or use so many connections you might run out of file descriptors, you do care about open out-of-transaction connections too.
there are two ways a connection is closed: either you call .close() explicitly after which you still have a handle but can't use it, or you let the connection go out of scope and get garbage-collected.
if you must close a connection, close it explicitly, according to Python's motto "explicit is better than implicit."
if you are only checking code for side-effects, letting a last variable holding reference to connection go out of scope may be acceptable, but keep in mind that exceptions capture the stack, and thus references in that stack. if you pass exceptions around, connection lifetime may be extended arbitrarily.
caveat programmator, sqlite uses "deferred" transactions by default, that is the transaction only starts when you execute a statement. In the example above, transaction runs from 3 to 4, rather than from 2 to 4.
This is the code that I use. The Connection and the Cursor will automatically close thanks to contextlib.closing(). The Connection will automatically commit thanks to the context manager.
import sqlite3
import contextlib
def execute_statement(statement):
with contextlib.closing(sqlite3.connect(path_to_file)) as conn: # auto-closes
with conn: # auto-commits
with contextlib.closing(conn.cursor()) as cursor: # auto-closes
cursor.execute(statement)
You can use a with block like this:
from contextlib import closing
import sqlite3
def query(self, db_name, sql):
with closing(sqlite3.connect(db_name)) as con, con, \
closing(con.cursor()) as cur:
cur.execute(sql)
return cur.fetchall()
connects
starts a transaction
creates a db cursor
performs the operation and returns the results
closes the cursor
commits/rolls-back the transaction
closes the connection
all safe in both happy and exceptional cases
Your version leaves conn in scope after connection usage.
EXAMPLE:
your version
conn = sqlite3.connect(db_filename) #DECLARE CONNECTION OUT OF WITH BLOCK
with conn: #USE CONNECTION IN WITH BLOCK
cur = conn.cursor()
cur.execute( ... )
#conn variable is still in scope, so you can use it again
new version
with sqlite3.connect(db_filename) as conn: #DECLARE CONNECTION AT START OF WITH BLOCK
cur = conn.cursor()
cur.execute( ... )
#conn variable is out of scope, so connection is closed
# MIGHT BE IT IS NOT CLOSED BUT WHAT Avaris SAID!
#(I believe auto close goes for with block)
For managing a connection to a database I usually do this,
# query method belonging to a DB manager class
def query (self, sql):
con = sqlite3.connect(self.dbName)
with con:
cur = con.cursor()
cur.execute(sql)
res = cur.fetchall()
if con:
con.close()
return res
doing so, I'm sure that the connection is explicitly closed.