I'm using Psycopg2 in Python to access a PostgreSQL database. I'm curious if it's safe to use the with closing() pattern to create and use a cursor, or if I should use an explicit try/except wrapped around the query. My question is concerning inserting or updating, and transactions.
As I understand it, all Psycopg2 queries occur within a transaction, and it's up to calling code to commit or rollback the transaction. If within a with closing(... block an error occurs, is a rollback issued? In older versions of Psycopg2, a rollback was explicitly issued on close() but this is not the case anymore (see http://initd.org/psycopg/docs/connection.html#connection.close).
My question might make more sense with an example. Here's an example using with closing(...
with closing(db.cursor()) as cursor:
cursor.execute("""UPDATE users
SET password = %s, salt = %s
WHERE user_id = %s""",
(pw_tuple[0], pw_tuple[1], user_id))
module.rase_unexpected_error()
cursor.commit()
What happens when module.raise_unexpected_error() raises its error? Is the transaction rolled back? As I understand transactions, I either need to commit them or roll them back. So in this case, what happens?
Alternately I could write my query like this:
cursor = None
try:
cursor = db.cursor()
cursor.execute("""UPDATE users
SET password = %s, salt = %s
WHERE user_id = %s""",
(pw_tuple[0], pw_tuple[1], user_id))
module.rase_unexpected_error()
cursor.commit()
except BaseException:
if cursor is not None:
cursor.rollback()
finally:
if cursor is not None:
cursor.close()
Also I should mention that I have no idea if Psycopg2's connection class cursor() method could raise an error or not (the documentation doesn't say) so better safe than sorry, no?
Which method of issuing a query and managing a transaction should I use?
Your link to the Psycopg2 docs kind of explains it itself, no?
... Note that closing a connection without committing the changes first will
cause any pending change to be discarded as if a ROLLBACK was
performed (unless a different isolation level has been selected: see
set_isolation_level()).
Changed in version 2.2: previously an explicit ROLLBACK was issued by
Psycopg on close(). The command could have been sent to the backend at
an inappropriate time, so Psycopg currently relies on the backend to
implicitly discard uncommitted changes. Some middleware are known to
behave incorrectly though when the connection is closed during a
transaction (when status is STATUS_IN_TRANSACTION), e.g. PgBouncer
reports an unclean server and discards the connection. To avoid this
problem you can ensure to terminate the transaction with a
commit()/rollback() before closing.
So, unless you're using a different isolation level, or using PgBouncer, your first example should work fine. However, if you desire some finer-grained control over exactly what happens during a transaction, then the try/except method might be best, since it parallels the database transaction state itself.
Related
I am writing code to create a GUI in Python on the Spyder environment of Anaconda. within this code I operate with a PostgreSQL database and I therefore use the psycopg2 database adapter so that I can interact with directly from the GUI.
The code is too long to post here, as it is over 3000 lines, but to summarize, I have no problem interacting with my database except when I try to drop a table.
When I do so, the GUI frames become unresponsive, the drop table query doesn't drop the intended table and no errors or anything else of that kind are thrown.
Within my code, all operations which result in a table being dropped are processed via a function (DeleteTable). When I call this function, there are no problems as I have inserted several print statements previously which confirmed that everything was in order. The problem occurs when I execute the statement with the cur.execute(sql) line of code.
Can anybody figure out why my tables won't drop?
def DeleteTable(table_name):
conn=psycopg2.connect("host='localhost' dbname='trial2' user='postgres' password='postgres'")
cur=conn.cursor()
sql="""DROP TABLE """+table_name+""";"""
cur.execute(sql)
conn.commit()
That must be because a concurrent transaction is holding a lock that blocks the DROP TABLE statement.
Examine the pg_stat_activity view and watch out for sessions with state equal to idle in transaction or active that have an xact_start of more than a few seconds ago.
This is essentially an application bug: you must make sure that all transactions are closed immediately, otherwise Bad Things can happen.
I am having the same issue when using psycopg2 within airflow's postgres hook and I resolved it with with statement. Probably this resolves the issue because the connection becomes local within the with statement.
def drop_table():
with PostgresHook(postgres_conn_id="your_connection").get_conn() as conn:
cur = conn.cursor()
cur.execute("DROP TABLE IF EXISTS your_table")
task_drop_table = PythonOperator(
task_id="drop_table",
python_callable=drop_table
)
And a solution is possible for the original code above like this (I didn't test this one):
def DeleteTable(table_name):
with psycopg2.connect("host='localhost' dbname='trial2' user='postgres' password='postgres'") as conn:
cur=conn.cursor()
sql="""DROP TABLE """+table_name+""";"""
cur.execute(sql)
conn.commit()
Please comment if anyone tries this.
I have been using always the command cur.close() once I'm done with the database:
import sqlite3
conn = sqlite3.connect('mydb')
cur = conn.cursor()
# whatever actions in the database
cur.close()
However, I just saw in some cases the following approach:
import sqlite3
conn = sqlite3.connect('mydb')
cur = conn.cursor()
# whatever actions in the database
cur.close()
conn.close()
And in the official documentation sometimes the cursor is closed, sometimes the connection and sometimes both.
My questions are:
Is there any difference between cur.close() and conn.close()?
Is it enough to close one, once I am done (or I must close both)? If so, which one is preferable?
[On closing cursors]
If you close the cursor, you are simply flagging it as invalid to process further requests ("I am done with this").
So, in the end of a function/transaction, you should keep closing the cursor, giving hint to the database that that transaction is finished.
A good pattern is to make cursors are short-lived: you get one from the connection object, do what you need, and then discard it. So closing makes sense and you should keep using cursor.close() at the end of your code section that makes use of it.
I believe (couldn't find any references) that if you just let the cursor fall out of scope (end of function, or simply del cursor) you should get the same behavior. But for the sake of good coding practices you should explicitly close it.
[Connection Objects]
When you are actually done with the database, you should close your connection to it. that means connection.close()
I am opening a cursor with connection.cursor executing a bunch of deletes then closing the cursor. It works, but I am not sure if it has any side effect. Would appreciate any feedback.
from django.db import connection
c=connection.cursor()
try:
c.execute('delete from table_a')
c.execute('delete from table_b')
...
finally:
c.close()
Since you are not executing these SQL statements in the transaction, you may encounter confusing states (for example, data was deleted from table_a, but not from table_b).
Django documentation says about this particular situation:
If you’re executing several custom SQL queries in a row, each one now
runs in its own transaction, instead of sharing the same “automatic
transaction”. If you need to enforce atomicity, you must wrap the
sequence of queries in atomic().
So, results of each execute() call are committed right after it, but we want them either all to pass, or all to fail - as a single one set of changes.
Wrap the view with a transacton.atomic decorator:
from django.db import transaction
#transaction.atomic
def my_view(request):
c = connection.cursor()
try:
c.execute('delete from table_a')
c.execute('delete from table_b')
finally:
c.close()
Note that atomic() and the whole new transaction management system were introduced in Django 1.6.
If you are using Django < 1.6, apply transaction.commit_on_success decorator.
So I'm using psycopg2, I have a simple table:
CREATE TABLE IF NOT EXISTS feed_cache (
feed_id int REFERENCES feeds(id) UNIQUE,
feed_cache text NOT NULL,
expire_date timestamp --without time zone
);
I'm calling the following method and query:
#staticmethod
def get_feed_cache(conn, feed_id):
c = conn.cursor()
try:
sql = 'SELECT feed_cache FROM feed_cache WHERE feed_id=%s AND localtimestamp <= expire_date;'
c.execute(sql, (feed_id,))
result = c.fetchone()
if result:
conn.commit()
return result[0]
else:
print 'DBSELECT.get_feed_cache: %s' % result
print 'sql: %s' % (c.mogrify(sql, (feed_id,)))
except:
conn.rollback()
raise
finally:
c.close()
return None
I've added the else statement to output the exact sql and result that is being executed and returned.
The get_feed_cache() method is called from a database connection thread pool. When the get_feed_cache() method is called "slowishly" (~1/sec or less) the result is returned as expected, however when called concurrently it will occasionally return None. I have tried multiple ways of writing this query & method.
Some observations:
If I remove 'AND localtimestamp <= expire_date' from the query, the query ALWAYS returns a result.
Executing the query rapidly in serial in psql always returns a result.
After reading about the fetch*() methods of psycopg's cursor class they note that the results are cached for the cursor, I'm assuming that the cache is not shared between different cursors. http://initd.org/psycopg/docs/faq.html#best-practices
I have tried using postgresql's now() and current_timestamp functions with the same results. (I am aware of the timezone aspect of now() & current_timestamp)
Conditions to note:
There will NEVER be a case where there is not a feed_cache value for a provided feed_id.
There will NEVER be a case where any value in the feed_cache table is NULL
While testing I have completely disabled any & all writes to this table
I have set the expire_date to be sufficiently far in the future for all values such that the expression 'AND localtimestamp <= expire_date' will always be true.
Here is a copy & pasted output of it returning None:
DBSELECT.get_feed_cache: None
sql: SELECT feed_cache FROM feed_cache WHERE feed_id=5 AND localtimestamp < expire_date;
Well that's pretty much it, I'm not sure what's going on. Maybe I'm making some really dumb mistake and I just don't notice it! My current guess is that it has something to do with psycopg2 and perhaps the way it's caching results between cursors. If the cursors DO share the cache and the queries happen near-simultaneously then it could be possible that the first cursor fetches the result, the second cursor sees there is a cache of the same query, so it does not execute, then the first cursor closes and deletes the cache and the second cursor tries to fetch a now null/None cache.*
That said, psycopg2 states that it's thread-safe for read-only queries, so unless I'm miss-interpreting their implementation of thread-safe, this shouldn't be the case.
Thank you for your time!
*After adding a thread lock for the get_feed_cache, acquiring before creating the cursor and releasing before returning, I still occasionally get a None result
I think this might have to do with the fact that the time stamps returned by localtimestamp or current_timestamp are fixed when the transaction starts, not when you run the statement. And psycopg manages the transactions behind your back to some degree. So you might be getting a slightly older time stamp.
You could debug this by setting log_statement = all in your server and then observing when the BEGIN statements are executed relative to your queries.
You might want to look into using a function such as clock_timestamp(), which updates more often per transaction. See http://www.postgresql.org/docs/current/static/functions-datetime.html.
I am doing something like this...
conn = sqlite3.connect(db_filename)
with conn:
cur = conn.cursor()
cur.execute( ... )
with automatically commits the changes. But the docs say nothing about closing the connection.
Actually I can use conn in later statements (which I have tested). Hence it seems that the context manager is not closing the connection.
Do I have to manually close the connection. What if I leave it open?
EDIT
My findings:
The connection is not closed in the context manager, I have tested and confirmed it. Upon __exit__, the context manager only commits the changes by doing conn.commit()
with conn and with sqlite3.connect(db_filename) as conn are same, so using either will still keep the connection alive
with statement does not create a new scope, hence all the variables created inside the suite of with will be accessible outside it
Finally, you should close the connection manually
In answer to the specific question of what happens if you do not close a SQLite database, the answer is quite simple and applies to using SQLite in any programming language. When the connection is closed explicitly by code or implicitly by program exit then any outstanding transaction is rolled back. (The rollback is actually done by the next program to open the database.) If there is no outstanding transaction open then nothing happens.
This means you do not need to worry too much about always closing the database before process exit, and that you should pay attention to transactions making sure to start them and commit at appropriate points.
You have a valid underlying concern here, however it's also important to understand how sqlite operates too:
1. connection open
2. transaction started
3. statement executes
4. transaction done
5. connection closed
in terms of data correctness, you only need to worry about transactions and not open handles. sqlite only holds a lock on a database inside a transaction(*) or statement execution.
however in terms of resource management, e.g. if you plan to remove sqlite file or use so many connections you might run out of file descriptors, you do care about open out-of-transaction connections too.
there are two ways a connection is closed: either you call .close() explicitly after which you still have a handle but can't use it, or you let the connection go out of scope and get garbage-collected.
if you must close a connection, close it explicitly, according to Python's motto "explicit is better than implicit."
if you are only checking code for side-effects, letting a last variable holding reference to connection go out of scope may be acceptable, but keep in mind that exceptions capture the stack, and thus references in that stack. if you pass exceptions around, connection lifetime may be extended arbitrarily.
caveat programmator, sqlite uses "deferred" transactions by default, that is the transaction only starts when you execute a statement. In the example above, transaction runs from 3 to 4, rather than from 2 to 4.
This is the code that I use. The Connection and the Cursor will automatically close thanks to contextlib.closing(). The Connection will automatically commit thanks to the context manager.
import sqlite3
import contextlib
def execute_statement(statement):
with contextlib.closing(sqlite3.connect(path_to_file)) as conn: # auto-closes
with conn: # auto-commits
with contextlib.closing(conn.cursor()) as cursor: # auto-closes
cursor.execute(statement)
You can use a with block like this:
from contextlib import closing
import sqlite3
def query(self, db_name, sql):
with closing(sqlite3.connect(db_name)) as con, con, \
closing(con.cursor()) as cur:
cur.execute(sql)
return cur.fetchall()
connects
starts a transaction
creates a db cursor
performs the operation and returns the results
closes the cursor
commits/rolls-back the transaction
closes the connection
all safe in both happy and exceptional cases
Your version leaves conn in scope after connection usage.
EXAMPLE:
your version
conn = sqlite3.connect(db_filename) #DECLARE CONNECTION OUT OF WITH BLOCK
with conn: #USE CONNECTION IN WITH BLOCK
cur = conn.cursor()
cur.execute( ... )
#conn variable is still in scope, so you can use it again
new version
with sqlite3.connect(db_filename) as conn: #DECLARE CONNECTION AT START OF WITH BLOCK
cur = conn.cursor()
cur.execute( ... )
#conn variable is out of scope, so connection is closed
# MIGHT BE IT IS NOT CLOSED BUT WHAT Avaris SAID!
#(I believe auto close goes for with block)
For managing a connection to a database I usually do this,
# query method belonging to a DB manager class
def query (self, sql):
con = sqlite3.connect(self.dbName)
with con:
cur = con.cursor()
cur.execute(sql)
res = cur.fetchall()
if con:
con.close()
return res
doing so, I'm sure that the connection is explicitly closed.