I am trying to use psycopg2 to add some new columns to a table. PostgreSQL lacks a ALTER TABLE table ADD COLUMN IF NOT EXISTS, so I am adding each column in it's own transaction. If the column exists, there will be a python & postgres error, that's OK, I want my programme to just continue and try to add the next column. The goal is for this to be idempotent, so it can be run many times in a row.
It currently looks like this:
def main():
# <snip>
with psycopg2.connect("") as connection:
create_columns(connection, args.table)
def create_columns(connection, table_name):
def sql(sql):
with connection.cursor() as cursor:
cursor.execute(sql.format(table_name=table_name))
sql("ALTER TABLE {table_name} ADD COLUMN my_new_col numeric(10,0);")
sql("ALTER TABLE {table_name} ADD COLUMN another_new_col INTEGER NOT NULL;")
However, if my_new_col exists, there is an exception ProgrammingError('column "parent_osm_id" of relation "relations" already exists\n',), which is to be expected, but when it tried to add another_new_col, there is the exception InternalError('current transaction is aborted, commands ignored until end of transaction block\n',).
The psycogpg2 document for the with statement implies that the with connection.cursor() as cursor: will wrap that code in a transaction. This is clearly not happening. Experimentation has shown me that I need 2 levels of with statements, to including the pscyopg2.connect call, and then I get a transaction.
How can I pass a connection object around and have queries run in their own transaction to allow this sort of "graceful error handling"? I would like to keep the postgres connection code separate, in a "clean architecture" style. Is this possible?
The psycogpg2 document for the with statement implies that the with connection.cursor() as cursor: will wrap that code in a transaction.
this is actually not true it says:
with psycopg2.connect(DSN) as conn:
with conn.cursor() as curs:
curs.execute(SQL)
When a connection exits the with block, if no exception has been raised by the block, the transaction is committed. In case of exception the transaction is rolled back. In no case the connection is closed: a connection can be used in more than a with statement and each with block is effectively wrapped in a transaction.
So it's not about cursor object being handled by with but the connection object
Also worth noting that all resource held by cursor will be released when we leave the with clause
When a cursor exits the with block it is closed, releasing any resource eventually associated with it. The state of the transaction is not affected.
So back to your code you could probably rewrite it to be more like:
def main():
# <snip>
with psycopg2.connect("") as connection:
create_columns(connection, args.table)
def create_columns(con, table_name):
def sql(connection, sql):
with connection:
with connection.cursor() as cursor:
cursor.execute(sql.format(table_name=table_name))
sql(con, "ALTER TABLE {table_name} ADD COLUMN my_new_col numeric(10,0);")
sql(con, "ALTER TABLE {table_name} ADD COLUMN another_new_col INTEGER NOT NULL;")
ensuring your connection is wrapped in with for each query you execute, so if it fails connection context manager will revert the transaction
Related
I have a stored procedure like this.
CREATE PROCEDURE StudentSproc
(#StudentID VARCHAR(50),
#Name VARCHAR(50))
AS
BEGIN
BEGIN TRAN
BEGIN TRY
INSERT INTO Student(StudentID, Name)
VALUES (#StudentID,#Name)
COMMIT TRAN;
END TRY
BEGIN CATCH
ROLLBACK TRAN
END CATCH
END;
I am trying to execute it from python:
db_conn_str = 'DRIVER={ODBC Driver 17 for SQL Server};SERVER=' + server + ';PORT=1433;DATABASE=' + database + ';UID=' + username + ';PWD=' + password
cnxn = pyodbc.connect(db_conn_str)
cursor = cnxn.cursor()
st = f"exec master.dbo.StudentSproc #StudentID = ?, #Name = ? "
s_id = "101"
name = "Charles"
params = (s_id, name)
cursor.execute(st, params)
print(f"executed sproc by {st}")
This has no errors and executes the stored procedure but it doesn't update the database and I am surprised. I know that I have to use autocommit=True in the connect() call, but why is that necessary if there is a commit in the stored procedure?
There are no errors because you are using a try/catch. This works just like it does in other languages - if you catch an exception, it doesn't get returned to the client. It's caught.
You can rollback the tran in the catch, and then throw again in order to return the error to the client.
CREATE PROCEDURE StudentSproc(
#StudentID VARCHAR(50),
#Name VARCHAR(50))
AS
BEGIN
BEGIN TRAN;
BEGIN TRY
INSERT INTO Student(StudentID, Name)
VALUES (#StudentID,#Name);
COMMIT TRAN;
END TRY
BEGIN CATCH
if (##trancount > 0) ROLLBACK TRAN; -- make sure that a transaction still exists before trying to roll back
THROW; -- now that we have dealt with the transaction, return the error to the client
END CATCH
END;
I also just noticed you have your procedure in the master database. You probably don't actually want it there, master is a system database.
You also mentioned that you knew you had to use autocommit, but based on our discussion I understand now that you actually aren't using it, you were just wondering why you had to use it. This introduces a second possibility.
In SQL Server, the only commit that matters is the "outermost" commit. Any "nested" commit doesn't actually do anything other than reduce the value of ##trancount.
For example:
begin tran;
insert MyTable values (1);
begin tran; -- this doesn't really do anything, it just increments ##trancount
insert MyTable values (2);
commit; -- this does nothing other than decrement ##trancount
-- if we were to execute a rollback here, all of the data would be gone
commit; -- only this commit matters
On the other hand, rollback works differently. A single rollback will rollback all nested transactions, reducing the ##trancount value to zero.
If you start two transactions (one in the client code, one in the stored procedure), but only issue a single commit, then your "real" transaction is actually still open.
I am writing code to create a GUI in Python on the Spyder environment of Anaconda. within this code I operate with a PostgreSQL database and I therefore use the psycopg2 database adapter so that I can interact with directly from the GUI.
The code is too long to post here, as it is over 3000 lines, but to summarize, I have no problem interacting with my database except when I try to drop a table.
When I do so, the GUI frames become unresponsive, the drop table query doesn't drop the intended table and no errors or anything else of that kind are thrown.
Within my code, all operations which result in a table being dropped are processed via a function (DeleteTable). When I call this function, there are no problems as I have inserted several print statements previously which confirmed that everything was in order. The problem occurs when I execute the statement with the cur.execute(sql) line of code.
Can anybody figure out why my tables won't drop?
def DeleteTable(table_name):
conn=psycopg2.connect("host='localhost' dbname='trial2' user='postgres' password='postgres'")
cur=conn.cursor()
sql="""DROP TABLE """+table_name+""";"""
cur.execute(sql)
conn.commit()
That must be because a concurrent transaction is holding a lock that blocks the DROP TABLE statement.
Examine the pg_stat_activity view and watch out for sessions with state equal to idle in transaction or active that have an xact_start of more than a few seconds ago.
This is essentially an application bug: you must make sure that all transactions are closed immediately, otherwise Bad Things can happen.
I am having the same issue when using psycopg2 within airflow's postgres hook and I resolved it with with statement. Probably this resolves the issue because the connection becomes local within the with statement.
def drop_table():
with PostgresHook(postgres_conn_id="your_connection").get_conn() as conn:
cur = conn.cursor()
cur.execute("DROP TABLE IF EXISTS your_table")
task_drop_table = PythonOperator(
task_id="drop_table",
python_callable=drop_table
)
And a solution is possible for the original code above like this (I didn't test this one):
def DeleteTable(table_name):
with psycopg2.connect("host='localhost' dbname='trial2' user='postgres' password='postgres'") as conn:
cur=conn.cursor()
sql="""DROP TABLE """+table_name+""";"""
cur.execute(sql)
conn.commit()
Please comment if anyone tries this.
I'm using PostgreSQL 9.3, and SQLAlchemy 1.0.11
I have code that looks like this:
import sqlalchemy as sa
engine = sa.create_engine('postgresql+psycopg2://me#myhost/mydb')
conn = engine.connect()
metadata = sa.MetaData()
# Real table has more columns
mytable = sa.Table(
'my_temp_table', metadata,
sa.Column('id', sa.Integer, primary_key=True),
sa.Column('something', sa.String(200)),
prefixes=['TEMPORARY'],
)
metadata.create_all(engine)
pg_conn = engine.raw_connection()
with pg_conn.cursor() as cursor:
cursor.copy_expert('''COPY my_temp_table (id, something)
FROM STDIN WITH CSV''',
open('somecsvfile', 'r'))
Now this works just fine - cursor.rowcount reports the expected number of rows inserted. I can even run cursor.execute('SELECT count(*) FROM my_temp_table'); print(cursor.fetchone()) and it will display the same #. The problem is when I try to run a query from SQLAlchemy's connection, e.g
result = conn.execute(sa.text('SELECT count(*) FROM my_temp_table'))
It doesn't matter where I put that. I've tried several places:
inside the with block
outside the with block
after a cursor.close()
after a pg_conn.close()
Nothing seems to work - no matter where I run the query from, it barfs with:
sqlalchemy.exc.ProgrammingError: (psycopg2.ProgrammingError) relation "my_temp_table" does not exist
The funny thing is that if I wrap that code in a try/except then I can do cursor.execute(...) in the except block successfully.
Actually, now that I'm writing this out, it appears that using the sqlalchemy connection anywhere fails to see that those tables exists.
So what gives? Why doesn't my SQLAlchemy connection see these tables, but the postgres (engine.raw_connection()) does?
Edit:
To further the mystery - if I create the the connection after the metadata.create_all(engine), it works! Well, sort of.
I can select from the tables, but then when I get the engine.raw_connection() it fails on .copy_expert because it can't find the table.
The first thing to note is that temporary tables are only visible to the connection which created them.
The second is that an Engine doesn't encapsulate a single connection; it manages a connection pool.
Finally, the documentation points out that operations performed directly on an Engine (engine.execute("select ...") in their example) will internally acquire and release their own connections.
With all of this in mind, it's clear what's going on in your example:
conn = engine.connect() acquires Connection #1 from the pool.
metadata.create_all(engine) implicitly acquires Connection #2 (as #1 is still "in use" from the engine's perspective), uses it to create the table, and releases it back to the pool.
pg_conn = engine.raw_connection() acquires #2 again, so the COPY executed via this object can still see the table.
conn is still using #1, and nothing you do via this object will be able to see your temp table.
In your second case:
metadata.create_all(engine) implicitly acquires/uses/releases Connection #1.
conn = engine.connect() acquires #1 and holds it.
pg_conn = engine.raw_connection() acquires #2, and the COPY fails to find the temp table.
The moral of the story: if you're doing something which relies on the connection state, you'd better be sure which connection you're using. Running commands directly on the engine is fine for standalone operations, but for anything involving temp tables, you should acquire one connection and stick with it through every step (including the table creation, which I suggest you change to metadata.create_all(conn)).
Well, this doesn't answer the why but it it is how to accomplish what I want.
Rather than:
pg_conn = engine.raw_connection()
with pg_conn.cursor() as cursor:
Just replace it with:
with conn.connection.cursor() as cursor:
The SQLAlchemy connection object exposes its underlying DBAPI connection via the .connection property. And whatever magic involved there does the right thing.
I'm using Psycopg2 in Python to access a PostgreSQL database. I'm curious if it's safe to use the with closing() pattern to create and use a cursor, or if I should use an explicit try/except wrapped around the query. My question is concerning inserting or updating, and transactions.
As I understand it, all Psycopg2 queries occur within a transaction, and it's up to calling code to commit or rollback the transaction. If within a with closing(... block an error occurs, is a rollback issued? In older versions of Psycopg2, a rollback was explicitly issued on close() but this is not the case anymore (see http://initd.org/psycopg/docs/connection.html#connection.close).
My question might make more sense with an example. Here's an example using with closing(...
with closing(db.cursor()) as cursor:
cursor.execute("""UPDATE users
SET password = %s, salt = %s
WHERE user_id = %s""",
(pw_tuple[0], pw_tuple[1], user_id))
module.rase_unexpected_error()
cursor.commit()
What happens when module.raise_unexpected_error() raises its error? Is the transaction rolled back? As I understand transactions, I either need to commit them or roll them back. So in this case, what happens?
Alternately I could write my query like this:
cursor = None
try:
cursor = db.cursor()
cursor.execute("""UPDATE users
SET password = %s, salt = %s
WHERE user_id = %s""",
(pw_tuple[0], pw_tuple[1], user_id))
module.rase_unexpected_error()
cursor.commit()
except BaseException:
if cursor is not None:
cursor.rollback()
finally:
if cursor is not None:
cursor.close()
Also I should mention that I have no idea if Psycopg2's connection class cursor() method could raise an error or not (the documentation doesn't say) so better safe than sorry, no?
Which method of issuing a query and managing a transaction should I use?
Your link to the Psycopg2 docs kind of explains it itself, no?
... Note that closing a connection without committing the changes first will
cause any pending change to be discarded as if a ROLLBACK was
performed (unless a different isolation level has been selected: see
set_isolation_level()).
Changed in version 2.2: previously an explicit ROLLBACK was issued by
Psycopg on close(). The command could have been sent to the backend at
an inappropriate time, so Psycopg currently relies on the backend to
implicitly discard uncommitted changes. Some middleware are known to
behave incorrectly though when the connection is closed during a
transaction (when status is STATUS_IN_TRANSACTION), e.g. PgBouncer
reports an unclean server and discards the connection. To avoid this
problem you can ensure to terminate the transaction with a
commit()/rollback() before closing.
So, unless you're using a different isolation level, or using PgBouncer, your first example should work fine. However, if you desire some finer-grained control over exactly what happens during a transaction, then the try/except method might be best, since it parallels the database transaction state itself.
I am doing something like this...
conn = sqlite3.connect(db_filename)
with conn:
cur = conn.cursor()
cur.execute( ... )
with automatically commits the changes. But the docs say nothing about closing the connection.
Actually I can use conn in later statements (which I have tested). Hence it seems that the context manager is not closing the connection.
Do I have to manually close the connection. What if I leave it open?
EDIT
My findings:
The connection is not closed in the context manager, I have tested and confirmed it. Upon __exit__, the context manager only commits the changes by doing conn.commit()
with conn and with sqlite3.connect(db_filename) as conn are same, so using either will still keep the connection alive
with statement does not create a new scope, hence all the variables created inside the suite of with will be accessible outside it
Finally, you should close the connection manually
In answer to the specific question of what happens if you do not close a SQLite database, the answer is quite simple and applies to using SQLite in any programming language. When the connection is closed explicitly by code or implicitly by program exit then any outstanding transaction is rolled back. (The rollback is actually done by the next program to open the database.) If there is no outstanding transaction open then nothing happens.
This means you do not need to worry too much about always closing the database before process exit, and that you should pay attention to transactions making sure to start them and commit at appropriate points.
You have a valid underlying concern here, however it's also important to understand how sqlite operates too:
1. connection open
2. transaction started
3. statement executes
4. transaction done
5. connection closed
in terms of data correctness, you only need to worry about transactions and not open handles. sqlite only holds a lock on a database inside a transaction(*) or statement execution.
however in terms of resource management, e.g. if you plan to remove sqlite file or use so many connections you might run out of file descriptors, you do care about open out-of-transaction connections too.
there are two ways a connection is closed: either you call .close() explicitly after which you still have a handle but can't use it, or you let the connection go out of scope and get garbage-collected.
if you must close a connection, close it explicitly, according to Python's motto "explicit is better than implicit."
if you are only checking code for side-effects, letting a last variable holding reference to connection go out of scope may be acceptable, but keep in mind that exceptions capture the stack, and thus references in that stack. if you pass exceptions around, connection lifetime may be extended arbitrarily.
caveat programmator, sqlite uses "deferred" transactions by default, that is the transaction only starts when you execute a statement. In the example above, transaction runs from 3 to 4, rather than from 2 to 4.
This is the code that I use. The Connection and the Cursor will automatically close thanks to contextlib.closing(). The Connection will automatically commit thanks to the context manager.
import sqlite3
import contextlib
def execute_statement(statement):
with contextlib.closing(sqlite3.connect(path_to_file)) as conn: # auto-closes
with conn: # auto-commits
with contextlib.closing(conn.cursor()) as cursor: # auto-closes
cursor.execute(statement)
You can use a with block like this:
from contextlib import closing
import sqlite3
def query(self, db_name, sql):
with closing(sqlite3.connect(db_name)) as con, con, \
closing(con.cursor()) as cur:
cur.execute(sql)
return cur.fetchall()
connects
starts a transaction
creates a db cursor
performs the operation and returns the results
closes the cursor
commits/rolls-back the transaction
closes the connection
all safe in both happy and exceptional cases
Your version leaves conn in scope after connection usage.
EXAMPLE:
your version
conn = sqlite3.connect(db_filename) #DECLARE CONNECTION OUT OF WITH BLOCK
with conn: #USE CONNECTION IN WITH BLOCK
cur = conn.cursor()
cur.execute( ... )
#conn variable is still in scope, so you can use it again
new version
with sqlite3.connect(db_filename) as conn: #DECLARE CONNECTION AT START OF WITH BLOCK
cur = conn.cursor()
cur.execute( ... )
#conn variable is out of scope, so connection is closed
# MIGHT BE IT IS NOT CLOSED BUT WHAT Avaris SAID!
#(I believe auto close goes for with block)
For managing a connection to a database I usually do this,
# query method belonging to a DB manager class
def query (self, sql):
con = sqlite3.connect(self.dbName)
with con:
cur = con.cursor()
cur.execute(sql)
res = cur.fetchall()
if con:
con.close()
return res
doing so, I'm sure that the connection is explicitly closed.