Creating functions and triggers in PostgreSQL with SQLAlchemy

Creating functions and triggers in PostgreSQL with SQLAlchemy - python

I use SQLAlchemy Engine to create some functions and triggers, but I did not want to mix Python and SQL, so I have created a separate file for my SQL statements, I read the content and pass it to engine.execute(). It throws no errors, however the functions are not created in the database, but if I run the same SQL file through pgAdmin, everything works fine.
My SQL file:
DO $$
BEGIN
IF NOT EXISTS (SELECT 1 FROM pg_extension WHERE extname = 'plpython3u') THEN
CREATE EXTENSION plpython3u;
END IF;
END;
$$;
DO $$
BEGIN
IF NOT EXISTS (SELECT 1 FROM pg_proc WHERE proname = 'my_func') THEN
CREATE FUNCTION public.my_func() RETURNS TRIGGER LANGUAGE 'plpython3u' NOT LEAKPROOF AS $BODY$
-- definition
$BODY$;
GRANT EXECUTE ON FUNCTION my_func() TO public;
END IF;
END;
$$;
DO $$
BEGIN
IF NOT EXISTS (SELECT 1 FROM pg_proc WHERE proname = 'my_func2') THEN
CREATE FUNCTION public.my_func2() RETURNS TRIGGER LANGUAGE 'plpython3u' NOT LEAKPROOF AS $BODY$
-- definition
$BODY$;
GRANT EXECUTE ON FUNCTION my_func2() TO public;
END IF;
END;
$$;
And I run this as follows:
def execute_sql_file(engine, path):
try:
with open(path) as file:
engine.execute(file.read())
except ProgrammingError:
raise MyCustomError
except FileNotFoundError:
raise MyCustomError
If I run this without superuser privilege, it throws ProgrammingError, as expected. In my understanding END; commits the transaction, so it this code is really run, the functions should be available for the public, however they are not even created. Any ideas are welcome, thanks!

I believe you may have mixed the BEGIN SQL command (a Postgresql extension) and a PL/pgSQL block. The SQL command DO executes an anonymous code block, as if it were an anonymous function with no parameters and returning void. In other words in
DO $$
BEGIN
...
END;
$$;
the BEGIN / END; pair denotes the code block, not a transaction. It is worth noting that starting from Postgresql version 11 it is possible to manage transactions in a DO block, given that it is not executed in a transaction block, but the commands for that are COMMIT and ROLLBACK, not the keyword END.
The problem then is that your changes are not committed, though your commands clearly are executed – as proven by the error, if not running with suitable privileges. This issue is caused by how SQLAlchemy autocommit feature works. In short, it inspects your statement / command and tries to determine if it is a data changing operation, or a DDL statement. This works for the basic operations such as INSERT, DELETE, UPDATE, and the like, but is not perfect. In fact it is impossible for it to always correctly determine if a statement changes data; for example SELECT my_mutating_procedure() is such a statement. So it needs some help, if doing more complex operations. One way is to instruct the autocommit machinery that it should commit by wrapping the SQL string in a text() construct and using execution_options():
engine.execute(text("SELECT my_mutating_procedure()").
execution_options(autocommit=True))
It is also possible to explicitly instruct SQLAlchemy that the command is a literal DDL statement using the DDL construct:
from sqlalchemy.schema import DDL
def execute_sql_file(engine, path):
try:
with open(path) as file:
stmt = file.read()
# Not strictly DDL, but a series of DO commands that execute DDL
ddl_stmt = DDL(stmt)
engine.execute(ddl_stmt)
except ProgrammingError:
raise MyCustomError
except FileNotFoundError:
raise MyCustomError
As to why it works with pgAdmin, it probably by default commits, if no error was raised.

Related

pyscopg2 WITHOUT transaction

Sometimes I have a need to execute a query from psycopg2 that is not in a transaction block.
For example:
cursor.execute('create index concurrently on my_table (some_column)')
Doesn't work:
InternalError: CREATE INDEX CONCURRENTLY cannot run inside a transaction block
I don't see any easy way to do this with psycopg2. What am I missing?
I can probably call os.system('psql -c "create index concurrently"') or something similar to get it to run from my python code, however it would be much nicer to be able to do it inside python and not rely on psql to actually be in the container.
Yes, I have to use the concurrently option for this particular use case.
Another time I've explored this and not found an obvious answer is when I have a set of sql commands that I'd like to call with a single execute(), where the first one briefly locks a resource. When I do this, that resource will remain locked for the entire duration of the execute() rather than for just when the first statement in the sql string was running because they all run together in one big happy transaction.
In that case I could break the query up into a series of execute() statements - each became its own transaction, which was ok.
It seems like there should be a way, but I seem to be missing it. Hopefully this is an easy answer for someone.
EDIT: Add code sample:
#!/usr/bin/env python3.10
import psycopg2 as pg2
# -- set the standard psql environment variables to specify which database this should connect to.
# We have to set these to 'None' explicitly to get psycopg2 to use the env variables
connDetails = {'database': None, 'host': None, 'port': None, 'user': None, 'password': None}
with (pg2.connect(**connDetails) as conn, conn.cursor() as curs):
conn.set_session(autocommit=True)
curs.execute("""
create index concurrently if not exists my_new_index on my_table (my_column);
""")
Throws:
psycopg2.errors.ActiveSqlTransaction: CREATE INDEX CONCURRENTLY cannot run inside a transaction block

Per psycopg2 documentation:
It is possible to set the connection in autocommit mode: this way all the commands executed will be immediately committed and no rollback is possible. A few commands (e.g. CREATE DATABASE, VACUUM, CALL on stored procedures using transaction control…) require to be run outside any transaction: in order to be able to run these commands from Psycopg, the connection must be in autocommit mode: you can use the autocommit property.
Hence on the connection:
conn.set_session(autocommit=True)
Further resources from psycopg2 documentation:
transactions-control
connection.autocommit

How to disable DDL transaction in an alembic migration

I am trying to run an alembic transaction. However, all migrations run in a transaction whenever transactions are supported (see Run alembic upgrade migrations in a transaction). How do I disable transaction for a specific migration?

Alembic used to have just two modes of using transactions:
One transaction for the whole migration command. If there are multiple versions to apply, then they all run in that single transaction.
Use a separate transaction per migration step.
However, as of version 1.2.0 (released September 2019), you can now also switch to the AUTOCOMMIT transaction level by using the MigrationContext.autocommit_block() context manager. When in this transaction mode, each statement is committed immediately. Note that there are caveats to using this feature, see below.
By default a single transaction is used, but you can call context.configure() in your env.py script to set transaction_per_migration to true to use separate transactions.
The first and default option, to use a single transaction, is executed in the env.py file that Alembic generates for you, in the run_migrations_online() function in that file:
try:
with context.begin_transaction():
context.run_migrations()
finally:
connection.close()
You could either just edit that file to remove the with context.begin_transaction(): context manager, or use the context.get_x_argument() feature to toggle transactions on the basis of a command-line switch:
try:
# Python 3.7+
from contextlib import nullcontext
except ImportError:
# Earlier Python versions
from contextlib import contextmanager
#contextmanager
def nullcontext():
yield
# ...
def run_migrations_online():
# ...
if context.get_x_argument(as_dictionary=True).get('no-transaction', False):
transaction_cm = nullcontext()
else:
transaction_cm = context.begin_transaction()
try:
with transaction_cm:
context.run_migrations()
finally:
connection.close()
To disable a transaction per migration step or for specific operations, you can use the aforementioned autocommit_block(), which is intended to be used for DDL statements that the database requires to be run outside of a transaction context:
def upgrade():
with op.get_context().autocommit_block():
op.execute("ALTER TYPE mood ADD VALUE 'soso'")
The above example (taken from the documentation), uses the Operations.get_context() method to get access to the migration context. Within the context, all statements are executed directly, without running in a transaction.
The caveat is that any transaction currently in progress is committed first. If statements before and after such a block are connected and should not be executed without the others, then you want to avoid placing an autocommit_block() in between. You also probably want to set transaction_per_migration = true, and use autocommit_block() for entire migration steps. That way you can at least minimise issues with a migration step failing halfway through.
Before version 1.2.0, it was not easy to disable transactions per migration step. You'd have do disable transactions entirely (just don't use context.begin_transaction() in env.py), then explicitly use a transaction per upgrade() or downgrade() step:
def run_migrations_online():
# ...
try:
# no with context.begin_transaction() here
context.run_migrations()
finally:
connection.close()
and in each migration step:
def upgrade():
with context.begin_transaction():
# ### commands auto generated by Alembic - please adjust! ###
op.create_table(
# ...
)
# etc.

This can be done using an autocommit block:
with op.get_context().autocommit_block():
op.execute(...)
https://alembic.sqlalchemy.org/en/latest/api/runtime.html#alembic.runtime.migration.MigrationContext.autocommit_block
This special directive is intended to support the occasional database DDL or system operation that specifically has to be run outside of any kind of transaction block. The PostgreSQL database platform is the most common target for this style of operation, as many of its DDL operations must be run outside of transaction blocks, even though the database overall supports transactional DDL.
Note that there are some caveats:
Warning: As is necessary, the database transaction preceding the block is unconditionally committed. This means that the run of migrations preceding the operation will be committed, before the overall migration operation is complete.
It is recommended that when an application includes migrations with “autocommit” blocks, that EnvironmentContext.transaction_per_migration be used so that the calling environment is tuned to expect short per-file migrations whether or not one of them has an autocommit block.

How to executescript in sqlite3 from Python transactionally? [duplicate]

Context
So I am trying to figure out how to properly override the auto-transaction when using SQLite in Python. When I try and run
cursor.execute("BEGIN;")
.....an assortment of insert statements...
cursor.execute("END;")
I get the following error:
OperationalError: cannot commit - no transaction is active
Which I understand is because SQLite in Python automatically opens a transaction on each modifying statement, which in this case is an INSERT.
Question:
I am trying to speed my insertion by doing one transaction per several thousand records.
How can I overcome the automatic opening of transactions?

As #CL. said you have to set isolation level to None. Code example:
s = sqlite3.connect("./data.db")
s.isolation_level = None
try:
c = s.cursor()
c.execute("begin")
...
c.execute("commit")
except:
c.execute("rollback")

The documentaton says:
You can control which kind of BEGIN statements sqlite3 implicitly executes (or none at all) via the isolation_level parameter to the connect() call, or via the isolation_level property of connections.
If you want autocommit mode, then set isolation_level to None.

Can't add rows to DB using Python's MYSQLdb module

For the life of me I can't figure out why the below module won't add new rows to my DB. I can add them using the command line interface. I can also add them by using other means (ie. writing commands to a script file and using os.system('...'), but if I use cursor.execute(), no rows are added (even though the table is created). Here is a minimal script for your viewing pleasure. Note that I am getting no errors or warnings when I run this script
#!/usr/bin/env python
import MySQLdb
if __name__ == '__main__':
db = MySQLdb.connect ( host="localhost", user="user", passwd="passwd", db="db" )
cursor = db.cursor()
cursor.execute (
"""
CREATE TABLE IF NOT EXISTS god_i_really_hate_this_stupid_library
(
id INT NOT NULL auto_increment,
username VARCHAR(32) NOT NULL UNIQUE,
PRIMARY KEY(id)
) engine=innodb;
"""
)
cursor.execute (
"""
INSERT INTO god_i_really_hate_this_stupid_library
( username )
VALUES
( 'Booberry' );
"""
)
cursor.close()

you need to call commit on your connection, otherwise all the changes made will be rolled back automatically.

From the FAQ of MySQLdb:
Starting with 1.2.0, MySQLdb disables autocommit by default, as required by the DB-API standard (PEP-249). If you are using InnoDB tables or some other type of transactional table type, you'll need to do connection.commit() before closing the connection, or else none of your changes will be written to the database.
Conversely, you can also use connection.rollback() to throw away any changes you've made since the last commit.
Important note: Some SQL statements -- specifically DDL statements like CREATE TABLE -- are non-transactional, so they can't be rolled back, and they cause pending transactions to commit.
You can call db.autocommit(True) to turn autocommit on for the connection or just call db.commit() manually whenever you deem it necessary.

Print the actual query MySQLdb runs?

I'm looking for a way to debug queries as they are executed and I was wondering if there is a way to have MySQLdb print out the actual query that it runs, after it has finished inserting the parameters and all that? From the documentation, it seems as if there is supposed to be a Cursor.info() call that will give information about the last query run, but this does not exist on my version (1.2.2).
This seems like an obvious question, but for all my searching I haven't been able to find the answer.

We found an attribute on the cursor object called cursor._last_executed that holds the last query string to run even when an exception occurs. This was easier and better for us in production than using profiling all the time or MySQL query logging as both of those have a performance impact and involve more code or more correlating separate log files, etc.
Hate to answer my own question but this is working better for us.

You can print the last executed query with the cursor attribute _last_executed:
try:
cursor.execute(sql, (arg1, arg2))
connection.commit()
except:
print(cursor._last_executed)
raise
Currently, there is a discussion how to get this as a real feature in pymysql (see pymysql issue #330: Add mogrify to Cursor, which returns the exact string to be executed; pymysql should be used instead of MySQLdb)
edit: I didn't test it by now, but this commit indicates that the following code might work:
cursor.mogrify(sql, (arg1, arg2))

For me / for now _last_executed doesn't work anymore. In the current version you want to access
cursor.statement.
see: https://dev.mysql.com/doc/connector-python/en/connector-python-api-mysqlcursor-statement.html

For mysql.connector:
cursor.statement
https://dev.mysql.com/doc/connector-python/en/connector-python-api-mysqlcursor-statement.html

cursor.statement and cursor._last_executed raised AttributeError exception
cursor._executed
worked for me!

One way to do it is to turn on profiling:
cursor.execute('set profiling = 1')
try:
cursor.execute('SELECT * FROM blah where foo = %s',[11])
except Exception:
cursor.execute('show profiles')
for row in cursor:
print(row)
cursor.execute('set profiling = 0')
yields
(1L, 0.000154, 'SELECT * FROM blah where foo = 11')
Notice the argument(s) were inserted into the query, and that the query was logged even though the query failed.
Another way is to start the server with logging turned on:
sudo invoke-rc.d mysql stop
sudo mysqld --log=/tmp/myquery.log
Then you have to sift through /tmp/myquery.log to find out what the server received.

I've had luck with cursor._last_executed generally speaking, but it doesn't work correctly when used with cursor.executemany(). That drops all but the last statement. Here's basically what I use now in that instance instead (based on tweaks from the actual MySQLDb cursor source):
def toSqlResolvedList( cursor, sql, dynamicValues ):
sqlList=[]
try:
db = cursor._get_db()
if isinstance( sql, unicode ):
sql = sql.encode( db.character_set_name() )
for values in dynamicValues :
sqlList.append( sql % db.literal( values ) )
except: pass
return sqlList

This read-only property returns the last executed statement as a string. The statement property can be useful for debugging and displaying what was sent to the MySQL server.
The string can contain multiple statements if a multiple-statement string was executed. This occurs for execute() with multi=True. In this case, the statement property contains the entire statement string and the execute() call returns an iterator that can be used to process results from the individual statements. The statement property for this iterator shows statement strings for the individual statements.
str = cursor.statement
source: https://dev.mysql.com/doc/connector-python/en/connector-python-api-mysqlcursor-statement.html

I can't say I've ever seen
Cursor.info()
In the documentation, and I can't find it after a few minutes searching. Maybe you saw some old documentation?
In the mean time you can always turn on MySQL Query Logging and have a look at the server's log files.

assume that your sql is like select * from table1 where 'name' = %s
from _mysql import escape
from MySQLdb.converters import conversions
actual_query = sql % tuple((escape(item, conversions) for item in parameters))

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.