How to version-control functions and triggers with alembic?

How to version-control functions and triggers with alembic? - python

Suppose, there's some trigger in the database with a function, like this:
-- Insert a new entry into another table
-- every time a NEW row is inserted
CREATE FUNCTION trgfunc_write_log() RETURNS TRIGGER AS $$
BEGIN
INSERT INTO some_other_table (
-- some columns
meter_id,
date_taken,
temperature,
) values (
NEW.meter_id,
NEW.time_taken,
NEW.temperature
);
return NEW;
END;
$$ language 'plpgsql';
-- The trigger itself: AFTER INSERT
CREATE TRIGGER trg_temperature_readings
AFTER INSERT ON temperature_readings
FOR EACH ROW
EXECUTE FUNCTION trgfunc_write_log();
Typically, this trigger will live next to my SqlAlchemy models and be auto-created with something like this:
from sqlalchemy import DDL, event
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
class Reading(Base):
...
create_trigger = DDL(""" ...SQL... """)
event.listen(Reading.__table__, 'after_create', create_trigger)
What's your best practice for version-controlling such a trigger and its function with Alembic migrations?

I recently had the same question come up in an application and found this article in the Alembic Cookbook.
It outlines a somewhat complex strategy of creating an object that encapsulates the name and SQL used to create a view, stored procedure, or trigger among other objects used to perform the Alembic operations to upgrade and downgrade that schema object. It looks something like this when used in an Alembic revision:
from alembic import op
from my_module import ReplaceableObject
my_trigger = ReplaceableObject(
"trigger_name",
"""...SQL..."""
)
def upgrade():
op.create_trigger(my_trigger)
def downgrade():
op.drop_trigger(my_trigger)
My team is currently discussing if this strategy is too complex for a simple trigger compared to a view or stored proc. You may update those schema objects more frequently making much of the behavior outlined in the Cookbook abstractions more valuable than with a simple trigger.
Another proposed option was something like this:
from alembic import op
create_trigger = """...SQL..."""
drop_trigger = """...SQL..."""
def upgrade():
op.execute(create_trigger)
def downgrade():
op.execute(drop_trigger)
The two implementations look almost identical, which is the argument for the Cookbook abstraction being unnecessarily complex for a simple trigger.

This is straightforward using Alembic Utils (pip install alembic_utils).
Create your function in your normal codebase, e.g.
from alembic_utils.pg_function import PGFunction
trgfunc_write_log = PGFunction(
schema="public",
signature="trgfunc_write_log()",
definition="""
RETURNS TRIGGER AS $$
BEGIN
INSERT INTO some_other_table (
-- some columns
meter_id,
date_taken,
temperature,
) values (
NEW.meter_id,
NEW.time_taken,
NEW.temperature
);
return NEW;
END;
$$ language 'plpgsql'
""")
trg_temperature_readings = PGTrigger(
schema="public",
signature="trg_temperature_readings",
on_entity="public.temperature_readings",
is_constraint=False,
definition="""AFTER INSERT ON temperature_readings
FOR EACH ROW
EXECUTE FUNCTION trgfunc_write_log()""",
)
The docs show how to modify your alembic ini file and env.py file - one gotcha is you have to register the entities e.g. in env.py:
from alembic_utils.replaceable_entity import register_entities
from app.db.function import trg_temperature_readings, trgfunc_write_log
register_entities([trg_temperature_readings, trgfunc_write_log])
Then alembic's auto generated migrations should work as normal:
alembic revision --autogenerate -m 'add temperature log trigger'.

Related

Dialect-specific SQLAlchemy declarative Column defaults

Short Version
In SQLAlchemy's ORM column declaration, how can I use server_default=sa.FetchedValue() on one dialect, and default=somePythonFunction on another, so that my real DBMS can populate things with triggers, and my test code can be written against sqlite?
Background
I'm using SQLAlchemy's declarative ORM to work with a Postgres database, but trying to write unit tests against an sqlite:///:memory:, and running into a problem with columns that have computed defaults on their primary keys. For a minimal example:
CREATE TABLE test_table(
id VARCHAR PRIMARY KEY NOT NULL
DEFAULT (lower(hex(randomblob(16))))
)
SQLite itself is quite happy with this table definition (sqlfiddle) but SQLAlchemy seems unable to work out the ID of newly created rows.
class TestTable(Base):
__tablename__ = 'test_table'
id = sa.Column(
sa.VARCHAR,
primary_key=True,
server_default=sa.FetchedValue())
Definitions like this work just fine in postgres, but die in sqlite (as you can see on Ideone) with a FlushError when I call Session.commit:
sqlalchemy.orm.exc.FlushError: Instance <TestTable at 0x7fc0e0254a10> has a NULL identity key. If this is an auto-generated value, check that the database table allows generation of new primary key values, and that the mapped Column object is configured to expect these generated values. Ensure also that this flush() is not occurring at an inappropriate time, such as within a load() event.
The documentation for FetchedValue warns us that this can happen on dialects that don't support the RETURNING clause on INSERT:
For special situations where triggers are used to generate primary key
values, and the database in use does not support the RETURNING clause,
it may be necessary to forego the usage of the trigger and instead
apply the SQL expression or function as a “pre execute” expression:
t = Table('test', meta,
Column('abc', MyType, default=func.generate_new_value(),
primary_key=True)
)
func.generate_new_value is not defined anywhere else in SQLAlchemy, so it seems they intend I either generate defaults in Python, or else write a separate function to do a SQL query to generate a default value in the DBMS. I can do that, but the problem is, I only want to do that for SQLite, since FetchedValue does exactly what I want on postgres.
Dead Ends
Subclassing Column probably won't work. Nothing that I can find in the sources ever tells the Column what dialect is being used, and the behavior of the default and server_default fields is defined outside the class
Writing a python function that calls the triggers by hand on the real DBMS creates a race condition. Avoiding the race condition by changing the isolation level creates a deadlock.
My Current Workaround
Bad because it breaks integration tests that connect to a real postgres.
import sys
import sqlalchemy as sa
def trigger_column(*a, **kw):
python_default = kw.pop('python_default')
if 'unittest' in sys.modules:
return sa.Column(*a, default=python_default, **kw)
else
return sa.Column(*a, server_default=sa.FetchedValue(), **kw)

Not a direct answer to you question but hopefully helpful to someone
My problem was wanting to change the collation depending on the dialect, this was my solution:
from sqlalchemy import Unicode
from sqlalchemy.ext.compiler import compiles
#compiles(Unicode, 'sqlite')
def compile_unicode(element, compiler, **kw):
element.collation = None
return compiler.visit_unicode(element, **kw)
This changes the collation for all Unicode columns only for sqlite.
Here's some documentation: http://docs.sqlalchemy.org/en/latest/core/custom_types.html#overriding-type-compilation

alembic generation of materialized view

TL;DR: How do I get alembic to understand and generate SQL for materialized views created in sqlalchemy?
I'm using flask-sqlalchemy and also using alembic with postgres. To get a materialized view working with sqlalchemy, I followed a nice post on the topic. I used it heavily, with just a few minor divergences (the article uses flask-sqlalchemy as well, however the complete code example uses sqlalchemy's declarative base directly instead).
class ActivityView(db.Model):
__table__ = create_materialized_view(
'activity_view',
db.select([
Activity.id.label('id'),
Activity.name.label('name'),
Activity.start_date.label('start_date'),
]).where(
db.and_(
Activity.start_date != None,
Activity.start_date <=
datetime_to_str(datetime.now(tz=pytz.UTC) + timedelta(hours=48))
)
)
)
#classmethod
def refresh(cls, concurrently=True):
refresh_materialized_view(cls.__table__.fullname, concurrently)
db.Index('activity_view_index',
ActivityView.__table__.c.id, ActivityView.__table__.c.start_date,
unique=True)
The create_materialized_view and refresh_materialized_view methods are taken straight from the blog post.
Note that the example above has been greatly simplified and probably seems silly because of my simplifications, but the real idea I want to get at is how do I get alembic to translate this view to a series of alembic operations during a migration?
When I run tests, the code runs fine, the view gets generated fine and everything works. When alembic runs it doesn't do anything with the view. So what I end up doing is copying the SQL that the tests emit for the materialized view into the alembic migrations/versions file and just end up executing that directly as:
op.execute(activities_view_sql)
Similarly, I do the same direct SQL execution when generating the unique index on the materialized view.
Unfortunately my approach is error prone and creates seemingly unnecessary code duplication.
Is there a way to get alembic to understand my ActivityView so that any time it changes, alembic will know how to update the view?
Thanks much!

TLDR: Just write the view migration manually. It doesn't seem there is reasonable support for view autogeneration.
EDIT: It's possible there is a way to autogenerate view migrations now. See answer https://stackoverflow.com/a/72829474/2839862
I think the easiest way around this is to not rely on Alembic to autogenerate the view for you. Instead, you can instruct it to ignore views like this, in your Alembic env.py:
def include_object(obj, name, type_, reflected, compare_to):
if obj.info.get("is_view", False):
return False
return True
...
def run_migrations_offline():
...
context.configure(url=url, target_metadata=target_metadata, literal_binds=True, include_object=include_object)
...
def run_migrations_online():
....
with connectable.connect() as connection:
context.configure(connection=connection, target_metadata=target_metadata, include_object=include_object)
The is_view flag is set by my custom View base class:
class View(Model):
#classmethod
def _init_table(cls, sub_cls):
table: sa.Table = Model._init_table(sub_cls)
if table is None:
return table
table.info["is_view"] = True
return table
When automatic generation ignores the view, you can then add the appropriate commands to your migration manually:
activities = table(
"activities",
sa.Column("id", sa.Integer()),
...
)
view_query = (
select(
[
activities.c.id,
]
)
.select_from(activities)
)
def upgrade():
view_query_string = str(view_query.compile(compile_kwargs={"literal_binds": True}))
op.execute(f"CREATE VIEW activity_view AS {view_query_string}")
def downgrade():
op.execute("DROP VIEW activity_view")
Two important points:
code duplication is not always a bad thing - you can think of migrations as more of a version control tool, than regular code. Your version history should not depend on the current state of the codebase
manually-written migrations are arguably more error prone than generated ones, but you can partially alleviate this by running your migrations in tests for production applications. Also, just inspecting the resulting DB schema should help.

Although the question doesn't specifically call out using PostgreSQL, the post that it is based on targeted materialized views in PostgeSQL, so this answer also targets an add on package called alembic_utils, which is based on alembic ReplaceableObjects that adds support for autogenerating a larger number of PostgreSQL entity types, including functions, views, materialized views, triggers, and policies.
To setup, you create your materialized view in the following way;
from alembic_utils.pg_materialized_view import PGMaterializedView
actview = PGMaterializedView (
schema="public",
signature="activity_view",
definition="select ...",
with_data=True
)
You could base the definition off static SQL or compiled version of sqlalchemy code.
Then in your alembic env.py:
from foo import actview
from alembic_utils.replaceable_entity import register_entities
register_entities([actview])
Alembic will now autogenerate migrations when the materialized view is updated in code.

Add trigger to SQLAlchemy Base Class

I'm making a SQLAlchemy base class for a new Postgres database and want to have bookkeeping fields incorporated into it. Specifically, I want to have two columns for modified_at and modified_by that are updated automatically. I was able to find out how to do this for individual tables, but it seems like making this part of the base class is trickier.
My first thought was to try and leverage the declared_attr functionality, but I don't actually want to make the triggers an attribute in the model so that seems incorrect. Then I looked at adding the trigger using event.listen:
trigger = """
CREATE TRIGGER update_{table_name}_modified
BEFORE UPDATE ON {table_name}
FOR EACH ROW EXECUTE PROCEDURE update_modified_columns()
"""
def create_modified_trigger(target, connection, **kwargs):
if hasattr(target, 'name'):
connection.execute(modified_trigger.format(table_name=target.name))
Base = declarative_base()
event.listen(Base.metadata,'after_create', create_modified_trigger)
I thought I could find table_name using the target parameter as shown in the docs but when used with Base.metadata it returns MetaData(bind=None) rather than a table.
I would strongly prefer to have this functionality as part of the Base rather than including it in migrations or externally to reduce the chance of someone forgetting to add the triggers. Is this possible?

I was able to sort this out with the help of a coworker. The returned MetaData object did in fact have a list of tables. Here is the working code:
modified_trigger = """
CREATE TRIGGER update_{table_name}_modified
BEFORE UPDATE ON {table_name}
FOR EACH ROW EXECUTE PROCEDURE update_modified_columns()
"""
def create_modified_trigger(target, connection, **kwargs):
"""
This is used to add bookkeeping triggers after a table is created. It hooks
into the SQLAlchemy event system. It expects the target to be an instance of
MetaData.
"""
for key in target.tables:
table = target.tables[key]
connection.execute(modified_trigger.format(table_name=table.name))
Base = declarative_base()
event.listen(Base.metadata, 'after_create', create_modified_trigger)

Modify data as part of an alembic upgrade

I would like to modify some database data as part of an alembic upgrade.
I thought I could just add any code in the upgrade of my migration, but the following fails:
def upgrade():
### commands auto generated by Alembic - please adjust! ###
op.add_column('smsdelivery', sa.Column('sms_message_part_id', sa.Integer(), sa.ForeignKey('smsmessagepart.id'), nullable=True))
### end Alembic commands ###
from volunteer.models import DBSession, SmsDelivery, SmsMessagePart
for sms_delivery in DBSession.query(SmsDelivery).all():
message_part = DBSession.query(SmsMessagePart).filter(SmsMessagePart.message_id == sms_delivery.message_id).first()
if message_part is not None:
sms_delivery.sms_message_part = message_part
with the following error:
sqlalchemy.exc.UnboundExecutionError: Could not locate a bind configured on mapper Mapper|SmsDelivery|smsdelivery, SQL expression or this Session
I am not really understanding this error. How can I fix this or is doing operations like this not a possibility?

It is difficult to understand what exactly you are trying to achieve from the code excerpt your provided. But I'll try to guess. So the following answer will be based on my guess.
Line 4 - you import things (DBSession, SmsDelivery, SmsMessagePart) form your modules and then you are trying to operate with these objects like you do in your application.
The error shows that SmsDelivery is a mapper object - so it is pointing to some table. mapper objects should bind to valid sqlalchemy connection.
Which tells me that you skipped initialization of DB objects (connection and binding this connection to mapper objects) like you normally do in your application code.
DBSession looks like SQLAlchemy session object - it should have connection bind too.
Alembic already has connection ready and open - for making changes to db schema you are requesting with op.* methods.
So there should be way to get this connection.
According to Alembic manual op.get_bind() will return current Connection bind:
For full interaction with a connected database, use the “bind” available from the context:
from alembic import op
connection = op.get_bind()
So you may use this connection to run your queries into db.
PS. I would assume you wanted to perform some modifications to data in your table. You may try to formulate this modification into one update query. Alembic has special method for executing such changes - so you would not need to deal with connection.
alembic.operations.Operations.execute
execute(sql, execution_options=None)
Execute the given SQL using the current migration context.
In a SQL script context, the statement is emitted directly to the output stream. There is no return result, however, as this function is oriented towards generating a change script that can run in “offline” mode.
Parameters: sql – Any legal SQLAlchemy expression, including:
a string a sqlalchemy.sql.expression.text() construct.
a sqlalchemy.sql.expression.insert() construct.
a sqlalchemy.sql.expression.update(),
sqlalchemy.sql.expression.insert(), or
sqlalchemy.sql.expression.delete() construct. Pretty much anything
that’s “executable” as described in SQL Expression Language Tutorial.

Its worth noting that if you do this, you probably want to freeze a copy of your orm model inside the migration, like this:
class MyType(Base):
__tablename__ = 'existing_table'
__table_args__ = {'extend_existing': True}
id = Column(Integer, ...)
..
def upgrade():
Base.metadata.bind = op.get_bind()
for item in Session.query(MyType).all():
...
Otherwise you'll inevitably end up in a situation where you orm model changes and previous migrations no longer work.
Particularly note that you want to extend Base, not the base type itself (app.models.MyType) because your type might go away as some point, and once again, your migrations will fail.

You need to import Base also and then
Base.metatada.bind = op.get_bind()
and after this you can use your models like always without errors.

How to TRUNCATE TABLE using Django's ORM?

To empty a database table, I use this SQL Query:
TRUNCATE TABLE `books`
How to I truncate a table using Django's models and ORM?
I've tried this, but it doesn't work:
Book.objects.truncate()

The closest you'll get with the ORM is Book.objects.all().delete().
There are differences though: truncate will likely be faster, but the ORM will also chase down foreign key references and delete objects in other tables.

You can do this in a fast and lightweight way, but not using Django's ORM. You may execute raw SQL with a Django connection cursor:
from django.db import connection
cursor = connection.cursor()
cursor.execute("TRUNCATE TABLE `books`")

You can use the model's _meta property to fill in the database table name:
from django.db import connection
cursor = connection.cursor()
cursor.execute('TRUNCATE TABLE "{0}"'.format(MyModel._meta.db_table))
Important: This does not work for inherited models as they span multiple tables!

In addition to Ned Batchelder's answer and refering to Bernhard Kircher's comment:
In my case I needed to empty a very large database using the webapp:
Book.objects.all().delete()
Which, in the development SQLlite environment, returned:
too many SQL variables
So I added a little workaround. It maybe not the neatest, but at least it works until the truncate table option is build into Django's ORM:
countdata = Book.objects.all().count()
logger.debug("Before deleting: %s data records" % countdata)
while countdata > 0:
if countdata > 999:
objects_to_keep = Book.objects.all()[999:]
Book.objects.all().exclude(pk__in=objects_to_keep).delete()
countdata = Book.objects.all().count()
else:
Book.objects.all().delete()
countdata = Book.objects.all().count()
By the way, some of my code was based on "Django Delete all but last five of queryset".
I added this while being aware the answer was already answered, but hopefully this addition will help some other people.

I know this is a very old Question and few corrects answer is in here is as well but I can't resist myself to share the most elegant and fastest way to serve the purpose of this question.
class Book(models.Model):
# Your Model Declaration
#classmethod
def truncate(cls):
with connection.cursor() as cursor:
cursor.execute('TRUNCATE TABLE {} CASCADE'.format(cls._meta.db_table))
And now to truncate all data from Book table just call
Book.truncate()
Since this is directly interact with your Database it will perform much faster than doing this
Book.objects.all().delete()

Now there's a library to help you truncate a specific TABLE in your Django project Database, It called django-truncate.
It's simple just run python manage.py truncate --apps myapp --models Model1 and all of the data in that TABLE will be deleted!
Learn more about it here: https://github.com/KhaledElAnsari/django-truncate

For me the to truncate my local sqllite database I end up with python manage.py flush.
What I have initial tried is to iterate over the models and delete all to rows one by one:
models = [m for c in apps.get_app_configs() for m in c.get_models(include_auto_created=False)]
for m in models:
m.objects.all().delete()
But becuse I have Protected foreign key the success of the operation depended on the order of the models.
So, I am using te flush command to truncate my local test database and it is working for me
https://docs.djangoproject.com/en/3.0/ref/django-admin/#django-admin-flush

This code uses PosgreSQL dialect. Leave out the cascade bits to use standard SQL.
Following up on Shubho Shaha's answer, you could also create a model manager for this.
class TruncateManager(models.Manager):
def truncate(self, cascade=False):
appendix = " CASCADE;" if cascade else ";"
raw_sql = f"TRUNCATE TABLE {self.model._meta.db_table}{appendix}"
cursor = connection.cursor()
cursor.execute(raw_sql)
class Truncatable(models.Model):
class Meta:
abstract = True
objects = TruncateManager()
Then, you can extend the Truncatable to create truncatable objects:
class Book(Truncatable):
...
That will allow you to call truncate on all models that extend from Truncatable.
Book.objects.truncate()
I added a flag to use cascade as well, which (danger zone) will also: "Automatically truncate all tables that have foreign-key references to any of the named tables, or to any tables added to the group due to CASCADE.", which is obviously more destructive, but will allow the code to run inside an atomic transaction.

This is doesn't directly answer the OP's question, but is nevertheless a solution one might use to achieve the same thing - differently.
Well, for some strange reason (while attempting to use the suggested RAW methods in the other answers here), I failed to truncate my Django database cache table until I did something like this:
import commands
cmd = ['psql', DATABASE, 'postgres', '-c', '"TRUNCATE %s;"' % TABLE]
commands.getstatusoutput(' '.join(cmd))
Basically, I had to resort to issuing the truncate command via the database's utility commands - psql in this case since am using Postgres. So, automating the command line might handle such corner cases.
Might save someone else some time...

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to version-control functions and triggers with alembic? - python

Related

Dialect-specific SQLAlchemy declarative Column defaults

alembic generation of materialized view

Add trigger to SQLAlchemy Base Class

Modify data as part of an alembic upgrade

How to TRUNCATE TABLE using Django's ORM?

Categories

Resources