Modify data as part of an alembic upgrade - python

I would like to modify some database data as part of an alembic upgrade.
I thought I could just add any code in the upgrade of my migration, but the following fails:
def upgrade():
### commands auto generated by Alembic - please adjust! ###
op.add_column('smsdelivery', sa.Column('sms_message_part_id', sa.Integer(), sa.ForeignKey('smsmessagepart.id'), nullable=True))
### end Alembic commands ###
from volunteer.models import DBSession, SmsDelivery, SmsMessagePart
for sms_delivery in DBSession.query(SmsDelivery).all():
message_part = DBSession.query(SmsMessagePart).filter(SmsMessagePart.message_id == sms_delivery.message_id).first()
if message_part is not None:
sms_delivery.sms_message_part = message_part
with the following error:
sqlalchemy.exc.UnboundExecutionError: Could not locate a bind configured on mapper Mapper|SmsDelivery|smsdelivery, SQL expression or this Session
I am not really understanding this error. How can I fix this or is doing operations like this not a possibility?

It is difficult to understand what exactly you are trying to achieve from the code excerpt your provided. But I'll try to guess. So the following answer will be based on my guess.
Line 4 - you import things (DBSession, SmsDelivery, SmsMessagePart) form your modules and then you are trying to operate with these objects like you do in your application.
The error shows that SmsDelivery is a mapper object - so it is pointing to some table. mapper objects should bind to valid sqlalchemy connection.
Which tells me that you skipped initialization of DB objects (connection and binding this connection to mapper objects) like you normally do in your application code.
DBSession looks like SQLAlchemy session object - it should have connection bind too.
Alembic already has connection ready and open - for making changes to db schema you are requesting with op.* methods.
So there should be way to get this connection.
According to Alembic manual op.get_bind() will return current Connection bind:
For full interaction with a connected database, use the “bind” available from the context:
from alembic import op
connection = op.get_bind()
So you may use this connection to run your queries into db.
PS. I would assume you wanted to perform some modifications to data in your table. You may try to formulate this modification into one update query. Alembic has special method for executing such changes - so you would not need to deal with connection.
alembic.operations.Operations.execute
execute(sql, execution_options=None)
Execute the given SQL using the current migration context.
In a SQL script context, the statement is emitted directly to the output stream. There is no return result, however, as this function is oriented towards generating a change script that can run in “offline” mode.
Parameters: sql – Any legal SQLAlchemy expression, including:
a string a sqlalchemy.sql.expression.text() construct.
a sqlalchemy.sql.expression.insert() construct.
a sqlalchemy.sql.expression.update(),
sqlalchemy.sql.expression.insert(), or
sqlalchemy.sql.expression.delete() construct. Pretty much anything
that’s “executable” as described in SQL Expression Language Tutorial.

Its worth noting that if you do this, you probably want to freeze a copy of your orm model inside the migration, like this:
class MyType(Base):
__tablename__ = 'existing_table'
__table_args__ = {'extend_existing': True}
id = Column(Integer, ...)
..
def upgrade():
Base.metadata.bind = op.get_bind()
for item in Session.query(MyType).all():
...
Otherwise you'll inevitably end up in a situation where you orm model changes and previous migrations no longer work.
Particularly note that you want to extend Base, not the base type itself (app.models.MyType) because your type might go away as some point, and once again, your migrations will fail.

You need to import Base also and then
Base.metatada.bind = op.get_bind()
and after this you can use your models like always without errors.

Related

Unit testing a function that depends on database

I am running tests on some functions. I have a function that uses database queries. So, I have gone through the blogs and docs that say we have to make an in memory or test database to use such functions. Below is my function,
def already_exists(story_data,c):
# TODO(salmanhaseeb): Implement de-dupe functionality by checking if it already
# exists in the DB.
c.execute("""SELECT COUNT(*) from posts where post_id = ?""", (story_data.post_id,))
(number_of_rows,)=c.fetchone()
if number_of_rows > 0:
return True
return False
This function hits the production database. My question is that, when in testing, I create an in memory database and populate my values there, I will be querying that database (test DB). But I want to test my already_exists() function, after calling my already_exists function from test, my production db will be hit. How do I make my test DB hit while testing this function?
There are two routes you can take to address this problem:
Make an integration test instead of a unit test and just use a copy of the real database.
Provide a fake to the method instead of actual connection object.
Which one you should do depends on what you're trying to achieve.
If you want to test that the query itself works, then you should use an integration test. Full stop. The only way to make sure the query as intended is to run it with test data already in a copy of the database. Running it against a different database technology (e.g., running against SQLite when your production database in PostgreSQL) will not ensure that it works in production. Needing a copy of the database means you will need some automated deployment process for it that can be easily invoked against a separate database. You should have such an automated process, anyway, as it helps ensure that your deployments across environments are consistent, allows you to test them prior to release, and "documents" the process of upgrading the database. Standard solutions to this are migration tools written in your programming language like albemic or tools to execute raw SQL like yoyo or Flyway. You would need to invoke the deployment and fill it with test data prior to running the test, then run the test and assert the output you expect to be returned.
If you want to test the code around the query and not the query itself, then you can use a fake for the connection object. The most common solution to this is a mock. Mocks provide stand ins that can be configured to accept the function calls and inputs and return some output in place of the real object. This would allow you to test that the logic of the method works correctly, assuming that the query returns the results you expect. For your method, such a test might look something like this:
from unittest.mock import Mock
...
def test_already_exists_returns_true_for_positive_count():
mockConn = Mock(
execute=Mock(),
fetchone=Mock(return_value=(5,)),
)
story = Story(post_id=10) # Making some assumptions about what your object might look like.
result = already_exists(story, mockConn)
assert result
# Possibly assert calls on the mock. Value of these asserts is debatable.
mockConn.execute.assert_called("""SELECT COUNT(*) from posts where post_id = ?""", (story.post_id,))
mockConn.fetchone.assert_called()
The issue is ensuring that your code consistently uses the same database connection. Then you can set it once to whatever is appropriate for the current environment.
Rather than passing the database connection around from method to method, it might make more sense to make it a singleton.
def already_exists(story_data):
# Here `connection` is a singleton which returns the database connection.
connection.execute("""SELECT COUNT(*) from posts where post_id = ?""", (story_data.post_id,))
(number_of_rows,) = connection.fetchone()
if number_of_rows > 0:
return True
return False
Or make connection a method on each class and turn already_exists into a method. It should probably be a method regardless.
def already_exists(self):
# Here the connection is associated with the object.
self.connection.execute("""SELECT COUNT(*) from posts where post_id = ?""", (self.post_id,))
(number_of_rows,) = self.connection.fetchone()
if number_of_rows > 0:
return True
return False
But really you shouldn't be rolling this code yourself. Instead you should use an ORM such as SQLAlchemy which takes care of basic queries and connection management like this for you. It has a single connection, the "session".
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from sqlalchemy_declarative import Address, Base, Person
engine = create_engine('sqlite:///sqlalchemy_example.db')
Base.metadata.bind = engine
DBSession = sessionmaker(bind=engine)
session = DBSession()
Then you use that to make queries. For example, it has an exists method.
session.query(Post.id).filter(q.exists()).scalar()
Using an ORM will greatly simplify your code. Here's a short tutorial for the basics, and a longer and more complete tutorial.

alembic generation of materialized view

TL;DR: How do I get alembic to understand and generate SQL for materialized views created in sqlalchemy?
I'm using flask-sqlalchemy and also using alembic with postgres. To get a materialized view working with sqlalchemy, I followed a nice post on the topic. I used it heavily, with just a few minor divergences (the article uses flask-sqlalchemy as well, however the complete code example uses sqlalchemy's declarative base directly instead).
class ActivityView(db.Model):
__table__ = create_materialized_view(
'activity_view',
db.select([
Activity.id.label('id'),
Activity.name.label('name'),
Activity.start_date.label('start_date'),
]).where(
db.and_(
Activity.start_date != None,
Activity.start_date <=
datetime_to_str(datetime.now(tz=pytz.UTC) + timedelta(hours=48))
)
)
)
#classmethod
def refresh(cls, concurrently=True):
refresh_materialized_view(cls.__table__.fullname, concurrently)
db.Index('activity_view_index',
ActivityView.__table__.c.id, ActivityView.__table__.c.start_date,
unique=True)
The create_materialized_view and refresh_materialized_view methods are taken straight from the blog post.
Note that the example above has been greatly simplified and probably seems silly because of my simplifications, but the real idea I want to get at is how do I get alembic to translate this view to a series of alembic operations during a migration?
When I run tests, the code runs fine, the view gets generated fine and everything works. When alembic runs it doesn't do anything with the view. So what I end up doing is copying the SQL that the tests emit for the materialized view into the alembic migrations/versions file and just end up executing that directly as:
op.execute(activities_view_sql)
Similarly, I do the same direct SQL execution when generating the unique index on the materialized view.
Unfortunately my approach is error prone and creates seemingly unnecessary code duplication.
Is there a way to get alembic to understand my ActivityView so that any time it changes, alembic will know how to update the view?
Thanks much!
TLDR: Just write the view migration manually. It doesn't seem there is reasonable support for view autogeneration.
EDIT: It's possible there is a way to autogenerate view migrations now. See answer https://stackoverflow.com/a/72829474/2839862
I think the easiest way around this is to not rely on Alembic to autogenerate the view for you. Instead, you can instruct it to ignore views like this, in your Alembic env.py:
def include_object(obj, name, type_, reflected, compare_to):
if obj.info.get("is_view", False):
return False
return True
...
def run_migrations_offline():
...
context.configure(url=url, target_metadata=target_metadata, literal_binds=True, include_object=include_object)
...
def run_migrations_online():
....
with connectable.connect() as connection:
context.configure(connection=connection, target_metadata=target_metadata, include_object=include_object)
The is_view flag is set by my custom View base class:
class View(Model):
#classmethod
def _init_table(cls, sub_cls):
table: sa.Table = Model._init_table(sub_cls)
if table is None:
return table
table.info["is_view"] = True
return table
When automatic generation ignores the view, you can then add the appropriate commands to your migration manually:
activities = table(
"activities",
sa.Column("id", sa.Integer()),
...
)
view_query = (
select(
[
activities.c.id,
]
)
.select_from(activities)
)
def upgrade():
view_query_string = str(view_query.compile(compile_kwargs={"literal_binds": True}))
op.execute(f"CREATE VIEW activity_view AS {view_query_string}")
def downgrade():
op.execute("DROP VIEW activity_view")
Two important points:
code duplication is not always a bad thing - you can think of migrations as more of a version control tool, than regular code. Your version history should not depend on the current state of the codebase
manually-written migrations are arguably more error prone than generated ones, but you can partially alleviate this by running your migrations in tests for production applications. Also, just inspecting the resulting DB schema should help.
Although the question doesn't specifically call out using PostgreSQL, the post that it is based on targeted materialized views in PostgeSQL, so this answer also targets an add on package called alembic_utils, which is based on alembic ReplaceableObjects that adds support for autogenerating a larger number of PostgreSQL entity types, including functions, views, materialized views, triggers, and policies.
To setup, you create your materialized view in the following way;
from alembic_utils.pg_materialized_view import PGMaterializedView
actview = PGMaterializedView (
schema="public",
signature="activity_view",
definition="select ...",
with_data=True
)
You could base the definition off static SQL or compiled version of sqlalchemy code.
Then in your alembic env.py:
from foo import actview
from alembic_utils.replaceable_entity import register_entities
register_entities([actview])
Alembic will now autogenerate migrations when the materialized view is updated in code.

Can't delete row from SQLAlchemy due to wrong session

I am trying to delete an entry from my table. This is my code for the delete function.
#app.route("/delete_link/<link_id>", methods=['GET', 'POST'])
def delete_link(link_id):
link = models.Link.query.filter(models.Link.l_id == link_id).first()
db.session.delete(link)
db.session.commit()
return flask.redirect(flask.url_for('links'))
the line: db.session.delete(link) returns me this error:
InvalidRequestError: Object '' is already attached to session '1' (this is '2')
I've tried this code as well:
#app.route("/delete_link/<link_id>", methods=['GET', 'POST'])
def delete_link(link_id):
link = models.Link.query.filter(models.Link.l_id == link_id)
link.delete()
db.session.commit()
return flask.redirect(flask.url_for('links'))
which does not update the database. Link must not be in the session I guess, but I don't know how to check that, and how to fix it.
I am new to sqlalchemy.
EDIT:
I use this to create my db variable which probably creates the session at this stage (this is at the top of the code). It comes from the flask documentation
from yourapplication import db
You are creating 2 instances of the db object, inherently creating 2 different sessions.
In models.py:
...
5. from config import app
6.
7. db = SQLAlchemy(app)
In erika.py:
...
16. from config import app
...
23. db = SQLAlchemy(app)
then when you try to delete the element:
link = models.Link.query.filter(models.Link.l_id == link_id).first()
db.session.delete(link)
db.session.commit()
the following happens:
models.Link.query uses the database session created by models.py to get the record.
db.session.delete uses the session created by erika.py.
link is attached to the models.py session and you can't use another session (erikas.py) to delete it. Hence:
InvalidRequestError: Object '' is already attached to session '1' (this is '2')
Solution
The solution it's simple. Have only one instance of a db object at any time and reuse that instance whenever you need db operations.
erika.py
from models import db
This way you are always using the same session that was used to fetch your records.
It appears to be a similar problem to the one described at http://blog.miguelgrinberg.com/post/the-flask-mega-tutorial-part-xvi-debugging-testing-and-profiling
It's a good in-depth description of the problem and how he solved it. The author of that article made a fix that's available as a fork.
The Fix
To address this problem we need to find an alternative way of attaching Flask-WhooshAlchemy's query object to the model.
The documentation for Flask-SQLAlchemy mentions there is a model.query_class attribute that contains the class to use for queries. This is actually a much cleaner way to make Flask-SQLAlchemy use a custom query class than what Flask-WhooshAlchemy is doing. If we configure Flask-SQLAlchemy to create queries using the Whoosh enabled query class (which is already a subclass of Flask-SQLAlchemy's BaseQuery), then we should have the same result as before, but without the bug.
I have created a fork of the Flask-WhooshAlchemy project on github where I have implemented these changes. If you want to see the changes you can see the github diff for my commit, or you can also download the fixed extension and install it in place of your original flask_whooshalchemy.py file.

Why does this SQLAlchemy example commit changes to the DB?

This example illustrates a mystery I encountered in an application I am building. The application needs to support an option allowing the user to exercise the code without actually committing changes to the DB. However, when I added this option, I discovered that changes were persisted to the DB even when I did not call the commit() method.
My specific question can be found in the code comments. The underlying goal is to have a clearer understanding of when and why SQLAlchemy will commit to the DB.
My broader question is whether my application should (a) use a global Session instance, or (b) use a global Session class, from which particular instances would be instantiated. Based on this example, I'm starting to think that the correct answer is (b). Is that right? Edit: this SQLAlchemy documentation suggests that (b) is recommended.
import sys
from sqlalchemy import create_engine, Column, Integer, String
from sqlalchemy.orm import sessionmaker
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
class User(Base):
__tablename__ = 'users'
id = Column(Integer, primary_key = True)
name = Column(String)
age = Column(Integer)
def __init__(self, name, age = 0):
self.name = name
self.age = 0
def __repr__(self):
return "<User(name='{0}', age={1})>".format(self.name, self.age)
engine = create_engine('sqlite://', echo = False)
Base.metadata.create_all(engine)
Session = sessionmaker()
Session.configure(bind=engine)
global_session = Session() # A global Session instance.
commit_ages = False # Whether to commit in modify_ages().
use_global = True # If True, modify_ages() will commit, regardless
# of the value of commit_ages. Why?
def get_session():
return global_session if use_global else Session()
def add_users(names):
s = get_session()
s.add_all(User(nm) for nm in names)
s.commit()
def list_users():
s = get_session()
for u in s.query(User): print ' ', u
def modify_ages():
s = get_session()
n = 0
for u in s.query(User):
n += 10
u.age = n
if commit_ages: s.commit()
add_users(('A', 'B', 'C'))
print '\nBefore:'
list_users()
modify_ages()
print '\nAfter:'
list_users()
tl;dr - The updates are not actually committed to the database-- they are part of an uncommitted transaction in progress.
I made 2 separate changes to your call to create_engine(). (Other than this one line, I'm using your code exactly as posted.)
The first was
engine = create_engine('sqlite://', echo = True)
This provides some useful information. I'm not going to post the entire output here, but notice that no SQL update commands are issued until after the second call to list_users() is made:
...
After:
xxxx-xx-xx xx:xx:xx,xxx INFO sqlalchemy.engine.base.Engine.0x...d3d0 UPDATE users SET age=? WHERE users.id = ?
xxxx-xx-xx xx:xx:xx,xxx INFO sqlalchemy.engine.base.Engine.0x...d3d0 (10, 1)
...
This is a clue that the data is not persisted, but kept around in the session object.
The second change I made was to persist the database to a file with
engine = create_engine('sqlite:///db.sqlite', echo = True)
Running the script again provides the same output as before for the second call to list_users():
<User(name='A', age=10)>
<User(name='B', age=20)>
<User(name='C', age=30)>
However, if you now open the db we just created and query it's contents, you can see that the added users were persisted to the database, but the age modifications were not:
$ sqlite3 db.sqlite "select * from users"
1|A|0
2|B|0
3|C|0
So, the second call to list_users() is getting its values from the session object, not from the database, because there is a transaction in progress that hasn't been committed yet. To prove this, add the following lines to the end of your script:
s = get_session()
s.rollback()
print '\nAfter rollback:'
list_users()
Since you state you are actually using MySQL on the system you are seeing the problem, check the engine type the table was created with. The default is MyISAM, which does not support ACID transactions. Make sure you are using the InnoDB engine, which does do ACID transactions.
You can see which engine a table is using with
show create table users;
You can change the db engine for a table with alter table:
alter table users engine="InnoDB";
1. the example: Just to make sure that (or check if) the session does not commit the changes, it is enough to call expunge_all on the session object. This will most probably prove that the changes are not actually committed:
....
print '\nAfter:'
get_session().expunge_all()
list_users()
2. mysql: As you already mentioned, the sqlite example might not reflect what you actually see when using mysql. As documented in sqlalchemy - MySQL - Storage Engines, the most likely reason for your problem is the usage of non-transactional storage engines (like MyISAM), which results in an autocommit mode of execution.
3. session scope: Although having one global session sounds like a quest for a problem, using new session for every tiny little request is also not a great idea. You should think of a session as a transaction/unit-of-work. I find the usage of the contextual sessions the best of two worlds, where you do not have to pass the session object in the hierarchy of method calls, and at the same time you are given a pretty good safety in the multi-threaded environment. I do use the local session once in a while where I know I do not want to interact with the currently running transaction (session).
Note that the defaults of create_session() are the opposite of that of sessionmaker(): autoflush and expire_on_commit are False, autocommit is True.
global_session is already instantiated when you call modify_ages() and you've already committed to the database. If you re-instantiate global_session after you commit, it should start a new transaction.
My guess is since you've already committed and are re-using the same object, each additional modification is automatically committed.

How to TRUNCATE TABLE using Django's ORM?

To empty a database table, I use this SQL Query:
TRUNCATE TABLE `books`
How to I truncate a table using Django's models and ORM?
I've tried this, but it doesn't work:
Book.objects.truncate()
The closest you'll get with the ORM is Book.objects.all().delete().
There are differences though: truncate will likely be faster, but the ORM will also chase down foreign key references and delete objects in other tables.
You can do this in a fast and lightweight way, but not using Django's ORM. You may execute raw SQL with a Django connection cursor:
from django.db import connection
cursor = connection.cursor()
cursor.execute("TRUNCATE TABLE `books`")
You can use the model's _meta property to fill in the database table name:
from django.db import connection
cursor = connection.cursor()
cursor.execute('TRUNCATE TABLE "{0}"'.format(MyModel._meta.db_table))
Important: This does not work for inherited models as they span multiple tables!
In addition to Ned Batchelder's answer and refering to Bernhard Kircher's comment:
In my case I needed to empty a very large database using the webapp:
Book.objects.all().delete()
Which, in the development SQLlite environment, returned:
too many SQL variables
So I added a little workaround. It maybe not the neatest, but at least it works until the truncate table option is build into Django's ORM:
countdata = Book.objects.all().count()
logger.debug("Before deleting: %s data records" % countdata)
while countdata > 0:
if countdata > 999:
objects_to_keep = Book.objects.all()[999:]
Book.objects.all().exclude(pk__in=objects_to_keep).delete()
countdata = Book.objects.all().count()
else:
Book.objects.all().delete()
countdata = Book.objects.all().count()
By the way, some of my code was based on "Django Delete all but last five of queryset".
I added this while being aware the answer was already answered, but hopefully this addition will help some other people.
I know this is a very old Question and few corrects answer is in here is as well but I can't resist myself to share the most elegant and fastest way to serve the purpose of this question.
class Book(models.Model):
# Your Model Declaration
#classmethod
def truncate(cls):
with connection.cursor() as cursor:
cursor.execute('TRUNCATE TABLE {} CASCADE'.format(cls._meta.db_table))
And now to truncate all data from Book table just call
Book.truncate()
Since this is directly interact with your Database it will perform much faster than doing this
Book.objects.all().delete()
Now there's a library to help you truncate a specific TABLE in your Django project Database, It called django-truncate.
It's simple just run python manage.py truncate --apps myapp --models Model1 and all of the data in that TABLE will be deleted!
Learn more about it here: https://github.com/KhaledElAnsari/django-truncate
For me the to truncate my local sqllite database I end up with python manage.py flush.
What I have initial tried is to iterate over the models and delete all to rows one by one:
models = [m for c in apps.get_app_configs() for m in c.get_models(include_auto_created=False)]
for m in models:
m.objects.all().delete()
But becuse I have Protected foreign key the success of the operation depended on the order of the models.
So, I am using te flush command to truncate my local test database and it is working for me
https://docs.djangoproject.com/en/3.0/ref/django-admin/#django-admin-flush
This code uses PosgreSQL dialect. Leave out the cascade bits to use standard SQL.
Following up on Shubho Shaha's answer, you could also create a model manager for this.
class TruncateManager(models.Manager):
def truncate(self, cascade=False):
appendix = " CASCADE;" if cascade else ";"
raw_sql = f"TRUNCATE TABLE {self.model._meta.db_table}{appendix}"
cursor = connection.cursor()
cursor.execute(raw_sql)
class Truncatable(models.Model):
class Meta:
abstract = True
objects = TruncateManager()
Then, you can extend the Truncatable to create truncatable objects:
class Book(Truncatable):
...
That will allow you to call truncate on all models that extend from Truncatable.
Book.objects.truncate()
I added a flag to use cascade as well, which (danger zone) will also: "Automatically truncate all tables that have foreign-key references to any of the named tables, or to any tables added to the group due to CASCADE.", which is obviously more destructive, but will allow the code to run inside an atomic transaction.
This is doesn't directly answer the OP's question, but is nevertheless a solution one might use to achieve the same thing - differently.
Well, for some strange reason (while attempting to use the suggested RAW methods in the other answers here), I failed to truncate my Django database cache table until I did something like this:
import commands
cmd = ['psql', DATABASE, 'postgres', '-c', '"TRUNCATE %s;"' % TABLE]
commands.getstatusoutput(' '.join(cmd))
Basically, I had to resort to issuing the truncate command via the database's utility commands - psql in this case since am using Postgres. So, automating the command line might handle such corner cases.
Might save someone else some time...

Categories

Resources