TL;DR: How do I get alembic to understand and generate SQL for materialized views created in sqlalchemy?
I'm using flask-sqlalchemy and also using alembic with postgres. To get a materialized view working with sqlalchemy, I followed a nice post on the topic. I used it heavily, with just a few minor divergences (the article uses flask-sqlalchemy as well, however the complete code example uses sqlalchemy's declarative base directly instead).
class ActivityView(db.Model):
__table__ = create_materialized_view(
'activity_view',
db.select([
Activity.id.label('id'),
Activity.name.label('name'),
Activity.start_date.label('start_date'),
]).where(
db.and_(
Activity.start_date != None,
Activity.start_date <=
datetime_to_str(datetime.now(tz=pytz.UTC) + timedelta(hours=48))
)
)
)
#classmethod
def refresh(cls, concurrently=True):
refresh_materialized_view(cls.__table__.fullname, concurrently)
db.Index('activity_view_index',
ActivityView.__table__.c.id, ActivityView.__table__.c.start_date,
unique=True)
The create_materialized_view and refresh_materialized_view methods are taken straight from the blog post.
Note that the example above has been greatly simplified and probably seems silly because of my simplifications, but the real idea I want to get at is how do I get alembic to translate this view to a series of alembic operations during a migration?
When I run tests, the code runs fine, the view gets generated fine and everything works. When alembic runs it doesn't do anything with the view. So what I end up doing is copying the SQL that the tests emit for the materialized view into the alembic migrations/versions file and just end up executing that directly as:
op.execute(activities_view_sql)
Similarly, I do the same direct SQL execution when generating the unique index on the materialized view.
Unfortunately my approach is error prone and creates seemingly unnecessary code duplication.
Is there a way to get alembic to understand my ActivityView so that any time it changes, alembic will know how to update the view?
Thanks much!
TLDR: Just write the view migration manually. It doesn't seem there is reasonable support for view autogeneration.
EDIT: It's possible there is a way to autogenerate view migrations now. See answer https://stackoverflow.com/a/72829474/2839862
I think the easiest way around this is to not rely on Alembic to autogenerate the view for you. Instead, you can instruct it to ignore views like this, in your Alembic env.py:
def include_object(obj, name, type_, reflected, compare_to):
if obj.info.get("is_view", False):
return False
return True
...
def run_migrations_offline():
...
context.configure(url=url, target_metadata=target_metadata, literal_binds=True, include_object=include_object)
...
def run_migrations_online():
....
with connectable.connect() as connection:
context.configure(connection=connection, target_metadata=target_metadata, include_object=include_object)
The is_view flag is set by my custom View base class:
class View(Model):
#classmethod
def _init_table(cls, sub_cls):
table: sa.Table = Model._init_table(sub_cls)
if table is None:
return table
table.info["is_view"] = True
return table
When automatic generation ignores the view, you can then add the appropriate commands to your migration manually:
activities = table(
"activities",
sa.Column("id", sa.Integer()),
...
)
view_query = (
select(
[
activities.c.id,
]
)
.select_from(activities)
)
def upgrade():
view_query_string = str(view_query.compile(compile_kwargs={"literal_binds": True}))
op.execute(f"CREATE VIEW activity_view AS {view_query_string}")
def downgrade():
op.execute("DROP VIEW activity_view")
Two important points:
code duplication is not always a bad thing - you can think of migrations as more of a version control tool, than regular code. Your version history should not depend on the current state of the codebase
manually-written migrations are arguably more error prone than generated ones, but you can partially alleviate this by running your migrations in tests for production applications. Also, just inspecting the resulting DB schema should help.
Although the question doesn't specifically call out using PostgreSQL, the post that it is based on targeted materialized views in PostgeSQL, so this answer also targets an add on package called alembic_utils, which is based on alembic ReplaceableObjects that adds support for autogenerating a larger number of PostgreSQL entity types, including functions, views, materialized views, triggers, and policies.
To setup, you create your materialized view in the following way;
from alembic_utils.pg_materialized_view import PGMaterializedView
actview = PGMaterializedView (
schema="public",
signature="activity_view",
definition="select ...",
with_data=True
)
You could base the definition off static SQL or compiled version of sqlalchemy code.
Then in your alembic env.py:
from foo import actview
from alembic_utils.replaceable_entity import register_entities
register_entities([actview])
Alembic will now autogenerate migrations when the materialized view is updated in code.
Related
Suppose, there's some trigger in the database with a function, like this:
-- Insert a new entry into another table
-- every time a NEW row is inserted
CREATE FUNCTION trgfunc_write_log() RETURNS TRIGGER AS $$
BEGIN
INSERT INTO some_other_table (
-- some columns
meter_id,
date_taken,
temperature,
) values (
NEW.meter_id,
NEW.time_taken,
NEW.temperature
);
return NEW;
END;
$$ language 'plpgsql';
-- The trigger itself: AFTER INSERT
CREATE TRIGGER trg_temperature_readings
AFTER INSERT ON temperature_readings
FOR EACH ROW
EXECUTE FUNCTION trgfunc_write_log();
Typically, this trigger will live next to my SqlAlchemy models and be auto-created with something like this:
from sqlalchemy import DDL, event
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
class Reading(Base):
...
create_trigger = DDL(""" ...SQL... """)
event.listen(Reading.__table__, 'after_create', create_trigger)
What's your best practice for version-controlling such a trigger and its function with Alembic migrations?
I recently had the same question come up in an application and found this article in the Alembic Cookbook.
It outlines a somewhat complex strategy of creating an object that encapsulates the name and SQL used to create a view, stored procedure, or trigger among other objects used to perform the Alembic operations to upgrade and downgrade that schema object. It looks something like this when used in an Alembic revision:
from alembic import op
from my_module import ReplaceableObject
my_trigger = ReplaceableObject(
"trigger_name",
"""...SQL..."""
)
def upgrade():
op.create_trigger(my_trigger)
def downgrade():
op.drop_trigger(my_trigger)
My team is currently discussing if this strategy is too complex for a simple trigger compared to a view or stored proc. You may update those schema objects more frequently making much of the behavior outlined in the Cookbook abstractions more valuable than with a simple trigger.
Another proposed option was something like this:
from alembic import op
create_trigger = """...SQL..."""
drop_trigger = """...SQL..."""
def upgrade():
op.execute(create_trigger)
def downgrade():
op.execute(drop_trigger)
The two implementations look almost identical, which is the argument for the Cookbook abstraction being unnecessarily complex for a simple trigger.
This is straightforward using Alembic Utils (pip install alembic_utils).
Create your function in your normal codebase, e.g.
from alembic_utils.pg_function import PGFunction
trgfunc_write_log = PGFunction(
schema="public",
signature="trgfunc_write_log()",
definition="""
RETURNS TRIGGER AS $$
BEGIN
INSERT INTO some_other_table (
-- some columns
meter_id,
date_taken,
temperature,
) values (
NEW.meter_id,
NEW.time_taken,
NEW.temperature
);
return NEW;
END;
$$ language 'plpgsql'
""")
trg_temperature_readings = PGTrigger(
schema="public",
signature="trg_temperature_readings",
on_entity="public.temperature_readings",
is_constraint=False,
definition="""AFTER INSERT ON temperature_readings
FOR EACH ROW
EXECUTE FUNCTION trgfunc_write_log()""",
)
The docs show how to modify your alembic ini file and env.py file - one gotcha is you have to register the entities e.g. in env.py:
from alembic_utils.replaceable_entity import register_entities
from app.db.function import trg_temperature_readings, trgfunc_write_log
register_entities([trg_temperature_readings, trgfunc_write_log])
Then alembic's auto generated migrations should work as normal:
alembic revision --autogenerate -m 'add temperature log trigger'.
Suppose there is a production database, there is some data in it. I need to migrate in the next tricky case.
There is a model (already in db), say Model, it has foreign keys to other models.
class ModelA: ...
class ModelX: ...
class Model:
a = models.ForeignKey(ModelA, default = A)
x = models.ForeignKey(ModelX, default = X)
And we need to create one more model ModelY to which Model should refer. And when creating a Model, an object should have some default value related to some ModelY object, which is obviously not yet available, but we should create it during migration.
class ModelY: ...
class Model:
y = models.ForeignKey (ModelY, default = ??????)
So the migration sequence should be:
Create ModelY table
Create a default object in this table, put its id somewhere
Create a new field y in the Model table, with the default value taken
from the previous paragraph
And I'd like to automate all of this, of course. So to avoid necessity to apply one migration by hands, then create some object, then write down it's id and then use this id as default value for new field, and only then apply another migration with this new field.
And I'd also like to do it all in one step, so define both ModelY and a new field y in the old model, generate migration, fix it somehow, and then apply at once and make it work.
Are there any best practices for such case? In particular, where to store this newly created object's id? Some dedicated table in same db?
You won't be able to do this in a single migration file, however you'll be able to create several migrations files to achieve this. I'll have a go at helping you out though I'm not totally certain this is what you want, it should teach you a thing or two about Django migrations.
I'm going to refer to two types of migrations here, one is a schema migration, and these are the migration files you typically generate after changing your models. The other is a data migration, and these need to be created using the --empty option of the makemigrations command, e.g. python manage.py makemigrations my_app --empty, and are used to move data around, set data on null columns that need to be changed to non-null, etc.
class ModelY(models.Model):
# Fields ...
is_default = models.BooleanField(default=False, help_text="Will be specified true by the data migration")
class Model(models.Model):
# Fields ...
y = models.ForeignKey(ModelY, null=True, default=None)
You'll notice that y accepts null, we can change this later, for now you can run python manage.py makemigrations to generate the schema migration.
To generate your first data migration run the command python manage.py makemigrations <app_name> --empty. You'll see an empty migration file in your migrations folder. You should add two methods, one that is going to create your default ModelY instance and assign it to your existing Model instances, and another that will be a stub method so Django will let you reverse your migrations later if needed.
from __future__ import unicode_literals
from django.db import migrations
def migrate_model_y(apps, schema_editor):
"""Create a default ModelY instance, and apply this to all our existing models"""
ModelY = apps.get_model("my_app", "ModelY")
default_model_y = ModelY.objects.create(something="something", is_default=True)
Model = apps.get_model("my_app", "Model")
models = Model.objects.all()
for model in models:
model.y = default_model_y
model.save()
def reverse_migrate_model_y(apps, schema_editor):
"""This is necessary to reverse migrations later, if we need to"""
return
class Migration(migrations.Migration):
dependencies = [("my_app", "0100_auto_1092839172498")]
operations = [
migrations.RunPython(
migrate_model_y, reverse_code=reverse_migrate_model_y
)
]
Do not directly import your models to this migration! The models need to be returned through the apps.get_model("my_app", "my_model") method in order to get the Model as it was at this migration's point in time. If in the future you add more fields and run this migration your models fields may not match the databases columns (because the model is from the future, sort of...), and you could receive some errors about missing columns in the database and such. Also be wary of using custom methods on your models/managers in migrations because you won't have access to them from this proxy Model, usually I may duplicate some code to a migration so it always runs the same.
Now we can go back and modify the Model model to ensure y is not null and that it picks up the default ModelY instance in the future:
def get_default_model_y():
default_model_y = ModelY.objects.filter(is_default=True).first()
assert default_model_y is not None, "There is no default ModelY to populate with!!!"
return default_model_y.pk # We must return the primary key used by the relation, not the instance
class Model(models.Model):
# Fields ...
y = models.ForeignKey(ModelY, default=get_default_model_y)
Now you should run python manage.py makemigrations again to create another schema migration.
You shouldn't mix schema migrations and data migrations, because of the way migrations are wrapped in transactions it can cause database errors which will complain about trying to create/alter tables and execute INSERT queries in a transaction.
Finally you can run python manage.py migrate and it should create a default ModelY object, add it to a ForeignKey of your Model, and remove the null to make it like a default ForeignKey.
Finally I came to the following solution.
First I accept the idea to identify default object by isDefault attribute and wrote some abstract model to deal with it, keeping data integrity as much as possible (code is in bottom of the post).
What I don't like much in accepted solution, is the data migrations are mixed with schema migrations. It's easy to lost them, i.e. during squashing. Occasionally I am also deleting migrations at all, when I am sure all my production and backup databases are in consistence with the code, so I can generate single initial migration and fake it. Keeping data migration together with schema migrations breaks this workflow.
So I decide to keep all data migrations in single file outside of migrations package. So I create data.py in my app package and put all data migrations in single function migratedata, keeping in mind that this function can be called on early stages, when some models still may not exist, so we need to catch LookupError exception for apps registry access. Than I use this function for every RunPython operations in data migrations.
So the workflow looks like that (we assume Model and ModelX are already in place):
1) Create ModelY:
class ModelY(Defaultable):
y_name = models.CharField(max_length=255, default='ModelY')
2) Generate migration:
manage.py makemigration
3) Add data migration in data.py (add name of the model to defaultable list in my case):
# data.py in myapp
def migratedata(apps, schema_editor):
defaultables = ['ModelX', 'ModelY']
for m in defaultables:
try:
M = apps.get_model('myapp', m)
if not M.objects.filter(isDefault=True).exists():
M.objects.create(isDefault=True)
except LookupError as e:
print '[{} : ignoring]'.format(e)
# owner model, should be after defaults to support squashed migrations over empty database scenario
Model = apps.get_model('myapp', 'Model')
if not Model.objects.all().exists():
Model.objects.create()
4) Edit migration by adding operation RunPython:
from myapp.data import migratedata
class Migration(migrations.Migration):
...
operations = [
migrations.CreateModel(name='ModelY', ...),
migrations.RunPython(migratedata, reverse_code=migratedata),
]
5) Add ForeignKey(ModelY) to Model:
class Model(models.Model):
# SET_DEFAULT ensures that there will be no integrity issues, but make sure default object exists
y = models.ForeignKey(ModelY, default=ModelY.default, on_delete=models.SET_DEFAULT)
6) Generate migration again:
manage.py makemigration
7) Migrate:
manage.py migrate
8) Done!
The whole chain can be applied to empty database, it will create final schema and fill it with initial data.
When we sure, that our db is in sync with code we can easily remove long chain of migrations, generate single initial one, add RunPython(migratedata, ...) to it, and then migrate with --fake-initial (delete django_migrations table before).
Huh, so so tricky solution for such simple task!
Finally there is Defaultable model source code:
class Defaultable(models.Model):
class Meta:
abstract = True
isDefault = models.BooleanField(default=False)
#classmethod
def default(cls):
# type: (Type[Defaultable]) -> Defaultable
"""
Search for default object in given model.
Returning None is useful when applying sqashed migrations on empty database,
the ForeignKey with this default can still be non-nullable, as return value
is not used during migration if there is no model instance (Django is not pushing
returned default to the SQL level).
Take a note on only(), this is kind of dirty hack to avoide problems during
model evolution, as default() can be called in migrations within some
historical project state, so ideally we should use model from this historical
apps registry, but we have no access to it globally.
:return: Default object id, or None if no or many.
"""
try:
return cls.objects.only('id', 'isDefault').get(isDefault=True).id
except cls.DoesNotExist:
return None
# take care of data integrity
def save(self, *args, **kwargs):
super(Defaultable, self).save(*args, **kwargs)
if self.isDefault: # Ensure only one default, so make all others non default
self.__class__.objects.filter(~Q(id=self.id), isDefault=True).update(isDefault=False)
else: # Ensure at least one default exists
if not self.__class__.objects.filter(isDefault=True).exists():
self.__class__.objects.filter(id=self.id).update(isDefault=True)
def __init__(self, *args, **kwargs):
super(Defaultable, self).__init__(*args, **kwargs)
# noinspection PyShadowingNames,PyUnusedLocal
def pre_delete_defaultable(instance, **kwargs):
if instance.isDefault:
raise IntegrityError, "Can not delete default object {}".format(instance.__class__.__name__)
pre_delete.connect(pre_delete_defaultable, self.__class__, weak=False, dispatch_uid=self._meta.db_table)
I left my previous answer just to show search for thoughts. Finally I've founded fully automatic solution, so it's not necessary anymore to manually edit django generated migrations, but the price is monkey patching, as often.
The idea is to provide callable for default of ForeignKey, which creates default instance of referenced model, if it is not exists. But the problem is, that this callable can be called not only in final Django project stage, but also during migrations, with old project stages, so it can be called for deleted model on early stages, when the model was still existing.
The standard solution in RunPython operations is to use apps registry from the migration state, but this feature unavailable for our callable, cause this registry is provided as argument for RunPython and not available globally. But to support all scenarios of migration applying and rollback we need to detect are we in migration or not, and access appropriate apps registry.
The only solution is to monkey patch AddField and RemoveField operations to keep migration apps registry in global variable, if we are in migration.
migration_apps = None
def set_migration_apps(apps):
global migration_apps
migration_apps = apps
def get_or_create_default(model_name, app_name):
M = (migration_apps or django.apps.apps).get_model(app_name, model_name)
try:
return M.objects.get(isDefault=True).id
except M.DoesNotExist as e:
o = M.objects.create(isDefault=True)
print '{}.{} default object not found, creating default object : OK'.format(model_name, app_name)
return o
def monkey_patch_fields_operations():
def patch(klass):
old_database_forwards = klass.database_forwards
def database_forwards(self, app_label, schema_editor, from_state, to_state):
set_migration_apps(to_state.apps)
old_database_forwards(self, app_label, schema_editor, from_state, to_state)
klass.database_forwards = database_forwards
old_database_backwards = klass.database_backwards
def database_backwards(self, app_label, schema_editor, from_state, to_state):
set_migration_apps(to_state.apps)
old_database_backwards(self, app_label, schema_editor, from_state, to_state)
klass.database_backwards = database_backwards
patch(django.db.migrations.AddField)
patch(django.db.migrations.RemoveField)
The rest, including Defaultable model with data integrity check are in GitHub repository
I am running tests on some functions. I have a function that uses database queries. So, I have gone through the blogs and docs that say we have to make an in memory or test database to use such functions. Below is my function,
def already_exists(story_data,c):
# TODO(salmanhaseeb): Implement de-dupe functionality by checking if it already
# exists in the DB.
c.execute("""SELECT COUNT(*) from posts where post_id = ?""", (story_data.post_id,))
(number_of_rows,)=c.fetchone()
if number_of_rows > 0:
return True
return False
This function hits the production database. My question is that, when in testing, I create an in memory database and populate my values there, I will be querying that database (test DB). But I want to test my already_exists() function, after calling my already_exists function from test, my production db will be hit. How do I make my test DB hit while testing this function?
There are two routes you can take to address this problem:
Make an integration test instead of a unit test and just use a copy of the real database.
Provide a fake to the method instead of actual connection object.
Which one you should do depends on what you're trying to achieve.
If you want to test that the query itself works, then you should use an integration test. Full stop. The only way to make sure the query as intended is to run it with test data already in a copy of the database. Running it against a different database technology (e.g., running against SQLite when your production database in PostgreSQL) will not ensure that it works in production. Needing a copy of the database means you will need some automated deployment process for it that can be easily invoked against a separate database. You should have such an automated process, anyway, as it helps ensure that your deployments across environments are consistent, allows you to test them prior to release, and "documents" the process of upgrading the database. Standard solutions to this are migration tools written in your programming language like albemic or tools to execute raw SQL like yoyo or Flyway. You would need to invoke the deployment and fill it with test data prior to running the test, then run the test and assert the output you expect to be returned.
If you want to test the code around the query and not the query itself, then you can use a fake for the connection object. The most common solution to this is a mock. Mocks provide stand ins that can be configured to accept the function calls and inputs and return some output in place of the real object. This would allow you to test that the logic of the method works correctly, assuming that the query returns the results you expect. For your method, such a test might look something like this:
from unittest.mock import Mock
...
def test_already_exists_returns_true_for_positive_count():
mockConn = Mock(
execute=Mock(),
fetchone=Mock(return_value=(5,)),
)
story = Story(post_id=10) # Making some assumptions about what your object might look like.
result = already_exists(story, mockConn)
assert result
# Possibly assert calls on the mock. Value of these asserts is debatable.
mockConn.execute.assert_called("""SELECT COUNT(*) from posts where post_id = ?""", (story.post_id,))
mockConn.fetchone.assert_called()
The issue is ensuring that your code consistently uses the same database connection. Then you can set it once to whatever is appropriate for the current environment.
Rather than passing the database connection around from method to method, it might make more sense to make it a singleton.
def already_exists(story_data):
# Here `connection` is a singleton which returns the database connection.
connection.execute("""SELECT COUNT(*) from posts where post_id = ?""", (story_data.post_id,))
(number_of_rows,) = connection.fetchone()
if number_of_rows > 0:
return True
return False
Or make connection a method on each class and turn already_exists into a method. It should probably be a method regardless.
def already_exists(self):
# Here the connection is associated with the object.
self.connection.execute("""SELECT COUNT(*) from posts where post_id = ?""", (self.post_id,))
(number_of_rows,) = self.connection.fetchone()
if number_of_rows > 0:
return True
return False
But really you shouldn't be rolling this code yourself. Instead you should use an ORM such as SQLAlchemy which takes care of basic queries and connection management like this for you. It has a single connection, the "session".
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from sqlalchemy_declarative import Address, Base, Person
engine = create_engine('sqlite:///sqlalchemy_example.db')
Base.metadata.bind = engine
DBSession = sessionmaker(bind=engine)
session = DBSession()
Then you use that to make queries. For example, it has an exists method.
session.query(Post.id).filter(q.exists()).scalar()
Using an ORM will greatly simplify your code. Here's a short tutorial for the basics, and a longer and more complete tutorial.
I would like to modify some database data as part of an alembic upgrade.
I thought I could just add any code in the upgrade of my migration, but the following fails:
def upgrade():
### commands auto generated by Alembic - please adjust! ###
op.add_column('smsdelivery', sa.Column('sms_message_part_id', sa.Integer(), sa.ForeignKey('smsmessagepart.id'), nullable=True))
### end Alembic commands ###
from volunteer.models import DBSession, SmsDelivery, SmsMessagePart
for sms_delivery in DBSession.query(SmsDelivery).all():
message_part = DBSession.query(SmsMessagePart).filter(SmsMessagePart.message_id == sms_delivery.message_id).first()
if message_part is not None:
sms_delivery.sms_message_part = message_part
with the following error:
sqlalchemy.exc.UnboundExecutionError: Could not locate a bind configured on mapper Mapper|SmsDelivery|smsdelivery, SQL expression or this Session
I am not really understanding this error. How can I fix this or is doing operations like this not a possibility?
It is difficult to understand what exactly you are trying to achieve from the code excerpt your provided. But I'll try to guess. So the following answer will be based on my guess.
Line 4 - you import things (DBSession, SmsDelivery, SmsMessagePart) form your modules and then you are trying to operate with these objects like you do in your application.
The error shows that SmsDelivery is a mapper object - so it is pointing to some table. mapper objects should bind to valid sqlalchemy connection.
Which tells me that you skipped initialization of DB objects (connection and binding this connection to mapper objects) like you normally do in your application code.
DBSession looks like SQLAlchemy session object - it should have connection bind too.
Alembic already has connection ready and open - for making changes to db schema you are requesting with op.* methods.
So there should be way to get this connection.
According to Alembic manual op.get_bind() will return current Connection bind:
For full interaction with a connected database, use the “bind” available from the context:
from alembic import op
connection = op.get_bind()
So you may use this connection to run your queries into db.
PS. I would assume you wanted to perform some modifications to data in your table. You may try to formulate this modification into one update query. Alembic has special method for executing such changes - so you would not need to deal with connection.
alembic.operations.Operations.execute
execute(sql, execution_options=None)
Execute the given SQL using the current migration context.
In a SQL script context, the statement is emitted directly to the output stream. There is no return result, however, as this function is oriented towards generating a change script that can run in “offline” mode.
Parameters: sql – Any legal SQLAlchemy expression, including:
a string a sqlalchemy.sql.expression.text() construct.
a sqlalchemy.sql.expression.insert() construct.
a sqlalchemy.sql.expression.update(),
sqlalchemy.sql.expression.insert(), or
sqlalchemy.sql.expression.delete() construct. Pretty much anything
that’s “executable” as described in SQL Expression Language Tutorial.
Its worth noting that if you do this, you probably want to freeze a copy of your orm model inside the migration, like this:
class MyType(Base):
__tablename__ = 'existing_table'
__table_args__ = {'extend_existing': True}
id = Column(Integer, ...)
..
def upgrade():
Base.metadata.bind = op.get_bind()
for item in Session.query(MyType).all():
...
Otherwise you'll inevitably end up in a situation where you orm model changes and previous migrations no longer work.
Particularly note that you want to extend Base, not the base type itself (app.models.MyType) because your type might go away as some point, and once again, your migrations will fail.
You need to import Base also and then
Base.metatada.bind = op.get_bind()
and after this you can use your models like always without errors.
To empty a database table, I use this SQL Query:
TRUNCATE TABLE `books`
How to I truncate a table using Django's models and ORM?
I've tried this, but it doesn't work:
Book.objects.truncate()
The closest you'll get with the ORM is Book.objects.all().delete().
There are differences though: truncate will likely be faster, but the ORM will also chase down foreign key references and delete objects in other tables.
You can do this in a fast and lightweight way, but not using Django's ORM. You may execute raw SQL with a Django connection cursor:
from django.db import connection
cursor = connection.cursor()
cursor.execute("TRUNCATE TABLE `books`")
You can use the model's _meta property to fill in the database table name:
from django.db import connection
cursor = connection.cursor()
cursor.execute('TRUNCATE TABLE "{0}"'.format(MyModel._meta.db_table))
Important: This does not work for inherited models as they span multiple tables!
In addition to Ned Batchelder's answer and refering to Bernhard Kircher's comment:
In my case I needed to empty a very large database using the webapp:
Book.objects.all().delete()
Which, in the development SQLlite environment, returned:
too many SQL variables
So I added a little workaround. It maybe not the neatest, but at least it works until the truncate table option is build into Django's ORM:
countdata = Book.objects.all().count()
logger.debug("Before deleting: %s data records" % countdata)
while countdata > 0:
if countdata > 999:
objects_to_keep = Book.objects.all()[999:]
Book.objects.all().exclude(pk__in=objects_to_keep).delete()
countdata = Book.objects.all().count()
else:
Book.objects.all().delete()
countdata = Book.objects.all().count()
By the way, some of my code was based on "Django Delete all but last five of queryset".
I added this while being aware the answer was already answered, but hopefully this addition will help some other people.
I know this is a very old Question and few corrects answer is in here is as well but I can't resist myself to share the most elegant and fastest way to serve the purpose of this question.
class Book(models.Model):
# Your Model Declaration
#classmethod
def truncate(cls):
with connection.cursor() as cursor:
cursor.execute('TRUNCATE TABLE {} CASCADE'.format(cls._meta.db_table))
And now to truncate all data from Book table just call
Book.truncate()
Since this is directly interact with your Database it will perform much faster than doing this
Book.objects.all().delete()
Now there's a library to help you truncate a specific TABLE in your Django project Database, It called django-truncate.
It's simple just run python manage.py truncate --apps myapp --models Model1 and all of the data in that TABLE will be deleted!
Learn more about it here: https://github.com/KhaledElAnsari/django-truncate
For me the to truncate my local sqllite database I end up with python manage.py flush.
What I have initial tried is to iterate over the models and delete all to rows one by one:
models = [m for c in apps.get_app_configs() for m in c.get_models(include_auto_created=False)]
for m in models:
m.objects.all().delete()
But becuse I have Protected foreign key the success of the operation depended on the order of the models.
So, I am using te flush command to truncate my local test database and it is working for me
https://docs.djangoproject.com/en/3.0/ref/django-admin/#django-admin-flush
This code uses PosgreSQL dialect. Leave out the cascade bits to use standard SQL.
Following up on Shubho Shaha's answer, you could also create a model manager for this.
class TruncateManager(models.Manager):
def truncate(self, cascade=False):
appendix = " CASCADE;" if cascade else ";"
raw_sql = f"TRUNCATE TABLE {self.model._meta.db_table}{appendix}"
cursor = connection.cursor()
cursor.execute(raw_sql)
class Truncatable(models.Model):
class Meta:
abstract = True
objects = TruncateManager()
Then, you can extend the Truncatable to create truncatable objects:
class Book(Truncatable):
...
That will allow you to call truncate on all models that extend from Truncatable.
Book.objects.truncate()
I added a flag to use cascade as well, which (danger zone) will also: "Automatically truncate all tables that have foreign-key references to any of the named tables, or to any tables added to the group due to CASCADE.", which is obviously more destructive, but will allow the code to run inside an atomic transaction.
This is doesn't directly answer the OP's question, but is nevertheless a solution one might use to achieve the same thing - differently.
Well, for some strange reason (while attempting to use the suggested RAW methods in the other answers here), I failed to truncate my Django database cache table until I did something like this:
import commands
cmd = ['psql', DATABASE, 'postgres', '-c', '"TRUNCATE %s;"' % TABLE]
commands.getstatusoutput(' '.join(cmd))
Basically, I had to resort to issuing the truncate command via the database's utility commands - psql in this case since am using Postgres. So, automating the command line might handle such corner cases.
Might save someone else some time...