How to TRUNCATE TABLE using Django's ORM? - python

To empty a database table, I use this SQL Query:
TRUNCATE TABLE `books`
How to I truncate a table using Django's models and ORM?
I've tried this, but it doesn't work:
Book.objects.truncate()

The closest you'll get with the ORM is Book.objects.all().delete().
There are differences though: truncate will likely be faster, but the ORM will also chase down foreign key references and delete objects in other tables.

You can do this in a fast and lightweight way, but not using Django's ORM. You may execute raw SQL with a Django connection cursor:
from django.db import connection
cursor = connection.cursor()
cursor.execute("TRUNCATE TABLE `books`")

You can use the model's _meta property to fill in the database table name:
from django.db import connection
cursor = connection.cursor()
cursor.execute('TRUNCATE TABLE "{0}"'.format(MyModel._meta.db_table))
Important: This does not work for inherited models as they span multiple tables!

In addition to Ned Batchelder's answer and refering to Bernhard Kircher's comment:
In my case I needed to empty a very large database using the webapp:
Book.objects.all().delete()
Which, in the development SQLlite environment, returned:
too many SQL variables
So I added a little workaround. It maybe not the neatest, but at least it works until the truncate table option is build into Django's ORM:
countdata = Book.objects.all().count()
logger.debug("Before deleting: %s data records" % countdata)
while countdata > 0:
if countdata > 999:
objects_to_keep = Book.objects.all()[999:]
Book.objects.all().exclude(pk__in=objects_to_keep).delete()
countdata = Book.objects.all().count()
else:
Book.objects.all().delete()
countdata = Book.objects.all().count()
By the way, some of my code was based on "Django Delete all but last five of queryset".
I added this while being aware the answer was already answered, but hopefully this addition will help some other people.

I know this is a very old Question and few corrects answer is in here is as well but I can't resist myself to share the most elegant and fastest way to serve the purpose of this question.
class Book(models.Model):
# Your Model Declaration
#classmethod
def truncate(cls):
with connection.cursor() as cursor:
cursor.execute('TRUNCATE TABLE {} CASCADE'.format(cls._meta.db_table))
And now to truncate all data from Book table just call
Book.truncate()
Since this is directly interact with your Database it will perform much faster than doing this
Book.objects.all().delete()

Now there's a library to help you truncate a specific TABLE in your Django project Database, It called django-truncate.
It's simple just run python manage.py truncate --apps myapp --models Model1 and all of the data in that TABLE will be deleted!
Learn more about it here: https://github.com/KhaledElAnsari/django-truncate

For me the to truncate my local sqllite database I end up with python manage.py flush.
What I have initial tried is to iterate over the models and delete all to rows one by one:
models = [m for c in apps.get_app_configs() for m in c.get_models(include_auto_created=False)]
for m in models:
m.objects.all().delete()
But becuse I have Protected foreign key the success of the operation depended on the order of the models.
So, I am using te flush command to truncate my local test database and it is working for me
https://docs.djangoproject.com/en/3.0/ref/django-admin/#django-admin-flush

This code uses PosgreSQL dialect. Leave out the cascade bits to use standard SQL.
Following up on Shubho Shaha's answer, you could also create a model manager for this.
class TruncateManager(models.Manager):
def truncate(self, cascade=False):
appendix = " CASCADE;" if cascade else ";"
raw_sql = f"TRUNCATE TABLE {self.model._meta.db_table}{appendix}"
cursor = connection.cursor()
cursor.execute(raw_sql)
class Truncatable(models.Model):
class Meta:
abstract = True
objects = TruncateManager()
Then, you can extend the Truncatable to create truncatable objects:
class Book(Truncatable):
...
That will allow you to call truncate on all models that extend from Truncatable.
Book.objects.truncate()
I added a flag to use cascade as well, which (danger zone) will also: "Automatically truncate all tables that have foreign-key references to any of the named tables, or to any tables added to the group due to CASCADE.", which is obviously more destructive, but will allow the code to run inside an atomic transaction.

This is doesn't directly answer the OP's question, but is nevertheless a solution one might use to achieve the same thing - differently.
Well, for some strange reason (while attempting to use the suggested RAW methods in the other answers here), I failed to truncate my Django database cache table until I did something like this:
import commands
cmd = ['psql', DATABASE, 'postgres', '-c', '"TRUNCATE %s;"' % TABLE]
commands.getstatusoutput(' '.join(cmd))
Basically, I had to resort to issuing the truncate command via the database's utility commands - psql in this case since am using Postgres. So, automating the command line might handle such corner cases.
Might save someone else some time...

Related

Unit testing a function that depends on database

I am running tests on some functions. I have a function that uses database queries. So, I have gone through the blogs and docs that say we have to make an in memory or test database to use such functions. Below is my function,
def already_exists(story_data,c):
# TODO(salmanhaseeb): Implement de-dupe functionality by checking if it already
# exists in the DB.
c.execute("""SELECT COUNT(*) from posts where post_id = ?""", (story_data.post_id,))
(number_of_rows,)=c.fetchone()
if number_of_rows > 0:
return True
return False
This function hits the production database. My question is that, when in testing, I create an in memory database and populate my values there, I will be querying that database (test DB). But I want to test my already_exists() function, after calling my already_exists function from test, my production db will be hit. How do I make my test DB hit while testing this function?
There are two routes you can take to address this problem:
Make an integration test instead of a unit test and just use a copy of the real database.
Provide a fake to the method instead of actual connection object.
Which one you should do depends on what you're trying to achieve.
If you want to test that the query itself works, then you should use an integration test. Full stop. The only way to make sure the query as intended is to run it with test data already in a copy of the database. Running it against a different database technology (e.g., running against SQLite when your production database in PostgreSQL) will not ensure that it works in production. Needing a copy of the database means you will need some automated deployment process for it that can be easily invoked against a separate database. You should have such an automated process, anyway, as it helps ensure that your deployments across environments are consistent, allows you to test them prior to release, and "documents" the process of upgrading the database. Standard solutions to this are migration tools written in your programming language like albemic or tools to execute raw SQL like yoyo or Flyway. You would need to invoke the deployment and fill it with test data prior to running the test, then run the test and assert the output you expect to be returned.
If you want to test the code around the query and not the query itself, then you can use a fake for the connection object. The most common solution to this is a mock. Mocks provide stand ins that can be configured to accept the function calls and inputs and return some output in place of the real object. This would allow you to test that the logic of the method works correctly, assuming that the query returns the results you expect. For your method, such a test might look something like this:
from unittest.mock import Mock
...
def test_already_exists_returns_true_for_positive_count():
mockConn = Mock(
execute=Mock(),
fetchone=Mock(return_value=(5,)),
)
story = Story(post_id=10) # Making some assumptions about what your object might look like.
result = already_exists(story, mockConn)
assert result
# Possibly assert calls on the mock. Value of these asserts is debatable.
mockConn.execute.assert_called("""SELECT COUNT(*) from posts where post_id = ?""", (story.post_id,))
mockConn.fetchone.assert_called()
The issue is ensuring that your code consistently uses the same database connection. Then you can set it once to whatever is appropriate for the current environment.
Rather than passing the database connection around from method to method, it might make more sense to make it a singleton.
def already_exists(story_data):
# Here `connection` is a singleton which returns the database connection.
connection.execute("""SELECT COUNT(*) from posts where post_id = ?""", (story_data.post_id,))
(number_of_rows,) = connection.fetchone()
if number_of_rows > 0:
return True
return False
Or make connection a method on each class and turn already_exists into a method. It should probably be a method regardless.
def already_exists(self):
# Here the connection is associated with the object.
self.connection.execute("""SELECT COUNT(*) from posts where post_id = ?""", (self.post_id,))
(number_of_rows,) = self.connection.fetchone()
if number_of_rows > 0:
return True
return False
But really you shouldn't be rolling this code yourself. Instead you should use an ORM such as SQLAlchemy which takes care of basic queries and connection management like this for you. It has a single connection, the "session".
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from sqlalchemy_declarative import Address, Base, Person
engine = create_engine('sqlite:///sqlalchemy_example.db')
Base.metadata.bind = engine
DBSession = sessionmaker(bind=engine)
session = DBSession()
Then you use that to make queries. For example, it has an exists method.
session.query(Post.id).filter(q.exists()).scalar()
Using an ORM will greatly simplify your code. Here's a short tutorial for the basics, and a longer and more complete tutorial.

Raw queries and custom sql queries what is the fastest?

Assuming that the file models.py in my django application (webapp) is like the following :
from django.db import models
from django.db import connection
class Foo(models.Model):
name = models.CharField(...)
surname = models.CharField(...)
def dictfetchall(cursor):
"Returns all rows from a cursor as a dict"
desc = cursor.description
return [
dict(zip([col[0] for col in desc], row))
for row in cursor.fetchall()
]
def get_foo():
cursor = connection.cursor()
cursor.execute('SELECT * FROM foo_table')
rows = dictfetchall(cursor)
return rows
To get access to my database content, I have basicly two options :
Option 1 :
from webapp.models import Foo
bar = Foo.objects.raw('SELECT * FROM foo_table')
Option 2 :
from application.models import get_foo
bar = get_foo()
Which option is the fastest in execution ?
Is there a better way to do what I want to do ?
There is no direct and clear answer on which approach is better.
Using Manager.raw() still keeps you within the ORM layer and while it returns Model instances you still have a nice database abstraction. But, while making a raw query, django does more than just cursor.execute in order to translate the results into Model instances (see what is happening in RawQuerySet and RawQuery classes).
But (quote from docs):
Sometimes even Manager.raw() isn’t quite enough: you might need to
perform queries that don’t map cleanly to models, or directly execute
UPDATE, INSERT, or DELETE queries.
So, generally speaking, what to choose depends on what results are going to get and what you are going to do with them.
See also:
Performing raw SQL queries
executing-custom-sql-directly
Raw sql queries in Django views
Using the connection cursor is for sure the faster than using raw() as it doesn't instantiate additionals objects... But for really telling what the fastest solution is you should do some benchmarking!
And don't overdo optimizations if not necessary because you are avoiding some of Django's most useful features this way as long as you don't have any serious performance problems. And if you have some they will most likely not be the result of how you execute the query. Of course you will be able to write better queries if you exactly know your use case and the ORM doesn't.

Modify data as part of an alembic upgrade

I would like to modify some database data as part of an alembic upgrade.
I thought I could just add any code in the upgrade of my migration, but the following fails:
def upgrade():
### commands auto generated by Alembic - please adjust! ###
op.add_column('smsdelivery', sa.Column('sms_message_part_id', sa.Integer(), sa.ForeignKey('smsmessagepart.id'), nullable=True))
### end Alembic commands ###
from volunteer.models import DBSession, SmsDelivery, SmsMessagePart
for sms_delivery in DBSession.query(SmsDelivery).all():
message_part = DBSession.query(SmsMessagePart).filter(SmsMessagePart.message_id == sms_delivery.message_id).first()
if message_part is not None:
sms_delivery.sms_message_part = message_part
with the following error:
sqlalchemy.exc.UnboundExecutionError: Could not locate a bind configured on mapper Mapper|SmsDelivery|smsdelivery, SQL expression or this Session
I am not really understanding this error. How can I fix this or is doing operations like this not a possibility?
It is difficult to understand what exactly you are trying to achieve from the code excerpt your provided. But I'll try to guess. So the following answer will be based on my guess.
Line 4 - you import things (DBSession, SmsDelivery, SmsMessagePart) form your modules and then you are trying to operate with these objects like you do in your application.
The error shows that SmsDelivery is a mapper object - so it is pointing to some table. mapper objects should bind to valid sqlalchemy connection.
Which tells me that you skipped initialization of DB objects (connection and binding this connection to mapper objects) like you normally do in your application code.
DBSession looks like SQLAlchemy session object - it should have connection bind too.
Alembic already has connection ready and open - for making changes to db schema you are requesting with op.* methods.
So there should be way to get this connection.
According to Alembic manual op.get_bind() will return current Connection bind:
For full interaction with a connected database, use the “bind” available from the context:
from alembic import op
connection = op.get_bind()
So you may use this connection to run your queries into db.
PS. I would assume you wanted to perform some modifications to data in your table. You may try to formulate this modification into one update query. Alembic has special method for executing such changes - so you would not need to deal with connection.
alembic.operations.Operations.execute
execute(sql, execution_options=None)
Execute the given SQL using the current migration context.
In a SQL script context, the statement is emitted directly to the output stream. There is no return result, however, as this function is oriented towards generating a change script that can run in “offline” mode.
Parameters: sql – Any legal SQLAlchemy expression, including:
a string a sqlalchemy.sql.expression.text() construct.
a sqlalchemy.sql.expression.insert() construct.
a sqlalchemy.sql.expression.update(),
sqlalchemy.sql.expression.insert(), or
sqlalchemy.sql.expression.delete() construct. Pretty much anything
that’s “executable” as described in SQL Expression Language Tutorial.
Its worth noting that if you do this, you probably want to freeze a copy of your orm model inside the migration, like this:
class MyType(Base):
__tablename__ = 'existing_table'
__table_args__ = {'extend_existing': True}
id = Column(Integer, ...)
..
def upgrade():
Base.metadata.bind = op.get_bind()
for item in Session.query(MyType).all():
...
Otherwise you'll inevitably end up in a situation where you orm model changes and previous migrations no longer work.
Particularly note that you want to extend Base, not the base type itself (app.models.MyType) because your type might go away as some point, and once again, your migrations will fail.
You need to import Base also and then
Base.metatada.bind = op.get_bind()
and after this you can use your models like always without errors.

Django. Thread safe update or create.

We know, that update - is thread safe operation.
It means, that when you do:
SomeModel.objects.filter(id=1).update(some_field=100)
Instead of:
sm = SomeModel.objects.get(id=1)
sm.some_field=100
sm.save()
Your application is relativly thread safe and operation SomeModel.objects.filter(id=1).update(some_field=100) will not rewrite data in other model fields.
My question is.. If there any way to do
SomeModel.objects.filter(id=1).update(some_field=100)
but with creation of object if it does not exists?
from django.db import IntegrityError
def update_or_create(model, filter_kwargs, update_kwargs)
if not model.objects.filter(**filter_kwargs).update(**update_kwargs):
kwargs = filter_kwargs.copy()
kwargs.update(update_kwargs)
try:
model.objects.create(**kwargs)
except IntegrityError:
if not model.objects.filter(**filter_kwargs).update(**update_kwargs):
raise # re-raise IntegrityError
I think, code provided in the question is not very demonstrative: who want to set id for model?
Lets assume we need this, and we have simultaneous operations:
def thread1():
update_or_create(SomeModel, {'some_unique_field':1}, {'some_field': 1})
def thread2():
update_or_create(SomeModel, {'some_unique_field':1}, {'some_field': 2})
With update_or_create function, depends on which thread comes first, object will be created and updated with no exception. This will be thread-safe, but obviously has little use: depends on race condition value of SomeModek.objects.get(some__unique_field=1).some_field could be 1 or 2.
Django provides F objects, so we can upgrade our code:
from django.db.models import F
def thread1():
update_or_create(SomeModel,
{'some_unique_field':1},
{'some_field': F('some_field') + 1})
def thread2():
update_or_create(SomeModel,
{'some_unique_field':1},
{'some_field': F('some_field') + 2})
You want django's select_for_update() method (and a backend that supports row-level locking, such as PostgreSQL) in combination with manual transaction management.
try:
with transaction.commit_on_success():
SomeModel.objects.create(pk=1, some_field=100)
except IntegrityError: #unique id already exists, so update instead
with transaction.commit_on_success():
object = SomeModel.objects.select_for_update().get(pk=1)
object.some_field=100
object.save()
Note that if some other process deletes the object between the two queries, you'll get a SomeModel.DoesNotExist exception.
Django 1.7 and above also has atomic operation support and a built-in update_or_create() method.
You can use Django's built-in get_or_create, but that operates on the model itself, rather than a queryset.
You can use that like this:
me = SomeModel.objects.get_or_create(id=1)
me.some_field = 100
me.save()
If you have multiple threads, your app will need to determine which instance of the model is correct. Usually what I do is refresh the model from the database, make changes, and then save it, so you don't have a long time in a disconnected state.
It's impossible in django do such upsert operation, with update. But queryset update method return number of filtered fields so you can do:
from django.db import router, connections, transaction
class MySuperManager(models.Manager):
def _lock_table(self, lock='ACCESS EXCLUSIVE'):
cursor = connections[router.db_for_write(self.model)]
cursor.execute(
'LOCK TABLE %s IN %s MODE' % (self.model._meta.db_table, lock)
)
def create_or_update(self, id, **update_fields):
with transaction.commit_on_success():
self.lock_table()
if not self.get_query_set().filter(id=id).update(**update_fields):
self.model(id=id, **update_fields).save()
this example if for postgres, you can use it without sql code, but update or insert operation will not be atomic. If you create a lock on table you will be sure that two objects will be not created in two other threads.
I think if you have critical demands on atom operations. You'd better design it in database level instead of Django ORM level.
Django ORM system is focusing on convenience instead of performance and safety. You have to optimize the automatic generated SQL sometimes.
"Transaction" in most productive databases provide database lock and rollback well.
In mashup(hybrid) systems, or say your system added some 3rd part components, like logging, statistics. Application in different framework or even language may access database at the same time, adding thread safe in Django is not enough in this case.
SomeModel.objects.filter(id=1).update(set__some_field=100)

What is the best way to access stored procedures in Django's ORM

I am designing a fairly complex database, and know that some of my queries will be far outside the scope of Django's ORM. Has anyone integrated SP's with Django's ORM successfully? If so, what RDBMS and how did you do it?
We (musicpictures.com / eviscape.com) wrote that django snippet but its not the whole story (actually that code was only tested on Oracle at that time).
Stored procedures make sense when you want to reuse tried and tested SP code or where one SP call will be faster than multiple calls to the database - or where security requires moderated access to the database - or where the queries are very complicated / multistep. We're using a hybrid model/SP approach against both Oracle and Postgres databases.
The trick is to make it easy to use and keep it "django" like. We use a make_instance function which takes the result of cursor and creates instances of a model populated from the cursor. This is nice because the cursor might return additional fields. Then you can use those instances in your code / templates much like normal django model objects.
def make_instance(instance, values):
'''
Copied from eviscape.com
generates an instance for dict data coming from an sp
expects:
instance - empty instance of the model to generate
values - dictionary from a stored procedure with keys that are named like the
model's attributes
use like:
evis = InstanceGenerator(Evis(), evis_dict_from_SP)
>>> make_instance(Evis(), {'evi_id': '007', 'evi_subject': 'J. Bond, Architect'})
<Evis: J. Bond, Architect>
'''
attributes = filter(lambda x: not x.startswith('_'), instance.__dict__.keys())
for a in attributes:
try:
# field names from oracle sp are UPPER CASE
# we want to put PIC_ID in pic_id etc.
setattr(instance, a, values[a.upper()])
del values[a.upper()]
except:
pass
#add any values that are not in the model as well
for v in values.keys():
setattr(instance, v, values[v])
#print 'setting %s to %s' % (v, values[v])
return instance
# Use it like this:
pictures = [make_instance(Pictures(), item) for item in picture_dict]
# And here are some helper functions:
def call_an_sp(self, var):
cursor = connection.cursor()
cursor.callproc("fn_sp_name", (var,))
return self.fn_generic(cursor)
def fn_generic(self, cursor):
msg = cursor.fetchone()[0]
cursor.execute('FETCH ALL IN "%s"' % msg)
thing = create_dict_from_cursor(cursor)
cursor.close()
return thing
def create_dict_from_cursor(cursor):
rows = cursor.fetchall()
# DEBUG settings (used to) affect what gets returned.
if DEBUG:
desc = [item[0] for item in cursor.cursor.description]
else:
desc = [item[0] for item in cursor.description]
return [dict(zip(desc, item)) for item in rows]
cheers, Simon.
You have to use the connection utility in Django:
from django.db import connection
with connection.cursor() as cursor:
cursor.execute("SQL STATEMENT CAN BE ANYTHING")
data = cursor.fetchone()
If you are expecting more than one row, use cursor.fetchall() to fetch a list of them.
More info here: http://docs.djangoproject.com/en/dev/topics/db/sql/
Don't.
Seriously.
Move the stored procedure logic into your model where it belongs.
Putting some code in Django and some code in the database is a maintenance nightmare. I've spent too many of my 30+ years in IT trying to clean up this kind of mess.
There is a good example :
https://djangosnippets.org/snippets/118/
from django.db import connection
cursor = connection.cursor()
ret = cursor.callproc("MY_UTIL.LOG_MESSAGE", (control_in, message_in))# calls PROCEDURE named LOG_MESSAGE which resides in MY_UTIL Package
cursor.close()
If you want to look at an actual running project that uses SP, check out minibooks. A good deal of custom SQL and uses Postgres pl/pgsql for SP. I think they're going to remove the SP eventually though (justification in trac ticket 92).
I guess the improved raw sql queryset support in Django 1.2 can make this easier as you wouldn't have to roll your own make_instance type code.
Cx_Oracle can be used. Also, It is fairly helpful when we do not have access to production deployed code and need arises to make major changes in database.
import cx_Oracle
try:
db = dev_plng_con
con = cx_Oracle.connect(db)
cur = con.cursor()
P_ERROR = str(error)
cur.callproc('NAME_OF_PACKAGE.PROCEDURENAME', [P_ERROR])
except Exception as error:
error_logger.error(message)

Categories

Resources