Assuming that the file models.py in my django application (webapp) is like the following :
from django.db import models
from django.db import connection
class Foo(models.Model):
name = models.CharField(...)
surname = models.CharField(...)
def dictfetchall(cursor):
"Returns all rows from a cursor as a dict"
desc = cursor.description
return [
dict(zip([col[0] for col in desc], row))
for row in cursor.fetchall()
]
def get_foo():
cursor = connection.cursor()
cursor.execute('SELECT * FROM foo_table')
rows = dictfetchall(cursor)
return rows
To get access to my database content, I have basicly two options :
Option 1 :
from webapp.models import Foo
bar = Foo.objects.raw('SELECT * FROM foo_table')
Option 2 :
from application.models import get_foo
bar = get_foo()
Which option is the fastest in execution ?
Is there a better way to do what I want to do ?
There is no direct and clear answer on which approach is better.
Using Manager.raw() still keeps you within the ORM layer and while it returns Model instances you still have a nice database abstraction. But, while making a raw query, django does more than just cursor.execute in order to translate the results into Model instances (see what is happening in RawQuerySet and RawQuery classes).
But (quote from docs):
Sometimes even Manager.raw() isn’t quite enough: you might need to
perform queries that don’t map cleanly to models, or directly execute
UPDATE, INSERT, or DELETE queries.
So, generally speaking, what to choose depends on what results are going to get and what you are going to do with them.
See also:
Performing raw SQL queries
executing-custom-sql-directly
Raw sql queries in Django views
Using the connection cursor is for sure the faster than using raw() as it doesn't instantiate additionals objects... But for really telling what the fastest solution is you should do some benchmarking!
And don't overdo optimizations if not necessary because you are avoiding some of Django's most useful features this way as long as you don't have any serious performance problems. And if you have some they will most likely not be the result of how you execute the query. Of course you will be able to write better queries if you exactly know your use case and the ORM doesn't.
Related
I was writing a few simple CRUD operations to try out sqlite3 with Python, and then I saw a nice function that executes queries and closes connection in this answer:
from contextlib import closing
import sqlite3
def query(self, db_name, sql):
with closing(sqlite3.connect(db_name)) as con, con, \
closing(con.cursor()) as cur:
cur.execute(sql)
return cur.fetchall()
I thought it would be nice to have something like this and call this function with whatever sql sentence I need whenever I want to query the database.
However, when I'm running an insert I'd need to return cur.lastrowid instead or cur.fetchall() and when deleting I'd like to know the cursor.rowcount instead. Also, sometimes I need to add parameters to the query, for instance sometimes I want to run select * from [some_table] and some other times I need select * from [some_table] where [some_column] = ?. So the function needs some tweaks depending on what kind of operation is being executed.
I could write one function for each kind of operation, with the same basic structure and the tweaks each query needs. But that sounds a bit repetitive since there would be duplicate chunks of code and these functions would look pretty similar to each other. So I'm not sure it's the right approach.
Is there another alternative to make this function a bit more "generic" to fit all cases?
One option is to have callouts in the with clause that let you customize program actions. There are many ways to do this. One is to write a class that calls methods to allow specialization. In this example, a class has pre and post processers. It does its work in __init__ and leaves its result in an instance variable which allows for terse usage.
from contextlib import closing
import sqlite3
class SqlExec:
def __init__(self, db_name, sql, parameters=()):
self.sql = sql
self.parameters = parameters
with closing(sqlite3.connect(db_name)) as self.con, \
closing(con.cursor()) as self.cur:
self.pre_process()
self.cur.execute(self.sql, parameters=self.parameters)
self.retval = self.post_process()
def pre_process(self):
return
def post_process_fetchall(self):
self.retval = self.cur.fetchall
post_process = post_process_fetchall
class SqlExecLastRowId(SqlExec):
def post_process(self):
self.retval = cur.lastrowid
last_row = SqlExecLastRowId("mydb.db", "DELETE FROM FOO WHERE BAR='{}'",
paramters=("baz",)).retval
I am running tests on some functions. I have a function that uses database queries. So, I have gone through the blogs and docs that say we have to make an in memory or test database to use such functions. Below is my function,
def already_exists(story_data,c):
# TODO(salmanhaseeb): Implement de-dupe functionality by checking if it already
# exists in the DB.
c.execute("""SELECT COUNT(*) from posts where post_id = ?""", (story_data.post_id,))
(number_of_rows,)=c.fetchone()
if number_of_rows > 0:
return True
return False
This function hits the production database. My question is that, when in testing, I create an in memory database and populate my values there, I will be querying that database (test DB). But I want to test my already_exists() function, after calling my already_exists function from test, my production db will be hit. How do I make my test DB hit while testing this function?
There are two routes you can take to address this problem:
Make an integration test instead of a unit test and just use a copy of the real database.
Provide a fake to the method instead of actual connection object.
Which one you should do depends on what you're trying to achieve.
If you want to test that the query itself works, then you should use an integration test. Full stop. The only way to make sure the query as intended is to run it with test data already in a copy of the database. Running it against a different database technology (e.g., running against SQLite when your production database in PostgreSQL) will not ensure that it works in production. Needing a copy of the database means you will need some automated deployment process for it that can be easily invoked against a separate database. You should have such an automated process, anyway, as it helps ensure that your deployments across environments are consistent, allows you to test them prior to release, and "documents" the process of upgrading the database. Standard solutions to this are migration tools written in your programming language like albemic or tools to execute raw SQL like yoyo or Flyway. You would need to invoke the deployment and fill it with test data prior to running the test, then run the test and assert the output you expect to be returned.
If you want to test the code around the query and not the query itself, then you can use a fake for the connection object. The most common solution to this is a mock. Mocks provide stand ins that can be configured to accept the function calls and inputs and return some output in place of the real object. This would allow you to test that the logic of the method works correctly, assuming that the query returns the results you expect. For your method, such a test might look something like this:
from unittest.mock import Mock
...
def test_already_exists_returns_true_for_positive_count():
mockConn = Mock(
execute=Mock(),
fetchone=Mock(return_value=(5,)),
)
story = Story(post_id=10) # Making some assumptions about what your object might look like.
result = already_exists(story, mockConn)
assert result
# Possibly assert calls on the mock. Value of these asserts is debatable.
mockConn.execute.assert_called("""SELECT COUNT(*) from posts where post_id = ?""", (story.post_id,))
mockConn.fetchone.assert_called()
The issue is ensuring that your code consistently uses the same database connection. Then you can set it once to whatever is appropriate for the current environment.
Rather than passing the database connection around from method to method, it might make more sense to make it a singleton.
def already_exists(story_data):
# Here `connection` is a singleton which returns the database connection.
connection.execute("""SELECT COUNT(*) from posts where post_id = ?""", (story_data.post_id,))
(number_of_rows,) = connection.fetchone()
if number_of_rows > 0:
return True
return False
Or make connection a method on each class and turn already_exists into a method. It should probably be a method regardless.
def already_exists(self):
# Here the connection is associated with the object.
self.connection.execute("""SELECT COUNT(*) from posts where post_id = ?""", (self.post_id,))
(number_of_rows,) = self.connection.fetchone()
if number_of_rows > 0:
return True
return False
But really you shouldn't be rolling this code yourself. Instead you should use an ORM such as SQLAlchemy which takes care of basic queries and connection management like this for you. It has a single connection, the "session".
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from sqlalchemy_declarative import Address, Base, Person
engine = create_engine('sqlite:///sqlalchemy_example.db')
Base.metadata.bind = engine
DBSession = sessionmaker(bind=engine)
session = DBSession()
Then you use that to make queries. For example, it has an exists method.
session.query(Post.id).filter(q.exists()).scalar()
Using an ORM will greatly simplify your code. Here's a short tutorial for the basics, and a longer and more complete tutorial.
A short intoduction to the problem...
PostgreSQL has very neat array fields (int array, string array) and functions for them like UNNEST and ANY.
These fields are supported by Django (I am using djorm_pgarray for that), but functions are not natively supported.
One could use .extra(), but Django 1.8 introduced a new concept of database functions.
Let me provide a most primitive example of what I am basicly doing with all these. A Dealer has a list of makes that it supports. A Vehicle has a make and is linked to a dealer. But it happens that Vehicle's make does not match Dealer's make list, that is inevitable.
MAKE_CHOICES = [('honda', 'Honda'), ...]
class Dealer(models.Model):
make_list = TextArrayField(choices=MAKE_CHOICES)
class Vehicle(models.Model):
dealer = models.ForeignKey(Dealer, null=True, blank=True)
make = models.CharField(max_length=255, choices=MAKE_CHOICES, blank=True)
Having a database of dealers and makes, I want to count all vehicles for which the vehicle's make and its dealer's make list do match. That's how I do it avoiding .extra().
from django.db.models import functions
class SelectUnnest(functions.Func):
function = 'SELECT UNNEST'
...
Vehicle.objects.filter(
make__in=SelectUnnest('dealer__make_list')
).count()
Resulting SQL:
SELECT COUNT(*) AS "__count" FROM "myapp_vehicle"
INNER JOIN "myapp_dealer"
ON ( "myapp_vehicle"."dealer_id" = "myapp_dealer"."id" )
WHERE "myapp_vehicle"."make"
IN (SELECT UNNEST("myapp_dealer"."make_list"))
And it works, and much faster than a traditional M2M approach we could use in Django. BUT, for this task, UNNEST is not a very good solution: ANY is much faster. Let's try it.
class Any(functions.Func):
function = 'ANY'
...
Vehicle.objects.filter(
make=Any('dealer__make_list')
).count()
It generates the following SQL:
SELECT COUNT(*) AS "__count" FROM "myapp_vehicle"
INNER JOIN "myapp_dealer"
ON ( "myapp_vehicle"."dealer_id" = "myapp_dealer"."id" )
WHERE "myapp_vehicle"."make" =
(ANY("myapp_dealer"."make_list"))
And it fails, because braces around ANY are bogus. If you remove them, it runs in the psql console with no problems, and fast.
So my question.
Is there any way to remove these braces? I could not find anything about that in Django documentation.
If not, - maybe there are other ways to rephrase this query?
P. S. I think that an extensive library of database functions for different backends would be very helpful for database-heavy Django apps.
Of course, most of these will not be portable. But you typically do not often migrate such a project from one database backend to another. In our example, using array fields and PostGIS we are stuck to PostgreSQL and do not intend to move.
Is anybody developing such a thing?
P. P. S. One might say that, in this case, we should be using a separate table for makes and intarray instead of string array, that is correct and will be done, but nature of problem does not change.
UPDATE.
TextArrayField is defined at djorm_pgarray. At the linked source file, you can see how it works.
The value is list of text strings. In Python, it is represented as a list. Example: ['honda', 'mazda', 'anything else'].
Here is what is said about it in the database.
=# select id, make from appname_tablename limit 3;
id | make
---+----------------------
58 | {vw}
76 | {lexus,scion,toyota}
39 | {chevrolet}
And underlying PostgreSQL field type is text[].
I've managed to get (more or less) what you need using following:
from django.db.models.lookups import BuiltinLookup
from django.db.models.fields import Field
class Any(BuiltinLookup):
lookup_name = 'any'
def get_rhs_op(self, connection, rhs):
return " = ANY(%s)" % (rhs,)
Field.register_lookup(Any)
and query:
Vehicle.objects.filter(make__any=F('dealer__make_list')).count()
as result:
SELECT COUNT(*) AS "__count" FROM "zz_vehicle"
INNER JOIN "zz_dealer" ON ("zz_vehicle"."dealer_id" = "zz_dealer"."id")
WHERE "zz_vehicle"."make" = ANY(("zz_dealer"."make_list"))
btw. instead djorm_pgarray and TextArrayField you can use native django:
make_list = ArrayField(models.CharField(max_length=200), blank=True)
(to simplify your dependencies)
To empty a database table, I use this SQL Query:
TRUNCATE TABLE `books`
How to I truncate a table using Django's models and ORM?
I've tried this, but it doesn't work:
Book.objects.truncate()
The closest you'll get with the ORM is Book.objects.all().delete().
There are differences though: truncate will likely be faster, but the ORM will also chase down foreign key references and delete objects in other tables.
You can do this in a fast and lightweight way, but not using Django's ORM. You may execute raw SQL with a Django connection cursor:
from django.db import connection
cursor = connection.cursor()
cursor.execute("TRUNCATE TABLE `books`")
You can use the model's _meta property to fill in the database table name:
from django.db import connection
cursor = connection.cursor()
cursor.execute('TRUNCATE TABLE "{0}"'.format(MyModel._meta.db_table))
Important: This does not work for inherited models as they span multiple tables!
In addition to Ned Batchelder's answer and refering to Bernhard Kircher's comment:
In my case I needed to empty a very large database using the webapp:
Book.objects.all().delete()
Which, in the development SQLlite environment, returned:
too many SQL variables
So I added a little workaround. It maybe not the neatest, but at least it works until the truncate table option is build into Django's ORM:
countdata = Book.objects.all().count()
logger.debug("Before deleting: %s data records" % countdata)
while countdata > 0:
if countdata > 999:
objects_to_keep = Book.objects.all()[999:]
Book.objects.all().exclude(pk__in=objects_to_keep).delete()
countdata = Book.objects.all().count()
else:
Book.objects.all().delete()
countdata = Book.objects.all().count()
By the way, some of my code was based on "Django Delete all but last five of queryset".
I added this while being aware the answer was already answered, but hopefully this addition will help some other people.
I know this is a very old Question and few corrects answer is in here is as well but I can't resist myself to share the most elegant and fastest way to serve the purpose of this question.
class Book(models.Model):
# Your Model Declaration
#classmethod
def truncate(cls):
with connection.cursor() as cursor:
cursor.execute('TRUNCATE TABLE {} CASCADE'.format(cls._meta.db_table))
And now to truncate all data from Book table just call
Book.truncate()
Since this is directly interact with your Database it will perform much faster than doing this
Book.objects.all().delete()
Now there's a library to help you truncate a specific TABLE in your Django project Database, It called django-truncate.
It's simple just run python manage.py truncate --apps myapp --models Model1 and all of the data in that TABLE will be deleted!
Learn more about it here: https://github.com/KhaledElAnsari/django-truncate
For me the to truncate my local sqllite database I end up with python manage.py flush.
What I have initial tried is to iterate over the models and delete all to rows one by one:
models = [m for c in apps.get_app_configs() for m in c.get_models(include_auto_created=False)]
for m in models:
m.objects.all().delete()
But becuse I have Protected foreign key the success of the operation depended on the order of the models.
So, I am using te flush command to truncate my local test database and it is working for me
https://docs.djangoproject.com/en/3.0/ref/django-admin/#django-admin-flush
This code uses PosgreSQL dialect. Leave out the cascade bits to use standard SQL.
Following up on Shubho Shaha's answer, you could also create a model manager for this.
class TruncateManager(models.Manager):
def truncate(self, cascade=False):
appendix = " CASCADE;" if cascade else ";"
raw_sql = f"TRUNCATE TABLE {self.model._meta.db_table}{appendix}"
cursor = connection.cursor()
cursor.execute(raw_sql)
class Truncatable(models.Model):
class Meta:
abstract = True
objects = TruncateManager()
Then, you can extend the Truncatable to create truncatable objects:
class Book(Truncatable):
...
That will allow you to call truncate on all models that extend from Truncatable.
Book.objects.truncate()
I added a flag to use cascade as well, which (danger zone) will also: "Automatically truncate all tables that have foreign-key references to any of the named tables, or to any tables added to the group due to CASCADE.", which is obviously more destructive, but will allow the code to run inside an atomic transaction.
This is doesn't directly answer the OP's question, but is nevertheless a solution one might use to achieve the same thing - differently.
Well, for some strange reason (while attempting to use the suggested RAW methods in the other answers here), I failed to truncate my Django database cache table until I did something like this:
import commands
cmd = ['psql', DATABASE, 'postgres', '-c', '"TRUNCATE %s;"' % TABLE]
commands.getstatusoutput(' '.join(cmd))
Basically, I had to resort to issuing the truncate command via the database's utility commands - psql in this case since am using Postgres. So, automating the command line might handle such corner cases.
Might save someone else some time...
I am designing a fairly complex database, and know that some of my queries will be far outside the scope of Django's ORM. Has anyone integrated SP's with Django's ORM successfully? If so, what RDBMS and how did you do it?
We (musicpictures.com / eviscape.com) wrote that django snippet but its not the whole story (actually that code was only tested on Oracle at that time).
Stored procedures make sense when you want to reuse tried and tested SP code or where one SP call will be faster than multiple calls to the database - or where security requires moderated access to the database - or where the queries are very complicated / multistep. We're using a hybrid model/SP approach against both Oracle and Postgres databases.
The trick is to make it easy to use and keep it "django" like. We use a make_instance function which takes the result of cursor and creates instances of a model populated from the cursor. This is nice because the cursor might return additional fields. Then you can use those instances in your code / templates much like normal django model objects.
def make_instance(instance, values):
'''
Copied from eviscape.com
generates an instance for dict data coming from an sp
expects:
instance - empty instance of the model to generate
values - dictionary from a stored procedure with keys that are named like the
model's attributes
use like:
evis = InstanceGenerator(Evis(), evis_dict_from_SP)
>>> make_instance(Evis(), {'evi_id': '007', 'evi_subject': 'J. Bond, Architect'})
<Evis: J. Bond, Architect>
'''
attributes = filter(lambda x: not x.startswith('_'), instance.__dict__.keys())
for a in attributes:
try:
# field names from oracle sp are UPPER CASE
# we want to put PIC_ID in pic_id etc.
setattr(instance, a, values[a.upper()])
del values[a.upper()]
except:
pass
#add any values that are not in the model as well
for v in values.keys():
setattr(instance, v, values[v])
#print 'setting %s to %s' % (v, values[v])
return instance
# Use it like this:
pictures = [make_instance(Pictures(), item) for item in picture_dict]
# And here are some helper functions:
def call_an_sp(self, var):
cursor = connection.cursor()
cursor.callproc("fn_sp_name", (var,))
return self.fn_generic(cursor)
def fn_generic(self, cursor):
msg = cursor.fetchone()[0]
cursor.execute('FETCH ALL IN "%s"' % msg)
thing = create_dict_from_cursor(cursor)
cursor.close()
return thing
def create_dict_from_cursor(cursor):
rows = cursor.fetchall()
# DEBUG settings (used to) affect what gets returned.
if DEBUG:
desc = [item[0] for item in cursor.cursor.description]
else:
desc = [item[0] for item in cursor.description]
return [dict(zip(desc, item)) for item in rows]
cheers, Simon.
You have to use the connection utility in Django:
from django.db import connection
with connection.cursor() as cursor:
cursor.execute("SQL STATEMENT CAN BE ANYTHING")
data = cursor.fetchone()
If you are expecting more than one row, use cursor.fetchall() to fetch a list of them.
More info here: http://docs.djangoproject.com/en/dev/topics/db/sql/
Don't.
Seriously.
Move the stored procedure logic into your model where it belongs.
Putting some code in Django and some code in the database is a maintenance nightmare. I've spent too many of my 30+ years in IT trying to clean up this kind of mess.
There is a good example :
https://djangosnippets.org/snippets/118/
from django.db import connection
cursor = connection.cursor()
ret = cursor.callproc("MY_UTIL.LOG_MESSAGE", (control_in, message_in))# calls PROCEDURE named LOG_MESSAGE which resides in MY_UTIL Package
cursor.close()
If you want to look at an actual running project that uses SP, check out minibooks. A good deal of custom SQL and uses Postgres pl/pgsql for SP. I think they're going to remove the SP eventually though (justification in trac ticket 92).
I guess the improved raw sql queryset support in Django 1.2 can make this easier as you wouldn't have to roll your own make_instance type code.
Cx_Oracle can be used. Also, It is fairly helpful when we do not have access to production deployed code and need arises to make major changes in database.
import cx_Oracle
try:
db = dev_plng_con
con = cx_Oracle.connect(db)
cur = con.cursor()
P_ERROR = str(error)
cur.callproc('NAME_OF_PACKAGE.PROCEDURENAME', [P_ERROR])
except Exception as error:
error_logger.error(message)