I was studying Django when i found they are chaining their query methods like Post.objects.filter(pk=1).filter(title='first').filter(author='me') to construct a query without actually executing it, and only execute the query when we try to access and work with its result.
From there i got interested to know how they are doing this so i can apply the same approach in my work, so for instance i can have something like
Writing code like myProduct.discount('10%').discountLimit('100$').tax('10$').shipping('20$') will only evaluate when i try work with it.
Build custom DB manager for non Django apps where i can chain my query methods and execute the query automatically only when i try to access its result (or at least when chaining ends). So i can end with something like
#doesn't hit the DB
myPost = Post.objects.select(...).where(...).where(...).limit(...)
#only hit the DB on usage
print(myPost.title)
So my QUESTION is, how can i do so?
Approaches that i thought of but i don't like
I can implement an .execute() method to perform the actual execution, calling it at the tail of the chain or whenever desired Post.objects.select(x).where(y).offset(z).execute()
I can insert a delay within each of the query builder methods to make sure it is the last in chain
class Post:
def where(self,...):
me = now()
self.lastCall = me
#process the inputs here
self.query += "WHERE ..."
self.lazyExecute(me)
return self
def lazyExecute(self,identifier):
delay(5000)
if self.lastCall = identifier
self.executeQuery()
else:
pass
Related
I'm using django 1.11 on python 3.7
In a method I want to execute some database queries, mainly updating links between objects and I want to use this method to perform a check on what needs to be updated in a sync-operation. The following is an implementation:
results = {}
with transaction.atomic():
sid = transaction.savepoint()
for speaker_user in speaker_users:
# here my code checks all sorts of things, updates the database with
# new connections between objects and stores them all in the
# results-dict, using a lot of code in other classes which
# I really dont want to change for this operation
if sync_test_only:
transaction.savepoint_rollback(sid)
else:
transaction.savepoint_commit(sid)
return results
This snippet is used in a method with the sync_test_only parameter that should only fill the results-dict without doing the database changes that go along with it.
So this method can be used to do the actual work, when sync_test_only is False, and also only report back the work to-be-done, when sync_test_only is True
Is this what the transaction.atomic() is designed for? Does this actually work in my use-case? If not, what would be a better way to achieve this behaviour?
Another option would be to use exceptions, like the docs suggest (read the part under the title "You may need to manually revert model state when rolling back a transaction"):
class MyException(Exception):
pass
def f(do_commit=False):
results = {}
try:
with transaction.atomic():
for speaker_user in speaker_users:
pass
if not do_commit:
raise MyException
except MyException:
# do nothing here
pass
return results
I suggest creating a custom exception so you don't accidently catch something that was raised somewhere else in the code.
I want to minimize the number of database queries my application makes, and I am familiarizing myself more with Django's ORM. I am wondering, what are the cases where a query is executed.
For instance, this format is along the lines of the answer I'm looking for (for example purposes, not accurate to my knowledge):
Model.objects.get()
Always launches a query
Model.objects.filter()
Launches a query if objects is empty only
(...)
I am assuming curried filter operations never make additional requests, but from the docs it looks like filter() does indeed make database requests if it's the first thing called.
If you're using test cases, you can use this custom assertion included in django's TestCase: assertNumQueries().
Example:
with self.assertNumQueries(2):
x = SomeModel.objects.get(pk=1)
y = x.some_foreign_key_in_object
If the expected number of queries was wrong, you'd see an assertion failed message of the form:
Num queries (expected - actual):
2 : 5
In this example, the foreign key would cause an additional query even though there's no explicit query (get, filter, exclude, etc.).
For this reason, I would use a practical approach: Test or logging, instead of trying to learn each of the cases in which django is supposed to query.
If you don't use unit tests, you may use this other method which prints the actual SQL statements sent by django, so you can have an idea of the complexity of the query, and not just the number of queries:
(DEBUG setting must be set to True)
from django.db import connection
x = SomeModel.objects.get(pk=1)
y = x.some_foreign_key_in_object
print connection.queries
The print would show a dictionary of queries:
[
{'sql': 'SELECT a, b, c, d ... FROM app_some_model', 'time': '0.002'},
{'sql': 'SELECT j, k, ... FROM app_referenced_model JOIN ... blabla ',
'time': '0.004'}
]
Docs on connection.queries.
Of course, you can also combine both methods and use the print connection.queries in your test cases.
See Django's documentation on when querysets are evaluated: https://docs.djangoproject.com/en/dev/ref/models/querysets/#when-querysets-are-evaluated
Evaluation in this case means that the query is executed. This mostly happens when you are trying to access the results, eg. when calling list() or len() on it or iterating over the results.
get()in your example doesn't return a queryset but a model objects, therefore it is evaluated immediately.
If I want to check for the existence and if possible retrieve an object, which of the following methods is faster? More idiomatic? And why? If not either of the two examples I list, how else would one go about doing this?
if Object.objects.get(**kwargs).exists():
my_object = Object.objects.get(**kwargs)
my_object = Object.objects.filter(**kwargs)
if my_object:
my_object = my_object[0]
If relevant, I care about mysql and postgres for this.
Why not do this in a try/except block to avoid the multiple queries / query then an if?
try:
obj = Object.objects.get(**kwargs)
except Object.DoesNotExist:
pass
Just add your else logic under the except.
django provides a pretty good overview of exists
Using your first example it will do the query two times, according to the documentation:
if some_queryset has not yet been evaluated, but you
know that it will be at some point, then using some_queryset.exists()
will do more overall work (one query for the existence check plus an
extra one to later retrieve the results) than simply using
bool(some_queryset), which retrieves the results and then checks if
any were returned.
So if you're going to be using the object, after checking for existance, the docs suggest just using it and forcing evaluation 1 time using
if my_object:
pass
Imagine you have the following situation:
for i in xrange(100000):
account = Account()
account.foo = i
account.save
Obviously, the 100,000 INSERT statements executed by Django are going to take some time. It would be nicer to be able to combine all those INSERTs into one big INSERT. Here's the kind of thing I'm hoping I can do:
inserts = []
for i in xrange(100000):
account = Account()
account.foo = i
inserts.append(account.insert_sql)
sql = 'INSERT INTO whatever... ' + ', '.join(inserts)
Is there a way to do this using QuerySet, without manually generating all those INSERT statements?
As shown in this related question, one can use #transaction.commit_manually to combine all the .save() operations as a single commit to greatly improve performance.
#transaction.commit_manually
def your_view(request):
try:
for i in xrange(100000):
account = Account()
account.foo = i
account.save()
except:
transaction.rollback()
else:
transaction.commit()
Alternatively, if you're feeling adventurous, have a look at this snippet which implements a manager for bulk inserting. Note that it works only with MySQL, and hasn't been updated in a while so it's hard to tell if it will play nice with newer versions of Django.
You could use raw SQL.
Either by Account.objects.raw() or using a django.db.connection objects.
This might not be an option if you want to maintain database agnosticism.
http://docs.djangoproject.com/en/dev/topics/db/sql/
If what you're doing is a one time setup, perhaps using a fixture would be better.
In my AppEngine project I have a need to use a certain filter as a base then apply various different extra filters to the end, retrieving the different result sets separately. e.g.:
base_query = MyModel.all().filter('mainfilter', 123)
Then I need to use the results of various sub queries separately:
subquery1 = basequery.filter('subfilter1', 'xyz')
#Do something with subquery1 results here
subquery2 = basequery.filter('subfilter2', 'abc')
#Do something with subquery2 results here
Unfortunately 'filter()' affects the state of the basequery Query instance, rather than just returning a modified version. Is there any way to duplicate the Query object and use it as a base? Is there perhaps a standard Python way of duping an object that could be used?
The extra filters are actually applied by the results of different forms dynamically within a wizard, and they use the 'running total' of the query in their branch to assess whether to ask further questions.
Obviously I could pass around a rudimentary stack of filter criteria, but I'd rather use the Query itself if possible, as it adds simplicity and elegance to the solution.
There's no officially approved (Eg, not likely to break) way to do this. Simply creating the query afresh from the parameters when you need it is your best option.
As Nick has said, you better create the query again, but you can still avoid repeating yourself. A good way to do that would be like this:
#inside a request handler
def create_base_query():
return MyModel.all().filter('mainfilter', 123)
subquery1 = create_base_query().filter('subfilter1', 'xyz')
#Do something with subquery1 results here
subquery2 = create_base_query().filter('subfilter2', 'abc')
#Do something with subquery2 results here