I want to do a query based on two fields of a model, a date, offset by an int, used as a timedelta
model.objects.filter(last_date__gte=datetime.now()-timedelta(days=F('interval')))
is a no-go, as a F() expression cannot be passed into a timedelta
A little digging, and I discovered DateModifierNode - though it seems it was removed in this commit: https://github.com/django/django/commit/cbb5cdd155668ba771cad6b975676d3b20fed37b (from this now-outdated SO question Django: Using F arguments in datetime.timedelta inside a query)
the commit mentions:
The .dates() queries were implemented by using custom Query, QuerySet,
and Compiler classes. Instead implement them by using expressions and
database converters APIs.
which sounds sensible, and like there should still be a quick easy way - but I've been fruitlessly looking for how to do that for a little too long - anyone know the answer?
In Django 1.10 there's simpler method to do it but you need to change the model a little: use a DurationField. My model is as follows:
class MyModel(models.Model):
timeout = models.DurationField(default=86400 * 7) # default: week
last = models.DateTimeField(auto_now_add=True)
and the query to find objects where last was before now minus timeout is:
MyModel.objects.filter(last__lt=datetime.datetime.now()-F('timeout'))
Ah, answer from the docs: https://docs.djangoproject.com/en/1.9/ref/models/expressions/#using-f-with-annotations
from django.db.models import DateTimeField, ExpressionWrapper, F
Ticket.objects.annotate(
expires=ExpressionWrapper(
F('active_at') + F('duration'), output_field=DateTimeField()))
which should make my original query look like
model.objects.annotate(new_date=ExpressionWrapper(F('last_date') + F('interval'), output_field=DateTimeField())).filter(new_date__gte=datetime.now())
Related
I'm using Django, Python 3.7 and PostGres 9.5. I want to write the following WHERE clause in Django ...
WHERE date_part('hour', current_time) = s.hour ...
so in reading some other documentation, I'm led to believe I need to write a "Func" create an annotation before running my query ...
qset = ArticleStat.objects.annotate(
hour_of_day=Func(
'current_time',
Value('hour'),
function='date_part',
)
).filter(hour_of_day=F("article__website__stats_per_hour__hour"))
However, this results in a
Cannot resolve keyword 'current_time' into field. Choices are: article, article_id, elapsed_time_in_seconds, id, score
error. It seems like Django is tryhing to treat "current_time" as a column from my table but I really want it to be treated as a PostGres function. How do I do that?
Update 2: Reading the filter clause you use the annotated hour_of_day for, simply turning the clause around would make it all a lot easier unless I'm overlooking something:
hour = datetime.datetime.now().hour
qset = ArticleStat.objects.filter(article__website__stats_per_hour__hour=hour)
Update: Even easier than the double annotation hack below is to get the current time in Python (once per query instead of once per row) and pass it to the function. You may need to make sure that the time zones match.
import datetime
from django.db.models import DateTimeField
from django.db.models.expressions import Func, Value
current_time = datetime.datetime.now()
qset = Session.objects.annotate(
hour_of_day=Func(
Value('hour'),
Value(current_time, output_field=DateTimeField()),
function='date_part',
)
)
A simple hack would be to use two annotations to avoid nesting a database function in another (which you can probably do with a custom function subclassed from Func if you're serious enough about it):
from django.db.models import DateTimeField
from django.db.models.expressions import Func, Value
qset = MyModel.objects.annotate(
current_time=Func(
Value(0),
function='current_time',
output_field=DateTimeField()
)).annotate(
hour_of_day=Func(
Value('hour'),
F('current_time'),
function='date_part',
)
)
So I am trying to update my model by running the following:
FooBar.objects.filter(something=True).update(foobar=F('foo__bar'))
but I get the following error:
FieldError: Joined field references are not permitted in this query
if this is not allowed with F expressions...how can I achieve this update?
ticket
given the information in this ticket, I now understand that this is impossible and will never be implemented in django, but is there any way to achieve this update? maybe with some work around? I do not want to use a loop because there are over 10 million FooBar objects, so SQL is much faster than python.
Django 1.11 adds supports for subqueries. You should be able to do:
from django.db.models import Subquery, OuterRef
FooBar.objects.filter(something=True).update(
foobar=Subquery(FooBar.objects.filter(pk=OuterRef('pk')).values('foo__bar')[:1])
)
Why don't use raw sql here:
Based on this, it will be something like
from django.db import connection
raw_query = '''
update app_foobar set app_foobar.foobar =
(select app_foo.bar from app_foo where app_foo.id = app_foobar.foo_id)
where app_foobar.something = 1;
'''
cursor = connection.cursor()
cursor.execute(raw_query)
This is the implementation of Georgi Yanchev's answer for two models:
class Foo(models.Model):
bar = models.ForeignKey(Bar)
Foo.objects \
.filter(foo_field_1=True) \
.update(foo_field_2=Subquery(
Bar.objects \
.filter(id=OuterRef('bar_id')) \
.values('bar_field_1')[:1]))
For anyone wanting a simpler way to do this and not having the case of huge set of objects, below snippet should work just fine:
for fooBar in FooBar.objects.filter(something=True):
fooBar.foobar = fooBar.foo.bar
fooBar.save(update_fields=['foobar'])
For a regular use-cases, this should not present much of a performance difference, especially if being run as part of a data migration.
You can, optionally, also use select_related if needed to further optimize.
I have a repeating pattern in my code where a model has a related model (one-to-many) which tracks its history/status. This related model can have many objects representing a point-in-time snapshot of the model's state.
For example:
class Profile(models.Model):
pass
class Subscription(models.Model):
profile = models.ForeignKey(Profile)
data_point = models.IntegerField()
created = models.DateTimeField(default=datetime.datetime)
#Example objects
p = Provile()
subscription1 = Subscription(profile=p, data_point=32, created=datetime.datetime(2011, 7 1)
subscription2 = Subscription(profile=p, data_point=2, created=datetime.datetime(2011, 8 1)
subscription3 = Subscription(profile=p, data_point=3, created=datetime.datetime(2011, 9 1)
subscription4 = Subscription(profile=p, data_point=302, created=datetime.datetime(2011, 10 1)
I often need to query these models to find all of the "Profile" objects that haven't had a subscription update in the last 3 days or similar. I've been using subselect queries to accomplish this:
q = Subscription.objects.filter(created__gt=datetime.datetime.now()-datetime.timedelta(days=3).values('id').query
Profile.objects.exclude(subscription__id__in=q).distinct()
The problem is that this is terribly slow when large tables are involved. Is there a more efficient pattern for a query such as this? Maybe some way to make Django use a JOIN instead of a SUBSELECT (seems like getting rid of all those inner nested loops would help)?
I'd lilke to use the ORM, but if needed I'd be willing to use the .extra() method or even raw SQL if the performance boost is compelling enough.
I'm running against Django 1.4alpha (SVN Trunk) and Postgres 9.1.
from django.db.models import Max
from datetime import datetime, timedelta
Profile.objects.annotate(last_update=Max('subscription__created')).filter(last_update__lt=datetime.now()-timedelta(days=3))
Aggregation (and annotation) is awesome-sauce, see: https://docs.djangoproject.com/en/dev/topics/db/aggregation/
Add a DB index to created:
created = models.DateTimeField(default=datetime.datetime, db_index=True)
As a rule of thumb, any column that is used in queries for lookup or sorting should be indexed, unless you are heavy on writing operations (in that case you should think about using a separate search index, maybe).
Queries using db columns without indexes are only so fast. If you want to analyze the query bottlenecks in more detail, turn on logging for longer running statements (e.g. 200ms and above), and do an explain analyze (postgres) on the long running queries.
EDIT:
I've only now seen in your comment that you have an index on the field. In that case, all the more reason to look at the output of explain analyze.
to make sure that the index is really used, and to its full extend.
to look whether postgres is unnecessarily writing to disk instead of using memory
See
- on query planning http://www.postgresql.org/docs/current/static/runtime-config-query.html
Maybe this helps as an intro: http://blog.it-agenten.com/2015/11/tuning-django-orm-part-2-many-to-many-queries/
Say I have a model that looks like:
class StockRequest(models.Model):
amount_requested = models.PositiveIntegerField(null=True)
amount_approved = models.PositiveIntegerField(null=True)
Is there any way to make a django query that would show me all requests where there is some relationship between amount_requested and amount_approved on a particular object/row?
In SQL it would be as simple as:
select * from stockrequest where amount_requested = amount_approved;
or
select * from stockrequest where amount_requested = amount_approved;
In Django, I'm not sure if it can be done, but I would imagine something like the below (NOTE: syntax completely made up and does not work).
StockRequest.objects.filter(amount_requested="__amount_approved")
from django.db.models import F
StockRequest.objects.filter(amount_requested=F("amount_approved"))
http://docs.djangoproject.com/en/dev/topics/db/queries/#filters-can-reference-fields-on-the-model
Yes, you can. You can use the built in "F" object to do this.
The syntax would be:
from django.db.models import F
StockRequest.objects.filter(amount_requested=F("amount_approved"))
or
StockRequest.objects.filter(amount_requested__gt=F("amount_approved"))
Note: I found the answer immediately after I finished writing the question up. Since I hadn't seen this on Stack Overflow anywhere, I am leaving it up with this answer.
Check docs on the F() function:
I'm trying to find an efficient way to find the rank of an object in the database related to it's score. My naive solution looks like this:
rank = 0
for q in Model.objects.all().order_by('score'):
if q.name == 'searching_for_this'
return rank
rank += 1
It should be possible to get the database to do the filtering, using order_by:
Model.objects.all().order_by('score').filter(name='searching_for_this')
But there doesn't seem to be a way to retrieve the index for the order_by step after the filter.
Is there a better way to do this? (Using python/django and/or raw SQL.)
My next thought is to pre-compute ranks on insert but that seems messy.
I don't think you can do this in one database query using Django ORM. But if it doesn't bothers you, I would create a custom method on a model:
from django.db.models import Count
class Model(models.Model):
score = models.IntegerField()
...
def ranking(self):
count = Model.objects.filter(score__lt=self.score).count()
return count + 1
You can then use "ranking" anywhere, as if it was a normal field:
print Model.objects.get(pk=1).ranking
Edit: This answer is from 2010. Nowadays I would recommend Carl's solution instead.
Using the new Window functions in Django 2.0 you could write it like this...
from django.db.models import Sum, F
from django.db.models.expressions import Window
from django.db.models.functions import Rank
Model.objects.filter(name='searching_for_this').annotate(
rank=Window(
expression=Rank(),
order_by=F('score').desc()
),
)
Use something like this:
obj = Model.objects.get(name='searching_for_this')
rank = Model.objects.filter(score__gt=obj.score).count()
You can pre-compute ranks and save it to Model if they are frequently used and affect the performance.
In "raw SQL" with a standard-conforming database engine (PostgreSql, SQL Server, Oracle, DB2, ...), you can just use the SQL-standard RANK function -- but that's not supported in popular but non-standard engines such as MySql and Sqlite, and (perhaps because of that) Django does not "surface" this functionality to the application.