Django F expressions joined field - python

So I am trying to update my model by running the following:
FooBar.objects.filter(something=True).update(foobar=F('foo__bar'))
but I get the following error:
FieldError: Joined field references are not permitted in this query
if this is not allowed with F expressions...how can I achieve this update?
ticket
given the information in this ticket, I now understand that this is impossible and will never be implemented in django, but is there any way to achieve this update? maybe with some work around? I do not want to use a loop because there are over 10 million FooBar objects, so SQL is much faster than python.

Django 1.11 adds supports for subqueries. You should be able to do:
from django.db.models import Subquery, OuterRef
FooBar.objects.filter(something=True).update(
foobar=Subquery(FooBar.objects.filter(pk=OuterRef('pk')).values('foo__bar')[:1])
)

Why don't use raw sql here:
Based on this, it will be something like
from django.db import connection
raw_query = '''
update app_foobar set app_foobar.foobar =
(select app_foo.bar from app_foo where app_foo.id = app_foobar.foo_id)
where app_foobar.something = 1;
'''
cursor = connection.cursor()
cursor.execute(raw_query)

This is the implementation of Georgi Yanchev's answer for two models:
class Foo(models.Model):
bar = models.ForeignKey(Bar)
Foo.objects \
.filter(foo_field_1=True) \
.update(foo_field_2=Subquery(
Bar.objects \
.filter(id=OuterRef('bar_id')) \
.values('bar_field_1')[:1]))

For anyone wanting a simpler way to do this and not having the case of huge set of objects, below snippet should work just fine:
for fooBar in FooBar.objects.filter(something=True):
fooBar.foobar = fooBar.foo.bar
fooBar.save(update_fields=['foobar'])
For a regular use-cases, this should not present much of a performance difference, especially if being run as part of a data migration.
You can, optionally, also use select_related if needed to further optimize.

Related

What is the replacement for DateModifierNode in new versions of Django

I want to do a query based on two fields of a model, a date, offset by an int, used as a timedelta
model.objects.filter(last_date__gte=datetime.now()-timedelta(days=F('interval')))
is a no-go, as a F() expression cannot be passed into a timedelta
A little digging, and I discovered DateModifierNode - though it seems it was removed in this commit: https://github.com/django/django/commit/cbb5cdd155668ba771cad6b975676d3b20fed37b (from this now-outdated SO question Django: Using F arguments in datetime.timedelta inside a query)
the commit mentions:
The .dates() queries were implemented by using custom Query, QuerySet,
and Compiler classes. Instead implement them by using expressions and
database converters APIs.
which sounds sensible, and like there should still be a quick easy way - but I've been fruitlessly looking for how to do that for a little too long - anyone know the answer?
In Django 1.10 there's simpler method to do it but you need to change the model a little: use a DurationField. My model is as follows:
class MyModel(models.Model):
timeout = models.DurationField(default=86400 * 7) # default: week
last = models.DateTimeField(auto_now_add=True)
and the query to find objects where last was before now minus timeout is:
MyModel.objects.filter(last__lt=datetime.datetime.now()-F('timeout'))
Ah, answer from the docs: https://docs.djangoproject.com/en/1.9/ref/models/expressions/#using-f-with-annotations
from django.db.models import DateTimeField, ExpressionWrapper, F
Ticket.objects.annotate(
expires=ExpressionWrapper(
F('active_at') + F('duration'), output_field=DateTimeField()))
which should make my original query look like
model.objects.annotate(new_date=ExpressionWrapper(F('last_date') + F('interval'), output_field=DateTimeField())).filter(new_date__gte=datetime.now())

Bulk update with subquery using SQLAlchemy

I'm trying to implement the following MySQL query using SQLAlchemy. The table in question is nested set hierarchy.
UPDATE category
JOIN
(
SELECT
node.cat_id,
(COUNT(parent.cat_id) - 1) AS depth
FROM category AS node, category AS parent
WHERE node.lft BETWEEN parent.lft AND parent.rgt
GROUP BY node.cat_id
) AS depths
ON category.cat_id = depths.cat_id
SET category.depth = depths.depth
This works just fine.
This is where I start pulling my hair out:
from sqlalchemy.orm import aliased
from sqlalchemy import func
from myapp.db import db
node = aliased(Category)
parent = aliased(Category)
stmt = db.session.query(node.cat_id,
func.count(parent.cat_id).label('depth_'))\
.filter(node.lft.between(parent.lft, parent.rgt))\
.group_by(node.cat_id).subquery()
db.session.query(Category,
stmt.c.cat_id,
stmt.c.depth_)\
.outerjoin(stmt,
Category.cat_id == stmt.c.cat_id)\
.update({Category.depth: stmt.c.depth_},
synchronize_session='fetch')
...and I get InvalidRequestError: This operation requires only one Table or entity be specified as the target. It seems to me that Category.depth adequately specifies the target, but of course SQLAlchemy trumps whatever I may think.
Stumped. Any suggestions? Thanks.
I know this question is five years old, but I stumbled upon it today. My answer might be useful to someone else. I understand that my solution is not the perfect one, but I don't have a better way of doing this.
I had to change only the last line to:
db.session.query(Category)\
.outerjoin(stmt,
Category.cat_id == stmt.c.cat_id)\
.update({Category.depth: stmt.c.depth_},
synchronize_session='fetch')
Then, you have to commit the changes:
db.session.commit()
This gives the following warning:
SAWarning: Evaluating non-mapped column expression '...' onto ORM
instances; this is a deprecated use case. Please make use of the
actual mapped columns in ORM-evaluated UPDATE / DELETE expressions.
"UPDATE / DELETE expressions." % clause
To get rid of it, I used the solution in this post: Turn off a warning in sqlalchemy
Note: For some reason, aliases don't work in SQLAlchemy update statements.

Django ORM query to find all objects which don't have a recent related object

I have a repeating pattern in my code where a model has a related model (one-to-many) which tracks its history/status. This related model can have many objects representing a point-in-time snapshot of the model's state.
For example:
class Profile(models.Model):
pass
class Subscription(models.Model):
profile = models.ForeignKey(Profile)
data_point = models.IntegerField()
created = models.DateTimeField(default=datetime.datetime)
#Example objects
p = Provile()
subscription1 = Subscription(profile=p, data_point=32, created=datetime.datetime(2011, 7 1)
subscription2 = Subscription(profile=p, data_point=2, created=datetime.datetime(2011, 8 1)
subscription3 = Subscription(profile=p, data_point=3, created=datetime.datetime(2011, 9 1)
subscription4 = Subscription(profile=p, data_point=302, created=datetime.datetime(2011, 10 1)
I often need to query these models to find all of the "Profile" objects that haven't had a subscription update in the last 3 days or similar. I've been using subselect queries to accomplish this:
q = Subscription.objects.filter(created__gt=datetime.datetime.now()-datetime.timedelta(days=3).values('id').query
Profile.objects.exclude(subscription__id__in=q).distinct()
The problem is that this is terribly slow when large tables are involved. Is there a more efficient pattern for a query such as this? Maybe some way to make Django use a JOIN instead of a SUBSELECT (seems like getting rid of all those inner nested loops would help)?
I'd lilke to use the ORM, but if needed I'd be willing to use the .extra() method or even raw SQL if the performance boost is compelling enough.
I'm running against Django 1.4alpha (SVN Trunk) and Postgres 9.1.
from django.db.models import Max
from datetime import datetime, timedelta
Profile.objects.annotate(last_update=Max('subscription__created')).filter(last_update__lt=datetime.now()-timedelta(days=3))
Aggregation (and annotation) is awesome-sauce, see: https://docs.djangoproject.com/en/dev/topics/db/aggregation/
Add a DB index to created:
created = models.DateTimeField(default=datetime.datetime, db_index=True)
As a rule of thumb, any column that is used in queries for lookup or sorting should be indexed, unless you are heavy on writing operations (in that case you should think about using a separate search index, maybe).
Queries using db columns without indexes are only so fast. If you want to analyze the query bottlenecks in more detail, turn on logging for longer running statements (e.g. 200ms and above), and do an explain analyze (postgres) on the long running queries.
EDIT:
I've only now seen in your comment that you have an index on the field. In that case, all the more reason to look at the output of explain analyze.
to make sure that the index is really used, and to its full extend.
to look whether postgres is unnecessarily writing to disk instead of using memory
See
- on query planning http://www.postgresql.org/docs/current/static/runtime-config-query.html
Maybe this helps as an intro: http://blog.it-agenten.com/2015/11/tuning-django-orm-part-2-many-to-many-queries/

How to make a query that filters rows in which one column equals another one of the same table?

Say I have a model that looks like:
class StockRequest(models.Model):
amount_requested = models.PositiveIntegerField(null=True)
amount_approved = models.PositiveIntegerField(null=True)
Is there any way to make a django query that would show me all requests where there is some relationship between amount_requested and amount_approved on a particular object/row?
In SQL it would be as simple as:
select * from stockrequest where amount_requested = amount_approved;
or
select * from stockrequest where amount_requested = amount_approved;
In Django, I'm not sure if it can be done, but I would imagine something like the below (NOTE: syntax completely made up and does not work).
StockRequest.objects.filter(amount_requested="__amount_approved")
from django.db.models import F
StockRequest.objects.filter(amount_requested=F("amount_approved"))
http://docs.djangoproject.com/en/dev/topics/db/queries/#filters-can-reference-fields-on-the-model
Yes, you can. You can use the built in "F" object to do this.
The syntax would be:
from django.db.models import F
StockRequest.objects.filter(amount_requested=F("amount_approved"))
or
StockRequest.objects.filter(amount_requested__gt=F("amount_approved"))
Note: I found the answer immediately after I finished writing the question up. Since I hadn't seen this on Stack Overflow anywhere, I am leaving it up with this answer.
Check docs on the F() function:

How do I get the position of a result in the list after an order_by?

I'm trying to find an efficient way to find the rank of an object in the database related to it's score. My naive solution looks like this:
rank = 0
for q in Model.objects.all().order_by('score'):
if q.name == 'searching_for_this'
return rank
rank += 1
It should be possible to get the database to do the filtering, using order_by:
Model.objects.all().order_by('score').filter(name='searching_for_this')
But there doesn't seem to be a way to retrieve the index for the order_by step after the filter.
Is there a better way to do this? (Using python/django and/or raw SQL.)
My next thought is to pre-compute ranks on insert but that seems messy.
I don't think you can do this in one database query using Django ORM. But if it doesn't bothers you, I would create a custom method on a model:
from django.db.models import Count
class Model(models.Model):
score = models.IntegerField()
...
def ranking(self):
count = Model.objects.filter(score__lt=self.score).count()
return count + 1
You can then use "ranking" anywhere, as if it was a normal field:
print Model.objects.get(pk=1).ranking
Edit: This answer is from 2010. Nowadays I would recommend Carl's solution instead.
Using the new Window functions in Django 2.0 you could write it like this...
from django.db.models import Sum, F
from django.db.models.expressions import Window
from django.db.models.functions import Rank
Model.objects.filter(name='searching_for_this').annotate(
rank=Window(
expression=Rank(),
order_by=F('score').desc()
),
)
Use something like this:
obj = Model.objects.get(name='searching_for_this')
rank = Model.objects.filter(score__gt=obj.score).count()
You can pre-compute ranks and save it to Model if they are frequently used and affect the performance.
In "raw SQL" with a standard-conforming database engine (PostgreSql, SQL Server, Oracle, DB2, ...), you can just use the SQL-standard RANK function -- but that's not supported in popular but non-standard engines such as MySql and Sqlite, and (perhaps because of that) Django does not "surface" this functionality to the application.

Categories

Resources