Django LEFT JOIN? - python

I have models, more or less like this:
class ModelA(models.Model):
field = models.CharField(..)
class ModelB(models.Model):
name = models.CharField(.., unique=True)
modela = models.ForeignKey(ModelA, blank=True, related_name='modelbs')
class Meta:
unique_together = ('name','modela')
I want to do a query that says something like: "Get all the ModelA's where field name equals to X that have a ModelB model with a name of X OR with no model name at all"
So far I have this:
ModelA.objects.exclude(field=condition).filter(modelsbs__name=condition)
This will get me all the ModelAs that have at least one modelB (and in reality it will ALWAYS be just one) - but if a ModelA has no related ModelBs, it will not be in the result set. I need it to be in the resultset with something like obj.modelb = None
How can I accomplish this?

Use Q to combine the two conditions:
from django.db.models import Q
qs = ModelA.objects.exclude(field=condition)
qs = qs.filter(Q(modelbs__name=condition) | Q(modelbs__isnull=True))
To examine the resulting SQL query:
print qs.query.as_sql()
On a similar query, this generates a LEFT OUTER JOIN ... WHERE (a.val = b OR a.id IS NULL).

It looks like you are coming up against the 80% barrier. Why not just use .extra(select={'has_x_or_none':'(EXISTS (SELECT ...))'}) to perform a subquery? You can write the subquery any way you like and should be able to filter against the new field. The SQL should wind up looking something like this:
SELECT *,
((EXISTS (SELECT * FROM other WHERE other.id=primary.id AND other.name='X'))
OR (NOT EXISTS (SELECT * FROM other WHERE other.id=primary.id))) AS has_x_or_none
FROM primary WHERE has_x_or_none=1;

Try this patch for custom joins: https://code.djangoproject.com/ticket/7231

LEFT JOIN is a union of two queries. Sometimes it's optimized to one query. Sometimes, it is not actually optimized by the underlying SQL engine and is done as two separate queries.
Do this.
for a in ModelA.objects.all():
related = a.model_b.set().all()
if related.count() == 0:
# These are the A with no B's
else:
# These are the A with some B's
Don't fetishize about SQL outer joins appearing to be a "single" query.

Related

GeoDjango: How to perform a query of spatially close records

I have two Django models (A and B) which are not related by any foreign key, but both have a geometry field.
class A(Model):
position = PointField(geography=True)
class B(Model):
position = PointField(geography=True)
I would like to relate them spatially, i.e. given a queryset of A, being able to obtain a queryset of B containing those records that are at less than a given distance to A.
I haven't found a way using pure Django's ORM to do such a thing.
Of course, I could write a property in A such as this one:
#property
def nearby(self):
return B.objects.filter(position__dwithin=(self.position, 0.1))
But this only allows me to fetch the nearby records on each instance and not in a single query, which is far from efficient.
I have also tried to do this:
nearby = B.objects.filter(position__dwithin=(OuterRef('position'), 0.1))
query = A.objects.annotate(nearby=Subquery(nearby.values('pk')))
list(query) # error here
However, I get this error for the last line:
ValueError: This queryset contains a reference to an outer query and may only be used in a subquery
Does anybody know a better way (more efficient) of performing such a query or maybe the reason why my code is failing?
I very much appreciate.
I finally managed to solve it, but I had to perform a raw SQL query in the end.
This will return all A records with an annotation including a list of all nearby B records:
from collections import namedtuple
from django.db import connection
with connection.cursor() as cursor:
cursor.execute('''SELECT id, array_agg(b.id) as nearby FROM myapp_a a
LEFT JOIN myapp_b b ON ST_DWithin(a.position, p.position, 0.1)
GROUP BY a.id''')
nt_result = namedtuple('Result', [col[0] for col in cursor.description])
results = [nt_result(*row) for row in cursor.fetchall()]
References:
Raw queries: https://docs.djangoproject.com/en/2.2/topics/db/sql/#executing-custom-sql-directly
Array aggregation: https://www.postgresql.org/docs/8.4/functions-aggregate.html
ST_DWithin: https://postgis.net/docs/ST_DWithin.html

Django Subquery with Scalar Value

Is it possible to compare subqueries results with scalar values using Django ORM? I'm having a hard time to convert this:
SELECT payment_subscription.*
FROM payment_subscription payment_subscription
JOIN payment_recurrent payment_recurrent ON payment_subscription.id = payment_recurrent.subscription_id
WHERE
payment_subscription.status = 1
AND (SELECT expiration_date
FROM payment_transaction payment_transaction
WHERE payment_transaction.company_id = payment_subscription.company_id
AND payment_transaction.status IN ('OK', 'Complete')
ORDER BY payment_transaction.expiration_date DESC, payment_transaction.id DESC
LIMIT 1) <= ?
The main points are:
The last comparison of the scalar value of the subquery with an arbitrary parameter.
the join between the subquery and the outer query with the company id
Subscription.objects.annotate(
max_expiraton_date=Max('transaction__expiration_date')
).filter(
status=1,
recurrent__isnull=False, # [inner] join with recurrent
transaction__status__in=['OK', 'Complete'],
max_expiraton_date=date_value
)
This produces other SQL query, but obtains the same Subscription objects.
You can (as of Django 1.11) annotate on a subquery, and slice it to ensure you only get the "first" result. You can then filter on that subquery annotations, by comparing to the value you want.
from django.db.models.expressions import Subquery, OuterRef
expiration_date = Transaction.objects.filter(
company=OuterRef('company'),
status__in=['OK', 'Complete'],
).order_by('-expiration_date').values('expiration_date')[:1]
Subscription.objects.filter(status=1).annotate(
expiration_date=Subquery(expiration_date),
).filter(expiration_date__lte=THE_DATE)
However...
Currently that can result in really poor performance: your database will evaluate the subquery twice (once in the where clause, from the filter, and again in the select clause, from the annotation). There is work underway to resolve this, but it's not currently complete.

Create model for MySQL view and join on it

I'm doing some prototyping and have a simple model like this
class SampleModel(models.Model):
user_id = models.IntegerField(default=0, db_index=True)
staff_id = models.IntegerField(default=0, db_index=True)
timestamp = models.DateTimeField(default=timezone.now, db_index=True)
objects = AsOfManager()
Now we need to do queries that require a self join, which written in raw SQL are simply something like this:
SELECT X.* FROM no_chain_samplemodel as X
JOIN (SELECT user_id, MAX(timestamp) AS timestamp
FROM no_chain_samplemodel
GROUP BY user_id) AS Y
ON (X.user_id = Y.user_id and X.timestamp = Y.timestamp);
This query should return for each user_id what is the last row ordering by timestamp. Each of this "chain" (of user_id related rows) could have thousands of rows potentially.
Now I could use raw SQL but then I lose composability, I would like to return another queryset.
And at the same time would be nice to make also writing raw SQL easier, so I thought I could use a database view.
The view could be just something like this
CREATE VIEW no_chain_sample_model_with_max_date AS SELECT user_id AS id, MAX(timestamp) AS timestamp
FROM no_chain_samplemodel
GROUP BY user_id;
So the model that refers to the view could be simply like this:
class SampleModelWithMaxDate(models.Model):
class Meta:
managed = False
db_table = 'no_chain_sample_model_with_max_date'
id = models.IntegerField(default=0, primary_key=True)
timestamp = models.DateTimeField(default=timezone.now, db_index=True)
However there are a few problems:
even if managed = False './manage.py makemigrations' still creates the migration for this table.
I even tried to leave the migration there but replacing the model with raw SQL to create the view
but no luck.
I need now to do select_related to join the two tables and query, but how should I do that?
I tried a foreign key on SampleModel like this:
by_date = models.ForeignKey(SampleModelWithMaxDate, null=True)
but this also doesn't work:
OperationalError: (1054, "Unknown column 'no_chain_sample_model_with_max_date.by_date_id' in 'field list'")
So in general I'm not even sure if it's possible, I can see other people that are using models with views and just for querying the independent model that works also for me, but is it possible to do anything smarter than that?
Thanks
I couldn't find any ORM method to get what you want in one query but we could kind of do this with two queries:
First, we get max timestamp for all the users
latest_timestamps = SampleModel.objects.values('user_id')
.annotate(max_ts=Max('timestamp')).values('max_ts')
Here values(user_id) works as group by operation.
Now, we get all the instanecs of SampleModel with the exact timestamps
qs = SampleModel.objects.filter(timestamp__in=latest_timestamps)
PostgreSQL speficic answer:
You could mix order_by and distinct to achieve what you want:
SampleModel.objects.order_by('user_id', '-timestamp').distinct('user_id')
Breaking it down:
# order by user_id, and in decreasing order of timestamp
qs = SampleModel.objects.order_by('user_id', '-timestamp')
# get distinct rows using user_id, this will make sure that the first entry for
# each user is retained and since we further ordered in decreasing order of
# timestamp for each user the first entry will have last row added
# for the user in the database.
qs = qs.distinct('user_id')

How do I select from a many-to-many intermediate model in django?

I have models of books and people:
from django.db import models
class Book(models.Model):
author = models.ManyToManyField('Person')
class Person(models.Model):
name = models.CharField(max_length=16)
I simplified them a bit here. How do I craft a django query to get all of the authors of the books? With SQL I would do a select on the intermediate table and join it with the people table to get the name, but I'm not sure how to do something similar here... Of course, there are people in the Person table that are not book authors, or I could just get Person.objects.all().
As easy as 1,2,3 with Filtering on annotations:
from django.db.models import Count
Person.objects.annotate(count_book=Count('book')).filter(count_book__gt=0)
For curiosity, i generated the SQL from each of the ways proposed on this topic:
In [9]: Person.objects.annotate(count_book=Count('book')).filter(count_book__gt=0)
DEBUG (0.000) SELECT "testapp_person"."id", "testapp_person"."name", COUNT("testapp_book_author"."book_id") AS "count_book" FROM "testapp_person" LEFT OUTER JOIN "testapp_book_author" ON ("testapp_person"."id" = "testapp_book_author"."person_id") GROUP BY "testapp_person"."id", "testapp_person"."name", "testapp_person"."id", "testapp_person"."name" HAVING COUNT("testapp_book_author"."book_id") > 0 LIMIT 21; args=(0,)
Out[9]: [<Person: Person object>]
In [10]: Person.objects.exclude(book=None)
DEBUG (0.000) SELECT "testapp_person"."id", "testapp_person"."name" FROM "testapp_person" WHERE NOT (("testapp_person"."id" IN (SELECT U0."id" FROM "testapp_person" U0 LEFT OUTER JOIN "testapp_book_author" U1 ON (U0."id" = U1."person_id") LEFT OUTER JOIN "testapp_book" U2 ON (U1."book_id" = U2."id") WHERE (U2."id" IS NULL AND U0."id" IS NOT NULL)) AND "testapp_person"."id" IS NOT NULL)) LIMIT 21; args=()
Out[10]: [<Person: Person object>]
In [11]: Person.objects.filter(pk__in=Book.objects.values_list('author').distinct())
DEBUG (0.000) SELECT "testapp_person"."id", "testapp_person"."name" FROM "testapp_person" WHERE "testapp_person"."id" IN (SELECT DISTINCT U1."person_id" FROM "testapp_book" U0 LEFT OUTER JOIN "testapp_book_author" U1 ON (U0."id" = U1."book_id")) LIMIT 21; args=()
Out[11]: [<Person: Person object>]
Maybe this can help you choose.
Personnaly, i prefer the version by Chris because it is the shortest. On the other hand, I don't know for sure about the impact of having subqueries which is the case for the two other ways. That said, they do demonstrate interresting QuerySet concepts:
Annonation, is aggregation per value of the queryset. If you use aggregate(Count('book')) then you will get the total number of books. If you use annotate(Count('book')) then you get a total number of book per value of the queryset (per Person). Also, each person has a 'count_book' attribute which is pretty cool: Person.objects.annotate(count_book=Count('book')).filter(count_book__gt=0)[0].count_book
Subqueries, very useful to create complex queries or optimize queries (merge querysets, generic relation prefetching for example).
Easiest way:
Person.objects.filter(book__isnull=False)
That will select all people that have at least one book associated with them.
You can get all the ID's of the authors quite easily using
Book.objects.all().values_list('author', flat=True).distinct()
As jpic points out below
Person.objects.filter(pk__in=Book.objects.values_list('author').distinct()) will give you all the person objects and not just their id's.
This chapter of the django book answers all your model needs with examples very similar to yours, including a ManyToMany relation. http://www.djangobook.com/en/2.0/chapter10/

Need a workaround to filter on related model and aggregated fields in Django

I opened a ticket for this problem.
In a nutshell here is my model:
class Plan(models.Model):
cap = models.IntegerField()
class Phone(models.Model):
plan = models.ForeignKey(Plan, related_name='phones')
class Call(models.Model):
phone = models.ForeignKey(Phone, related_name='calls')
cost = models.IntegerField()
I want to run a query like this one:
Phone.objects.annotate(total_cost=Sum('calls__cost')).filter(total_cost__gte=0.5*F('plan__cap'))
Unfortunately Django generates bad SQL:
SELECT "app_phone"."id", "app_phone"."plan_id",
SUM("app_call"."cost") AS "total_cost"
FROM "app_phone"
INNER JOIN "app_plan" ON ("app_phone"."plan_id" = "app_plan"."id")
LEFT OUTER JOIN "app_call" ON ("app_phone"."id" = "app_call"."phone_id")
GROUP BY "app_phone"."id", "app_phone"."plan_id"
HAVING SUM("app_call"."cost") >= 0.5 * "app_plan"."cap"
and errors with:
ProgrammingError: column "app_plan.cap" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: ...."plan_id" HAVING SUM("app_call"."cost") >= 0.5 * "app_plan"....
Is there any workaround apart from running raw SQL?
When aggregating, SQL requires any value in a field either be unique within a group, or that the field be wrapped in an aggregation function which ensures that only one value will come out for each group. The problem here is that "app_plan.cap" could have many different values for each combination of "app_phone.id" and "app_phone.plan_id", so you need to tell the DB how to treat those.
So, valid SQL for your result is one of two different possibilities, depending on the result you want. First, you could include app_plan.cap in the GROUP BY function, so that any distinct combination of (app_phone.id, app_phone.plan_id, app_plan.cap) will be a different group:
SELECT "app_phone"."id", "app_phone"."plan_id", "app_plan"."cap",
SUM("app_call"."cost") AS "total_cost"
FROM "app_phone"
INNER JOIN "app_plan" ON ("app_phone"."plan_id" = "app_plan"."id")
LEFT OUTER JOIN "app_call" ON ("app_phone"."id" = "app_call"."phone_id")
GROUP BY "app_phone"."id", "app_phone"."plan_id", "app_plan"."cap"
HAVING SUM("app_call"."cost") >= 0.5 * "app_plan"."cap"
The trick is to get the extra value into the "GROUP BY" call. We can weasel our way into this by abusing "extra", though this hard-codes the table name for "app_plan" which is unideal -- you could do it programmatically with the Plan class instead if you wanted:
Phone.objects.extra({
"plan_cap": "app_plan.cap"
}).annotate(
total_cost=Sum('calls__cost')
).filter(total_cost__gte=0.5*F('plan__cap'))
Alternatively, you could wrap app_plan.cap in an aggregation function, turning it into a unique value. Aggregation functions vary by DB provider, but might include things like AVG, MAX, MIN, etc.
SELECT "app_phone"."id", "app_phone"."plan_id",
SUM("app_call"."cost") AS "total_cost",
AVG("app_plan"."cap") AS "avg_cap",
FROM "app_phone"
INNER JOIN "app_plan" ON ("app_phone"."plan_id" = "app_plan"."id")
LEFT OUTER JOIN "app_call" ON ("app_phone"."id" = "app_call"."phone_id")
GROUP BY "app_phone"."id", "app_phone"."plan_id"
HAVING SUM("app_call"."cost") >= 0.5 * AVG("app_plan"."cap")
You could get this result in Django using the following:
Phone.objects.annotate(
total_cost=Sum('calls__cost'),
avg_cap=Avg('plan__cap')
).filter(total_cost__gte=0.5 * F("avg_cap"))
You may want to consider updating the bug report you left with a clearer specification of the result you expect -- for example, the valid SQL you're after.

Categories

Resources