I am trying to query all objects in a table without backreferences from another model.
class A(models.Model):
pass
class B(models.Model):
reference = models.ForeignKey(A)
In order to get all A objects with no references from any B objects, I do
A.objects.filter(b__isnull=True)
The Django documentation on isnull does not mention backreferences at all.
Can I get into trouble with this or is it just poorly documented?
I tried this with Django 1.10.3, using your same code example.
Let's take a look at the raw SQL statement that Django creates:
>>> print(A.objects.filter(b__isnull=True).query)
SELECT "backrefs_a"."id" FROM "backrefs_a" LEFT OUTER JOIN "backrefs_b" ON ("backrefs_a"."id" = "backrefs_b"."reference_id") WHERE "backrefs_b"."id" IS NULL
Technically, this isn't the exact same SQL that Django will send to the database, but it's close enough. For more discussion, see https://stackoverflow.com/a/1074224/5044893
If we pretty it up a bit, you can see that the query is actually quite safe:
SELECT "backrefs_a"."id"
FROM "backrefs_a"
LEFT OUTER JOIN "backrefs_b" ON ("backrefs_a"."id" = "backrefs_b"."reference_id")
WHERE "backrefs_b"."id" IS NULL
The LEFT in the LEFT OUTER JOIN ensures that you will get records from A, even when there are no matching records in B. And, because Django won't save a B with a NULL id, you can rest assured that it will work as you expect.
Related
We have a hand-written SQL query for proof of concept and hope to implement the function with the Django framework.
Specifically, Django's QuerySet usually implements a join query by matching the foreign key with the primary key of the referred table. However, in the sample SQL below, we need additional matching conditions besides the foreign key, like the eav_phone.attribute_id = 122 in the example snippet below.
...
left outer join eav_value as eav_postalcode
on t.id = eav_postalcode.entity_id and eav_phone.attribute_id = 122
...
Questions:
We wonder if there is a way to do it with Python, Django framework, or libraries.
We also wonder if other programming languages have any mature toolkits we can refer to as a design pattern. So we highly appreciate any hints and suggestions.
Backgrounds and Technical Details:
The scenario is a report that consists of transactions with customized columns by Django-EAV. This library implements the eav_value table consisting of columns of different data types, e.g. value_text, value_date, value_float, etc.
We forked an internal repository of Django-EAV and upgraded it to Python 3, so we can use any up-to-date Python features, although we are not using Django-EAV2. As far as we know, the new version, EAV2, follows the same database schema design.
So, the application defines a product with attributes in specific data types, and we referred it as metadata in this question, e.g.:
attribute_id
slug
datatype
122
postalcode
text
123
phone
text
...
...
e.g. date, float, etc. ...
One transaction is one entity, and the eav_value table contains multiple records with the matching entity_id corresponding to the different customized attributes. And we want to build a dynamic QuerySet according to the metadata to assemble the customized columns with left outer join similar to the sample SQL query below.
select
t.id, t.create_ts
, eav_postalcode.value_text as postalcode
, eav_phone.value_text as phone
from
(
select * from transactions
where product_id = __PRODUCT_ID__
) as t
left outer join eav_value as eav_postalcode
on t.id = eav_postalcode.entity_id and eav_phone.attribute_id = 122
left outer join eav_value as eav_phone
on t.id = eav_phone.entity_id and eav_phone.attribute_id = 123
;
We followed #NickODell's hint on FilteredRelation, and our tentative solution looks like the below snippet:
transaction_eav = transaction.annotate(
eav_postalcode=FilteredRelation('eav_values', condition=Q(eav_values__attribute_id=22))
)
transaction_eav = transaction_eav.annotate(
value_postalcode=F('eav_postalcode__value_text)}
)
We are new to Django ORM so please point out if the sample code above contains any low-efficient or non-standard flaws.
Many thanks to all for the great suggestions!
Let's say I have following models:
class Invoice(models.Model):
...
class Note(models.Model):
invoice = models.ForeignKey(Invoice, related_name='notes', on_delete=models.CASCADE)
text = models.TextField()
and I want to select Invoices that have some notes. I would write it using annotate/Exists like this:
Invoice.objects.annotate(
has_notes=Exists(Note.objects.filter(invoice_id=OuterRef('pk')))
).filter(has_notes=True)
This works well enough, filters only Invoices with notes. However, this method results in the field being present in the query result, which I don't need and means worse performance (SQL has to execute the subquery 2 times).
I realize I could write this using extra(where=) like this:
Invoice.objects.extra(where=['EXISTS(SELECT 1 FROM note WHERE invoice_id=invoice.id)'])
which would result in the ideal SQL, but in general it is discouraged to use extra / raw SQL.
Is there a better way to do this?
You can remove annotations from the SELECT clause using .values() query set method. The trouble with .values() is that you have to enumerate all names you want to keep instead of names you want to skip, and .values() returns dictionaries instead of model instances.
Django internaly keeps the track of removed annotations in
QuerySet.query.annotation_select_mask. So you can use it to tell Django, which annotations to skip even wihout .values():
class YourQuerySet(QuerySet):
def mask_annotations(self, *names):
if self.query.annotation_select_mask is None:
self.query.set_annotation_mask(set(self.query.annotations.keys()) - set(names))
else:
self.query.set_annotation_mask(self.query.annotation_select_mask - set(names))
return self
Then you can write:
invoices = (Invoice.objects
.annotate(has_notes=Exists(Note.objects.filter(invoice_id=OuterRef('pk'))))
.filter(has_notes=True)
.mask_annotations('has_notes')
)
to skip has_notes from the SELECT clause and still geting filtered invoice instances. The resulting SQL query will be something like:
SELECT invoice.id, invoice.foo FROM invoice
WHERE EXISTS(SELECT note.id, note.bar FROM notes WHERE note.invoice_id = invoice.id) = True
Just note that annotation_select_mask is internal Django API that can change in future versions without a warning.
Ok, I've just noticed in Django 3.0 docs, that they've updated how Exists works and can be used directly in filter:
Invoice.objects.filter(Exists(Note.objects.filter(invoice_id=OuterRef('pk'))))
This will ensure that the subquery will not be added to the SELECT columns, which may result in a better performance.
Changed in Django 3.0:
In previous versions of Django, it was necessary to first annotate and then filter against the annotation. This resulted in the annotated value always being present in the query result, and often resulted in a query that took more time to execute.
Still, if someone knows a better way for Django 1.11, I would appreciate it. We really need to upgrade :(
We can filter for Invoices that have, when we perform a LEFT OUTER JOIN, no NULL as Note, and make the query distinct (to avoid returning the same Invoice twice).
Invoice.objects.filter(notes__isnull=False).distinct()
This is best optimize code if you want to get data from another table which primary key reference stored in another table
Invoice.objects.filter(note__invoice_id=OuterRef('pk'),)
We should be able to clear the annotated field using the below method.
Invoice.objects.annotate(
has_notes=Exists(Note.objects.filter(invoice_id=OuterRef('pk')))
).filter(has_notes=True).query.annotations.clear()
When querying my database, I only want to load specified columns. Creating a query with with_entities requires a reference to the model column attribute, while creating a query with load_only requires a string corresponding to the column name. I would prefer to use load_only because it is easier to create a dynamic query using strings. What is the difference between the two?
load_only documentation
with_entities documentation
There are a few differences. The most important one when discarding unwanted columns (as in the question) is that using load_only will still result in creation of an object (a Model instance), while using with_entities will just get you tuples with values of chosen columns.
>>> query = User.query
>>> query.options(load_only('email', 'id')).all()
[<User 1 using e-mail: n#d.com>, <User 2 using e-mail: n#d.org>]
>>> query.with_entities(User.email, User.id).all()
[('n#d.org', 1), ('n#d.com', 2)]
load_only
load_only() defers loading of particular columns from your models.
It removes columns from query. You can still access all the other columns later, but an additional query (in the background) will be performed just when you try to access them.
"Load only" is useful when you store things like pictures of users in your database but you do not want to waste time transferring the images when not needed. For example, when displaying a list of users this might suffice:
User.query.options(load_only('name', 'fullname'))
with_entities
with_entities() can either add or remove (simply: replace) models or columns; you can even use it to modify the query, to replace selected entities with your own function like func.count():
query = User.query
count_query = query.with_entities(func.count(User.id)))
count = count_query.scalar()
Note that the resulting query is not the same as of query.count(), which would probably be slower - at least in MySQL (as it generates a subquery).
Another example of the extra capabilities of with_entities would be:
query = (
Page.query
.filter(<a lot of page filters>)
.join(Author).filter(<some author filters>)
)
pages = query.all()
# ok, I got the pages. Wait, what? I want the authors too!
# how to do it without generating the query again?
pages_and_authors = query.with_entities(Page, Author).all()
Confused working with query object results. I am not using foreign keys in this example.
lookuplocation = aliased(ValuePair)
lookupoccupation = aliased(ValuePair)
persons = db.session.query(Person.lastname, lookuplocation.displaytext, lookupoccupation.displaytext).\
outerjoin(lookuplocation, Person.location == lookuplocation.valuepairid).\
outerjoin(lookupoccupation, Person.occupation1 == lookupoccupation.valuepairid).all()
Results are correct as far as data is concerned. However, when I try to access an individual row of data I have an issue:
persons[0].lastname works as I expected and returns data.
However, there is a person.displaytext in the result but since I aliased the displaytext entity, I get just one result. I understand why I get the result but I need to know what aliased field names I would use to get the two displaytext columns.
The actual SQL statement generated by the above join is as follows:
SELECT person.lastname AS person_lastname, valuepair_1.displaytext AS valuepair_1_displaytext, valuepair_2.displaytext AS valuepair_2_displaytext
FROM person LEFT OUTER JOIN valuepair AS valuepair_1 ON person.location = valuepair_1.valuepairid LEFT OUTER JOIN valuepair AS valuepair_2 ON person.occupation1 = valuepair_2.valuepairid
But none of these "as" field names are available to me in the results.
I'm new to SqlAlchemy so most likely this is a "newbie" issue.
Thanks.
Sorry - RTFM issue - should have been:
lookuplocation.displaytext.label("myfield1"),
lookupoccupation.displaytext.label("myfield2")
After results are returned reference field with person.myfield
Simple.
Lets say I got 2 models, Document and Person. Document got relationship to Person via "owner" property. Now:
session.query(Document)\
.options(joinedload('owner'))\
.filter(Person.is_deleted!=True)
Will double join table Person. One person table will be selected, and the doubled one will be filtered which is not exactly what I want cuz this way document rows will not be filtered.
What can I do to apply filter on joinloaded table/model ?
You are right, table Person will be used twice in the resulting SQL, but each of them serves different purpose:
one is to filter the the condition: filter(Person.is_deleted != True)
the other is to eager load the relationship: options(joinedload('owner'))
But the reason your query returns wrong results is because your filter condition is not complete. In order to make it produce the right results, you also need to JOIN the two models:
qry = (session.query(Document).
join(Document.owner). # THIS IS IMPORTANT
options(joinedload(Document.owner)).
filter(Person.is_deleted != True)
)
This will return correct rows, even though it will still have 2 references (JOINs) to Person table. The real solution to your query is that using contains_eager instead of joinedload:
qry = (session.query(Document).
join(Document.owner). # THIS IS STILL IMPORTANT
options(contains_eager(Document.owner)).
filter(Person.is_deleted != True)
)