Date ranges prefetch gives multiple entries instead of 1

Date ranges prefetch gives multiple entries instead of 1 - python

The following returns 3 objects but this should be only 1. Since there is only 1 InsiderTrading object that has these filters, but there are 3 owners.
quarter_trading_2018q1 = InsiderTrading.objects.filter(
issuer=company_issuer.pk,
owners__company=company.pk,
transaction_date__range=["2018-01-01", "2018-03-30"]
).prefetch_related('owners')
If I however remove the owner_company filter it return 1 (correct behaviour)
quarter_trading_2018q1 = InsiderTrading.objects.filter(
issuer=company_issuer.pk,
transaction_date__range=["2018-01-01", "2018-03-30"]
).prefetch_related('owners')
But I still want to filter on owners_company, how do I get 1 returned then?

You should add a distinct().
InsiderTrading.objects.filter(
issuer=company_issuer.pk,
owners__company=company.pk,
transaction_date__range=["2018-01-01", "2018-03-30"]
).distinct().prefetch_related('owners')

If distinct works: this means that this query results with
InsiderTrading.objects.filter(
issuer=company_issuer.pk,
owners__company=company.pk,
transaction_date__range=["2018-01-01", "2018-03-30"]
)
with multiple results not one.
Django docs states
select_related() "follows" foreign-key relationships, selecting
additional related-object data when it executes its query.
prefetch_related() does a separate lookup for each relationship, and
does the "joining" in Python.
I'd say use select_related instead.
check https://stackoverflow.com/a/31237071/4117381
Another solution is to use group_by owner_id.

Related

Django - Annotate multiple fields from a Subquery

I'm working on a Django project on which i have a queryset of a 'A' objects ( A.objects.all() ), and i need to annotate multiple fields from a 'B' objects' Subquery. The problem is that the annotate method can only deal with one field type per parameter (DecimalField, CharField, etc.), so, in order to annotate multiple fields, i must use something like:
A.objects.all().annotate(b_id =Subquery(B_queryset.values('id')[:1],
b_name =Subquery(B_queryset.values('name')[:1],
b_other_field =Subquery(B_queryset.values('other_field')[:1],
... )
Which is very inefficient, as it creates a new subquery/subselect on the final SQL for each field i want to annotate. I would like to use the same Subselect with multiple fields on it's values() params, and annotate them all on A's queryset. I'd like to use something like this:
b_subquery = Subquery(B_queryset.values('id', 'name', 'other_field', ...)[:1])
A.objects.all().annotate(b=b_subquery)
But when i try to do that (and access the first element A.objects.all().annotate(b=b_subquery)[0]) it raises an exception:
{FieldError}Expression contains mixed types. You must set output_field.
And if i set Subquery(B_quer...[:1], output_field=ForeignKey(B, models.DO_NOTHING)), i get a DB exception:
{ProgrammingError}subquery must return only one column
In a nutshell, the whole problem is that i have multiple Bs that "belongs" to a A, so i need to use Subquery to, for every A in A.objects.all(), pick a specific B and attach it on that A, using OuterRefs and a few filters (i only want a few fields of B), which seens a trivial problem for me.
Thanks for any help in advance!

What I do in such situations is to use prefetch-related
a_qs = A.objects.all().prefetch_related(
models.Prefetch('b_set',
# NOTE: no need to filter with OuterRef (it wont work anyway)
# Django automatically filter and matches B objects to A
queryset=B_queryset,
to_attr='b_records'
)
)
Now a.b_records will be a list containing a's related b objects. Depending on how you filter your B_queryset this list may be limited to only 1 object.

Django querysets optimization - preventing selection of annotated fields

Let's say I have following models:
class Invoice(models.Model):
...
class Note(models.Model):
invoice = models.ForeignKey(Invoice, related_name='notes', on_delete=models.CASCADE)
text = models.TextField()
and I want to select Invoices that have some notes. I would write it using annotate/Exists like this:
Invoice.objects.annotate(
has_notes=Exists(Note.objects.filter(invoice_id=OuterRef('pk')))
).filter(has_notes=True)
This works well enough, filters only Invoices with notes. However, this method results in the field being present in the query result, which I don't need and means worse performance (SQL has to execute the subquery 2 times).
I realize I could write this using extra(where=) like this:
Invoice.objects.extra(where=['EXISTS(SELECT 1 FROM note WHERE invoice_id=invoice.id)'])
which would result in the ideal SQL, but in general it is discouraged to use extra / raw SQL.
Is there a better way to do this?

You can remove annotations from the SELECT clause using .values() query set method. The trouble with .values() is that you have to enumerate all names you want to keep instead of names you want to skip, and .values() returns dictionaries instead of model instances.
Django internaly keeps the track of removed annotations in
QuerySet.query.annotation_select_mask. So you can use it to tell Django, which annotations to skip even wihout .values():
class YourQuerySet(QuerySet):
def mask_annotations(self, *names):
if self.query.annotation_select_mask is None:
self.query.set_annotation_mask(set(self.query.annotations.keys()) - set(names))
else:
self.query.set_annotation_mask(self.query.annotation_select_mask - set(names))
return self
Then you can write:
invoices = (Invoice.objects
.annotate(has_notes=Exists(Note.objects.filter(invoice_id=OuterRef('pk'))))
.filter(has_notes=True)
.mask_annotations('has_notes')
)
to skip has_notes from the SELECT clause and still geting filtered invoice instances. The resulting SQL query will be something like:
SELECT invoice.id, invoice.foo FROM invoice
WHERE EXISTS(SELECT note.id, note.bar FROM notes WHERE note.invoice_id = invoice.id) = True
Just note that annotation_select_mask is internal Django API that can change in future versions without a warning.

Ok, I've just noticed in Django 3.0 docs, that they've updated how Exists works and can be used directly in filter:
Invoice.objects.filter(Exists(Note.objects.filter(invoice_id=OuterRef('pk'))))
This will ensure that the subquery will not be added to the SELECT columns, which may result in a better performance.
Changed in Django 3.0:
In previous versions of Django, it was necessary to first annotate and then filter against the annotation. This resulted in the annotated value always being present in the query result, and often resulted in a query that took more time to execute.
Still, if someone knows a better way for Django 1.11, I would appreciate it. We really need to upgrade :(

We can filter for Invoices that have, when we perform a LEFT OUTER JOIN, no NULL as Note, and make the query distinct (to avoid returning the same Invoice twice).
Invoice.objects.filter(notes__isnull=False).distinct()

This is best optimize code if you want to get data from another table which primary key reference stored in another table
Invoice.objects.filter(note__invoice_id=OuterRef('pk'),)

We should be able to clear the annotated field using the below method.
Invoice.objects.annotate(
has_notes=Exists(Note.objects.filter(invoice_id=OuterRef('pk')))
).filter(has_notes=True).query.annotations.clear()

Django annotate count doesn't work always return 1

My models:
class PicKeyword(models.Model):
"""chat context collection
"""
keyword = models.TextField(blank=True, null=True)
Myfilter:
from django.db.models import Count
PicKeyword.objects.annotate(Count('keyword'))[0].keyword__count
The result always get 1
just like:
print(PicKeyword.objects.filter(keyword='hi').count()
show:
3
PicKeyword.objects.filter(keyword='hi').annotate(Count('keyword'))[0].keyword__count
show:
1
Is it because I use sqllite or my keyword type is Textfield?

I had exactly the same problem and none of the answers worked for me.
I was able to get it working using the following syntax passing only 1 field as value (worked for me because I was trying to spot duplicates):
dups = PicKeyword.objects.values('keyword').annotate(Count('keyword')).filter(keyword__count__gt=1)
The following returns always an empty queryset, which Should NOT:
dups = PicKeyword.objects.values('keyword','id').annotate(Count('keyword')).filter(keyword__count__gt=1)
Hope it helps

The way you are trying to annotate the count of keywords here,
PicKeyword.objects.annotate(Count('keyword'))[0].keyword__count
only works when you want to summarize the relationship between multiple objects. Since, you do not have any related objects to the keyword field the output is always 1 (it's own instance)
As the API docs for queryset.annotate states,
Annotates each object in the QuerySet with the provided list of query expressions. An expression may be a simple value, a reference to a field on the model (or any related models), or an aggregate expression (averages, sums, etc.) that has been computed over the objects that are related to the objects in the QuerySet.
Blog Model reference for the Queryset.annotate example
Finally, If there are no relationships among objects and your aim is to just get the count of objects in your PicKeyword Model, the answers from #samba and #Willem are correct.

The following will show the exact count:
PicKeyword.objects.annotate(Count('keyword')).count()

How to do ...ON DUPLICATE KEY UPDATE... in django

Is there a way to do the following in django's ORM?
INSERT INTO mytable
VALUES (1,2,3)
ON DUPLICATE KEY
UPDATE field=4
I'm familiar with get_or_create, which takes default values, but that doesn't update the record if there are differences in the defaults. Usually I use the following approach, but it takes two queries instead of one:
item = Item(id=1)
item.update(**fields)
item.save()
Is there another way to do this?

I'm familiar with get_or_create, which takes default values, but that doesn't update the record if there are differences in the defaults.
update_or_create should provide the behavior you're looking for.
Item.objects.update_or_create(
id=1,
defaults=fields,
)
It returns the same (object, created) tuple as get_or_create.
Note that this will still perform two queries, but only in the event the record does not already exist (as is the case with get_or_create). If that is for some reason unacceptable, you will likely be stuck writing raw SQL to handle this, which would be unfortunate in terms of readability and maintainability.

I think get_or_create() is still the answer, but only specify the pk field(s).
item, _ = Item.objects.get_or_create(id=1)
item.update(**fields)
item.save()

Django 4.1 has added the support for INSERT...ON DUPLICATE KEY UPDATE query. It will update the fields in case the unique validation fails.
Example of above in a single query:
# Let's say we have an Item model with unique on key
items = [
Item(key='foobar', value=10),
Item(key='foobaz', value=20),
]
# this function will create 2 rows in a single SQL query
Item.objects.bulk_create(items)
# this time it will update the value for foobar
# and create new row for barbaz
# all in a single SQL query
items = [
Item(key='foobar', value=30),
Item(key='barbaz', value=50),
]
Item.objects.bulk_create(
items,
update_conflicts=True,
update_fields=['rate']
)

Django - Following ForeignKey relationships "backward" for entire QuerySet

is it possible to follow ForeignKey relationships backward for entire querySet?
i mean something like this:
x = table1.objects.select_related().filter(name='foo')
x.table2.all()
when table1 hase ForeignKey to table2.
in
https://docs.djangoproject.com/en/1.2/topics/db/queries/#following-relationships-backward
i can see that it works only with get() and not filter()
Thanks

You basically want to get QuerySet of different type from data you start with.
class Kid(models.Model):
mom = models.ForeignKey('Mom')
name = models.CharField…
class Mom(models.Model):
name = models.CharField…
Let's say you want to get all moms having any son named Johnny.
Mom.objects.filter(kid__name='Johnny')
Let's say you want to get all kids of any Lucy.
Kid.objects.filter(mom__name='Lucy')

You should be able to use something like:
for y in x:
y.table2.all()
But you could also use get() for a list of the unique values (which will be id, unless you have a different specified), after finding them using a query.
So,
x = table1.objects.select_related().filter(name='foo')
for y in x:
z=table1.objects.select_related().get(y.id)
z.table2.all()
Should also work.

You can also use values() to fetch specific values of a foreign key reference. With values the select query on the DB will be reduced to fetch only those values and the appropriate joins will be done.
To re-use the example from Krzysztof Szularz:
jonny_moms = Kid.objects.filter(name='Jonny').values('mom__id', 'mom__name').distinct()
This will return a dictionary of Mom attributes by using the Kid QueryManager.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Date ranges prefetch gives multiple entries instead of 1 - python

You should add a distinct(). InsiderTrading.objects.filter( issuer=company_issuer.pk, owners__company=company.pk, transaction_date__range=["2018-01-01", "2018-03-30"] ).distinct().prefetch_related('owners')

Related

Django - Annotate multiple fields from a Subquery

Django querysets optimization - preventing selection of annotated fields

Django annotate count doesn't work always return 1

How to do ...ON DUPLICATE KEY UPDATE... in django

Django - Following ForeignKey relationships "backward" for entire QuerySet

Categories

Resources