Filter on annotated column in django

Filter on annotated column in django - python

I have two models:
class Project(models.Model):
...
class StateChange(models.Model):
created_at = models.DateTimeField(default=now, db_index=True)
project = models.ForeignKey("project.Project")
state = models.IntegerField(choices=PROJECT_STATE_TYPES, db_index=True)
...
The models are linked and I need a list of projects which is filtered by the related StateChange if there's one.
I build my queryset like this:
state_checked = Case(
When(statechange__state=PROJECT_STATE_CHECKED, then=F('statechange__created_at'))
)
state_construction_ordered = Case(
When(statechange__state=PROJECT_STATE_CONSTRUCTION_ORDERED, then=F('statechange__created_at'))
)
qs = Projekt.objects.visible_for_me(self.request.user) \
.annotate(date_reached_state_checked=state_checked) \
.annotate(date_reached_state_construction_ordered=state_construction_ordered)\
.exclude(Q(date_reached_state_checked__isnull=True) & Q(statechange__state=PROJECT_STATE_CHECKED) |
Q(date_reached_state_construction_ordered__isnull=True) & Q(statechange__state=PROJECT_STATE_CONSTRUCTION_ORDERED))
The Project may have no matching StateChange, or one or both.
I need the list to show one Project-line in all cases. My queryset only works for zero or one matching StateChange. It excludes the Projects where both StateChanges are present and I see why it does it when I look at the generated query.
If I do not exclude anything, it shows 1 line for each case.
Can anyone give me a hint about how to make django create the JOINs I need?

We did it by using .extra():
return Projekt.objects.all().extra(
select={
"date_reached_state_checked": "SELECT created_at FROM tracking_statechange WHERE tracking_statechange.projekt_id = projekt_projekt.id AND tracking_statechange.state = 20",
"date_reached_state_construction_ordered": "SELECT created_at FROM tracking_statechange WHERE tracking_statechange.projekt_id = projekt_projekt.id AND tracking_statewechsel.state = 50"
})

Related

Django Left join how

I fairly new to Django and stuck with creating a left join in Django. I tried so many, but none of them seems to be working:
The query I want to translate to Django is:
select ssc.id
,mgz.Title
,tli.id
,tli.Time
from Subscription ssc
join Person prs
on ssc.PersonID = prs.id
and prs.id = 3
join Magazine mgz
on mgz.id = ssc.MagazineID
and mgz.from <= date.today()
and mgz.until > date.today()
left join TimeLogedIn tli
on tli.SubscriptionID = ssc.id
and tli.DateOnline = date.today()
The model I'm using looks like this:
class Magazine(models.Model):
Title = models.CharField(max_length=100L)
from = models.Datefield()
until = models.Datefield()
Persons = models.ManyToManyField(Person, through='Subscription')
class Person(models.Model):
user = models.OneToOneField(User, on_delete=models.CASCADE)
Magazines = models.ManyToManyField(Magazine, through='Subscription')
class Subscription(models.Model):
MagazineID = models.ForeignKey(Magazine,on_delete=models.CASCADE)
PersonID = models.ForeignKey(Person,on_delete=models.CASCADE)
class TimeLogedIn(models.Model):
SubscriptionID = models.ForeignKey('Subscription', on_delete=models.CASCADE)
DateOnline = models.DateField()
Time = models.DecimalField(max_digits=5, decimal_places=2)
Like I said, tried so many but no succes and now I don't know how to do this in Django ORM , is it even possible? I created already a raw-query and this is working ok, but how to create this in Django ORM?

You can use field lookups lte and gt to filter your objects and then values() method.
You can also querying in the opposite direction and use Q objects for null values:
from django.db.models import Q
Subscription.objects.filter(
PersonID_id=3,
MagazineID__from__lte=date.today(),
MagazineID__until__gt=date.today()
).filter(
Q(TimeLogedIn__DateOnline=date.today()) | Q(TimeLogedIn__DateOnline__isnull=True)
).values("id", "MagazineID__Title", "TimeLogedIn__id", "TimeLogedIn__Time")
OR from TimeLogedIn:
TimeLogedIn.objects.filter(DateOnline=date.today()).filter(
SubscriptionID__MagazineID__from__lte=date.today(),
SubscriptionID__MagazineID__util__gt=date.today()
).values(
"SubscriptionID_id", "SubscriptionID__MagazineID__Title", "id", "Time"
)
Querysets also have the query attribute that contains the sql query to be executed, you can see it like following:
print(TimeLogedIn.objects.filter(...).values(...).query)
Note: Behind the scenes, Django appends "_id" to the field name to create its database column name. Therefore it should be
subscription, instead of SubscriptionID.
You can also use prefetch_related() and select_related() to prevent multiple database hits:
SubscriptionID.objects.filter(...).prefetch_related("TimeLogedIn_set")
SubscriptionID.objects.filter(...).select_related("PersonID")

django - prefetch only the newest record?

I am trying to prefetch only the latest record against the parent record.
my models are as such
class LinkTargets(models.Model):
device_circuit_subnet = models.ForeignKey(DeviceCircuitSubnets, verbose_name="Device", on_delete=models.PROTECT)
interface_index = models.CharField(max_length=100, verbose_name='Interface index (SNMP)', blank=True, null=True)
get_bgp = models.BooleanField(default=False, verbose_name="get BGP Data?")
dashboard = models.BooleanField(default=False, verbose_name="Display on monitoring dashboard?")
class LinkData(models.Model):
link_target = models.ForeignKey(LinkTargets, verbose_name="Link Target", on_delete=models.PROTECT)
interface_description = models.CharField(max_length=200, verbose_name='Interface Description', blank=True, null=True)
...
The below query fails with the error
AttributeError: 'LinkData' object has no attribute '_iterable_class'
Query:
link_data = LinkTargets.objects.filter(dashboard=True) \
.prefetch_related(
Prefetch(
'linkdata_set',
queryset=LinkData.objects.all().order_by('-id')[0]
)
)
I thought about getting LinkData instead and doing a select related but ive no idea how to get only 1 record for each link_target_id
link_data = LinkData.objects.filter(link_target__dashboard=True) \
.select_related('link_target')..?
EDIT:
using rtindru's solution, the pre fetched seems to be empty. there is 6 records in there currently, atest 1 record for each of the 3 LinkTargets
>>> link_data[0]
<LinkTargets: LinkTargets object>
>>> link_data[0].linkdata_set.all()
<QuerySet []>
>>>

The reason is that Prefetch expects a Django Queryset as the queryset parameter and you are giving an instance of an object.
Change your query as follows:
link_data = LinkTargets.objects.filter(dashboard=True) \
.prefetch_related(
Prefetch(
'linkdata_set',
queryset=LinkData.objects.filter(pk=LinkData.objects.latest('id').pk)
)
)
This does have the unfortunate effect of undoing the purpose of Prefetch to a large degree.
Update
This prefetches exactly one record globally; not the latest LinkData record per LinkTarget.
To prefetch the max LinkData for each LinkTarget you should start at LinkData: you can achieve this as follows:
LinkData.objects.filter(link_target__dashboard=True).values('link_target').annotate(max_id=Max('id'))
This will return a dictionary of {link_target: 12, max_id: 3223}
You can then use this to return the right set of objects; perhaps filter LinkData based on the values of max_id.
That will look something like this:
latest_link_data_pks = LinkData.objects.filter(link_target__dashboard=True).values('link_target').annotate(max_id=Max('id')).values_list('max_id', flat=True)
link_data = LinkTargets.objects.filter(dashboard=True) \
.prefetch_related(
Prefetch(
'linkdata_set',
queryset=LinkData.objects.filter(pk__in=latest_link_data_pks)
)
)

The following works on PostgreSQL. I understand it won't help OP, but it might be useful to somebody else.
from django.db.models import Count, Prefetch
from .models import LinkTargets, LinkData
link_data_qs = LinkData.objects.order_by(
'link_target__id',
'-id',
).distinct(
'link_target__id',
)
qs = LinkTargets.objects.prefetch_related(
Prefetch(
'linkdata_set',
queryset=link_data_qs,
)
).all()

LinkData.objects.all().order_by('-id')[0] is not a queryset, it is an model object, hence your error.
You could try LinkData.objects.all().order_by('-id')[0:1] which is indeed a QuerySet, but it's not going to work. Given how prefetch_related works, the queryset argument must return a queryset that contains all the LinkData records you need (this is then further filtered, and the items in it joined up with the LinkTarget objects). This queryset only contains one item, so that's no good. (And Django will complain "Cannot filter a query once a slice has been taken" and raise an exception, as it should).
Let's back up. Essentially you are asking an aggregation/annotation question - for each LinkTarget, you want to know the most recent LinkData object, or the 'max' of an 'id' column. The easiest way is to just annotate with the id, and then do a separate query to get all the objects.
So, it would look like this (I've checked with a similar model in my project, so it should work, but the code below may have some typos):
linktargets = (LinkTargets.objects
.filter(dashboard=True)
.annotate(most_recent_linkdata_id=Max('linkdata_set__id'))
# Now, if we need them, lets collect and get the actual objects
linkdata_ids = [t.most_recent_linkdata_id for t in linktargets]
linkdata_objects = LinkData.objects.filter(id__in=linkdata_ids)
# And we can decorate the LinkTarget objects as well if we want:
linkdata_d = {l.id: l for l in linkdata_objects}
for t in linktargets:
if t.most_recent_linkdata_id is not None:
t.most_recent_linkdata = linkdata_d[t.most_recent_linkdata_id]
I have deliberately not made this into a prefetch that masks linkdata_set, because the result is that you have objects that lie to you - the linkdata_set attribute is now missing results. Do you really want to be bitten by that somewhere down the line? Best to make a new attribute that has just the thing you want.

Tricky, but it seems to work:
class ForeignKeyAsOneToOneField(models.OneToOneField):
def __init__(self, to, on_delete, to_field=None, **kwargs):
super().__init__(to, on_delete, to_field=to_field, **kwargs)
self._unique = False
class LinkData(models.Model):
# link_target = models.ForeignKey(LinkTargets, verbose_name="Link Target", on_delete=models.PROTECT)
link_target = ForeignKeyAsOneToOneField(LinkTargets, verbose_name="Link Target", on_delete=models.PROTECT, related_name='linkdata_helper')
interface_description = models.CharField(max_length=200, verbose_name='Interface Description', blank=True, null=True)
link_data = LinkTargets.objects.filter(dashboard=True) \
.prefetch_related(
Prefetch(
'linkdata_helper',
queryset=LinkData.objects.all().order_by('-id'),
'linkdata'
)
)
# Now you can access linkdata:
link_data[0].linkdata
Ofcourse with this approach you can't use linkdata_helper to get related objects.

This is not a direct answer to you question, but solves the same problem. It is possible annotate newest object with a subquery, which I think is more clear. You also don't have to do stuff like Max("id") to limit the prefetch query.
It makes use of django.db.models.functions.JSONObject (added in Django 3.2) to combine multiple fields:
MainModel.objects.annotate(
last_object=RelatedModel.objects.filter(mainmodel=OuterRef("pk"))
.order_by("-date_created")
.values(
data=JSONObject(
id="id", body="body", date_created="date_created"
)
)[:1]
)

Django full text search using indexes with PostgreSQL

After solving the problem I asked about in this question, I am trying to optimize performance of the FTS using indexes.
I issued on my db the command:
CREATE INDEX my_table_idx ON my_table USING gin(to_tsvector('italian', very_important_field), to_tsvector('italian', also_important_field), to_tsvector('italian', not_so_important_field), to_tsvector('italian', not_important_field), to_tsvector('italian', tags));
Then I edited my model's Meta class as follows:
class MyEntry(models.Model):
very_important_field = models.TextField(blank=True, null=True)
also_important_field = models.TextField(blank=True, null=True)
not_so_important_field = models.TextField(blank=True, null=True)
not_important_field = models.TextField(blank=True, null=True)
tags = models.TextField(blank=True, null=True)
class Meta:
managed = False
db_table = 'my_table'
indexes = [
GinIndex(
fields=['very_important_field', 'also_important_field', 'not_so_important_field', 'not_important_field', 'tags'],
name='my_table_idx'
)
]
But nothing seems to have changed. The lookup takes exactly the same amount of time as before.
This is the lookup script:
from django.contrib.postgres.search import SearchQuery, SearchRank, SearchVector
# other unrelated stuff here
vector = SearchVector("very_important_field", weight="A") + \
SearchVector("tags", weight="A") + \
SearchVector("also_important_field", weight="B") + \
SearchVector("not_so_important_field", weight="C") + \
SearchVector("not_important_field", weight="D")
query = SearchQuery(search_string, config="italian")
rank = SearchRank(vector, query, weights=[0.4, 0.6, 0.8, 1.0]). # D, C, B, A
full_text_search_qs = MyEntry.objects.annotate(rank=rank).filter(rank__gte=0.4).order_by("-rank")
What am I doing wrong?
Edit:
The above lookup is wrapped in a function I use a decorator on to time. The function actually returns a list, like this:
#timeit
def search(search_string):
# the above code here
qs = list(full_text_search_qs)
return qs
Might this be the problem, maybe?

You need to add a SearchVectorField to your MyEntry, update it from your actual text fields and then perform the search on this field. However, the update can only be performed after the record has been saved to the database.
Essentially:
from django.contrib.postgres.indexes import GinIndex
from django.contrib.postgres.search import SearchVector, SearchVectorField
class MyEntry(models.Model):
# The fields that contain the raw data.
very_important_field = models.TextField(blank=True, null=True)
also_important_field = models.TextField(blank=True, null=True)
not_so_important_field = models.TextField(blank=True, null=True)
not_important_field = models.TextField(blank=True, null=True)
tags = models.TextField(blank=True, null=True)
# The field we actually going to search.
# Must be null=True because we cannot set it immediately during create()
search_vector = SearchVectorField(editable=False, null=True)
class Meta:
# The search index pointing to our actual search field.
indexes = [GinIndex(fields=["search_vector"])]
Then you can create the plain instance as usual, for example:
# Does not set MyEntry.search_vector yet.
my_entry = MyEntry.objects.create(
very_important_field="something very important", # Fake Italien text ;-)
also_important_field="something different but equally important"
not_so_important_field="this one matters less"
not_important_field="we don't care are about that one at all"
tags="things, stuff, whatever"
Now that the entry exists in the database, you can update the search_vector field using all kinds of options. For example weight to specify the importance and config to use one of the default language configurations. You can also completely omit fields you don't want to search:
# Update search vector on existing database record.
my_entry.search_vector = (
SearchVector("very_important_field", weight="A", config="italien")
+ SearchVector("also_important_field", weight="A", config="italien")
+ SearchVector("not_so_important_field", weight="C", config="italien")
+ SearchVector("tags", weight="B", config="italien")
)
my_entry.save()
Manually updating the search_vector field every time some of the text fields change can be error prone, so you might consider adding an SQL trigger to do that for you using a Django migration. For an example on how to do that see for instance a blog article on Full-text Search with Django and PostgreSQL.
To actually search in MyEntry using the index you need to filter and rank by your search_vector field. The config for the SearchQuery should match the one of the SearchVector above (to use the same stopword, stemming etc).
For example:
from django.contrib.postgres.search import SearchQuery, SearchRank
from django.core.exceptions import ValidationError
from django.db.models import F, QuerySet
search_query = SearchQuery("important", search_type="websearch", config="italien")
search_rank = SearchRank(F("search_vector"), search_query)
my_entries_found = (
MyEntry.objects.annotate(rank=search_rank)
.filter(search_vector=search_query) # Perform full text search on index.
.order_by("-rank") # Yield most relevant entries first.
)

I'm not sure but according to postgresql documentation (https://www.postgresql.org/docs/9.5/static/textsearch-tables.html#TEXTSEARCH-TABLES-INDEX):
Because the two-argument version of to_tsvector was used in the index
above, only a query reference that uses the 2-argument version of
to_tsvector with the same configuration name will use that index. That
is, WHERE to_tsvector('english', body) ## 'a & b' can use the index,
but WHERE to_tsvector(body) ## 'a & b' cannot. This ensures that an
index will be used only with the same configuration used to create the
index entries.
I don't know what configuration django uses but you can try to remove first argument

Django filtering based on count of related model

I have the following working code:
houses_of_agency = House.objects.filter(agency_id=90)
area_list = AreaHouse.objects.filter(house__in=houses_of_agency).values('area')
area_ids = Area.objects.filter(area_id__in=area_list).values_list('area_id', flat=True)
That returns a queryset with a list of area_ids. I want to filter further so that I only get area_ids where there are more than 100 houses belonging to the agency.
I tried the following adjustment:
houses_of_agency = House.objects.filter(agency_id=90)
area_list = AreaHouse.objects.filter(house__in=houses_of_agency).annotate(num_houses=Count('house_id')).filter(num_houses__gte=100).values('area')
area_ids = Area.objects.filter(area_id__in=area_list).values_list('area_id', flat=True)
But it returns an empty queryset.
My models (simplified) look like this:
class House(TimeStampedModel):
house_pk = models.IntegerField()
agency = models.ForeignKey(Agency, on_delete=models.CASCADE)
class AreaHouse(TimeStampedModel):
area = models.ForeignKey(Area, on_delete=models.CASCADE)
house = models.ForeignKey(House, on_delete=models.CASCADE)
class Area(TimeStampedModel):
area_id = models.IntegerField(primary_key=True)
parent = models.ForeignKey('self', null=True)
name = models.CharField(null=True, max_length=30)
Edit: I'm using MySQL for the database backend.

You are querying for agency_id with just one underscore. I corrected your queries below. Also, in django it's more common to use pk instead of id however the behaviour is the same. Further, there's no need for three separate queries as you can combine everything into one.
Also note that your fields area_id and house_pk are unnecessary, django automatically creates primary key fields which area accessible via modelname__pk.
# note how i inlined your first query in the .filter() call
area_list = AreaHouse.objects \
.filter(house__agency__pk=90) \
.annotate(num_houses=Count('house')) \ # <- 'house'
.filter(num_houses__gte=100) \
.values('area')
# note the double underscore
area_ids = Area.objects.filter(area__in=area_list)\
.values_list('area__pk', flat=True)
you could simplify this even further if you don't need the intermediate results. here are both queries combined:
area_ids = AreaHouse.objects \
.filter(house__agency__pk=90) \
.annotate(num_houses=Count('house')) \
.filter(num_houses__gte=100) \
.values_list('area__pk', flat=True)
Finally, you seem to be manually defining a many-to-many relation in your model (through AreaHouse). There are better ways of doing this, please read the django docs.

Database error: Lookup type 'in' can't be negated

I am using Django framework, appengine database.
My code for model is:
class Group(models.Model):
name = models.CharField(max_length=200)
ispublic = models.BooleanField()
logo = models.CharField(max_length=200)
description = models.CharField(max_length=200)
groupwebsite = models.CharField(max_length=200)
owner = models.ForeignKey('profile')
class Group_members(models.Model):
profile = models.CharField(max_length=200)
group = models.ForeignKey('group')
I am querying on Group_members to remove group. My query is as follows:
groups = Group_members.objects.filter(Q(profile=profile.id),~Q(group__in=group_id)
INFO:
group_id = ['128','52']
group is a foreign key to group model
My problem is when I run this query, it throws Database error: Lookup type 'in' can't be negated.
I have also performed query using __in it worked fine but does not work for foreign key.
Thanks in advance

I think you trying to filter profile id and remove groups in group_id in single filter
groups = Group_members.objects.filter(Q(profile=profile.id),~Q(group__in=group_id)
instead try this:
1)first filter the profiles form group_member :
groups = Group_members.objects.filter(profile=profile.id)
2)remove the groups form Queryset by:
groupId = [x.group.id for x in groups if x.group.id not in group_id]
Hope this will give you perfect result

2 suggestions.
Use ~Q(group__ pk__in=group_id)
Instead of using filter and not in, use exclude and in

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Filter on annotated column in django - python

Related

Django Left join how

django - prefetch only the newest record?

Django full text search using indexes with PostgreSQL

Django filtering based on count of related model

Database error: Lookup type 'in' can't be negated

Categories

Resources