Consider the following django model:
class Image(models.Model):
image_filename = models.CharField(max_length=50)
class Rating(models.Model):
DIMENSIONS = [
('happy', 'happiness'),
('competence', 'competence'),
('warm_sincere', 'warm/sincere'),
]
rating_value = models.IntegerField(),
rating_dimension = models.CharField(max_length=50, choices=DIMENSIONS),
image = models.ForeignKey(Image, on_delete=models.CASCADE)
Now, I'd like to group all Ratings by the number of ratings per category like this:
Rating.objects.values("rating_dimension").annotate(num_ratings=Count("rating_value"))
which returns a QuerySets like this:
[{'rating_dimension': 'happy', 'num_ratings': 2},
{'rating_dimension': 'competence', 'num_ratings': 5}]
Is there a way to include all not-rated dimensions? To achieve an output like:
[{'rating_dimension': 'happy', 'num_ratings': 2},
{'rating_dimension': 'competence', 'num_ratings': 5},
{'rating_dimension': 'warm_sincere', 'num_ratings': 0}] # ← zero occurrences should be included.
First we will create a dictionary with counts for all dimensions initialised to 0.
results = {dimension[0]: 0 for dimension in Rating.DIMENSIONS}
Next we will query the database:
queryset = Rating.objects.values("rating_dimension").annotate(num_ratings=Count("rating_value"))
Next we will update our results dictionary:
for entry in queryset:
results.update({entry['rating_dimension']: entry['num_ratings']})
In the template we can iterate over this dictionary by {% for key, value in results.items %}. Or the dictionary can be converted to any suitable structure as per need in the views.
Related
Lets say I have a following dict:
schools_dict = {
'1': {'points': 10},
'2': {'points': 14},
'3': {'points': 5},
}
And how can I put these values into my queryset using annotate?
I would like to do smth like this, but its not working
schools = SchoolsExam.objects.all()
queryset = schools.annotate(
total_point = schools_dict[F('school__school_id')]['points']
)
Models:
class SchoolsExam(Model):
school = ForeignKey('School', on_delete=models.CASCADE),
class School(Model):
school_id = CharField(),
This code gives me an error KeyError: F(school__school_id)
You can not work with F objects in a lookup, since a dictionary does not "understand" F-objects.
You can translate this to a conditional expression [Django-doc]:
from django.db.models import Case, Value, When
schools = SchoolsExam.objects.annotate(
total_point=Case(
*[
When(school__school_id=school_id, then=Value(v['points']))
for school_id, v in school_dict.items()
]
)
)
This will thus "unwind" the dictionary into CASE WHEN school_id=1 THEN 10 WHEN school_id=2 THEN 14 WHEN school_id=3 THEN 5.
However using data in a dictionary often does not make much sense: usually you store this in a table and perform a JOIN.
I have models similar to the below:
class Tag(models.Model):
text = models.CharField(max_length=30)
class Post(models.Model):
title = models.CharField(max_length=30)
tags = models.ManyToManyField(Tag)
A Post can have many Tags and Tags can be associated with many Posts.
What I need is to get a list of all posts along with all the tags associated with each post. I then create a Pandas DataFrame from that data. Here is how I am currently doing it:
qs = Post.objects.all().prefetch_related('tags')
tag_df = pd.DataFrame(columns=["post_id", "tags"])
for q in qs:
tag_df = tag_df.append(
{
"post_id": q.pk,
"tags": list(q.tags.all().values_list("text", flat=True)),
},
ignore_index=True,
)
post_df = pd.DataFrame(qs.values("id", "title"))
final_df = post_df.merge(tag_df, left_on="id", right_on="post_id")
The result is correct in terms of the data I require. The problem is how incredibly inefficient it is and the number of queries that run even though I'm using prefetch_related. It appears that a query is hitting the database for each iteration of the loop.
Is there a better, more efficient way to do this (possibly without loops)? All I need in the end is a dataframe that contains all the posts along with a column which has a list of the tags for each post.
By using .values_list(..) you will make an extra query each iteration. So that is not very effective. You can simply use the, already prefetched Tag objects, and obtain the .text attributes:
qs = Post.objects.prefetch_related('tags')
tag_df = pd.DataFrame(columns=['post_id', 'tags'])
for q in qs:
tag_df = tag_df.append(
{
'post_id': q.pk,
'tags': [t.text for t in q.tags.all()],
},
ignore_index=True,
)
post_df = pd.DataFrame(qs.values('id', 'title'))
final_df = post_df.merge(tag_df, left_on='id', right_on='post_id')
It might however be more efficient to first make a list of dictionaries, and then load these in a dataframe once:
qs = Post.objects.prefetch_related('tags')
data = [
{'id': q.pk, 'title': q.title, 'tags': [t.text for t in q.tags.all()]}
for q in qs
]
final_df= pd.DataFrame(data, columns=['id', 'title', 'tags'])
Note that using .values(..) or .values_list(..) is not a good idea. Only in certain cases, like making a GROUP BY on a certain value, that is a good idea. Usually it is better to make use of the model objects, since these add an extra layer of logic.
I have the following models:
class Event(models.Model):
date = models.DateTimeField()
event_type = models.ForeignKey('EventType')
class EventType(models.Model):
name = models.CharField(unique=True)
I am trying to get a list of all dates, and what event types are available on that date.
Each item in the list would be a dictionary with two fields: date and event_types which would be a list of distinct event types available on that date.
Currently I have come up with a query to get me a list of all distinct dates, but this is only half of what I want to do:
query = Event.objects.all().select_related('event_type')
results = query.distinct('date').order_by('date').values_list('date', flat=True)
Now I can change this slightly to get me a list of all distinct date + event_type combinations:
query = Event.objects.all().select_related('event_type')
results = query.order_by('date').distinct('date', 'event_type').values_list('date', 'event_type__name')
But this will have an entry for each event type within a given date. I need to aggregate a list within each date.
Is there a way I can construct a queryset to do this? If not, how would I do this some other way to get to the same result?
You can perform such aggregate with the groupby function of itertools. It is a requirement that the elements appearch in "chunks" with respect to the "grouper criteria". But this is the case here, since you use order_by.
We can thus write it like:
from itertools import groupby
from operator import itemgetter
query = (Event.objects.all.select_related('event_type')
.order_by('date', 'event_type')
.distinct('date', 'event_type')
.values_list('date', 'event_type__name'))
result = [
{ 'date': k, 'datetypes': [v[1] for v in vs]}
for k, vs in groupby(query, itemgetter(0))
]
You also better use 'event_type' in the order by criterion.
This will result in something like:
[{'date': datetime.date(2018, 5, 19), 'datetypes': ['Famous person died',
'Royal wedding']},
{'date': datetime.date(2018, 5, 24), 'datetypes': ['Famous person died']},
{'date': datetime.date(2011, 5, 25), 'datetypes': ['Important law enforced',
'Referendum']}]
(based on quick Wikipedia scan of the last days in May).
The groupby works in linear time with the number of rows returned.
I've got following models:
class Store(models.Model):
name = models.CharField()
class Goods(models.Model):
store = models.ForeigKey(Store, on_delete=models.CASCADE)
total_cost = models.DecimalField()
different values ...
So, I filtered all the goods according to the parameters, and now my goal is to get one good from each store, which has the lowest price among other goods from this store
stores = Store.objects.all() - all stores
goods = Good.objects.filter(..) - filter goods
goods.annotate(min_price=Subquery(Min(stores.values('goods__total_cost'))))
I tried something like this, but I've got an error:
AttributeError: 'Min' object has no attribute 'query'
I think in you context, you need a Group By feature than a Django annotation,
from this SO answer,
>>> q = Book.objects.annotate(num_authors=Count('authors'))
>>> q[0].num_authors
2
>>> q[1].num_authors
1
q is the queryset of books, but each book has been annotated with the number of authors.
That is, if you annotate your goods queryset, they won't give you back some sorted/filtered set of objects. It will annotate with new field min_price only.So I would suggest you to do a Group By operation as follow
from django.db.models import Min
result = Goods.objects.values('store').annotate(min_val=Min('total_cost'))
Example
In [2]: from django.db.models import Min
In [3]: Goods.objects.values('store').annotate(min_val=Min('total_cost'))
Out[3]: <QuerySet [{'store': 1, 'min_val': 1}, {'store': 2, 'min_val': 2}]>
In [6]: Goods.objects.annotate(min_val=Min('total_cost'))
Out[6]: <QuerySet [<Goods: Goods object>, <Goods: Goods object>, <Goods: Goods object>, <Goods: Goods object>, <Goods: Goods object>]>
In [7]: Goods.objects.annotate(min_val=Min('total_cost'))[0].__dict__
Out[7]:
{'_state': <django.db.models.base.ModelState at 0x7f5b60168ef0>,
'id': 1,
'min_val': 1,
'store_id': 1,
'total_cost': 1}
In [8]: Goods.objects.annotate(min_val=Min('total_cost'))[1].__dict__
Out[8]:
{'_state': <django.db.models.base.ModelState at 0x7f5b6016af98>,
'id': 2,
'min_val': 123,
'store_id': 1,
'total_cost': 123}
UPDATE-1
I think, this is not a good idea, may some optimization issues occur, but you can try if you want
from django.db.models import Min
store_list = Store.objects.values_list('id', flat=True) # list of id's od Store instance
result_queryset = []
for store_id in store_list:
min_value = Goods.objects.filter(store_id=store_id).aggregate(min_value=Min('total_cost'))
result_queryset = result_queryset|Goods.objects.filter(store_id=store_id, total_cost=min_value)
UPDATE-2
I think my Update-1 section has very large amount of performance issues, So I found one possible answer to your question, which is ,
goods_queryset = Goods.objects.filter(**you_possible_filters)
result = goods_queryset.filter(store_id__in=[good['store'] for good in Goods.objects.values('store').annotate(min_val=Min('total_cost'))])
Tldr of Problem
Frontend is a form that requires a complex lookup with ranges and stuff across several models, given in a dict. Best way to do it?
Explanation
From the view, I receive a dict of the following form (After being processed by something else):
{'h_index': {"min": 10,"max":20},
'rank' : "supreme_overlord",
'total_citations': {"min": 10,"max":400},
'year_began': {"min": 2000},
'year_end': {"max": 3000},
}
The keys are column names from different models (Right now, 2 separate models, Researcher and ResearchMetrics), and the values are the range / exact value that I want to query.
Example (Above)
Belonging to model Researcher :
rank
year_began
year_end
Belonging to model ResearchMetrics
total_citations
h_index
Researcher has a One to Many relationship with ResearchMetrics
Researcher has a Many to Many relationship with Journals (not mentioned in question)
Ideally: I want to show the researchers who fulfill all the criteria above in a list of list format.
Researcher ID, name, rank, year_began, year_end, total_citations, h_index
[[123, "Thomas", "professor", 2000, 2012, 15, 20],
[ 343 ... ]]
What's the best way to go about solving this problem? (Including changes to form, etc?) I'm not very familiar with the whole form query model thing.
Thank you for your help!
To dynamically perform a query you pass a dict with items 'fieldname__lookuptype': value as **kwargs to Model.objects.filter.
So to filter for rank, year_began and year_end in your example above, you would do this:
How exactly you do the transformation depends on how variable this incoming dictionary is. An example could be something like this:
filter_in = {
'h_index': {"min": 10,"max":20},
'rank' : "supreme_overlord",
'total_citations': {"min": 10,"max":400},
'year_began': {"min": 2000},
'year_end': {"max": 3000},
}
LOOKUP_MAPPING = {
'min': 'gt',
'max': 'lt'
}
filter_kwargs = {}
for field in RESEARCHER_FIELDS:
if not field in filter_in:
continue
filter = filter_in[field]
if isinstance(filter, dict):
for filter_type, value in filter.items():
lookup_type = LOOKUP_MAPPING[filter_type]
lookup = '%s__%s' % (field, lookup_type)
filter_dict[lookup] = value
else:
filter_dict[field] = filter
This results in a dictionary like this:
{
'rank': 'supreme_overlord',
'year_began__gt': 2000,
'year_end__lt': 3000
}
Use it like this:
qs = Researcher.objects.filter(**filter_kwargs)
Regarding the fields total_citations and h_index from ResearchMetrics, I assume you want to aggregate the values. So in your example above you want either a sum or an average.
The principle is the same:
from django.db.models import Sum
METRICS_FIELDS = ['total_citations', 'h_index']
annotate_kwargs = {}
for field in METRICS_FIELDS:
if not field in filter_in:
continue
annotated_field = '%s_sum' % field
annotate_kwargs[annotated_field] = Sum('researchmetric__%s' % field)
filter = filter_in[field]
if isinstance(filter, dict):
for filter_type, value in filter.items():
lookup_type = LOOKUP_MAPPING[filter_type]
lookup = '%s__%s' % (annotated_field, lookup_type)
filter_dict[lookup] = value
else:
filter_kwargs[field] = filter
Now your filter_kwargs look like this:
{
'h_index_sum__gt': 10,
'h_index_sum__lt': 20,
'rank': 'supreme_overlord',
'total_citations_sum__gt': 10,
'total_citations_sum__lt': 400,
'year_began__gt': 2000,
'year_end__lt': 3000
}
And your annotate_kwargs look like this:
{
'h_index_sum': Sum('reasearchmetric__h_index')),
'total_citations_sum': Sum('reasearchmetric__total_citations'))
}
So your final call looks like this:
Researcher.objects.annotate(**annotate_kwargs).filter(**filter_kwargs)
There are some assumptions in my answer, but I hope you get the general idea.
There is one important point: make sure you properly validate the input to make sure that only the field can be filtered that you want the user to filter. In my approach, this is ensured by hard coding the field names in RESEARCHER_FIELDS and METRICS_FIELDS.