Django: QuerySet with group of same entries - python

My goal is to show for a specific survey the Top 10 "Entities" per question ordered from high to low by salience.
A survey has several questions. And each question has several answers. Each answer can have several entities (sometimes the same name (CharField), sometimes different names). Entities are grouped by the name field per question.
I thought the following final result makes sense:
[
5: # question.pk
[
{
'name': 'Leonardo Di Caprio',
'count': 4, # E.g. answer__pk = 1, answer__pk = 1, answer__pk = 2, answer__pk = 3. Leonardo Di Caprio was mentioned twice in answer_pk 1 and therefore has entries.
'salience': 3.434 # Sum of all 4 entities
},
{
'name': 'titanic',
'count': 5,
'salience': 1.12
},
{
'name': 'music',
'count': 3,
'salience': 1.12
}
],
3: # question.pk
[
{
'name': 'Leonardo Di Caprio',
'count': 5,
'salience': 1.5
},
{
'name': 'titanic',
'count': 4,
'salience': 1.12
},
{
'name': 'music',
'count': 2,
'salience': 1.12
}
],
]
Now I am struggling to write the right QuerySet for my desired outcome. I came to the point that I probably have to use .values() and .annotate(). But my results are quite far away from what my goal ist.
Here my models.py:
class Entity(TimeStampedModel):
name = models.CharField()
type = models.CharField()
salience = models.FloatField()
sentiment_magnitude = models.FloatField()
sentiment_score = models.FloatField()
language = models.CharField()
answer = models.ForeignKey(
Answer, on_delete=models.CASCADE, related_name="entities"
)
class Answer(TimeStampedModel):
question = models.ForeignKey(
"surveys.Question", on_delete=models.CASCADE, related_name="answers"
)
response = models.ForeignKey()
answer = models.TextField()
class Question(TimeStampedModel):
survey = models.ForeignKey(
"surveys.Survey", on_delete=models.CASCADE, related_name="questions"
)
title = models.CharField(max_length=100, verbose_name=_("Title"))
focus = models.CharField()
class Response(TimeStampedModel):
survey = models.ForeignKey(
"surveys.Survey", on_delete=models.CASCADE, related_name="responses"
)
order = models.ForeignKey()
attendee = models.ForeignKey()
total_time = models.PositiveIntegerField()
ip_address = models.GenericIPAddressField()
language = models.CharField()
class Survey(TimeStampedModel):
id = models.UUIDField(primary_key=True, editable=False, default=uuid.uuid4)
event = models.ForeignKey()
template = models.CharField()
Here, what I tried so far. But that seems far from what my goal ist:
questions = self.request.event.surveys.get_results(
settings.SURVEY_PRE_EVENT
)
for question in questions:
print("------")
print(question.pk)
answers = question.answers.all()
for answer in answers:
print(
answer.entities.values("name")
.annotate(count=Count("name"))
.annotate(salience=Sum("salience"))
)
Here the output:
------
33
<QuerySet [{'name': 'people', 'count': 1, 'salience': 1.0}]>
<QuerySet [{'name': 'income', 'count': 1, 'salience': 1.0}]>
<QuerySet [{'name': 'incomes', 'count': 2, 'salience': 1.26287645101547}]>

I'm not sure entirely if I understood your problem correctly, but you may be looking for something like
Question.objects.values("answers__entities__name").annotate(
salience=Sum("answers__entities__salience"),
count=Count("answers"),
)
Disclaimers:
I haven't tested this and I may be wrong, but this is what I'd start playing around with.
Also you might find this useful: https://simpleisbetterthancomplex.com/tutorial/2016/12/06/how-to-create-group-by-queries.html

You can loop through the questions in order to create a list for each question:
Entity.objects.filter(answer__question=question).values('name').annotate(count=Count('pk')).annotate(total_salience=Sum('salience'))
Or if you want to have all in one queryset, group first by question (pk):
Entity.objects.values('answer__question__pk', 'name').annotate(count=Count('pk')).annotate(total_salience=Sum('salience'))
This will produce a list, not a nested list by question, but you can later regroup this in python to nest the entities for each question.

Related

Django get total count and count by unique value in queryset

I have models Software and Domain described loosely as:
class Software(models.Model)
id = models.BigInteger(primary_key=True, db_index=True, null=False)
company = models.ForeignKey('Company')
domain = models.ForeignKey('Domain')
type = models.CharField(null=False)
vendor = models.CharField(null=False)
name = models.CharField(null=False)
class Domain(models.Model):
id = models.BigInteger(primary_key=True, db_index=True, null=False)
type = models.CharField()
importance = models.DecimalField(max_digits=11, decimal_places=10, null=False)
And I get a Software queryset with:
qs = Software.objects.filter(company=c).order_by('vendor')
The desired output should have an aggregated Domain importance with total count for each unique Software, i.e.
[
{
'type': 'type_1', \
'vendor': 'ajwr', | - unique together
'name': 'nginx', /
'domains': {
'total_count': 4,
'importance_counts': [0.1: 1, 0.5: 2, 0.9: 1] # sum of counts = total_count
},
},
{
...
},
]
I feel like the first step here should be to just group the type, vendor, name by Domain so each Software object has a list of Domains instead of just one but I'm not sure how to do that. Doing this in memory would make it a lot easier but it seems like it would be a lot slower than using querysets / SQL.
So I would do it like this:
from django.db.models import Sum
qs = Software.objects.filter(company=c).prefetch_related(
'domain'
).annotate(
total_count=Sum('domain__importance')
).order_by('vendor')
output = []
for obj in qs:
domains = obj.domain.all() # using prefetched domains, no db query
output.append({
# ...
'domains': {
'total_count': obj.total_count,
'importance_counts': [d.importance for d in domains]
}
})
And I belive it should be fast enough. Only if finding that it isn't I would try to improve. Remember "Premature optimization is the root of all evil"

Python SQLAlchemy TypeError: unhashable type: 'dict' when trying to instantiate model with one-to-many relationship

I'm trying to use SQLAlchemy to create a model which contains a one-to-many relationship. One recipe may have many directions associated with it. However, when I try to instantiate a recipe I get TypeError: unhashable type: 'dict'. If I remove the directions argument everything works fine and it creates the recipe without any directions. Is there something I'm missing that won't allow the directions parameter to be a list?
app.py
data = {
'cook_time': 15,
'description': 'Recipe description',
'directions': [{'order': 1, 'text': 'First direction'},
{'order': 2, 'text': 'Second direction'}],
'image_url': 'https://via.placeholder.com/800x300?text=Recipe+Image',
'name': 'Test recipe 2',
'prep_time': 15,
'servings': 6
}
recipe = models.Recipe(
name=data['name'],
description=data['description'],
image_url=data['image_url'],
prep_time=data['prep_time'],
cook_time=data['cook_time'],
servings=data['servings'],
directions=data['directions']
)
models.py
class Recipe(db.Model):
__tablename__ = 'recipes'
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.String(200), index=True)
description = db.Column(db.String(2000))
image_url = db.Column(db.String(200))
prep_time = db.Column(db.Integer)
cook_time = db.Column(db.Integer)
servings = db.Column(db.Integer)
directions = db.relationship('RecipeDirection', backref='recipes', lazy='dynamic')
class RecipeDirection(db.Model):
__tablename__ = 'recipe_directions'
id = db.Column(db.Integer, primary_key=True)
recipe_id = db.Column(db.Integer, db.ForeignKey('recipes.id'))
order = db.Column(db.Integer)
text = db.Column(db.String(1000))
You are getting the error because SQLAlchemy is expecting directions to be a list of RecipeDirection. To fix, create a list of RecipeDirection first.
data = {
'cook_time': 15,
'description': 'Recipe description',
'directions': [{'order': 1, 'text': 'First direction'},
{'order': 2, 'text': 'Second direction'}],
'image_url': 'https://via.placeholder.com/800x300?text=Recipe+Image',
'name': 'Test recipe 2',
'prep_time': 15,
'servings': 6
}
# Create a list of `RecipeDirection`
directions = []
for direction in data.get("directions", []):
directions.append(models.RecipeDirection(**direction))
recipe = models.Recipe(
name=data['name'],
description=data['description'],
image_url=data['image_url'],
prep_time=data['prep_time'],
cook_time=data['cook_time'],
servings=data['servings'],
directions=directions # Now list of RecipieDirection not list of dicts
)
I would also suggest looking into a serilizer that will take care of some of the details of marshalling and serilizing nested data structures for you, such as marshmallow-sqlalchemy

python django sort with lambda with if statement

I have date and some dollar gross model:
class FirstDate(models.Model):
gross = models.DecimalField(max_digits=12, decimal_places=2, default=0)
updated = models.DateTimeField(auto_now=True)
class SecondDate(models.Model):
gross = models.DecimalField(max_digits=12, decimal_places=2, default=0)
updated = models.DateTimeField(auto_now=True)
And want to sort it by gross, and if gross is the same, then sort it with updated field
For example,
qs1 = SoloDate.objects.all()[:2]
qs2 = GroupDate.objects.all()[:2]
result_list = sorted(
chain(qs1, qs2),
key=lambda x: x.gross # and if gross is the same, for the gross same objects, x.updated and then update was also the same, x.pk,
reverse=True
)
I mean, let me say that there is two objects individually from qs1 and qs2.
# objects from qs1
qs1_obj1 = {
'pk': 1,
'gross': 5,
'updated': 2018-11-24 10:53:23.360707+00:00
}
qs1_obj2 = {
'pk': 2,
'gross': 5,
'updated': 2018-11-25 10:53:23.360707+00:00
}
# objects from qs2
qs2_obj1 = {
'pk': 3,
'gross': 5,
'updated': 2018-11-24 10:53:23.360707+00:00
}
qs2_obj2 = {
'pk': 4,
'gross': 1,
'updated': 2018-11-23 10:53:23.360707+00:00
}
It's result_list order will be qs1_obj1, qs2_obj1, qs1_obj2, qs_2_obj_2.
Reasons is it:
qs1_obj1: 1.by gross, 2.by updated, 3.by pk,
qs2_obj1: 1.by gross, 2.by updated, 3. but pk was not good,
qs1_obj2: 1.by gross, 2.but by dpdated was late,
qs2_obj2: 1.gross was small.
Maybe it is not good question or bothersome question, I need help.
Question line is that :
key=lambda x: x.gross # and if gross is the same, for the same gross objects, x.updated and then update was also the same, x.pk,
How can I do this?
Try sorting by multiple fields like so:
result_list = sorted(
chain(qs1, qs2),
key=lambda x: (x.gross, -x.updated.timestamp(), x.pk) # and if gross is the same, for the gross same objects, x.updated and then update was also the same, x.pk,
reverse=True
)

How can I use parent_id in Django?

In my model:
class HomePageFirstModule(models.Model):
name = models.CharField(max_length=8, unique=True)
is_active = models.BooleanField(default=True) # 是否启用
class HomePageSecondModule(models.Model):
name = models.CharField(max_length=16, unique=True)
is_active = models.BooleanField(default=True) # 是否启用
home_page_first_module = models.ForeignKey(to=HomePageFirstModule) # 所属的第一级模块
class HomePageThridModule(models.Model):
name = models.CharField(max_length=16, unique=True)
url = models.CharField(max_length=128)
is_active = models.BooleanField(default=True) # 是否启用
home_page_second_module = models.ForeignKey(to=HomePageSecondModule) # 所属的第二级模块
Then I use filter method to query out the data:
def get_homepage_module_list():
"""
获取到可以使用的模块信息
:return:
"""
data_query_list = models.HomePageThridModule.objects.filter(
home_page_second_module__home_page_first_module="1"
).values('id', 'name', 'is_active', 'home_page_second_module__name',
'home_page_second_module__home_page_first_module__name',
'home_page_second_module__home_page_first_module__is_active',
'home_page_second_module__is_active'
)
data_list_del = []
data_list = list(data_query_list)
for item in data_list:
if (item['is_active'] == False) or (
item['home_page_second_module__is_active'] == False
) or (
item['home_page_second_module__home_page_first_module__is_active'] == False
):
data_list_del.append(item)
for item_del in data_list_del:
data_list.remove(item_del)
return data_list
========================
How can I convert this list data :
[
{
"home_page_second_module__name": "云主机",
"home_page_second_module__home_page_first_module__name": "产品",
"id": 1,
"name": "云主机子1"
},
{
"home_page_second_module__name": "云主机",
"home_page_second_module__home_page_first_module__name": "产品",
"id": 4,
"name": "云主机子4"
},
{
"home_page_second_module__name": "云硬盘",
"home_page_second_module__home_page_first_module__name": "产品",
"id": 2,
"name": "云硬盘子2"
},
{
"home_page_second_module__name": "云硬盘",
"home_page_second_module__home_page_first_module__name": "产品",
"id": 3,
"name": "云硬盘子3"
}
]
to this:
[
{"name":"产品",
"data":[
{"name":"云主机",
"data":[{"name":"云主机子1",
"data":{"id":1}},
{"name":"云主机子2",
"data":{"id":2}}]},
{"name":"云硬盘",
"data":[{"name":"云硬盘子1",
"data":{"id":3}},
{"name":"云硬盘子2",
"data":{"id":4}}]}
]
}
]
There should has a arithmetic method to do this, but I tried, do not get that.
I only think of this below little things:
home_page_second_module__name_list = []
home_page_second_module__home_page_first_module__name_list = []
id_list = []
name_list = []
for home_page_second_module__name,home_page_second_module__home_page_first_module__name,id,name in ori_list:
if not (home_page_second_module__name_list.__contains__(home_page_second_module__name)):
home_page_second_module__name_list.append(home_page_second_module__name)
if not (home_page_second_module__home_page_first_module__name_list.__contains__(home_page_second_module__home_page_first_module__name_list)):
home_page_second_module__home_page_first_module__name_list.append(home_page_second_module__home_page_first_module__name)
But now I think this is very difficult to do that, and I think mine is wrong way to do that.
Is there a convenient way to realize it?
EDIT
The 产品, 云主机, 云硬盘 may can be deed as parent id.
You can use django-rest-framework, and use related serializers. relations serializers
Outdated code, as the original question just asked how to transform a list of dicts into another list of dicts with different outcome
I bet this can optimized a lot but... assuming your dict is named old, I think this might do it:
new = {'name': i['home_page_second_module__home_page_first_module__name'] for i in old if not i['home_page_second_module__home_page_first_module__name'] in old}
new['data'] = [['name': i['home_page_second_module__name'], 'data':[{'name': i['home_page_second_module__name'], 'data': {'id': i['id']}}]] for i in old]

Django REST serialize output - group by foreign keys

I have models like below.
Restaurant Model
class Restaurant(models.Model):
name = models.CharField(max_length=40, verbose_name='Name')
Menu Model
class Menu(models.Model):
name = models.CharField(max_length=40, unique=True, verbose_name='menu name')
Item Model
class Item(models.Model):
restaurant = models.ForeignKey(Restaurant)
menu = models.ForeignKey(Menu)
name = models.CharField(max_length=500)
price = models.IntegerField(default=0)
I want to get the menus for the shop id.
How can I group my results by menu for the restaurant id ?
call GET /menus/restaurant_id
Sample.
{
name: menu name 1
items: [ {item1}, {item2}]
},
{
name: menu name 2
items: [ {item1}, {item2}]
}
Thanks..
The only thing i can find it's postgres specific aggregation function ArrayAgg
You can use it like this:
from django.contrib.postgres.aggregates import ArrayAgg
Item.objects.filter(restaurant_id=1).values('menu__name').annotate(items=ArrayAgg('name'))
# example output:
# [
# {
# 'menu__name': 'menu1',
# 'items': ['item1', 'item2']
# },
# {
# 'menu__name': 'menu2',
# 'items': ['item3', 'item4']
# },
# ]
Such qs performs next raw sql query:
SELECT
"appname_menu"."name",
ARRAY_AGG("appname_item"."name") AS "items"
FROM "appname_item"
INNER JOIN "appname_menu" ON ("appname_item"."menu_id" = "appname_menu"."id")
WHERE "appname_item"."restaurant_id" = 1
GROUP BY "appname_menu"."name"
Probably it can help you.

Categories

Resources