Django, annotate + values duplicates records - python

I have a model called Location and I'm querying the model with filters that yield 4000 objects:
count = Location.objects.filter(**filters).count()
4000
there is a related Model called KPIs, each Location has many KPIs and there are 2,944,000 KPIs records.
I have a very complex query for the Location that annotates a lot of the KPIs data.
the annotations:
def contribute_annotations(self):
user = self.request.user
self.kpis = user.user_selected_kpis.get_all_kpis_qs()
kpis_names = tuple(kpi.internal_name for kpi in self.kpis)
branch_date = Subquery(BranchKPIs.objects.
filter(branch__location__id=OuterRef(ID)).
order_by('-date').
values(DATE)[:1]
)
# summing the members amount
filters_for_branch = (
Q(location_branches__prem=True) &
~Q(location_branches__branch_scores__members_count=0) &
Q(location_branches__branch_scores__date=F(BRANCH_DATE))
)
sum_of_members_prem_count = Coalesce(Sum('location_branches__branch_scores__members_count',
output_field=IntegerField(),
filter=filters_for_branch),
0)
# location kpis prefetch object
location_kpis_qs = LocationKPIs.objects.filter(date__range=month_range).only(DATE, LOCATION, *kpis_names)
prefetch_location_kpis = Prefetch(lookup=RelatedNames.LOCATION_SCORES,
queryset=location_kpis_qs,
)
assigned_members_count_of_latest = Case(When(location_scores__date=F(LATEST_DATE),
then=f'location_scores__assigned_members_count'))
members_count_of_latest = Case(When(location_scores__date=F(LATEST_DATE),
then=f'location_scores__members_count'))
# kpis annotations for Avg, Trends, and Sizing
kpis_annotations, alias_for_trends, kpis_objects = {}, {}, {}
for kpi in self.kpis:
name = kpi.internal_name
# annotating the last kpi score
kpis_annotations[name] = Case(When(location_scores__date=F('latest_date'),
then=f'location_scores__{name}'), default=0)
# annotating the kpi's month avg
alias_for_trends[f'{name}_avg'] = Coalesce(
Avg(f'location_scores__{name}',
filter=Q(location_scores__date__range=month_range), output_field=IntegerField()
),
0
)
# comparing latest score to the monthly avg in order to determine the kpi's trend
when_equal = When(**{f'{name}_avg': F(name)}, then=0)
when_trend_is_down = When(**{f'{name}_avg__gt': F(name)}, then=-1)
when_trend_is_up = When(**{f'{name}_avg__lt': F(name)}, then=1)
kpi_trend = Case(when_equal, when_trend_is_up, when_trend_is_down,
default=0, output_field=IntegerField())
# annotating the score color
when_red = When(**{f'{name}__gte': kpi.location_level_red_threshold.lower,
f'{name}__lte': kpi.location_level_red_threshold.upper},
then=1
)
when_yellow = When(**{f'{name}__gte': kpi.location_level_yellow_threshold.lower,
f'{name}__lte': kpi.location_level_yellow_threshold.upper},
then=2
)
when_green = When(**{f'{name}__gte': kpi.location_level_green_threshold.lower,
f'{name}__lte': kpi.location_level_green_threshold.upper},
then=3
)
score_type = Case(when_red, when_yellow, when_green, default=2)
# outputs kpi : {score: int, trend: int, score_type: int}
kpis_objects[name] = JSONObject(
score=F(name),
trend=kpi_trend,
score_type=score_type
)
# cases for the pin size of the location, it depends on how many members are in it
when_in_s_size = When(
Q(member_count__gte=settings.S_LOCATION_SIZE[0]) & Q(member_count__lte=settings.S_LOCATION_SIZE[-1]),
then=1)
when_in_m_size = When(
Q(member_count__gte=settings.M_LOCATION_SIZE[0]) & Q(member_count__lte=settings.M_LOCATION_SIZE[-1]),
then=2)
when_in_l_size = When(
Q(member_count__gte=settings.L_LOCATION_SIZE[0]) & Q(member_count__lte=settings.L_LOCATION_SIZE[-1]),
then=3)
when_in_xl_size = When(
Q(member_count__gte=settings.XL_LOCATION_SIZE[0]) & Q(member_count__lte=settings.XL_LOCATION_SIZE[-1]),
then=4)
location_size = Case(when_in_s_size, when_in_m_size, when_in_l_size, when_in_xl_size,
default=2,
output_field=IntegerField())
# location's address string
location_str = Concat(LOCATION__STREET, LOCATION__CITY, LOCATION__COUNTRY,
output_field=CharField())
return (
sum_of_members_prem_count, prefetch_location_kpis, assigned_members_count_of_latest, members_count_of_latest,
kpis_annotations, location_size, alias_for_trends, location_str, kpis_names, kpis_objects, branch_date)
filters = {'user': self.request.user, ACTIVE: True}
(sum_of_members_prem_count, prefetch_location_kpis, assigned_members_count_of_latest, members_count_of_latest,
kpis_annotations, location_size, alias_for_trends, location_str, kpis_names, kpis_objects, branch_date) = self.contribute_annotations()
query_set = (Location.objects.
filter(**filters).
select_related(RelatedNames.LOCATION).
prefetch_related(prefetch_location_kpis).
alias(latest_date=Max('scores__date'),
branch_date=branch_date,
**alias_for_trends,
**kpis_annotations
).
annotate(members_prem_count=sum_of_members_prem,
members_count=members_count_of_latest,
assigned_members_count=assigned_count_of_latest,
farm_latitude=Min(LOCATION__LATITUDE),
farm_longitude=Min(LOCATION__LONGITUDE),
address=location_str,
farm_size=farm_size,
latest_date=Max('farm_scores__date'),
**kpis_objects
).
values(ID, NAME, ADMIN_EMAIL, ADMIN_PHONE, MEMBERS_PREM_COUNT,
MEMBERS_COUNT, ASSIGNED_MEMBERS_COUNT, SIZE, ADDRESS,
latitude=F(LOCATION_LATITUDE), longitude=F(LOCATION_LONGITUDE), *kpis_names
)
)
this query yields 2,944,000 records, which means each for each KPI record and not Location.
I tried adding distinct calls in several ways but I either end up with:
NotImplementedError: annotate() + distinct(fields) is not implemented.
Or the query just ignores it and doesn't add distinct location objects.
the docs suggest that values and distinct don't play nice together and that probably somewhere there is an order by that breaks it.
I've looked at all the involved models, queries and subqueries and removed the order by but it still doesn't work.
I also tried adding this to the query:
query_set.query.clear_ordering(True)
query_set = query_set.order_by(ID).distinct(ID)
but this raises that NotImplementedError

Well, I'm not sure why it's like this and maybe in some cases it won't work.
But, I changed the query to the following:
query_set = (Location.objects.
filter(**filters).
select_related(RelatedNames.LOCATION).
prefetch_related(prefetch_location_kpis).
alias(latest_date=Max('scores__date'),
branch_date=branch_date,
**alias_for_trends,
**kpis_annotations
).
distinct(ID).
annotate(members_prem_count=sum_of_members_prem,
members_count=members_count_of_latest,
assigned_members_count=assigned_count_of_latest,
farm_latitude=Min(LOCATION__LATITUDE),
farm_longitude=Min(LOCATION__LONGITUDE),
address=location_str,
farm_size=farm_size,
latest_date=Max('farm_scores__date'),
**kpis_objects
).
distinct(ID)
)
and overriding Django's source code in django/db/models/sql/compiler.py
line 595
if grouping:
if distinct_fields:
raise NotImplementedError('annotate() + distinct(fields) is not implemented.')
order_by = order_by or self.connection.ops.force_no_ordering()
result.append('GROUP BY %s' % ', '.join(grouping))
if self._meta_ordering:
order_by = None
if having:
result.append('HAVING %s' % having)
params.extend(h_params)
just commented out the if distinct_fields condition
if grouping:
# if distinct_fields:
# raise NotImplementedError('annotate() + distinct(fields) is not implemented.')
order_by = order_by or self.connection.ops.force_no_ordering()
result.append('GROUP BY %s' % ', '.join(grouping))
if self._meta_ordering:
order_by = None
if having:
result.append('HAVING %s' % having)
params.extend(h_params)

Related

Handle divide by zero with aggregated fields in Annotate expression

Currently within the following query, win_rate will always default to 0 unless Lost is 0- in that case, win_rate becomes 100. How do I properly allow division of the aggregated fields while avoiding the division by zero error?
top_markets = list(opps
.annotate(name=Subquery(Market.objects.filter(id=OuterRef('market'))[:1].values('marketname')))
.order_by('name')
.values('name')
.annotate(opps=Count('id', filter=Q(datecreated__range=(start_date, end_date))),
Won=Count(
'id', filter=Q(winloss='Won') & Q(date_closed__range=(start_date, end_date))),
Lost=Count('id', filter=Q(winloss='Lost') & Q(
date_closed__range=(start_date, end_date))),
Concluded=F('Won') + F('Lost'))
)
.annotate(
win_rate=Case(
When(Won=0, then=0),
default=((F('Won')) / \
(F('Won')) + F('Lost'))) * 100
)
Edit-
Adding my model. opps is a pre-filtered query on the model Opportunity:
class Opportunity(models.Model):
name = models.CharField()
winloss = models.CharField()
market = models.ForeignKey(Market, on_delete=SET_NULL)
datecreated = models.DateTimeField(auto_now=True)
Cast it to a FloatField:
from django.db.models import Count, F, FloatField, Q
from django.db.models.functions import Cast
opps.values(name=F('market__marketname')).annotate(
opps=Count('id', filter=Q(datecreated__range=(start_date, end_date))),
Won=Count(
'id', filter=Q(winloss='Won', date_closed__range=(start_date, end_date))
),
Lost=Count(
'id', filter=Q(winloss='Lost', date_closed__range=(start_date, end_date))
),
Concluded=F('Won') + F('Lost'),
win_rate=Case(
When(
Concluded__gt=0,
then=Cast('Won', output_field=FloatField())
* 100
/ Cast('Concluded', output_field=FloatField()),
),
default=0,
output_field=FloatField(),
),
).order_by('name')
That being said, I don't see why you do this at the database side: you have the amount of won and list Opportunitys, so you can just do that at the Python/Django level. Furthermore please do not use the queryset to generate serialized data: use a serializer.

Django: Count related model where an annotation on the related has a specific value and store count in an annotation (or simply: count subquery)

I have two models Pick and GamePick. GamePick has a ForeignKey relation to Pick, which is accessible on Pick.game_picks.
I have setup GamePick with a custom queryset and manger so that when ever I retrieve a GamePick with the manager objects is is annotated with a field is_correct based on the values of other fields.
Now what I want to be able to do is count the how many correct GamePicks are pointing to a specific Pick.
One simple way is doing this with a method in Python:
class Pick(models.Model):
...
def count_correct(self):
return self.game_picks.filter(is_correct=True).count()
So far so good.
But now, I would like to annotate each Pick with that count, say as correct_count. This is so I can order the Pick with something like Pick.objects.all().order_by("correct_count").
Now how would I do this?
This is where I am:
correct_game_picks = GamePick.objects.filter(
pick=models.OuterRef("pk"),
is_correct=True
)
picks = Pick.objects.annotate(
correct_count=models.Count(correct_game_picks.values("pk"))
)
This is what pick.query gives me:
SELECT
"picks_pick"."id",
"picks_pick"."picker",
"picks_pick"."pot_id",
COUNT((
SELECT U0."id" FROM "picks_gamepick" U0
INNER JOIN "games_game" U1 ON (U0."game_id" = U1."id")
WHERE ((U0."picked_team_id" = U1."winning_team_id") AND U0."pick_id" = "picks_pick"."id")
)) AS "correct_count"
FROM "picks_pick"
GROUP BY "picks_pick"."id", "picks_pick"."picker", "picks_pick"."pot_id"
I am not good at SQL, but it seems like it should be correct.
In my test, it returns 1 when it should 2 for two correct GamePick belonging to a Pick.
Does anybody have any pointers?
Btw, if I remove the .values("pk") I get this error:
E django.db.utils.OperationalError: sub-select returns 5 columns - expected 1
I am not sure why it matters how many column I have when I want to count rows.
As feedback suggests that this is hard to debug without knowing the models, here they are:
class Pot(models.Model):
name = models.CharField(max_length=250, null=False, blank=False)
class Team(models.Model):
name = models.CharField(max_length=250, null=False, blank=False)
class Game(models.Model):
teams = models.ManyToManyField(
Team,
related_name="+",
)
winning_team = models.ForeignKey(
Team,
on_delete=models.CASCADE,
related_name="+",
blank=True,
null=True,
)
class Pick(models.Model):
picker = models.CharField(max_length=100, help_text="Name of the person picking")
# This is the method is would like to replace with an annotation
def count_correct_method(self):
return self.game_picks.filter(is_correct=True).count()
class GamePickQueryset(models.QuerySet):
def annotate_is_correct(self):
return self.annotate(
is_correct=models.ExpressionWrapper(
models.Q(picked_team=models.F("game__winning_team")),
output_field=models.BooleanField(),
)
)
class GamePickManager(models.Manager):
def get_queryset(self):
queryset = GamePickQueryset(self.model, using=self._db)
queryset = queryset.annotate_is_correct()
return queryset
GamePickMangerFromQueryset = GamePickManager.from_queryset(GamePickQueryset)
class GamePick(models.Model):
pick = models.ForeignKey(
Pick, on_delete=models.CASCADE, related_name="game_picks", null=True, blank=True
)
game = models.ForeignKey(Game, on_delete=models.CASCADE,
related_name="game_picks")
picked_team = models.ForeignKey(
Team, on_delete=models.CASCADE, related_name="+", null=True, blank=False
)
objects = GamePickMangerFromQueryset()
With these models, I am running this as a test in which I am trying to get the annotation working
team_1 = Team(name="Test Team 1")
team_1.save()
team_2 = Team(name="Test Team 2")
team_2.save()
team_3 = Team(name="Test Team 3")
team_3.save()
team_4 = Team(name="Test Team 4")
team_4.save()
team_5 = Team(name="Test Team 5")
team_5.save()
team_6 = Team(name="Test Team 6")
team_6.save()
assert Team.objects.count() == 6
pot = Pot(name="Test Pot")
pot.save()
assert Pot.objects.count() == 1
assert Pot.objects.first() == pot
game_1 = Game(pot=pot)
game_1.save()
game_1.teams.add(team_1, team_2)
game_1.winning_team = team_1
game_1.save()
game_2 = Game(pot=pot)
game_2.save()
game_2.teams.add(team_3, team_4)
game_2.winning_team = team_3
game_2.save()
game_3 = Game(pot=pot)
game_3.save()
game_3.teams.add(team_5, team_6)
game_3.winning_team = team_5
game_3.save()
assert Game.objects.count() == 3
assert pot.games.count() == 3
assert pot.games.all()[0].winning_team == team_1
assert pot.games.all()[1].winning_team == team_3
assert pot.games.all()[2].winning_team == team_5
pick = Pick(picker="Tester", pot=pot)
pick.save()
assert Pick.objects.count() == 1
game_pick_1 = GamePick(pick=pick, game=game_1, picked_team=team_1)
game_pick_1.save()
game_pick_2 = GamePick(pick=pick, game=game_2, picked_team=team_3)
game_pick_2.save()
game_pick_3 = GamePick(pick=pick, game=game_3, picked_team=team_6)
game_pick_3.save()
assert GamePick.objects.count() == 3
assert pick.game_picks.count() == 3
assert pick.game_picks.all()[0].is_correct == True
assert pick.game_picks.all()[1].is_correct == True
assert pick.game_picks.all()[2].is_correct == False
assert pick.count_correct() == 2
from django.db import models
correct_game_picks = GamePick.objects.filter(
pick=models.OuterRef("pk"),
is_correct=True,
)
pick = Pick.objects.all().annotate(
correct_count=models.Count(
# models.Q(game_picks__in=correct_game_picks)
models.Q(game_picks__picked_team=models.F("game_picks__game__winning_team"))
)
)[0]
assert pick.correct_count == 2
In this test I get 3 == 2. For some reason, it is counting all the game_picks not only the ones that fulfill the expression.
Really don't know what to do with that anymore...
I just realized (thanks to #BradMeinsberger), since I am doing that __in expression, I should not really need the OuterRef.
So the annotation can be just this:
correct_game_picks = GamePick.objects.filter(
is_correct=True,
)
pick = Pick.objects.all().annotate(
correct_count=models.Count(
models.Q(game_picks__in=correct_game_picks)
)
)[0]
But now the kicker: without the OuterRef I can evaluate the correct game picks separately:
assert correct_game_picks.count() == 2
assert pick.correct_count == 2
The first assert passes but the second does not with 3 == 2 😧
How can there be more than 2 in a list of 2?
Is there some kind of duplicate happening?
Now I can through a distinct=True into the Count and it passes 🎉
Let's test another combination e.g. only 1 correct game pick:
game_pick_1 = GamePick(pick=pick, game=game_1, picked_team=team_1)
game_pick_1.save()
game_pick_2 = GamePick(pick=pick, game=game_2, picked_team=team_4)
game_pick_2.save()
game_pick_3 = GamePick(pick=pick, game=game_3, picked_team=team_6)
game_pick_3.save()
assert GamePick.objects.count() == 3
assert pick.game_picks.count() == 3
assert pick.game_picks.all()[0].is_correct == True
assert pick.game_picks.all()[1].is_correct == False
assert pick.game_picks.all()[2].is_correct == False
assert pick.count_correct() == 1
from django.db import models
correct_game_picks = GamePick.objects.filter(
is_correct=True,
)
pick = Pick.objects.all().annotate(
correct_count=models.Count(
models.Q(game_picks__in=correct_game_picks),
distinct=True
)
)[0]
assert correct_game_picks.count() == 1
assert pick.correct_count == 1
💥 2 == 1
😭
In the SQL you generate the subquery inside the COUNT aggregate is joining to a games_game table that isn't anywhere else in your question. It looks like it's doing this to figure out if the pick is correct where elsewhere in your question you have a column on GamePick called is_correct that is used for this.
Here is how you would do it assuming you have the is_correct column and ignoring the games_game table
from django.db.models import Subquery, OuterRef, Count
subquery = GamePick.objects.filter(
pick=OuterRef('id'),
is_correct=True
).values(
'pick_id' # Necessary to get the proper group by
).annotate(
count=Count('pk')
).values(
'id' # Necessary to select only one column
)
picks = Pick.objects.annotate(correct_count=Subquery(subquery))
You can get the same thing using the django-sql-utils package. pip install django-sql-utils and then
from sql_util.utils import SubqueryCount
from django.db.models import Q
subquery = SubqueryCount('game_pick', filter=Q(is_correct=True))
picks=Pick.objects.annotate(correct_count=subquery)
If you need to determine if the pick is correct using the games_game table, I think you would replace is_correct=True (in both examples above) with
game__winning_team_id=F('picked_team_id')
I'm not 100% certain since I can't see those models/columns.
Just got it!
I guess I was making it more complicated than it needed to be.
correct_game_picks = GamePick.objects.filter(
pick=models.OuterRef("pk"),
is_correct=True
)
picks = Pick.objects.annotate(
correct_count=models.Count(
models.Q(game_picks__in=correct_game_picks)
)
)
and the resulting SQL:
SELECT
"picks_pick"."id",
"picks_pick"."picker",
"picks_pick"."pot_id",
COUNT(
"picks_gamepick"."id" IN (
SELECT U0."id" FROM "picks_gamepick" U0
INNER JOIN "games_game" U1 ON (U0."game_id" = U1."id")
WHERE ((U0."picked_team_id" = U1."winning_team_id") AND U0."pick_id" = "picks_pick"."id"))
) AS "correct_count"
FROM "picks_pick"
LEFT OUTER JOIN "picks_gamepick" ON ("picks_pick"."id" = "picks_gamepick"."pick_id")
GROUP BY "picks_pick"."id", "picks_pick"."picker", "picks_pick"."pot_id"
This seemingly unrelated blog post I came a across when searching for "Django subquery count" pointed me in the right direction:
https://mattrobenolt.com/the-django-orm-and-subqueries/
Nope. The above does not work. For some reason it only counts the number of game picks... 🤦‍♂️
Guess a proper look into the docs is always helpful:
correct_game_picks = GamePick.objects.filter(
is_correct=True,
)
picks = Pick.objects.all().annotate(
correct_count=models.Count(
"game_picks", # The field to count needs to be mentioned specifically
filter=models.Q(game_picks__in=correct_game_picks), # ... and you can define a filter to limit the number of rows in the aggregate
distinct=True. # Prevent duplicates! Important for counting rows
)
)
The aggregate filter is what is was looking for: https://docs.djangoproject.com/en/3.2/ref/models/querysets/#aggregate-filter
This is the generated SQL:
SELECT
"picks_pick"."id",
"picks_pick"."picker",
"picks_pick"."pot_id",
COUNT(
DISTINCT "picks_gamepick"."id"
) FILTER (
WHERE "picks_gamepick"."id" IN (
SELECT U0."id" FROM "picks_gamepick" U0
INNER JOIN "games_game" U1 ON (U0."game_id" = U1."id")
WHERE (U0."picked_team_id" = U1."winning_team_id")
)
)
AS "correct_count"
FROM "picks_pick"
LEFT OUTER JOIN "picks_gamepick" ON ("picks_pick"."id" = "picks_gamepick"."pick_id")
GROUP BY "picks_pick"."id", "picks_pick"."picker", "picks_pick"."pot_id"

I got the first result only of the for loop in Django views

i have a PositiveIntegerField in a model, in which i need to loop through that model to check all the values of this field and get its results to use it in my views..
The Problem is when i did that i just get the value of the first row in the database only!
models.py
class RoomType(models.Model):
hotel = models.ForeignKey(Hotel, on_delete=models.CASCADE)
room_type = models.ForeignKey(RoomTypesNames, on_delete=models.CASCADE)
room_capacity = models.PositiveIntegerField() ## Thats the field i wanna check its value
views.py
def SearchHotels(request):
x = None
z = None
t = None
if request.method == 'GET':
destination = request.GET.get('cityHotels')
numAdultStr = request.GET.get('numAdult')
numChild = request.GET.get('numChild')
numAdult = int(numAdultStr)
if destination:
q_city2 = Q(hotel__city__name__icontains = destination)
rooms2 = RoomType.objects.filter(q_city2)
################################
### next is my question:
if rooms2:
for r in rooms2:
if r.room_capacity < numAdult and numAdult % r.room_capacity == 0:
x = numAdult / r.room_capacity
### i want to loop through this query and check the values of 'room_capacity' in all models, but i only get the result of only the first row in my database
Probably you should get the last entry of your table unless your order_by is reversed. As #furas mentioned in comments, when you are dealing with multiple entry in a loop, its better to add the calculated values in a list.
But an alternative solution is to use annotate with conditional expression to use the DB to calculate the values for you:
from django.db.models import FloatField, IntegerField, ExpressionWrapper, F, Case, When, Value
room2 = RoomType.objects.filter(q_city2).annotate(
x_modulo=ExpressionWrapper(
numAdult % F('room_capacity'),
output_field=IntegerField()
)
).annotate(
x=Case(
When(
room_capacity__lt=numAdult,
x_modulo=0,
then=numAdult/F('room_capacity')
),
default_value=Value('0'),
output_field=FloatField()
)
)
all_x = []
for r in room2:
all_x.append(r.x)
print(all_x)
# or
print(room2.values('x'))
# filter usage
room2.filter(x__gt=0)
Explanation: In here, I am annotating of x_modulo which is modular value of numAdult and room_capacity. Then I am annotating the value of x which checks if room capacity is less than number of adults and value of x_modulo is 0. Then I am just annotating the fraction of numAdults and room_capacity.

My django query is very slow in givig me data on terminal

I have a users table which has 3 types of users Student, Faculty and Club and I have a university table.
What I want is how many users are there in the specific university.
I am getting my desired output but the output is very slow.I have 90k users and the output it is generating it takes minutes to produce results.
My user model:-
from __future__ import unicode_literals
from django.db import models
from django.contrib.auth.models import User
from cms.models.masterUserTypes import MasterUserTypes
from cms.models.universities import Universities
from cms.models.departments import MasterDepartments
# WE ARE AT MODELS/APPUSERS
requestChoice = (
('male', 'male'),
('female', 'female'),
)
class Users(models.Model):
id = models.IntegerField(db_column="id", max_length=11, help_text="")
userTypeId = models.ForeignKey(MasterUserTypes, db_column="userTypeId")
universityId = models.ForeignKey(Universities, db_column="universityId")
departmentId = models.ForeignKey(MasterDepartments , db_column="departmentId",help_text="")
name = models.CharField(db_column="name",max_length=255,help_text="")
username = models.CharField(db_column="username",unique=True, max_length=255,help_text="")
email = models.CharField(db_column="email",unique=True, max_length=255,help_text="")
password = models.CharField(db_column="password",max_length=255,help_text="")
bio = models.TextField(db_column="bio",max_length=500,help_text="")
gender = models.CharField(db_column="gender",max_length=6, choices=requestChoice,help_text="")
mobileNo = models.CharField(db_column='mobileNo', max_length=16,help_text="")
dob = models.DateField(db_column="dob",help_text="")
major = models.CharField(db_column="major",max_length=255,help_text="")
graduationYear = models.IntegerField(db_column='graduationYear',max_length=11,help_text="")
canAddNews = models.BooleanField(db_column='canAddNews',default=False,help_text="")
receivePrivateMsgNotification = models.BooleanField(db_column='receivePrivateMsgNotification',default=True ,help_text="")
receivePrivateMsg = models.BooleanField(db_column='receivePrivateMsg',default=True ,help_text="")
receiveCommentNotification = models.BooleanField(db_column='receiveCommentNotification',default=True ,help_text="")
receiveLikeNotification = models.BooleanField(db_column='receiveLikeNotification',default=True ,help_text="")
receiveFavoriteFollowNotification = models.BooleanField(db_column='receiveFavoriteFollowNotification',default=True ,help_text="")
receiveNewPostNotification = models.BooleanField(db_column='receiveNewPostNotification',default=True ,help_text="")
allowInPopularList = models.BooleanField(db_column='allowInPopularList',default=True ,help_text="")
xmppResponse = models.TextField(db_column='xmppResponse',help_text="")
xmppDatetime = models.DateTimeField(db_column='xmppDatetime', help_text="")
status = models.BooleanField(db_column="status", default=False, help_text="")
deactivatedByAdmin = models.BooleanField(db_column="deactivatedByAdmin", default=False, help_text="")
createdAt = models.DateTimeField(db_column='createdAt', auto_now=True, help_text="")
modifiedAt = models.DateTimeField(db_column='modifiedAt', auto_now=True, help_text="")
updatedBy = models.ForeignKey(User,db_column="updatedBy",help_text="Logged in user updated by ......")
lastPasswordReset = models.DateTimeField(db_column='lastPasswordReset',help_text="")
authorities = models.CharField(db_column="departmentId",max_length=255,help_text="")
class Meta:
managed = False
db_table = 'users'
the query i am using which is producing the desired output but too sloq is:-
universities = Universities.objects.using('cms').all()
for item in universities:
studentcount = Users.objects.using('cms').filter(universityId=item.id,userTypeId=2).count()
facultyCount = Users.objects.using('cms').filter(universityId=item.id,userTypeId=1).count()
clubCount = Users.objects.using('cms').filter(universityId=item.id,userTypeId=3).count()
totalcount = Users.objects.using('cms').filter(universityId=item.id).count()
print studentcount,facultyCount,clubCount,totalcount
print item.name
You should use annotate to get the counts for each university and conditional expressions to get the counts based on conditions (docs)
Universities.objects.using('cms').annotate(
studentcount=Sum(Case(When(users_set__userTypeId=2, then=1), output_field=IntegerField())),
facultyCount =Sum(Case(When(users_set__userTypeId=1, then=1), output_field=IntegerField())),
clubCount=Sum(Case(When(users_set__userTypeId=3, then=1), output_field=IntegerField())),
totalcount=Count('users_set'),
)
First, an obvious optimization. In the loop, you're doing essentially the same query four times: thrice filtering for different userTypeId, and once without one. You can do this in a single COUNT(*) ... GROUP BY userTypeId query.
...
# Here, we're building a dict {userTypeId: count}
# by counting PKs over each userTypeId
qs = Users.objects.using('cms').filter(universityId=item.id)
counts = {
x["userTypeId"]: x["cnt"]
for x in qs.values('userTypeId').annotate(cnt=Count('pk'))
}
student_count = counts.get(2, 0)
faculty_count = counts.get(1, 0)
club_count = count.get(3, 0)
total_count = sum(count.values()) # Assuming there may be other userTypeIds
...
However, you're still doing 1+n queries, where n is number of universities you have in the database. This is fine if the number is low, but if it's high you need further aggregation, joining Universities and Users. A first draft I came with is something like this:
# Assuming University.name is unique, otherwise you'll need to use IDs
# to distinguish between different projects, instead of names.
qs = Users.objects.using('cms').values('userTypeId', 'university__name')\
.annotate(cnt=Count('pk').order_by('university__name')
for name, group in itertools.groupby(qs, lambda x: x["university__name"]):
print("University: %s" % name)
cnts = {g["userTypeId"]: g["cnt"] for g in group}
faculty, student, club = cnts.get(1, 0), cnts.get(2, 0), cnts.get(3, 0)
# NOTE: I'm assuming there are only few (if any) userTypeId values
# other than {1,2,3}.
total = sum(cnts.values())
print(" Student: %d, faculty: %d, club: %d, total: %d" % (
student, faculty, club, total))
I might've made a typo there, but hope it's correct. In terms of SQL, it should emit a query like
SELECT uni.name, usr.userTypeId, COUNT(usr.id)
FROM some_app_universities AS uni
LEFT JOUN some_app_users AS usr ON us.universityId = uni.id
GROUP BY uni.name, usr.userTypeId
ORDER BY uni.name
Consider reading documentation on aggregations and annotations. And be sure to check out raw SQL that Django ORM emits (e.g. use Django Debug Toolbar) and analyze how well it works on your database. For example, use EXPLAIN SELECT if you're using PostgreSQL. Depending on your dataset, you may benefit from some indexes there (e.g. on userTypeId column).
Oh, and on a side note... it's off-topic, but in Python it's a custom to have variables and attributes use lowercase_with_underscores. In Django, model class names are usually singular, e.g. User and University.

Django ORM for given group by SQL query with aggregation method sum and count

I have below given Django model
class ABC(models.Model):
user = models.ForeignKey(DEF)
name = models.CharField()
phone_num = models.CharField()
date = models.DateTimeField(auto_now=True)
amount = models.IntegerField()
I want to perform below query using Django ORM.
select *, sum(amount), count(date) from ABC group by phone_num;
I tried code below, but it does not work.
ABC.objects.all().annotate(count = Count("phone_num")).order_by("phone_num")
Not sure whether it possible to grub data you mentioned above ( Select *, sum(amount), count( date ) by simple order by, probab;y that's JOIN query, at least you could try variants below and perform some intersection by phone_num on ABC.all():
ABC.objects.values("phone_num").order_by().annotate(count = Count("date"), amount= Sum("amount"))
Notes:
values('phone_num') - for GROUP BY 'phone_num' clause.
order_by() - for exclusion possible default ordering which ( you could remove that order_by().
p.s.
Also try to run query below:
ABC.objects.all().values("phone_num").annotate(count = Count("date"), amount= Sum("amount"))
Update
You could do next loop to grub desired data as Django ORM solution is absent:
data = (dict(o, data=ABC.objects.filter(phone_num=o['phone_num'])[:1][0]) for o in ABC.objects
.values("phone_num")
.order_by()
.annotate(count = Count("date"), amount= Sum("amount")).all())
// know you could access your data in next way:
for item in data:
phone_num = item['phone_num']
count = item['count']
amount = item['amount']
id = item['data'].id
name = item['data'].name
// Do other staff...
Note
data formed with generator expression(comprehension)

Categories

Resources