Refactoring Django query

Refactoring Django query - python

I wrote some instructions in order to extract data from my database.
I have two values; a city name and a keyword, which are attributes of Address and Museum:
class Museum(models.Model):
id = models.AutoField(primary_key=True)
name = models.CharField(max_length=200)
address = models.ForeignKey(Address)
description = models.CharField(max_length=200)
class Address(models.Model):
id = models.AutoField(primary_key=True)
streetAddress = models.CharField(max_length=200)
city = models.CharField(max_length=200)
Now I am receiving two optional parameters: a city and and a keyword. I want to filter out museums according to such city (exact match) AND such keyword (partial match in name OR description)
This is what I ended up writing:
if city is not None and keyword is None:
city_data = Address.objects.all().filter(city=city)
museum_list = Museum.objects.all().filter(address__in=city_data)
elif city is None and keyword is not None:
museum_list = Museum.objects.all().filter(
Q(name__contains=keyword) | Q(description__contains=keyword)
)
elif city is not None and keyword is not None:
city_data = Address.objects.all().filter(city=city)
museum_list = Museum.objects.all().filter(
Q(address__in=city_data) & (
Q(name__contains = keyword) | Q(description__contains=keyword)
)
)
else:
museum_list = Museum.objects.all()
I don't like this code, because I am accounting for all possible combinations. How can I use Django filtering to improve such code to something like:
results = Museum.objects.all()
if city not null
results = results.filterByAddress_City
if keyword not null
results = results.filterByKeywordLikeNameOrLikeDescription
Thanks.

Queries are composable, so you can pretty much do exactly what you state in your pseudocode.
results = Museum.objects.all()
if city:
results = results.filter(address__city=city)
if keyword:
results = results.filter(Q(name__contains = keyword) | Q(description__contains = keyword))

Related

Django - How to take string values on URL for PUT?

I set up my URL like this :
path('voucher/<str:voucher_id>', views.update_voucher),
My process
def update_voucher(request, voucher_id):
put = QueryDict(request.body)
try:
customer_id = put.get('customer_id')
except:
return HttpResponse("Missing parameters")
updateVoucher = Voucher.objects.filter(code = voucher_id)
Its a PUT call taking parameters from both body and url. (voucher_id from URL) and (customer_id from body)
.
I call this URL http://127.0.0.1:5448/voucher/NewVoucher
I got this error:
ValueError: Field 'id' expected a number but got 'NewVoucher'.
The below is my model:
here.
class Voucher(models.Model):
code = models.CharField(unique=True, max_length=255)
delivery_type = models.CharField(max_length=255)
description = models.CharField(max_length=255, blank=True, null=True)
start_at = models.DateTimeField()
end_at = models.DateTimeField()
discount_type = models.CharField(max_length=255)
discount_amount = models.FloatField(blank=True, null=True)
P/S: I am a maintainer - cant change method function, and cant change the way this URL take parameters from both URL and body

You are not passing voucher_id as integers. instead you are passing code "NewVoucher" which is a string as per this error.
ValueError: Field 'id' expected a number but got 'NewVoucher'.
You have to pass id in integers so it would look something like this
http://127.0.0.1:5448/voucher/1
So far as i've understood you are looking for filter based on voucher code i,e "NewVoucher".
then Your method should be changed as,
def update_voucher(request, voucher_code, *args, **kwargs):
voucher = get_object_or_404(Voucher, code=voucher_code)
customer_id = request.data.get("customer_id") # im not sure where you are using this customer_id
if not customer_id:
raise HttpResponse("Missing parameters")
# updateVoucher = Voucher.objects.filter(code = voucher_id) no need of this line as voucher variable contains it
# urls
path('voucher/<str:voucher_code>', views.update_voucher),

Django : Best way to Query a M2M Field , and count occurences

class Edge(BaseInfo):
source = models.ForeignKey('Node', on_delete=models.CASCADE,related_name="is_source")
target = models.ForeignKey('Node', on_delete=models.CASCADE,related_name="is_target")
def __str__(self):
return '%s' % (self.label)
class Meta:
unique_together = ('source','target','label','notes')
class Node(BaseInfo):
item_type_list = [('profile','Profile'),
('page','Page'),
('group','Group'),
('post','Post'),
('phone','Phone'),
('website','Website'),
('email','Email'),
('varia','Varia')
]
item_type = models.CharField(max_length=200,choices=item_type_list,blank = True,null=True)
firstname = models.CharField(max_length=200,blank = True, null=True)
lastname = models.CharField(max_length=200,blank = True,null=True)
identified = models.BooleanField(blank=True,null=True,default=False)
username = models.CharField(max_length=200, blank=True, null=True)
uid = models.CharField(max_length=200,blank=True,null=True)
url = models.CharField(max_length=2000,blank=True,null=True)
edges = models.ManyToManyField('self', through='Edge',blank = True)
I have a Model Node (in this case a soc media profile - item_type) that has relations with other nodes (in this case a post). A profile can be the author of a post. An other profile can like or comment that post.
Question : what is the most efficient way to get all the distinct profiles that liked or commented on anothes profile's post + the count of these likes /comments.
print(Edge.objects.filter(Q(label="Liked")|Q(label="Commented"),q).values("source").annotate(c=Count('source')))
Gets me somewhere but i have the values then (id) and i want to pass the objects to my template rather then .get() all the profiles again...
Result :
Thanks in advance

I ended up with iterating over the queryset and adding the objects that i wanted in a dictionary , if the object was already in dictionary , i would count +1 and add the relation in a nested list.
This doesnt feel right but works for now.
posts = Edge.objects.filter(source = self,target__item_type='post',label='Author')
if posts:
q = Q()
for post in posts:
q = q | Q(target=post.target)
contributors = Edge.objects.filter(Q(label="Liked")|Q(label="Commented"),q)
if contributors:
for i in contributors:
if i.source.uid in results:
if i.label in results[i.source.uid]['relation']:
pass
else:
results[i.source.uid]["relation"].append(i.label)
if 'post' in results[i.source.uid]:
results[i.source.uid]['post'].append(i.target)
else:
results[i.source.uid]['post']=[i.target]
else:
results[i.source.uid] = {'profile' : i.source , 'relation':[i.label],'post':[i.target]}

My django query is very slow in givig me data on terminal

I have a users table which has 3 types of users Student, Faculty and Club and I have a university table.
What I want is how many users are there in the specific university.
I am getting my desired output but the output is very slow.I have 90k users and the output it is generating it takes minutes to produce results.
My user model:-
from __future__ import unicode_literals
from django.db import models
from django.contrib.auth.models import User
from cms.models.masterUserTypes import MasterUserTypes
from cms.models.universities import Universities
from cms.models.departments import MasterDepartments
# WE ARE AT MODELS/APPUSERS
requestChoice = (
('male', 'male'),
('female', 'female'),
)
class Users(models.Model):
id = models.IntegerField(db_column="id", max_length=11, help_text="")
userTypeId = models.ForeignKey(MasterUserTypes, db_column="userTypeId")
universityId = models.ForeignKey(Universities, db_column="universityId")
departmentId = models.ForeignKey(MasterDepartments , db_column="departmentId",help_text="")
name = models.CharField(db_column="name",max_length=255,help_text="")
username = models.CharField(db_column="username",unique=True, max_length=255,help_text="")
email = models.CharField(db_column="email",unique=True, max_length=255,help_text="")
password = models.CharField(db_column="password",max_length=255,help_text="")
bio = models.TextField(db_column="bio",max_length=500,help_text="")
gender = models.CharField(db_column="gender",max_length=6, choices=requestChoice,help_text="")
mobileNo = models.CharField(db_column='mobileNo', max_length=16,help_text="")
dob = models.DateField(db_column="dob",help_text="")
major = models.CharField(db_column="major",max_length=255,help_text="")
graduationYear = models.IntegerField(db_column='graduationYear',max_length=11,help_text="")
canAddNews = models.BooleanField(db_column='canAddNews',default=False,help_text="")
receivePrivateMsgNotification = models.BooleanField(db_column='receivePrivateMsgNotification',default=True ,help_text="")
receivePrivateMsg = models.BooleanField(db_column='receivePrivateMsg',default=True ,help_text="")
receiveCommentNotification = models.BooleanField(db_column='receiveCommentNotification',default=True ,help_text="")
receiveLikeNotification = models.BooleanField(db_column='receiveLikeNotification',default=True ,help_text="")
receiveFavoriteFollowNotification = models.BooleanField(db_column='receiveFavoriteFollowNotification',default=True ,help_text="")
receiveNewPostNotification = models.BooleanField(db_column='receiveNewPostNotification',default=True ,help_text="")
allowInPopularList = models.BooleanField(db_column='allowInPopularList',default=True ,help_text="")
xmppResponse = models.TextField(db_column='xmppResponse',help_text="")
xmppDatetime = models.DateTimeField(db_column='xmppDatetime', help_text="")
status = models.BooleanField(db_column="status", default=False, help_text="")
deactivatedByAdmin = models.BooleanField(db_column="deactivatedByAdmin", default=False, help_text="")
createdAt = models.DateTimeField(db_column='createdAt', auto_now=True, help_text="")
modifiedAt = models.DateTimeField(db_column='modifiedAt', auto_now=True, help_text="")
updatedBy = models.ForeignKey(User,db_column="updatedBy",help_text="Logged in user updated by ......")
lastPasswordReset = models.DateTimeField(db_column='lastPasswordReset',help_text="")
authorities = models.CharField(db_column="departmentId",max_length=255,help_text="")
class Meta:
managed = False
db_table = 'users'
the query i am using which is producing the desired output but too sloq is:-
universities = Universities.objects.using('cms').all()
for item in universities:
studentcount = Users.objects.using('cms').filter(universityId=item.id,userTypeId=2).count()
facultyCount = Users.objects.using('cms').filter(universityId=item.id,userTypeId=1).count()
clubCount = Users.objects.using('cms').filter(universityId=item.id,userTypeId=3).count()
totalcount = Users.objects.using('cms').filter(universityId=item.id).count()
print studentcount,facultyCount,clubCount,totalcount
print item.name

You should use annotate to get the counts for each university and conditional expressions to get the counts based on conditions (docs)
Universities.objects.using('cms').annotate(
studentcount=Sum(Case(When(users_set__userTypeId=2, then=1), output_field=IntegerField())),
facultyCount =Sum(Case(When(users_set__userTypeId=1, then=1), output_field=IntegerField())),
clubCount=Sum(Case(When(users_set__userTypeId=3, then=1), output_field=IntegerField())),
totalcount=Count('users_set'),
)

First, an obvious optimization. In the loop, you're doing essentially the same query four times: thrice filtering for different userTypeId, and once without one. You can do this in a single COUNT(*) ... GROUP BY userTypeId query.
...
# Here, we're building a dict {userTypeId: count}
# by counting PKs over each userTypeId
qs = Users.objects.using('cms').filter(universityId=item.id)
counts = {
x["userTypeId"]: x["cnt"]
for x in qs.values('userTypeId').annotate(cnt=Count('pk'))
}
student_count = counts.get(2, 0)
faculty_count = counts.get(1, 0)
club_count = count.get(3, 0)
total_count = sum(count.values()) # Assuming there may be other userTypeIds
...
However, you're still doing 1+n queries, where n is number of universities you have in the database. This is fine if the number is low, but if it's high you need further aggregation, joining Universities and Users. A first draft I came with is something like this:
# Assuming University.name is unique, otherwise you'll need to use IDs
# to distinguish between different projects, instead of names.
qs = Users.objects.using('cms').values('userTypeId', 'university__name')\
.annotate(cnt=Count('pk').order_by('university__name')
for name, group in itertools.groupby(qs, lambda x: x["university__name"]):
print("University: %s" % name)
cnts = {g["userTypeId"]: g["cnt"] for g in group}
faculty, student, club = cnts.get(1, 0), cnts.get(2, 0), cnts.get(3, 0)
# NOTE: I'm assuming there are only few (if any) userTypeId values
# other than {1,2,3}.
total = sum(cnts.values())
print(" Student: %d, faculty: %d, club: %d, total: %d" % (
student, faculty, club, total))
I might've made a typo there, but hope it's correct. In terms of SQL, it should emit a query like
SELECT uni.name, usr.userTypeId, COUNT(usr.id)
FROM some_app_universities AS uni
LEFT JOUN some_app_users AS usr ON us.universityId = uni.id
GROUP BY uni.name, usr.userTypeId
ORDER BY uni.name
Consider reading documentation on aggregations and annotations. And be sure to check out raw SQL that Django ORM emits (e.g. use Django Debug Toolbar) and analyze how well it works on your database. For example, use EXPLAIN SELECT if you're using PostgreSQL. Depending on your dataset, you may benefit from some indexes there (e.g. on userTypeId column).
Oh, and on a side note... it's off-topic, but in Python it's a custom to have variables and attributes use lowercase_with_underscores. In Django, model class names are usually singular, e.g. User and University.

Create error message datefield

I want to create an error message for following form:
class ExaminationCreateForm(forms.ModelForm):
class Meta:
model = Examination
fields = ['patient', 'number_of_examination', 'date_of_examination']
Models:
class Patient(models.Model):
patientID = models.CharField(max_length=200, unique=True, help_text='Insert PatientID')
birth_date = models.DateField(auto_now=False, auto_now_add=False, help_text='YYYY-MM-DD')
gender = models.CharField(max_length=200,choices=Gender_Choice, default='UNDEFINED')
class Examination(models.Model):
number_of_examination = models.IntegerField(choices=EXA_Choices)
patient = models.ForeignKey(Patient, on_delete=models.CASCADE)
date_of_examination = models.DateField(auto_now=False, auto_now_add=False, help_text='YYYY-MM-DD')
Every Patient has 2 Examinations (number of examination = Choices 1 or 2) and the error message should be activated when the date of the second examination < date of the first examination. Something like this:
Solution: `
def clean_date_of_examination(self):
new_exam = self.cleaned_data.get('date_of_examination')
try:
old_exam = Examination.objects.get(patient=self.cleaned_data.get('patient'))
except Examination.DoesNotExist:
return new_exam
if old_exam:
if old_exam.date_of_examination > new_exam:
raise forms.ValidationError("Second examination should take place after first examination")
return new_exam`

def clean_date_of_examination(self):
new_exam = self.cleaned_data.get('date_of_examination')
old_exam = Examination.objects.get(patient = self.cleaned_data.get('Patient'))
if old_exam:
if old_exam.date_of_examination > new_exam.date_of_examination:
raise forms.ValidationError("Second examination should take place after first examination")
return data

def clean_date_of_examination(self):
# Where 'data' is used?
date_of_exam = self.cleaned_data['date_of_examination']
try:
pat1 = Patient.object.get(examination__number_of_examination=1, date_of_examination=date_of_exam)
except Patiens.DoesNotExist:
# Patient 1 with given query doesn't exist. Handle it!
try:
pat2 = Patient.object.get(examination__number_of_examination=2, date_of_examination=date_of_exam)
except Patiens.DoesNotExist:
# Patient 2 with given query doesn't exist. Handle it!
if pat2.date_of_examination < pat1.date_of_examination:
raise forms.ValidationError("Second examination should take place after first examination")`
return data`

Iterating with Django ORM through large datasets is slow

I'm using Django ORM to get data out of a database with a few million items. However, computation takes a while (40 minutes+), and I'm not sure how to pin point where the issue is located.
Models I've used:
class user_chartConfigurationData(models.Model):
username_chartNum = models.ForeignKey(user_chartConfiguration, related_name='user_chartConfigurationData_username_chartNum')
openedConfig = models.ForeignKey(user_chartConfigurationChartID, related_name='user_chartConfigurationData_user_chartConfigurationChartID')
username_selects = models.CharField(max_length=200)
blockName = models.CharField(max_length=200)
stage = models.CharField(max_length=200)
variable = models.CharField(max_length=200)
condition = models.CharField(max_length=200)
value = models.CharField(max_length=200)
type = models.CharField(max_length=200)
order = models.IntegerField()
def __unicode__(self):
return str(self.username_chartNum)
order = models.IntegerField()
class data_parsed(models.Model):
setid = models.ForeignKey(sett, related_name='data_parsed_setid', primary_key=True)
setid_hash = models.CharField(max_length=100, db_index = True)
block = models.CharField(max_length=2000, db_index = True)
username = models.CharField(max_length=2000, db_index = True)
time = models.IntegerField(db_index = True)
time_string = models.CharField(max_length=200, db_index = True)
def __unicode__(self):
return str(self.setid)
class unique_variables(models.Model):
setid = models.ForeignKey(sett, related_name='unique_variables_setid')
setid_hash = models.CharField(max_length=100, db_index = True)
block = models.CharField(max_length=200, db_index = True)
stage = models.CharField(max_length=200, db_index = True)
variable = models.CharField(max_length=200, db_index = True)
value = models.CharField(max_length=2000, db_index = True)
class Meta:
unique_together = (("setid", "block", "variable", "stage", "value"),)
The code I'm running is looping through data_parsed, with relevant data that matches between user_chartConfigurationData and unique_variables.
#After we get the tab, we will get the configuration data from the config button. We will need the tab ID, which is chartNum, and the actual chart
#That is opened, which is the chartID.
chartIDKey = user_chartConfigurationChartID.objects.get(chartID = chartID)
for i in user_chartConfigurationData.objects.filter(username_chartNum = chartNum, openedConfig = chartIDKey).order_by('order').iterator():
iterator = data_parsed.objects.all().iterator()
#We will loop through parsed objects, and at the same time using the setid (unique for all blocks), which contains multiple
#variables. Using the condition, we can set the variable gte (greater than equal), or lte (less than equal), so that the condition match
#the setid for the data_parsed object, and variable condition
for contents in iterator:
#These are two flags, found is when we already have an entry inside a dictionary that already
#matches the same setid. Meaning they are the same blocks. For example FlowBranch and FlowPure can belong
#to the same block. Hence when we find an entry that matches the same id, we will put it in the same dictionary.
#Added is used when the current item does not map to a previous setid entry in the dictionary. Then we will need
#to add this new entry to the array of dictionary (set_of_pk_values). Otherwise, we will be adding a lot
#of entries that doesn't have any values for variables (because the value was added to another entry inside a dictionary)
found = False
added = False
storeItem = {}
#Initial information for the row
storeItem['block'] = contents.block
storeItem['username'] = contents.username
storeItem['setid'] = contents.setid
storeItem['setid_hash'] = contents.setid_hash
if (i.variable != ""):
for findPrevious in set_of_pk_values:
if(str(contents.setid) == str(findPrevious['setid'])):
try:
items = unique_variables.objects.get(setid = contents.setid, variable = i.variable)
findPrevious[variableName] = items.value
found = True
break
except:
pass
if(found == False):
try:
items = unique_variables.objects.get(setid = contents.setid, variable = i.variable)
storeItem[variableName] = items.value
added = True
except:
pass
if(found == False and added == True):
storeItem['time_string'] = contents.time_string
set_of_pk_values.append(storeItem)
I've tried to use select_related() or prefetch_related(), since it needs to go to unique_variables object and get some data, however, it still takes a long time.
Is there a better way to approach this problem?

Definitely, have a look at django_debug_toolbar. It will tell you how many queries you execute, and how long they last. Can't really live without this package when I have to optimize something =).
PS: Execution will be even slower.
edit: You may also want to enable db_index for the fields you use to filter with or index_together for more than one field. Ofc, measure the times between your changes so you make sure which option is better.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Refactoring Django query - python

Queries are composable, so you can pretty much do exactly what you state in your pseudocode. results = Museum.objects.all() if city: results = results.filter(addresscity=city) if keyword: results = results.filter(Q(namecontains = keyword) | Q(description__contains = keyword))

Related

Django - How to take string values on URL for PUT?

Django : Best way to Query a M2M Field , and count occurences

My django query is very slow in givig me data on terminal

Create error message datefield

Iterating with Django ORM through large datasets is slow

Categories

Resources

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Refactoring Django query - python

Queries are composable, so you can pretty much do exactly what you state in your pseudocode. results = Museum.objects.all() if city: results = results.filter(address__city=city) if keyword: results = results.filter(Q(name__contains = keyword) | Q(description__contains = keyword))

Related

Django - How to take string values on URL for PUT?

Django : Best way to Query a M2M Field , and count occurences

My django query is very slow in givig me data on terminal

Create error message datefield

Iterating with Django ORM through large datasets is slow

Categories

Resources

Queries are composable, so you can pretty much do exactly what you state in your pseudocode. results = Museum.objects.all() if city: results = results.filter(addresscity=city) if keyword: results = results.filter(Q(namecontains = keyword) | Q(description__contains = keyword))