QuerySet optimization

QuerySet optimization - python

I need to find a match between a serial number and a list of objects, each of them having a serial number :
models:
class Beacon(models.Model):
serial = models.CharField(max_length=32, default='0')
First I wrote:
for b in Beacon.objects.all():
if b.serial == tmp_serial:
# do something
break
Then I did one step ahead:
b_queryset = Beacon.objects.all().filter(serial=tmp_serial)
if b_queryset.exists():
#do something
Now, is there a second step for more optimization?
I don't think it would be faster to cast my QuerySet in a List and do a list.index('tmp_serial').

If your serial is unique, you can do:
# return a single instance from db
match = Beacon.objects.get(serial=tmp_serial)
If you have multiple objects to get with the same serial and plan do something on each of them, exist will add a useless query.
Instead, you should do:
matches = Beacon.objects.filter(serial=tmp_serial)
if len(matches) > 0:
for match in matches:
# do something
The trick here is that len(matches) will force the evaluation of the queryset (so your db will be queried). After that,
model instances are retrieved and you can use them without another query.
However, when you use queryset.exists(), the ORM run a really simple query to check if the queryset would have returned any element.
Then, if you iterate over your queryset, you run another query to grab your objects. See the related documentation for more details.
To sum it up: use exists only if you want to check that a queryset return a result a result or not. If you actually need the queryset data, use len().

I think you are at best but if you just want whether object exists or not then,
From django queryset exists()
https://docs.djangoproject.com/en/1.8/ref/models/querysets/#django.db.models.query.QuerySet.exists
if Beacon.objects.all().filter(serial=tmp_serial).exists():
# do something

Related

Repeating record using for loop in database

in this code am trying to add the student record in the attendance database whenever the student image is captured by the webcam so name = image name stored adding that the image name is the same as the studentid stored in Student entity, whenever the detected face name exist in Name = the list of images name then it will add the student info to the attendance database.
The code just works fine but it keeps repeating the records how can I limit it to add the record just once and not repeating it.
def markattendance(name):
for n in Names:
if name in Names:
# print(name, "Exist")
# fitches the information related to the detected name
attend = Student.objects.filter(student_id=name).values('student_id', 'student_fname', 'student_lname','fk_course_id_id')
#print(attend)
# filter returns a queryset. A queryset isn't a single object, it's a group of objects so it doesn't make sense
# to call save() on a queryset. Instead you save each individual object IN the queryset:
for object in attend:
#print(object)
if object.get('student_id') not in attend:
# INSERT SQL statement behind the scenes. Django doesn’t hit the database until you explicitly call save().
reg = Attendance(student_id=object.get('student_id'),
student_fname=object.get('student_fname'),
student_lname=object.get('student_lname'),
course_id=object.get('fk_course_id_id'))
# print(reg)
reg.save()
else:
pass

I'll try to answer with my understanding of your code (which may not be perfect, correct me if I'm mistaken).
First, your for loop at the start is useless, you don't use even once the n variable you introduce there. That's probably from where the problem is coming from, as you're executing again and again the same code.
Now, your attend queryset probably contains one Student instance as you're filtering by id, which I guess is unique among all Students. You then shouldn't be looping over each of its elements, as there's only one.
Finally, your line if object.get('student_id') not in attend: will always return False. You're getting the value of student_id of the unique element in your queryset, then looking if your queryset contains this same value.
But your queryset contains Student instances, not ids so it won't return True.
This, added to the fact that you're looping over these few lines of code, will result in you getting multiple records.
You probably need something like that (based on what I understood) :
def markattendance(name):
if name in Names:
attend = Student.objects.filter(student_id=name).values('student_id', 'student_fname', 'student_lname','fk_course_id_id')
student_with_name = attend[0]
reg = Attendance(student_id=object.get('student_id'),
student_fname=object.get('student_fname'),
student_lname=object.get('student_lname'),
course_id=object.get('fk_course_id_id'))
reg.save()

Django querysets optimization - preventing selection of annotated fields

Let's say I have following models:
class Invoice(models.Model):
...
class Note(models.Model):
invoice = models.ForeignKey(Invoice, related_name='notes', on_delete=models.CASCADE)
text = models.TextField()
and I want to select Invoices that have some notes. I would write it using annotate/Exists like this:
Invoice.objects.annotate(
has_notes=Exists(Note.objects.filter(invoice_id=OuterRef('pk')))
).filter(has_notes=True)
This works well enough, filters only Invoices with notes. However, this method results in the field being present in the query result, which I don't need and means worse performance (SQL has to execute the subquery 2 times).
I realize I could write this using extra(where=) like this:
Invoice.objects.extra(where=['EXISTS(SELECT 1 FROM note WHERE invoice_id=invoice.id)'])
which would result in the ideal SQL, but in general it is discouraged to use extra / raw SQL.
Is there a better way to do this?

You can remove annotations from the SELECT clause using .values() query set method. The trouble with .values() is that you have to enumerate all names you want to keep instead of names you want to skip, and .values() returns dictionaries instead of model instances.
Django internaly keeps the track of removed annotations in
QuerySet.query.annotation_select_mask. So you can use it to tell Django, which annotations to skip even wihout .values():
class YourQuerySet(QuerySet):
def mask_annotations(self, *names):
if self.query.annotation_select_mask is None:
self.query.set_annotation_mask(set(self.query.annotations.keys()) - set(names))
else:
self.query.set_annotation_mask(self.query.annotation_select_mask - set(names))
return self
Then you can write:
invoices = (Invoice.objects
.annotate(has_notes=Exists(Note.objects.filter(invoice_id=OuterRef('pk'))))
.filter(has_notes=True)
.mask_annotations('has_notes')
)
to skip has_notes from the SELECT clause and still geting filtered invoice instances. The resulting SQL query will be something like:
SELECT invoice.id, invoice.foo FROM invoice
WHERE EXISTS(SELECT note.id, note.bar FROM notes WHERE note.invoice_id = invoice.id) = True
Just note that annotation_select_mask is internal Django API that can change in future versions without a warning.

Ok, I've just noticed in Django 3.0 docs, that they've updated how Exists works and can be used directly in filter:
Invoice.objects.filter(Exists(Note.objects.filter(invoice_id=OuterRef('pk'))))
This will ensure that the subquery will not be added to the SELECT columns, which may result in a better performance.
Changed in Django 3.0:
In previous versions of Django, it was necessary to first annotate and then filter against the annotation. This resulted in the annotated value always being present in the query result, and often resulted in a query that took more time to execute.
Still, if someone knows a better way for Django 1.11, I would appreciate it. We really need to upgrade :(

We can filter for Invoices that have, when we perform a LEFT OUTER JOIN, no NULL as Note, and make the query distinct (to avoid returning the same Invoice twice).
Invoice.objects.filter(notes__isnull=False).distinct()

This is best optimize code if you want to get data from another table which primary key reference stored in another table
Invoice.objects.filter(note__invoice_id=OuterRef('pk'),)

We should be able to clear the annotated field using the below method.
Invoice.objects.annotate(
has_notes=Exists(Note.objects.filter(invoice_id=OuterRef('pk')))
).filter(has_notes=True).query.annotations.clear()

How to get penultimate item from QuerySet in Django?

How to get the penultimate item from Django QuerySet? I tried my_queryset[-2] (after checking whether the my_queryset length is greater than 1) as follows:
if len(my_queryset)>1:
query = my_queryset[-2]
and it returns:
Exception Value: Negative indexing is not supported.
Is there some "Django" way to get such item?
The only thing which comes to my mind is to reverse the queryset and get my_queryset[2] but I'm not sure about its efficiency.
EDIT:
scans = self.scans.all().order_by('datetime')
if len(scans)>1:
scan = scans[-2]

This code which produces an error
scans = self.scans.all().order_by('datetime')
if len(scans)>1:
scan = scans[-2]
Is the equivalent of
scans = self.scans.all().order_by('-datetime')
if len(scans)>1:
scan = scans[1]
If you want to get the second one the index to use is 1 and not 2 because in python offsets starts from 0.
Also note that django querysets are lazy which means you can change your mind about ordering without a performance hit provided that proper indexes are available.
QuerySets are lazy – the act of creating a QuerySet doesn’t involve
any database activity. You can stack filters together all day long,
and Django won’t actually run the query until the QuerySet is
evaluated. Take a look at this example:

Extend queryset in django python

I am looking for way how to add new objects to existing queryset, or how to implement what I want by other way.
contact = watson.filter(contacts, searchline)
This line returns queryset, which I later use to iterate.
Then I want to do this to add more objects, which watson couldn't find
contact_in_iteration = Contact.objects.get(id = fild.f_for)
contact.append(contact_in_iteration)
And sorry for my poor english
Did this
contacts = Contact.objects.filter(crm_id=request.session['crm_id'])
query = Q(contacts,searchline)
contact = watson.filter(query)
and get 'filter() missing 1 required positional argument: 'search_text'' error

You can use | and Q lookups. See the docs.
I'm not sure I've fully understood your initial query, but I think that in your case you would want to do:
query = Q(contacts='Foo', searchline='Bar')
contact = watson.filter(query)
Then later:
contact = watson.filter(query | Q(id=field.f_for))
Strictly speaking it won't append to the queryset, but will return a new queryset. But that's okay, because that's what .filter() does anyway.

You should look at a queryset as a sql query that will be executed later. When constructing a queryset and save the result in a variable, you can later filter it even more, but you can not expand it. If you need a query that has more particular rules (like, you need an OR operation) you should state that when you are constructing the query. One way of doing that is indeed using the Q object.
But it looks like you are confused about what querysets really are and how they are used. First of all:
Contact.objects.get(id = fild.f_for)
will never return a queryset, but an instance, because you use get and thus ask for a single particular record. You need to use filter() if you want to get a quersyet. So if you had an existing queryset say active_contacts and you wanted to filter it down so you only get the contacts that have a first_name of 'John' you would do:
active_contacts = Contact.objects.filter(active=True)
active_contacts_named_John = active_contacts.filter(first_name='John')
Of course you could do this in one line too, but I'm assuming you do the first queryset construction somewhere else in your code.
Second remark:
If in your example watson is a queryset, your user of filter() is unclear. This doesn't really make sense:
contact = watson.filter(contacts, searchline)
As stated earlier, filtering a queryset returns another queryset. So you should use a plurar as your variable name e.g. contacts. Then the correct use of filter would be:
contacts = watson.filter(first_name=searchline)
I'm assuming searchline here is a variable that contains a user inputted search term. So maybe here you should name your variable searchterm or similar. This will return all contacts that are filtered by whatever watson is filtering out already and whose first_name matches searchline exactly. You could also use a more liberate method and filter out results that 'contains' the searching term, like so:
contacts = watson.filter(first_name__contains=searchline)
Hope this helps you get on the right path.

Django complex ordering

I have a Django model Document, which can have Vote objects pointing on it. There's a integer field on Vote called score.
I want to order a queryset of documents according to the number of Vote objects with score=1 that are pointing at the document. i.e., the document that has the most positive votes should be the first one in the queryset.
Is it possible with Django? How?

This is a job for annotations.
from django.db.models import Count
Document.objects.filter(score=1).annotate(
positive_votes=Count('vote__count')).order_by('positive_votes')
Edit
There isn't really a way to do this without filtering, because that's the way the underlying database operations work. But one not-so-nice way would be to do a separate query for all the documents not included in the original, and chain the two querysets together:
positive_docs = <query from above>
other_docs = Document.objects.exclude(id__in=positive_docs)
all_docs = itertools.chain(positive_docs, other_docs)
This would work as long as you don't have millions of docs, but would break things like pagination.

I did this (on the QuerySet model):
def order_by_score(self):
q = django.db.models.Q(ratings__score=1)
documents_with_one_positive_rating = self.filter(q) # Annotation sees only
# the positive ratings
documents_without_one_positive_rating = self.filter(~q)
return (documents_with_one_positive_rating |
documents_without_one_positive_rating).annotate(
db_score=django.db.models.Count('ratings')
).order_by('-db_score')
Advantage is it still shows the documents without a positive rating.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.