django queryset runtime - get nth entry in constant time

django queryset runtime - get nth entry in constant time - python

I'm using multiple ways to get data from db via different django querysets,
but I would like to know the runtime for each queryset and if possible a better way (to maybe get data in constant time!!)
qs = MyModel.objects.order_by('-time')
qs = qs.filter(blah = blah)
to get the first entry I'm doing this:
entry = list(qs[:1])
first_entry = entry[0]
or to get 10th and last entry:
entry = list(qs)
some_entry = entry[9]
last_entry = entry[-1]
but I believe this will take O(n) time, is there anyway to get the nth term in constant time?
I dont want to use get() as I dont know the id or other value of the entry(its sorted), but only the position.
I may also use annotate, but this also take O(n) runtime.
MyModel.objects.values('date').annotate(min_value=Min('value')).order_by('min_value')[0]
I know the position just need that entry in constant time?

From the docs:
Use a subset of Python’s array-slicing syntax to limit your QuerySet to a certain number of results. This is the equivalent of SQL’s LIMIT and OFFSET clauses.
Generally, slicing a QuerySet returns a new QuerySet – it doesn’t evaluate the query. An exception is if you use the “step” parameter of Python slice syntax.
To retrieve a single object rather than a list (e.g. SELECT foo FROM bar LIMIT 1), use a simple index instead of a slice.
https://docs.djangoproject.com/en/dev/topics/db/queries/#limiting-querysets
The part about not evaluating the queryset as you slice it is the important part.

Related

Index of row looping over django queryset [duplicate]

I have a QuerySet, let's call it qs, which is ordered by some attribute which is irrelevant to this problem. Then I have an object, let's call it obj. Now I'd like to know at what index obj has in qs, as efficiently as possible. I know that I could use .index() from Python or possibly loop through qs comparing each object to obj, but what is the best way to go about doing this? I'm looking for high performance and that's my only criteria.
Using Python 2.6.2 with Django 1.0.2 on Windows.

If you're already iterating over the queryset and just want to know the index of the element you're currently on, the compact and probably the most efficient solution is:
for index, item in enumerate(your_queryset):
...
However, don't use this if you have a queryset and an object obtained by some unrelated means, and want to learn the position of this object in the queryset (if it's even there).

If you just want to know where you object sits amongst all others (e.g. when determining rank), you can do it quickly by counting the objects before you:
index = MyModel.objects.filter(sortField__lt = myObject.sortField).count()

Assuming for the purpose of illustration that your models are standard with a primary key id, then evaluating
list(qs.values_list('id', flat=True)).index(obj.id)
will find the index of obj in qs. While the use of list evaluates the queryset, it evaluates not the original queryset but a derived queryset. This evaluation runs a SQL query to get the id fields only, not wasting time fetching other fields.

QuerySets in Django are actually generators, not lists (for further details, see Django documentation on QuerySets).
As such, there is no shortcut to get the index of an element, and I think a plain iteration is the best way to do it.
For starter, I would implement your requirement in the simplest way possible (like iterating); if you really have performance issues, then I would use some different approach, like building a queryset with a smaller amount of fields, or whatever.
In any case, the idea is to leave such tricks as late as possible, when you definitely knows you need them.
Update: You may want to use directly some SQL statement to get the rownumber (something lie . However, Django's ORM does not support this natively and you have to use a raw SQL query (see documentation). I think this could be the best option, but again - only if you really see a real performance issue.

It's possible for a simple pythonic way to query the index of an element in a queryset:
(*qs,).index(instance)
This answer will unpack the queryset into a list, then use the inbuilt Python index function to determine it's position.

You can do this using queryset.extra(…) and some raw SQL like so:
queryset = queryset.order_by("id")
record500 = queryset[500]
numbered_qs = queryset.extra(select={
'queryset_row_number': 'ROW_NUMBER() OVER (ORDER BY "id")'
})
from django.db import connection
cursor = connection.cursor()
cursor.execute(
"WITH OrderedQueryset AS (" + str(numbered_qs.query) + ") "
"SELECT queryset_row_number FROM OrderedQueryset WHERE id = %s",
[record500.id]
)
index = cursor.fetchall()[0][0]
index == 501 # because row_number() is 1 indexed not 0 indexed

Django: optimizing a query with spread data

I have Order objects and OrderOperation objects that represent an action on a Order (creation, modification, cancellation).
Conceptually, an order has 1 to many order operations. Each time there is an operation on the order, the total is computed in this operation. Which means when I need to find the total of an order, I just get the last order operation total.
The simplified code
class OrderOperation(models.Model):
order = models.ForeignKey(Order)
total = DecimalField(max_digits=9, decimal_places=2)
class Order(models.Model):
#property
def last_operation(self) -> Optional['OrderOperation']:
try:
qs = self.orderoperation_set.all()
return qs[len(qs) - 1]
except AssertionError: # when there is a negative indexing (no operation)
# IndexError can not happen
return None
#property
def total(self) -> Optional[Decimal]:
last_operation = self.last_operation
return last_operation.total if last_operation else None
The issue
Since I get lots of orders, each time I want to make a simple filtering like "orders that have a total lower than 5€", it takes a long time, because I need to browse all orders, using the following, obviously bad query:
all_objects = Order.objects.all()
Order.objects.prefetch_related('orderoperation_set').filter(
pk__in=[o.pk for o in all_objects if o.total <= some_value])
My current ideas / what I tried
Data denormalization?
I could simply create a total attribute on Order, and copy the operation total to the order total every time on operation is created.
Then, Order.objects.filter(total__lte=some_value) would work.
However, before duplicating data in my database, I'd like to be sure there is not an easier/cleaner solution.
Using annotate() method?
I somehow expected to be able to do: Order.objects.annotate(total=something_magical_here).filter(total__lte=some_value). It seems it's not possible.
Filtering separetely then matching?
order_operations = OrderOperation.objects.filter(total__lte=some_value)
orders = Order.objects.filter(orderoperation__in=order_operations)
This is very fast, but the filtering is bad since I didn't filter last operations, but all operations here. This is wrong.
Any other idea? Thanks.

Using annotate() method
It seems it's not possible.
Of course, it is possible ;) You can use subqueries or some clever conditional expressions. Assuming that you want to get total amount from last order operation, here is example with subquery:
from django.db.models import Subquery, OuterRef
orders = Order.objects.annotate(
total=Subquery( # [1]
OrderOperation.objects \
.filter(order_id=OuterRef("pk")) \ # [2]
.order_by('-id') \ # [3]
.values('total') \ # [4]
[:1] # [5]
)
)
Explanation of code above:
We are adding new field to results list, called total taht will be filled in by subquery. You can access it as any other field of model Order in this queryset (either after evaluating it, in model instances or in filtering and other annotations). You can learn how annotation works from Django docs.
Subquery should only be invoked for operations from current order. OuterRef just will be replaced with reference to selected field in resulting SQL query.
We want to order by operation id descending, because we do want latest one. If you have other field in your operations that you want to order by instead (like creation date), fill it here.
That subquery should only return total value from operation
We want only one element. It is being fetched using slice notation instead of normal index, because using index on django querysets will immediately invoke it. Slicing only adds LIMIT clause to SQL query, without invoking it and that is what we want.
Now you can use:
orders.filter(total__lte=some_value)
to fetch only orders that you want. You can also use that annotation to

How to get the equalent of python [:-1] in django ORM?

I am writing a Django application where I want to get all the items but last from a query. My query goes like this:
objects = Model.objects.filter(name='alpha').order_by('rank')[:-1]
but it throws out error:
Assertion Error: Negative indexing not supported.
Any idea where I am going wrong?
Any suggestions will be appreciated.

You can use QuerySet.last() to get the last and use its id for excluding it from results.
objects = Model.objects.filter(name='alpha').order_by('rank')
last = objects.last()
objects = objects.exclude(pk=last.pk)
A query for excluding from the result all objects ranked with the minimum value found in DB:
objects = Model.objects.annotate(
mini_rank=Min('rank'), # Annotate each object with the minimum known rank
).exclude(
mini_rank=F('rank') # Exclude all objects ranked with the minimum value found
)

EDITED
Django does not support negative indexing on QuerySets. Please read https://code.djangoproject.com/ticket/13089 for more information.
The quick and "dirty" way to do it is to convert the Queryset as a list and then use the negative indexing.
objects = list( Model.objects.filter(name='alpha').order_by('rank') )[:-1]
Please do note that the objects variable is no longer a queryset but a list.
However i would recommend using .exclude() method.
If you wish to use the .exclude() method, which i recommend, I would like to ask you to read the solution #RaydelMiranda has wrote below.

Negative indexing is not allowed in Django.
However you can use negative indexing in order_by function and take the first or any number of objects in the order.
You can do something like this:
objects = Model.objects.filter(name='alpha').order_by('-rank')[n:]
Here n suggests the number of objects you will need. In your case it would be:
objects = Model.objects.filter(name='alpha').order_by('-rank')[1:]

query=model.objects.filter(user=request.user)
if query.exists():
query=query.last()

django - filter after slice / filter on queryset where results have been limited

having trouble understanding why I can't filter after a slice on a queryset and what is happening.
stuff = stuff.objects.all()
stuff.count()
= 7
If I then go
extra_stuff = stuff.filter(stuff_flag=id)
extra_stuff.count()
= 6. Everything is all good and I have my new queryset in extrastuff no issues
stuff = stuff.objects.all()[:3]
extra_stuff = stuff.filter(stuff_flag=id)
I get the error "Cannot filter a query once a slice has been taken."
How can I filter further on a queryset where I have limited the number of results?

You can't use filter() after you have sliced the queryset. The error is pretty explicit.
Cannot filter a query once a slice has been taken.
You could do the filter in Python
stuff = stuff.objects.all()[:3]
extra_stuff = [s for s in stuff if s.stuff_flag=='flag']
To get the number or items in extra_stuff, just use len()
extra_stuff_count = len(extra_stuff)
Doing the filtering in Python will work fine when the size of stuff is very small, as in this case. If you had a much larger slice, you could use a subquery, however this might have performance issues as well, you would have to test.
extra_stuff = Stuff.objects.filter(id__in=stuff, stuff_flag='flag')

Django gives you that error because it's already retrieved the items from the database by that point. The filter method is only useful to refine the database query before actually executing it.
Since you're only getting three objects, you could just do the extra filtering in Django:
extra_stuff = [s for s in stuff if s.stuff_flag==id]
but I wonder why you don't do the filter before slicing.

Just made the filtering first after that create another variable and slice it like that:
extra_stuff = stuff.objects.filter(stuff_flag=id)
the_sliced_stuff = extra_stuff[:3]
It works well

Just do 2 queries.
total_stuff = StuffClass.objects.count()
extra_stuff = StuffClass.filter(stuff_flag=id)[:3]
extra_stuff_count = len(StuffClass.filter(stuff_flag=id))
Note, if extra_stuff_count is a few count, like 3 or 300.
Because, it's need more memory for more count (in this case, just make one more request).

can this python be shorter

I tend to obsess about expressing code the most compactly and succinctly possible without sacrificing runtime efficiency.
Here's my code:
p_audio = plate.parts.filter(content__iendswith=".mp3")
p_video = not p_audio and plate.parts.filter(content__iendswith=".flv")
p_swf = not p_audio and not p_video and plate.parts.filter(content__iendswith=".swf")
extra_context.update({
'p_audio': p_audio and p_audio[0],
'p_video': p_video and p_video[0],
'p_swf': p_swf and p_swf[0]
})
Are there any python/django gurus that can drastically shorten this code?

Actually, in your pursuit of compactness and efficiency, you have managed to come up with code that is terribly inefficient. This is because when you refer to p_audio or not p_audio, that causes that queryset to be evaluated - and because you haven't sliced it before then, that means that the entire filter is brought from the database - eg all the plate objects that end with mp3, and so on.
You should ensure you do the slice for each query first, before you refer to the value of that query. Since you're concerned with code compactness, you probably want to slice with [:1] first, to get a queryset of a single object:
p_audio = plate.parts.filter(content__iendswith=".mp3")[:1]
p_video = not p_audio and plate.parts.filter(content__iendswith=".flv") [:1]
p_swf = not p_audio and not p_video and plate.parts.filter(content__iendswith=".swf")[:1]
and the rest can stay the same.
Edit to add Because you're only interested in the first element of each list, as evidenced by the fact that you only pass [0] from each element into the context. But in your code, not p_audio refers to the original, unsliced queryset: and to determine the true/false value of the qs, Django has to evaluate it, which gets all matching elements from the database and converts them into Python objects. Since you don't actually want those objects, you're doing a lot more work than you need.
Note though that it's not re-running it every time: just the first time, since after the first evaluation the queryset is cached internally. But as I say, that's already more work than you want.

Besides featuring less redundancy, this is also way easier to extend with new content types.
kinds = (("p_audio", ".mp3"), ("p_video", ".flv"), ("p_swf", ".swf"))
extra_context.update((key, False) for key, _ in kinds)
for key, ext in kinds:
entries = plate.parts.filter(content__iendswith=ext)
if entries:
extra_context[key] = entries[0]
break

Just adding this as another answer inspired by Pyroscope's above (as my edit there has to be peer reviewed)
The latest incarnation is exploiting that the Django template system just disregards nonexistant context items when they are referenced, so mp3, etc below do not need to be initialized to False (or 0). So, the following meets all the functionality of the code from the OP. The other optimization is that mp3, etc are used as key names (instead of "p_audio" etc.)
for key in ['mp3','flv','swf'] :
entries = plate.parts.filter(content__iendswith=key)[:1]
extra_context[key] = entries and entries[0]
if extra_context[key] :
break

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.