Index of row looping over django queryset [duplicate] - python

I have a QuerySet, let's call it qs, which is ordered by some attribute which is irrelevant to this problem. Then I have an object, let's call it obj. Now I'd like to know at what index obj has in qs, as efficiently as possible. I know that I could use .index() from Python or possibly loop through qs comparing each object to obj, but what is the best way to go about doing this? I'm looking for high performance and that's my only criteria.
Using Python 2.6.2 with Django 1.0.2 on Windows.

If you're already iterating over the queryset and just want to know the index of the element you're currently on, the compact and probably the most efficient solution is:
for index, item in enumerate(your_queryset):
...
However, don't use this if you have a queryset and an object obtained by some unrelated means, and want to learn the position of this object in the queryset (if it's even there).

If you just want to know where you object sits amongst all others (e.g. when determining rank), you can do it quickly by counting the objects before you:
index = MyModel.objects.filter(sortField__lt = myObject.sortField).count()

Assuming for the purpose of illustration that your models are standard with a primary key id, then evaluating
list(qs.values_list('id', flat=True)).index(obj.id)
will find the index of obj in qs. While the use of list evaluates the queryset, it evaluates not the original queryset but a derived queryset. This evaluation runs a SQL query to get the id fields only, not wasting time fetching other fields.

QuerySets in Django are actually generators, not lists (for further details, see Django documentation on QuerySets).
As such, there is no shortcut to get the index of an element, and I think a plain iteration is the best way to do it.
For starter, I would implement your requirement in the simplest way possible (like iterating); if you really have performance issues, then I would use some different approach, like building a queryset with a smaller amount of fields, or whatever.
In any case, the idea is to leave such tricks as late as possible, when you definitely knows you need them.
Update: You may want to use directly some SQL statement to get the rownumber (something lie . However, Django's ORM does not support this natively and you have to use a raw SQL query (see documentation). I think this could be the best option, but again - only if you really see a real performance issue.

It's possible for a simple pythonic way to query the index of an element in a queryset:
(*qs,).index(instance)
This answer will unpack the queryset into a list, then use the inbuilt Python index function to determine it's position.

You can do this using queryset.extra(…) and some raw SQL like so:
queryset = queryset.order_by("id")
record500 = queryset[500]
numbered_qs = queryset.extra(select={
'queryset_row_number': 'ROW_NUMBER() OVER (ORDER BY "id")'
})
from django.db import connection
cursor = connection.cursor()
cursor.execute(
"WITH OrderedQueryset AS (" + str(numbered_qs.query) + ") "
"SELECT queryset_row_number FROM OrderedQueryset WHERE id = %s",
[record500.id]
)
index = cursor.fetchall()[0][0]
index == 501 # because row_number() is 1 indexed not 0 indexed

Related

How to get the equalent of python [:-1] in django ORM?

I am writing a Django application where I want to get all the items but last from a query. My query goes like this:
objects = Model.objects.filter(name='alpha').order_by('rank')[:-1]
but it throws out error:
Assertion Error: Negative indexing not supported.
Any idea where I am going wrong?
Any suggestions will be appreciated.
You can use QuerySet.last() to get the last and use its id for excluding it from results.
objects = Model.objects.filter(name='alpha').order_by('rank')
last = objects.last()
objects = objects.exclude(pk=last.pk)
A query for excluding from the result all objects ranked with the minimum value found in DB:
objects = Model.objects.annotate(
mini_rank=Min('rank'), # Annotate each object with the minimum known rank
).exclude(
mini_rank=F('rank') # Exclude all objects ranked with the minimum value found
)
EDITED
Django does not support negative indexing on QuerySets. Please read https://code.djangoproject.com/ticket/13089 for more information.
The quick and "dirty" way to do it is to convert the Queryset as a list and then use the negative indexing.
objects = list( Model.objects.filter(name='alpha').order_by('rank') )[:-1]
Please do note that the objects variable is no longer a queryset but a list.
However i would recommend using .exclude() method.
If you wish to use the .exclude() method, which i recommend, I would like to ask you to read the solution #RaydelMiranda has wrote below.
Negative indexing is not allowed in Django.
However you can use negative indexing in order_by function and take the first or any number of objects in the order.
You can do something like this:
objects = Model.objects.filter(name='alpha').order_by('-rank')[n:]
Here n suggests the number of objects you will need. In your case it would be:
objects = Model.objects.filter(name='alpha').order_by('-rank')[1:]
query=model.objects.filter(user=request.user)
if query.exists():
query=query.last()

django - filter after slice / filter on queryset where results have been limited

having trouble understanding why I can't filter after a slice on a queryset and what is happening.
stuff = stuff.objects.all()
stuff.count()
= 7
If I then go
extra_stuff = stuff.filter(stuff_flag=id)
extra_stuff.count()
= 6. Everything is all good and I have my new queryset in extrastuff no issues
stuff = stuff.objects.all()[:3]
extra_stuff = stuff.filter(stuff_flag=id)
I get the error "Cannot filter a query once a slice has been taken."
How can I filter further on a queryset where I have limited the number of results?
You can't use filter() after you have sliced the queryset. The error is pretty explicit.
Cannot filter a query once a slice has been taken.
You could do the filter in Python
stuff = stuff.objects.all()[:3]
extra_stuff = [s for s in stuff if s.stuff_flag=='flag']
To get the number or items in extra_stuff, just use len()
extra_stuff_count = len(extra_stuff)
Doing the filtering in Python will work fine when the size of stuff is very small, as in this case. If you had a much larger slice, you could use a subquery, however this might have performance issues as well, you would have to test.
extra_stuff = Stuff.objects.filter(id__in=stuff, stuff_flag='flag')
Django gives you that error because it's already retrieved the items from the database by that point. The filter method is only useful to refine the database query before actually executing it.
Since you're only getting three objects, you could just do the extra filtering in Django:
extra_stuff = [s for s in stuff if s.stuff_flag==id]
but I wonder why you don't do the filter before slicing.
Just made the filtering first after that create another variable and slice it like that:
extra_stuff = stuff.objects.filter(stuff_flag=id)
the_sliced_stuff = extra_stuff[:3]
It works well
Just do 2 queries.
total_stuff = StuffClass.objects.count()
extra_stuff = StuffClass.filter(stuff_flag=id)[:3]
extra_stuff_count = len(StuffClass.filter(stuff_flag=id))
Note, if extra_stuff_count is a few count, like 3 or 300.
Because, it's need more memory for more count (in this case, just make one more request).

Querying a list in mongoengine; contains vs in

I have a ListField in a model with ids (ReferenceField), and I need to do a query if a certain id is in that list. AFAIK I have 2 options for this:
Model.objects.filter(refs__contains='59633cad9d4bc6543aab2f39')
or:
Model.objects.filter(refs__in=['59633cad9d4bc6543aab2f39'])
Which one is the most efficient for this use case?
The model looks like:
class Model(mongoengine.Document):
refs = mongoengine.ListField(mongoengine.ReferenceField(SomeOtherModel))
From what I can read in the mongoengine documentation, contains is really a string query, but it works surprisingly here as well. But I'm guessing that __in is more efficient since it should be optimized for lists, or am I wrong?
The string queries normally under the covers are all regex query so would be less efficient. However, the exception is when testing against reference fields! The following queries are:
Model.objects.filter(refs__contains="5305c92956c02c3f391fcaba")._query
{'refs': ObjectId('5305c92956c02c3f391fcaba')}
Which is a direct lookup.
Model.objects.filter(refs__in=["5305c92956c02c3f391fcaba"])._query
{'refs': {'$in': [ObjectId('5305c92956c02c3f391fcaba')]}}
This probably is less efficient, but would probably be extremely marginal. The biggest impact would be the number of docs and whether or not the refs field has an index.

django queryset runtime - get nth entry in constant time

I'm using multiple ways to get data from db via different django querysets,
but I would like to know the runtime for each queryset and if possible a better way (to maybe get data in constant time!!)
qs = MyModel.objects.order_by('-time')
qs = qs.filter(blah = blah)
to get the first entry I'm doing this:
entry = list(qs[:1])
first_entry = entry[0]
or to get 10th and last entry:
entry = list(qs)
some_entry = entry[9]
last_entry = entry[-1]
but I believe this will take O(n) time, is there anyway to get the nth term in constant time?
I dont want to use get() as I dont know the id or other value of the entry(its sorted), but only the position.
I may also use annotate, but this also take O(n) runtime.
MyModel.objects.values('date').annotate(min_value=Min('value')).order_by('min_value')[0]
I know the position just need that entry in constant time?
From the docs:
Use a subset of Python’s array-slicing syntax to limit your QuerySet to a certain number of results. This is the equivalent of SQL’s LIMIT and OFFSET clauses.
Generally, slicing a QuerySet returns a new QuerySet – it doesn’t evaluate the query. An exception is if you use the “step” parameter of Python slice syntax.
To retrieve a single object rather than a list (e.g. SELECT foo FROM bar LIMIT 1), use a simple index instead of a slice.
https://docs.djangoproject.com/en/dev/topics/db/queries/#limiting-querysets
The part about not evaluating the queryset as you slice it is the important part.

Union on ValuesQuerySet in django

I've been searching for a way to take the union of querysets in django. From what I read you can use query1 | query2 to take the union... This doesn't seem to work when using values() though. I'd skip using values until after taking the union but I need to use annotate to take the sum of a field and filter on it and since there's no way to do "group by" I have to use values(). The other suggestions I read were to use Q objects but I can't think of a way that would work.
Do I pretty much need to just use straight SQL or is there a django way of doing this?
What I want is:
q1 = mymodel.objects.filter(date__lt = '2010-06-11').values('field1','field2').annotate(volsum=Sum('volume')).exclude(volsum=0)
q2 = mymodel.objects.values('field1','field2').annotate(volsum=Sum('volume')).exclude(volsum=0)
query = q1|q2
But this doesn't work and as far as I know I need the "values" part because there's no other way for Sum to know how to act since it's a 15 column table.
QuerySet.values() does not return a QuerySet, but rather a ValuesQuerySet, which does not support this operation. Convert them to lists then add them.
query = list(q1) + list(q2)

Categories

Resources