How to get penultimate item from QuerySet in Django?

How to get penultimate item from QuerySet in Django? - python

How to get the penultimate item from Django QuerySet? I tried my_queryset[-2] (after checking whether the my_queryset length is greater than 1) as follows:
if len(my_queryset)>1:
query = my_queryset[-2]
and it returns:
Exception Value: Negative indexing is not supported.
Is there some "Django" way to get such item?
The only thing which comes to my mind is to reverse the queryset and get my_queryset[2] but I'm not sure about its efficiency.
EDIT:
scans = self.scans.all().order_by('datetime')
if len(scans)>1:
scan = scans[-2]

This code which produces an error
scans = self.scans.all().order_by('datetime')
if len(scans)>1:
scan = scans[-2]
Is the equivalent of
scans = self.scans.all().order_by('-datetime')
if len(scans)>1:
scan = scans[1]
If you want to get the second one the index to use is 1 and not 2 because in python offsets starts from 0.
Also note that django querysets are lazy which means you can change your mind about ordering without a performance hit provided that proper indexes are available.
QuerySets are lazy – the act of creating a QuerySet doesn’t involve
any database activity. You can stack filters together all day long,
and Django won’t actually run the query until the QuerySet is
evaluated. Take a look at this example:

Related

Django - Filter GTE LTE for alphanumeric IDs

I am trying to serve up our APIs to allow filtering capabilities using LTE and GTE on our ID look ups. However, the IDs we have are alphanumeric like AB:12345, AB:98765 and so on. I am trying to do the following on the viewset using the Django-Filter:
class MyFilter(django_filters.FilterSet):
item_id = AllLookupsFilter()
class Meta:
model = MyModel
fields = {
'item_id': ['lte', 'gte']
}
But the issue is, if I query as: http://123.1.1.1:7000/my-entities/?item_id__gte=AB:1999 or even http://123.1.1.1:7000/my-entities/?item_id__lte=AB:100 it won't exactly return items with ID's greater than 1999 or less than 100 since the filter will take ID as a string and tries to filter by every character. Any idea how to achieve so I can filter on IDs so I get items exactly greater / less than the numeric ID (ignoring the initial characters) ?

What you'll want to do is write a custom lookup. You can read more about them here: https://docs.djangoproject.com/en/2.0/howto/custom-lookups/
The code sample below has everything you need to define your own except the actual function. For that part of the example check the link.
from django.db.models import Lookup
#Field.register_lookup
class NotEqual(Lookup):
lookup_name = 'ne'
In the lookup, you'll need to split the string and then search based on your own parameters. This will likely require you to do one of the following:
Write custom SQL that you can pass through Django into your query.
Request a large number of results containing the subset you're looking for and filter that through Python, returning only the important bits.
What you're trying to accomplish is usually called Natural Sorting, and tends to be very difficult to do on the SQL side of things. There's a good trick, explained well here: https://www.copterlabs.com/natural-sorting-in-mysql/ However, the highlights are simple in the SQL world:
Sort by Length first
Sort by column value second

Django: How to filter DB records by 0-3 criteria?

My goal is to have a search form (currently with 3 fields, but later possibly with more) which could be used for filtering products (In my case a product is training that can be filtered by sport, province and city). All those filter fields shall be optional, so I want to ignore when the POST value is either None (sport and province are FKs to related tables) or an empty string (city).
I need to chain those three condition into the Training.objects.filter() call, but I need to omit conditions, that are not actually used. And I also need to check for None values as I am getting RelatedObjectDoesNotExist exception.
This is what I have so far, but it is not good (only when all 3 conditions are properly filled):
trainings = Training.objects.filter(sport = searchQuery.sport.sport_id).filter(province = searchQuery.province.province_id).filter(city = searchQuery.city)
I tried to use conditional expression inside filter() to avoid exceptions, but either I am not able to elaborate the syntax correctly, or it is not possible - I was stopped by invalid syntax error...
I was even thinking of dumb solution checking inputs in IF clauses and have different query for each case, but even with 3 params it has many different options already, so I doubt this is the way when I plan to add more filters later.
Any suggestions? This seems like kinda trivial task to me, but so far I was unable to google the right solution :(

Django Queryset is lazy loaded so following is not an issue for it
trainings = Training.objects.filter(sport = searchQuery.sport.sport_id)
if something:
trainings = trainings.filter(province = searchQuery.province.province_id)
You don't need to have each separate option depending on a state of your filter variables.
also you can do it by catching exceptions

For example, you could apply filters conditionally:
trainings = Training.objects.all()
if searchQuery.sport:
trainings = trainings.filter(sport=searchQuery.sport)
if searchQuery.province:
trainings = trainings.filter(province=searchQuery.province)
if searchQuery.city:
trainings = trainings.filter(city=searchQuery.city)
for training in trainings: # this will be the point where the actual database call occurs
# do something

QuerySet optimization

I need to find a match between a serial number and a list of objects, each of them having a serial number :
models:
class Beacon(models.Model):
serial = models.CharField(max_length=32, default='0')
First I wrote:
for b in Beacon.objects.all():
if b.serial == tmp_serial:
# do something
break
Then I did one step ahead:
b_queryset = Beacon.objects.all().filter(serial=tmp_serial)
if b_queryset.exists():
#do something
Now, is there a second step for more optimization?
I don't think it would be faster to cast my QuerySet in a List and do a list.index('tmp_serial').

If your serial is unique, you can do:
# return a single instance from db
match = Beacon.objects.get(serial=tmp_serial)
If you have multiple objects to get with the same serial and plan do something on each of them, exist will add a useless query.
Instead, you should do:
matches = Beacon.objects.filter(serial=tmp_serial)
if len(matches) > 0:
for match in matches:
# do something
The trick here is that len(matches) will force the evaluation of the queryset (so your db will be queried). After that,
model instances are retrieved and you can use them without another query.
However, when you use queryset.exists(), the ORM run a really simple query to check if the queryset would have returned any element.
Then, if you iterate over your queryset, you run another query to grab your objects. See the related documentation for more details.
To sum it up: use exists only if you want to check that a queryset return a result a result or not. If you actually need the queryset data, use len().

I think you are at best but if you just want whether object exists or not then,
From django queryset exists()
https://docs.djangoproject.com/en/1.8/ref/models/querysets/#django.db.models.query.QuerySet.exists
if Beacon.objects.all().filter(serial=tmp_serial).exists():
# do something

Django Haystack Distinct Value for Field

I am building a small search engine using Django Haystack + Elasticsearch + Django REST Framework, and I'm trying to figure out reproduce the behavior of a Django QuerySet's distinct method.
My index looks something like this:
class ItemIndex(indexes.SearchIndex, indexes.Indexable):
text = indexes.CharField(document=True, use_template=True)
item_id = indexes.IntegerField(faceted=True)
def prepare_item_id(self, obj):
return obj.item_id
What I'd like to be able to do is the following:
sqs = SearchQuerySet().filter(content=my_search_query).distinct('item_id')
However, Haystack's SearchQuerySet doesn't have a distinct method, so I'm kind of lost. I tried faceting the field, and then querying Django using the returned list of item_id's, but this loses the performance of Elasticsearch, and also makes it impossible to use Elasticsearch's sorting features.
Any thoughts?
EDIT:
Example data:
Example data:
Item Model
==========
id title
1 'Item 1'
2 'Item 2'
3 'Item 3'
VendorItem Model << the table in question
================
id item_id vendor_id lat lon
1 1 1 38 -122
2 2 1 38.2 -121.8
3 3 2 37.9 -121.9
4 1 2 ... ...
5 2 2 ... ...
6 2 3 ... ...
As you can see, there are multiple VendorItem's for the same Item, however when searching I only want to retrieve at most one result for each item. Therefore I need the item_id column to be unique/distinct.
I have tried faceting on the item_id column, and then executing the following query:
facets = SearchQuerySet().filter(content=query).facet('item_id')
counts = sqs.facet_counts()
# ids will look like: [345, 892, 123, 34,...]
ids = [i[0] for i in counts['fields']['item_id']]
items = VendorItem.objects.filter(vendor__lat__gte=latMin,
vendor__lon__gte=lonMin, vendor__lat__lte=latMax,
vendor__lon__lte=lonMax, item_id__in=ids).distinct(
'item').select_related('vendor', 'item')
The main problem here is that results are limited to 100 items, and they cannot be sorted with haystack.

I think the best advice I can give you is to stop using Haystack.
Haystack's default backend (the elasticsearch_backend.py) is mostly written with Solr in mind. There are a lot of annoyances that I find in haystack, but the biggest has to be that it packs all queries into something called query_string. Using query string, they can use the lucene syntax, but it also means losing the entire elasticsearch DSL. The lucene syntax has some advantages, especially if this is what you are used to, but it is very limiting from an elasticsearch point of view.
Furthermore, I think you are applying an RDBMS concept to a search engine. That isn't to say that you shouldn't get the results you need, but the approach is often different.
The way you might query and retrieve this data might be different if you don't use haystack because haystack creates indexes in a way more appropriate for solr than for elasticsearch.
For example, in creating a new index, haystack will assign a "type" called "modelresult" to all models that will go in an index.
So, let's say you have some entities called Items and some other entities called vendoritems.
It might be appropriate to have them both in the same index but with vendoritems as a type of vendoritems and items having a type of items.
When querying, you would then query based on the rest endpoint so, something like localhost:9200/index/type (query). The way haystack achieves is this is through the django content types module. Accordingly, there is a field called "django_ct" that haystack queries and attaches to any query you might make when you are only looking for unique items.
To illustrate the above:
This endpoint searches accross all indexes
`localhost:9200/`
This endpoint searches across all types in an index:
`localhost:9200/yourindex/`
This endpoint searches in a type within an index:
`localhost:9200/yourindex/yourtype/`
and this endpoint searches in two specified types within an index:
`localhost:9200/yourindex/yourtype,yourothertype/`
Back to haystack though, you can possibly get unique values by adding a django_ct to your query, but likely that isn't what you want.
What you really want to do is a facet, and probably you want to use term facets. This could be a problem in haystack because it A.) analyzes all text and B.) applies store=True to all fields (really not something you want to do in elasticsearch, but something you often want to do in solr).
You can order facet results in elasticsearch (http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-facets-terms-facet.html#_ordering)
I don't mean for this to be a slam on haystack. I think it does a lot of things right conceptually. It's especially good if all you need to do is index a single model (like say a blog) and just have it quickly return results.
That said, I highly recommend to use elasticutils. Some of the concepts from haystack are similar, but it uses the search dsl, rather than query_string (but you can still use query_string if you wanted).
Be warned though, I don't think you can order facets using elasticutils by default, but you can just pass in a python dictionary of the facets you want to facet_raw method (something I don't think you can do in haystack).
Your last option is to create your own haystack backend, inherit from the existing backend and just add some functionality to the .facet() method to allow for ordering per the above dsl.

Django: comparison on extra fields

Short question: Is there a way in Django to find the next row, based on the alphabetical order of some field, in a case-insensitive way?
Long question: I have some words in the database, and a detail view for them. I would like to be able to browse the words in alphabetical order. So I need to find out the id of the previous and next word in alphabetical order. Right now what I do is the following (original is the field that stores the name of the word):
class Word(models.Model):
original = models.CharField(max_length=50)
...
def neighbours(self):
"""
Returns the words adjacent to a given word, in alphabetical order
"""
previous_words = Word.objects.filter(
original__lt=self.original).order_by('-original')
next_words = Word.objects.filter(
original__gt=self.original).order_by('original')
previous = previous_words[0] if len(previous_words) else None
next = next_words[0] if len(next_words) else None
return previous, next
The problem is that this does a case-sensitive comparison, so Foo appears before bar, which is not what I want. To avoid this problem, in another view - where I list all words, I have made use of a custom model manager which adds an extra field, like this
class CaseInsensitiveManager(models.Manager):
def get_query_set(self):
"""
Also adds an extra 'lower' field which is useful for ordering
"""
return super(CaseInsensitiveManager, self).get_query_set().extra(
select={'lower': 'lower(original)'})
and in the definition of Word I add
objects = models.Manager()
alpha = CaseInsensitiveManager()
In this way I can do queries like
Word.alpha.all().order_by('lower')
and get all words in alphabetical order, regardless of the case. But I cannot do
class Word(models.Model):
original = models.CharField(max_length=50)
...
objects = models.Manager()
alpha = CaseInsensitiveManager()
def neighbours(self):
previous_words = Word.objects.filter(
lower__lt=self.lower()).order_by('-lower')
next_words = Word.objects.filter(
lower__gt=self.lower()).order_by('lower')
previous = previous_words[0] if len(previous_words) else None
next = next_words[0] if len(next_words) else None
return previous, next
Indeed Django will not accept field lookups based on extra fields. So, what am I supposed to do (short of writing custom SQL)?
Bonus questions: I see at least to more problems in what I am doing. First, I'm not sure about performance. I assume that no queries at all are performed when I define previous_words and next_words, and the only lookup in the database will happen when I define previous and next, yielding a query which is more or less
SELECT Word.original, ..., lower(Word.original) AS lower
WHERE lower < `foo`
ORDER BY lower DESC
LIMIT 1
Is this right? Or am I doing something which will slow down the database too much? I don't know enough details about the inner workings of the Django ORM.
The second problem is that I actually have to cope with words in different languages. Given that I know the language for each word, is there a way to get them in alphabetical order even if they have non-ASCII characters. For instance I'd want to have méchant, moche in this order, but I get moche, méchant.

The database should be able to do this sorting for you, and it should be able to do so without the "lower" function.
Really what you need to fix is the database collation and encoding.
For example, if you are using mysql you could use the character set utf8 and collation utf8_general_ci
If that collation doesn't work for you, you can try other collations depending on your needs and database. But using an extra field and a function in the query is an ugly workaround that is going to slow the app down.
There are many collations options available in mysql and postgresql too:
http://dev.mysql.com/doc/refman/5.5/en/charset-mysql.html
http://stackoverflow.com/questions/1423378/postgresql-utf8-character-comparison
But this is definitely a good chance to optimise at the db level.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.