Django: comparison on extra fields - python

Short question: Is there a way in Django to find the next row, based on the alphabetical order of some field, in a case-insensitive way?
Long question: I have some words in the database, and a detail view for them. I would like to be able to browse the words in alphabetical order. So I need to find out the id of the previous and next word in alphabetical order. Right now what I do is the following (original is the field that stores the name of the word):
class Word(models.Model):
original = models.CharField(max_length=50)
...
def neighbours(self):
"""
Returns the words adjacent to a given word, in alphabetical order
"""
previous_words = Word.objects.filter(
original__lt=self.original).order_by('-original')
next_words = Word.objects.filter(
original__gt=self.original).order_by('original')
previous = previous_words[0] if len(previous_words) else None
next = next_words[0] if len(next_words) else None
return previous, next
The problem is that this does a case-sensitive comparison, so Foo appears before bar, which is not what I want. To avoid this problem, in another view - where I list all words, I have made use of a custom model manager which adds an extra field, like this
class CaseInsensitiveManager(models.Manager):
def get_query_set(self):
"""
Also adds an extra 'lower' field which is useful for ordering
"""
return super(CaseInsensitiveManager, self).get_query_set().extra(
select={'lower': 'lower(original)'})
and in the definition of Word I add
objects = models.Manager()
alpha = CaseInsensitiveManager()
In this way I can do queries like
Word.alpha.all().order_by('lower')
and get all words in alphabetical order, regardless of the case. But I cannot do
class Word(models.Model):
original = models.CharField(max_length=50)
...
objects = models.Manager()
alpha = CaseInsensitiveManager()
def neighbours(self):
previous_words = Word.objects.filter(
lower__lt=self.lower()).order_by('-lower')
next_words = Word.objects.filter(
lower__gt=self.lower()).order_by('lower')
previous = previous_words[0] if len(previous_words) else None
next = next_words[0] if len(next_words) else None
return previous, next
Indeed Django will not accept field lookups based on extra fields. So, what am I supposed to do (short of writing custom SQL)?
Bonus questions: I see at least to more problems in what I am doing. First, I'm not sure about performance. I assume that no queries at all are performed when I define previous_words and next_words, and the only lookup in the database will happen when I define previous and next, yielding a query which is more or less
SELECT Word.original, ..., lower(Word.original) AS lower
WHERE lower < `foo`
ORDER BY lower DESC
LIMIT 1
Is this right? Or am I doing something which will slow down the database too much? I don't know enough details about the inner workings of the Django ORM.
The second problem is that I actually have to cope with words in different languages. Given that I know the language for each word, is there a way to get them in alphabetical order even if they have non-ASCII characters. For instance I'd want to have méchant, moche in this order, but I get moche, méchant.

The database should be able to do this sorting for you, and it should be able to do so without the "lower" function.
Really what you need to fix is the database collation and encoding.
For example, if you are using mysql you could use the character set utf8 and collation utf8_general_ci
If that collation doesn't work for you, you can try other collations depending on your needs and database. But using an extra field and a function in the query is an ugly workaround that is going to slow the app down.
There are many collations options available in mysql and postgresql too:
http://dev.mysql.com/doc/refman/5.5/en/charset-mysql.html
http://stackoverflow.com/questions/1423378/postgresql-utf8-character-comparison
But this is definitely a good chance to optimise at the db level.

Related

Django - Filter GTE LTE for alphanumeric IDs

I am trying to serve up our APIs to allow filtering capabilities using LTE and GTE on our ID look ups. However, the IDs we have are alphanumeric like AB:12345, AB:98765 and so on. I am trying to do the following on the viewset using the Django-Filter:
class MyFilter(django_filters.FilterSet):
item_id = AllLookupsFilter()
class Meta:
model = MyModel
fields = {
'item_id': ['lte', 'gte']
}
But the issue is, if I query as: http://123.1.1.1:7000/my-entities/?item_id__gte=AB:1999 or even http://123.1.1.1:7000/my-entities/?item_id__lte=AB:100 it won't exactly return items with ID's greater than 1999 or less than 100 since the filter will take ID as a string and tries to filter by every character. Any idea how to achieve so I can filter on IDs so I get items exactly greater / less than the numeric ID (ignoring the initial characters) ?
What you'll want to do is write a custom lookup. You can read more about them here: https://docs.djangoproject.com/en/2.0/howto/custom-lookups/
The code sample below has everything you need to define your own except the actual function. For that part of the example check the link.
from django.db.models import Lookup
#Field.register_lookup
class NotEqual(Lookup):
lookup_name = 'ne'
In the lookup, you'll need to split the string and then search based on your own parameters. This will likely require you to do one of the following:
Write custom SQL that you can pass through Django into your query.
Request a large number of results containing the subset you're looking for and filter that through Python, returning only the important bits.
What you're trying to accomplish is usually called Natural Sorting, and tends to be very difficult to do on the SQL side of things. There's a good trick, explained well here: https://www.copterlabs.com/natural-sorting-in-mysql/ However, the highlights are simple in the SQL world:
Sort by Length first
Sort by column value second

QuerySet optimization

I need to find a match between a serial number and a list of objects, each of them having a serial number :
models:
class Beacon(models.Model):
serial = models.CharField(max_length=32, default='0')
First I wrote:
for b in Beacon.objects.all():
if b.serial == tmp_serial:
# do something
break
Then I did one step ahead:
b_queryset = Beacon.objects.all().filter(serial=tmp_serial)
if b_queryset.exists():
#do something
Now, is there a second step for more optimization?
I don't think it would be faster to cast my QuerySet in a List and do a list.index('tmp_serial').
If your serial is unique, you can do:
# return a single instance from db
match = Beacon.objects.get(serial=tmp_serial)
If you have multiple objects to get with the same serial and plan do something on each of them, exist will add a useless query.
Instead, you should do:
matches = Beacon.objects.filter(serial=tmp_serial)
if len(matches) > 0:
for match in matches:
# do something
The trick here is that len(matches) will force the evaluation of the queryset (so your db will be queried). After that,
model instances are retrieved and you can use them without another query.
However, when you use queryset.exists(), the ORM run a really simple query to check if the queryset would have returned any element.
Then, if you iterate over your queryset, you run another query to grab your objects. See the related documentation for more details.
To sum it up: use exists only if you want to check that a queryset return a result a result or not. If you actually need the queryset data, use len().
I think you are at best but if you just want whether object exists or not then,
From django queryset exists()
https://docs.djangoproject.com/en/1.8/ref/models/querysets/#django.db.models.query.QuerySet.exists
if Beacon.objects.all().filter(serial=tmp_serial).exists():
# do something

Sort table by a column and set other column to sequential value to persist ordering

So I'm not sure whether to pose this as a Django or SQL question however I have the following model:
class Picture(models.Model):
weight = models.IntegerField(default=0)
taken_date = models.DateTimeField(blank=True, null=True)
album = models.ForeignKey(Album, db_column="album_id", related_name='pictures')
I may have a subset of Picture records numbering in the thousands, and I'll need to sort them by taken_date and persist the order by setting the weight value.
For instance in Django:
pictures = Picture.objects.filter(album_id=5).order_by('taken_date')
for weight, picture in enumerate(list(pictures)):
picture.weight = weight
picture.save()
Now for 1000s of records as I'm expecting to have, this could take way too long. Is there a more efficient way of performing this task? I'm assuming I might need to resort to SQL as I've recently come to learn Django's not necessarily "there yet" in terms of database bulk operations.
Ok I put together the following in MySQL which works fine, however I'm gonna guess there's no way to simulate this using Django ORM?
UPDATE picture p
JOIN (SELECT #inc := #inc + 1 AS new_weight, id
FROM (SELECT #inc := 0) temp, picture
WHERE album_id = 5
ORDER BY taken_date) pw
ON p.id = pw.id
SET p.weight = pw.new_weight;
I'll leave the question open for a while just in case there's some awesome solution or app that solves this, however the above query for ~6000 records takes 0.11s.
NOTE that the above query will generate warnings in MySQL if you have the following setting in MySQL:
binlog_format=statement
In order to fix this, you must change the binlog_format setting to either mixed or row. mixed is probably better as it means you'll still use statement for everything except in cases where row is required to avoid a warning like the above.

Variable interpolation in python/django, django query filters [duplicate]

Given a class:
from django.db import models
class Person(models.Model):
name = models.CharField(max_length=20)
Is it possible, and if so how, to have a QuerySet that filters based on dynamic arguments? For example:
# Instead of:
Person.objects.filter(name__startswith='B')
# ... and:
Person.objects.filter(name__endswith='B')
# ... is there some way, given:
filter_by = '{0}__{1}'.format('name', 'startswith')
filter_value = 'B'
# ... that you can run the equivalent of this?
Person.objects.filter(filter_by=filter_value)
# ... which will throw an exception, since `filter_by` is not
# an attribute of `Person`.
Python's argument expansion may be used to solve this problem:
kwargs = {
'{0}__{1}'.format('name', 'startswith'): 'A',
'{0}__{1}'.format('name', 'endswith'): 'Z'
}
Person.objects.filter(**kwargs)
This is a very common and useful Python idiom.
A simplified example:
In a Django survey app, I wanted an HTML select list showing registered users. But because we have 5000 registered users, I needed a way to filter that list based on query criteria (such as just people who completed a certain workshop). In order for the survey element to be re-usable, I needed for the person creating the survey question to be able to attach those criteria to that question (don't want to hard-code the query into the app).
The solution I came up with isn't 100% user friendly (requires help from a tech person to create the query) but it does solve the problem. When creating the question, the editor can enter a dictionary into a custom field, e.g.:
{'is_staff':True,'last_name__startswith':'A',}
That string is stored in the database. In the view code, it comes back in as self.question.custom_query . The value of that is a string that looks like a dictionary. We turn it back into a real dictionary with eval() and then stuff it into the queryset with **kwargs:
kwargs = eval(self.question.custom_query)
user_list = User.objects.filter(**kwargs).order_by("last_name")
Additionally to extend on previous answer that made some requests for further code elements I am adding some working code that I am using
in my code with Q. Let's say that I in my request it is possible to have or not filter on fields like:
publisher_id
date_from
date_until
Those fields can appear in query but they may also be missed.
This is how I am building filters based on those fields on an aggregated query that cannot be further filtered after the initial queryset execution:
# prepare filters to apply to queryset
filters = {}
if publisher_id:
filters['publisher_id'] = publisher_id
if date_from:
filters['metric_date__gte'] = date_from
if date_until:
filters['metric_date__lte'] = date_until
filter_q = Q(**filters)
queryset = Something.objects.filter(filter_q)...
Hope this helps since I've spent quite some time to dig this up.
Edit:
As an additional benefit, you can use lists too. For previous example, if instead of publisher_id you have a list called publisher_ids, than you could use this piece of code:
if publisher_ids:
filters['publisher_id__in'] = publisher_ids
Django.db.models.Q is exactly what you want in a Django way.
This looks much more understandable to me:
kwargs = {
'name__startswith': 'A',
'name__endswith': 'Z',
***(Add more filters here)***
}
Person.objects.filter(**kwargs)
A really complex search forms usually indicates that a simpler model is trying to dig it's way out.
How, exactly, do you expect to get the values for the column name and operation?
Where do you get the values of 'name' an 'startswith'?
filter_by = '%s__%s' % ('name', 'startswith')
A "search" form? You're going to -- what? -- pick the name from a list of names? Pick the operation from a list of operations? While open-ended, most people find this confusing and hard-to-use.
How many columns have such filters? 6? 12? 18?
A few? A complex pick-list doesn't make sense. A few fields and a few if-statements make sense.
A large number? Your model doesn't sound right. It sounds like the "field" is actually a key to a row in another table, not a column.
Specific filter buttons. Wait... That's the way the Django admin works. Specific filters are turned into buttons. And the same analysis as above applies. A few filters make sense. A large number of filters usually means a kind of first normal form violation.
A lot of similar fields often means there should have been more rows and fewer fields.

Filter and sort music info on Google App Engine

I've enjoyed building out a couple simple applications on the GAE, but now I'm stumped about how to architect a music collection organizer on the app engine. In brief, I can't figure out how to filter on multiple properties while sorting on another.
Let's assume the core model is an Album that contains several properties, including:
Title
Artist
Label
Publication Year
Genre
Length
List of track names
List of moods
Datetime of insertion into database
Let's also assume that I would like to filter the entire collection using those properties, and then sorting the results by one of:
Publication year
Length of album
Artist name
When the info was added into the database
I don't know how to do this without running into the exploding index conundrum. Specifically, I'd love to do something like:
Albums.all().filter('publication_year <', 1980).order('artist_name')
I know that's not possible, but what's the workaround?
This seems like a fairly general type of application. The music albums could be restaurants, bottles of wine, or hotels. I have a collection of items with descriptive properties that I'd like to filter and sort.
Is there a best practice data model design that I'm overlooking? Any advice?
There's a couple of options here: You can filter as best as possible, then sort the results in memory, as Alex suggests, or you can rework your data structures for equality filters instead of inequality filters.
For example, assuming you only want to filter by decade, you can add a field encoding the decade in which the song was recorded. To find everything before or after a decade, do an IN query for the decades you want to span. This will require one underlying query per decade included, but if the number of records is large, this can still be cheaper than fetching all the results and sorting them in memory.
Since storage is cheap, you could create your own ListProperty based indexfiles with key_names that reflect the sort criteria.
class album_pubyear_List(db.Model):
words = db.StringListProperty()
class album_length_List(db.Model):
words = db.StringListProperty()
class album_artist_List(db.Model):
words = db.StringListProperty()
class Album(db.Model):
blah...
def save(self):
super(Album, self).save()
# you could do this at save time or batch it and do
# it with a cronjob or taskqueue
words = []
for field in ["title", "artist", "label", "genre", ...]:
words.append("%s:%s" %(field, getattr(self, field)))
word_records = []
now = repr(time.time())
word_records.append(album_pubyear_List(parent=self, key_name="%s_%s" %(self.pubyear, now)), words=words)
word_records.append(album_length_List(parent=self, key_name="%s_%s" %(self.album_length, now)), words=words)
word_records.append(album_artist_List(parent=self, key_name="%s_%s" %(self.artist_name, now)), words=words)
db.put(word_records)
Now when it's time to search you create an appropriate WHERE clause and call the appropriate model
where = "WHERE words = " + "%s:%s" %(field-a, value-a) + " AND " + "%s:%s" %(field-b, value-b) etc.
aModel = "album_pubyear_List" # or anyone of the other key_name sorted wordlist models
indexes = db.GqlQuery("""SELECT __key__ from %s %s""" %(aModel, where))
keys = [k.parent() for k in indexes[offset:numresults+1]] # +1 for pagination
object_list = db.get(keys) # returns a sorted by key_name list of Albums
As you say, you can't have an inequality condition on one field and an order by another (or inequalities on two fields, etc, etc). The workaround is simply to use the "best" inequality condition to get data in memory (where "best" means the one that's expected to yield the least data) and then further refine it and order it by Python code in your application.
Python's list comprehensions (and other forms of loops &c), list's sort method and the sorted built-in function, the itertools module in the standard library, and so on, all help a lot to make these kinds of tasks quite simple to perform in Python itself.

Categories

Resources