I'd like to know how Django's order_by works if the given order_by field's values are same for a set of records. Consider I have a score field in DB and I'm filtering the queryset using order_by('score'). How will records having the same values for score arrange themselves?
Every time, they're ordered randomly within the subset of records having equal score and this breaks the pagination at client side. Is there a way to override this and return the records in a consistent order?
I'm Using Django 1.4 and PostgreSQL.
As the other answers correctly explain, order_by() accepts multiple arguments. I'd suggest using something like:
qs.order_by('score','pk') #where qs is your queryset
I recommend using 'pk' (or '-pk') as the last argument in these cases, since every model has a pk field and its value is never the same for 2 records.
order_by can have multiple params, I think order_by('score', '-create_time') will always return the same queryset.
If I understand correctly, I think you need consistently ordered result set every time, You can use something like order_by('score','id') that will first order by the score first and then by the auto-increment id within the score having same values, hence your output being consistent. The documentation is here. You need to be explicit in the order_by if you want to fetch correct result set every time, using 'id' is one of the ways.
Related
I am not trying to get all duplicate querysets and I am not trying to compare two queryset results and see of the two comparison are the same.
Somehow, I have no idea how this happened yet but sometimes there are records being saved twice and some not. So when I do querying my result would sometimes have duplicates.
For example, I am just doing Model.objects.filter(user='myname')
I would get maybe 50 instances back, all of them should be different. But somehow maybe id 11 and 12 have the exact same values then 23 and 24 has the same values and the others are totally fine.
So actually within those 50 instances, there are 2 duplicates which means I should only have 48 instances instead of 50 to be accurrate.
Is there a way to check the queryset and return only one of the duplicates with other querysets if the values are totally the same?
you can do that using distinct in your orm query eg.Model.objects.filter(user='myname').distinct('field_name')
this will give you only distinct values based on the field that you provide.
Above .distinct('field_name') is one solution. But this only works with POSTGRE SQL. But doesn't work with MySQL.
For POSTGRESQL Simply after your queryset add below statement:
queryset = queryset.distinct('field_name')
For MySQL. Simply after your queryset add below statement:
queryset = queryset.distinct()
I'm using the following resource in tastypie:
class ChoiceResource(LtgModelWithUuidResource):
"""
Resource for the choice model
"""
explanation = fields.ForeignKey(MultiLangTextFieldResource,attribute='explanation',full=True,null=True,
use_in=DisableOnPatch())
question = fields.ForeignKey('ltg_backend_app.base.api.question.QuestionResource',attribute='question',
use_in=DisableOnPatch())
keywords = fields.ManyToManyField(XrayResource,attribute='keywords',null=True,
use_in=DisableOnPatch())
class Meta(LtgResource.Meta):
queryset = Choice.objects.select_related('explanation','question').\
all().prefetch_related('keywords')
allowed_methods = ['get']
authentication = ApiKeyAuthentication()
authorization = Authorization()
order_by = ['id',]
THE PROBLEM :
When querying for
api/v1/choice/?limit=100&offset=200
a choice with id = 615 is included in the results.
When querying for
api/v1/choice/?limit=100&offset=2400
the choice with id = 615 is returning AGAIN
The total_count returning by the api is correct (6010 objects).
THE POSSIBLE CAUSE:
When inspecting the generated SQL query, There is no ORDER BY in the generated SQL, however there are OFFSET and LIMIT set.
Quoting the Postgresql Documentation:
When using LIMIT, it is a good idea to use an ORDER BY clause that constrains the result rows into a unique order. Otherwise you will get an unpredictable subset of the query's rows---you may be asking for the tenth through twentieth rows, but tenth through twentieth in what ordering? The ordering is unknown, unless you specified ORDER BY.
The query optimizer takes LIMIT into account when generating a query plan, so you are very likely to get different plans (yielding different row orders) depending on what you give for LIMIT and OFFSET. Thus, using different LIMIT/OFFSET values to select different subsets of a query result will give inconsistent results unless you enforce a predictable result ordering with ORDER BY. This is not a bug; it is an inherent consequence of the fact that SQL does not promise to deliver the results of a query in any particular order unless ORDER BY is used to constrain the order.
THE SOLUTION I'VE FOUND:
I've added order_by to the queryset, and now it looks like this:
Choice.objects.select_related('explanation','question').\
all().prefetch_related('keywords').order_by('id)
THE QUESTION:
Im not sure if I missing something here, or what I'm doing wrong.
Any clarification would be greatly appreciated.
As I see it, I shouldn't include the order_by on the django queryset or in the url query params for it to work, or at least it's not specified in the tastypie docs.
Thanks.
The ID usually starts with 1. However this is not always the case. Regardless, the default sorting would be in the order it was inserted, so this is expected behavior.
If you would like to override this you can, like previously mentioned, set the ordering parameter in the Media class. Or also you can pass in order_by in the GET query.
In your specific case, you need an .order_by in the Queryset because the field id is not part of the Tastypie resource; only the Django Model.
Ref: Tastypie source
In general, it is a good practice to order your fields as close to the data as possible. If the field exists in the Django Model - order it using the queryset allowing an optimal SQL call. But if the field is computed on the Tastypie resource then let Tastypie's order_by handle it.
I have a model, Reading, which has a foreign key, Type. I'm trying to get a reading for each type that I have, using the following code:
for type in Type.objects.all():
readings = Reading.objects.filter(
type=type.pk)
if readings.exists():
reading_list.append(readings[0])
The problem with this, of course, is that it hits the database for each sensor reading. I've played around with some queries to try to optimize this to a single database call, but none of them seem efficient. .values for instance will provide me a list of readings grouped by type, but it will give me EVERY reading for each type, and I have to filter them with Python in memory. This is out of the question, as we're dealing with potentially millions of readings.
if you use PostgreSQL as your DB backend you can do this in one-line with something like:
Reading.objects.order_by('type__pk', 'any_other_order_field').distinct('type__pk')
Note that the field on which distinct happens must always be the first argument in the order_by method. Feel free to change type__pk with the actuall field you want to order types on (e.g. type__name if the Type model has a name property). You can read more about distinct here https://docs.djangoproject.com/en/dev/ref/models/querysets/#distinct.
If you do not use PostgreSQL you could use the prefetch_related method for this purpose:
#reading_set could be replaced with whatever your reverse relation name actually is
for type in Type.objects.prefetch_related('reading_set').all():
readings = type.reading_set.all()
if len(readings):
reading_list.append(readings[0])
The above will perform only 2 queries in total. Note I use len() so that no extra query is performed when counting the objects. You can read more about prefetch_related here https://docs.djangoproject.com/en/dev/ref/models/querysets/#prefetch-related.
On the downside of this approach is you first retrieve all related objects from the DB and then just get the first.
The above code is not tested, but I hope it will at least point you towards the right direction.
I have a simple Python/Django class:
class myModel(models.Model):
date = models.DateTimeField()
value = models.IntegerField()
and I want to get two elements from my database. First is the newest element and the second is newest positive element. So I can do this like this:
myModel.objects.all().order_by('-date')[:1][0]
myModel.objects.filter(value__gte = 0).order_by('-date')[:1][0]
Note those [:1][0] at the end - this is because I want to get maximum use of database sql engine. The thing is that I still need two queries and I want to combine it into a single one (something like [:2] at the end which will produce the result I want). I know about Django's Q, but can't figure out how to use it in this context. Maybe some raw sql? I'm waiting for ideas. :)
This looks like premature optimisation to me. Is two queries instead of one really so bad? At the moment, anyone who knows the Django ORM can understand your two queries. After you've replaced it with some funky raw SQL, that might not be the case.
You should use [0] instead of [:1][0]. Django knows how to slice querysets efficiently -- both queries will result in the exact same SQL.
This doesn't fully answer your question, but you can get rid of those [:1][0] and order_by by using latest QuerySet method, it will return the latest element in the QuerySet using the argument provided as a date field.
I have 2 models:
ParentModel: 'just' sits there
ChildModel: has a foreign key to ParentModel
ParentModel.objects.filter(childmodel__in=ChildModel.objects.all()) gives multiple occurrences of ParentModel.
How do I query all ParentModels that have at least one ChildModel that's referring to it? And without multiple occurrences...
You almost got it right...
ParentModel.objects.filter(childmodel__in=ChildModel.objects.all()).distinct()
You might want to avoid using childmodel__in=ChildModel.objects.all() if the number of ChildModel objects is large. This will generate SQL with all ChildModel id's enumerated in a list, possibly creating a huge SQL query.
If you can use Django 1.1 with aggregation support, you could do something like:
ParentModel.objects.annotate(num_children=Count('child')).filter(num_children__gte=1)
which should generate better SQL.