Doing "group by" in django but still retaining complete object

Doing "group by" in django but still retaining complete object - python

I want to do a GROUP BY in Django. I saw answers on Stack Overflow that recommend:
Member.objects.values('designation').annotate(dcount=Count('designation'))
This works, but the problem is you're getting a ValuesQuerySet instead of a QuerySet, so the queryset isn't giving me full objects but only specific fields. I want to get complete objects.
Of course, since we're grouping we need to choose which object to take out of each group; I want a way to specify the object (e.g. take the one with the biggest value in a certain field, etc.)
Does anyone know how I can do that?

If you're willing to make two queries, you could do the following:
dcounts = Member.objects.values('id', 'designation').annotate(dcount=Count('designation')).order_by('-dcount')
member = Member.objects.get(id=dcounts.first()['id'])
If you wanted the top five objects by dcount, you could do the following:
ids = [dcount['id'] for dcount in dcounts[:5]]
members = Member.objects.filter(id__in=ids)

It sounds like you don't necessarily need to GROUP BY, but just want to limit your selection to one item per field (eg, the MAX value of a certain field).
Can you try getting distinct objects by field, such as
In Postgres
Member.objects.order_by('designation').distinct('designation')
In any other database
Member.objects.distinct('designation')
https://docs.djangoproject.com/en/dev/ref/models/querysets/#django.db.models.query.QuerySet.distinct

Related

Index of row looping over django queryset [duplicate]

I have a QuerySet, let's call it qs, which is ordered by some attribute which is irrelevant to this problem. Then I have an object, let's call it obj. Now I'd like to know at what index obj has in qs, as efficiently as possible. I know that I could use .index() from Python or possibly loop through qs comparing each object to obj, but what is the best way to go about doing this? I'm looking for high performance and that's my only criteria.
Using Python 2.6.2 with Django 1.0.2 on Windows.

If you're already iterating over the queryset and just want to know the index of the element you're currently on, the compact and probably the most efficient solution is:
for index, item in enumerate(your_queryset):
...
However, don't use this if you have a queryset and an object obtained by some unrelated means, and want to learn the position of this object in the queryset (if it's even there).

If you just want to know where you object sits amongst all others (e.g. when determining rank), you can do it quickly by counting the objects before you:
index = MyModel.objects.filter(sortField__lt = myObject.sortField).count()

Assuming for the purpose of illustration that your models are standard with a primary key id, then evaluating
list(qs.values_list('id', flat=True)).index(obj.id)
will find the index of obj in qs. While the use of list evaluates the queryset, it evaluates not the original queryset but a derived queryset. This evaluation runs a SQL query to get the id fields only, not wasting time fetching other fields.

QuerySets in Django are actually generators, not lists (for further details, see Django documentation on QuerySets).
As such, there is no shortcut to get the index of an element, and I think a plain iteration is the best way to do it.
For starter, I would implement your requirement in the simplest way possible (like iterating); if you really have performance issues, then I would use some different approach, like building a queryset with a smaller amount of fields, or whatever.
In any case, the idea is to leave such tricks as late as possible, when you definitely knows you need them.
Update: You may want to use directly some SQL statement to get the rownumber (something lie . However, Django's ORM does not support this natively and you have to use a raw SQL query (see documentation). I think this could be the best option, but again - only if you really see a real performance issue.

It's possible for a simple pythonic way to query the index of an element in a queryset:
(*qs,).index(instance)
This answer will unpack the queryset into a list, then use the inbuilt Python index function to determine it's position.

You can do this using queryset.extra(…) and some raw SQL like so:
queryset = queryset.order_by("id")
record500 = queryset[500]
numbered_qs = queryset.extra(select={
'queryset_row_number': 'ROW_NUMBER() OVER (ORDER BY "id")'
})
from django.db import connection
cursor = connection.cursor()
cursor.execute(
"WITH OrderedQueryset AS (" + str(numbered_qs.query) + ") "
"SELECT queryset_row_number FROM OrderedQueryset WHERE id = %s",
[record500.id]
)
index = cursor.fetchall()[0][0]
index == 501 # because row_number() is 1 indexed not 0 indexed

How to get the equalent of python [:-1] in django ORM?

I am writing a Django application where I want to get all the items but last from a query. My query goes like this:
objects = Model.objects.filter(name='alpha').order_by('rank')[:-1]
but it throws out error:
Assertion Error: Negative indexing not supported.
Any idea where I am going wrong?
Any suggestions will be appreciated.

You can use QuerySet.last() to get the last and use its id for excluding it from results.
objects = Model.objects.filter(name='alpha').order_by('rank')
last = objects.last()
objects = objects.exclude(pk=last.pk)
A query for excluding from the result all objects ranked with the minimum value found in DB:
objects = Model.objects.annotate(
mini_rank=Min('rank'), # Annotate each object with the minimum known rank
).exclude(
mini_rank=F('rank') # Exclude all objects ranked with the minimum value found
)

EDITED
Django does not support negative indexing on QuerySets. Please read https://code.djangoproject.com/ticket/13089 for more information.
The quick and "dirty" way to do it is to convert the Queryset as a list and then use the negative indexing.
objects = list( Model.objects.filter(name='alpha').order_by('rank') )[:-1]
Please do note that the objects variable is no longer a queryset but a list.
However i would recommend using .exclude() method.
If you wish to use the .exclude() method, which i recommend, I would like to ask you to read the solution #RaydelMiranda has wrote below.

Negative indexing is not allowed in Django.
However you can use negative indexing in order_by function and take the first or any number of objects in the order.
You can do something like this:
objects = Model.objects.filter(name='alpha').order_by('-rank')[n:]
Here n suggests the number of objects you will need. In your case it would be:
objects = Model.objects.filter(name='alpha').order_by('-rank')[1:]

query=model.objects.filter(user=request.user)
if query.exists():
query=query.last()

How to define list of enumerated values in Web2Py

I develop a website with Web2Py framework.
It provides a way to define enumerated values as given below.
I need to define a table as given below.
Field('state','string', length=10, requires=IS_IN_SET(('open','closed','not_open')))
Also, I can define a field which can list values as given below.
Field('emails','list:string')
But, what is the syntax to combine this?
I need to define the weekend days for an organization and this should be more than 1.
I tried the following.
db.define_table('organization',
Field('name','string', requires=IS_NOT_EMPTY()),
Field('description','text'),
Field('weekends','list:string', length=10, requires=IS_IN_SET(('sunday','monday','tuesday','wednesday','thursday','friday','saturday'))),
redefine=migrate_flag
)
But it only defines an enumeration with a single value.
I verify this in the new record creation in the Web2Py appadmin interface by creating a new database record there. I can enter only one value for the weekends field.
Can this be done in the 'web2py' way? Or will I have to resort to creating a new weekend table in the database and make a foreign key to the organization?

Use the "multiple" argument to allow/require multiple selections:
IS_IN_SET(('sunday','monday','tuesday','wednesday','thursday','friday','saturday'),
multiple=True)
Or if you want to require exactly two choices:
IS_IN_SET(('sunday','monday','tuesday','wednesday','thursday','friday','saturday'),
multiple=(2, 2))
If multiple is True, it will allow zero or more choices. multiple can also be a tuple specifying the minimum and maximum number of choices allowed.
The IS_IN_DB validator also takes the multiple argument.

Annotate with a filtered related object set

So I have a SensorType model, which has a collection of SensorReading objects as part of a sensorreading_set (i.e., the sensor type has many sensor readings). I want to annotate the sensor types to give me the sensor reading with the max id. To wit:
sensor_types = SensorType.objects.annotate(
newest_reading_id=Max('sensorreading__id'))
This works fantastically, but there's a catch. Sensor Readings have another foreign key, Device. What I really want is the highest sensor reading id for a given sensor type for a given device. Is it possible to have the annotation refer to a subset of sensor readings that basically amounts to SensorReading.objects.filter(device=device)?

Filtering works perfectly fine with related objects, and annotations work perfectly fine with those filters. What you need to do is:
from django.db.models import Max
SensorType.objects.filter(sensorreading__device=device) \
.annotate(newest_reading_id=Max('sensorreading__id'))
Note that the order of function calls matters. Using filter before annotate will only annotate on the filtered set, using annotate before filter will annotate on the complete set, and then filter. Also, when filtering on related objects, keep in mind that filter(sensorreading__x=x, sensorreading__y=y) will filter for sensorreadings where all conditions are true, while .filter(sensorreading__x=x).filter(sensorreading__y=y) will filter for sensorreadings where either one of these conditions is true.

You can use .extra for these type of Queries in Django:
Like this:
SensorType.objects.extra(
select={
"newest_reading_id": "SELECT MAX(id) FROM sensorreading WHERE sensorreading.sensortype_id = sensortype.id AND sensorreading.device_id=%s",
},
select_params = [device.id]
)
You can read more about .extra here : https://docs.djangoproject.com/en/1.6/ref/models/querysets/#django.db.models.query.QuerySet.extra

As I understood, you want to GROUP_BY on two fields, device_id and sensortype_id. This can be done using:
SensorReading.objects.all().values('device_id', 'sensortype_id').annotate(max=Max('id'))
I didn't tried it; it was taken from 2 different answers in SO, this one and this one.

Finding the distribution of a field in a list of objects?

I have a list of objects. Each object has a field called grade whose value is between 0 and 5. Now I want to see the distribution of this field across my list of objects. Is there any way to find it?
I know I can iterate over the whole objects and find it out but I don't want to do that.

As near as I can tell, using a table Table with a grade column you need something like this:
counts = Table.objects.annotate(count=Count("grade", distinct=True)
This adds a count attribute to each member of the counts query set, which you can access just like a regular database column.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Doing "group by" in django but still retaining complete object - python

Related

Index of row looping over django queryset [duplicate]

How to get the equalent of python [:-1] in django ORM?

How to define list of enumerated values in Web2Py

Annotate with a filtered related object set

Finding the distribution of a field in a list of objects?

Categories

Resources