I've been searching for a way to take the union of querysets in django. From what I read you can use query1 | query2 to take the union... This doesn't seem to work when using values() though. I'd skip using values until after taking the union but I need to use annotate to take the sum of a field and filter on it and since there's no way to do "group by" I have to use values(). The other suggestions I read were to use Q objects but I can't think of a way that would work.
Do I pretty much need to just use straight SQL or is there a django way of doing this?
What I want is:
q1 = mymodel.objects.filter(date__lt = '2010-06-11').values('field1','field2').annotate(volsum=Sum('volume')).exclude(volsum=0)
q2 = mymodel.objects.values('field1','field2').annotate(volsum=Sum('volume')).exclude(volsum=0)
query = q1|q2
But this doesn't work and as far as I know I need the "values" part because there's no other way for Sum to know how to act since it's a 15 column table.
QuerySet.values() does not return a QuerySet, but rather a ValuesQuerySet, which does not support this operation. Convert them to lists then add them.
query = list(q1) + list(q2)
Related
What's the difference between having multiple nested lookups inside queryset.filter and queryset.exclude?
For example car ratings. User can create ratings of multiple types for any car.
class Car(Model):
...
class Rating(Model):
type = ForeignKey('RatingType') # names like engine, design, handling
user = ... # user
Let's try to get a list of cars without rating by user "a" and type "design".
Approach 1
car_ids = Car.objects.filter(
rating__user="A", rating__type__name="design"
).values_list('id',flat=True)
Car.objects.exclude(id__in=car_ids)
Approach 2
Car.objects.exclude(
rating__user="A", rating__type__name="design"
)
The Approach 1 works well to me whereas the Approach 2 looks to be excluding more cars. My suspicion is that nested lookup inside exclude does not behave like AND (for the rating), rather it behaves like OR.
Is that true? If not, why these two approaches results in different querysets?
Regarding filter, "multiple parameters are joined via AND in the underlying SQL statement". Your first approach results not in one but in two SQL queries roughly equivalent to:
SELECT ... WHERE rating.user='A' AND rating.type.name='design';
SELECT ... WHERE car.id NOT IN (id1, id2, id3 ...);
Here's the part of the documentation that answers your question very precisely regarding exclude:
https://docs.djangoproject.com/en/stable/ref/models/querysets/#exclude
The evaluated SQL query would look like:
SELECT ... WHERE NOT (rating.user='A' AND rating.type.name='design')
Nested lookups inside filter and exclude behave similarly and use AND conditions. At the end of the day, most of the time, your 2 approaches are indeed equivalent... Except that the Car table might have been updated between the 1st and the 2d query of your approach 1.
Are you sure it's not the case? To be sure, try maybe to wrap the 2 lines of approach 1 in a transaction.atomic block? In any case, your second approach is probably the best (the less queries, the better).
If you have any doubt, you can also have a look at the evaluated queries (see here or here).
I have a QuerySet, let's call it qs, which is ordered by some attribute which is irrelevant to this problem. Then I have an object, let's call it obj. Now I'd like to know at what index obj has in qs, as efficiently as possible. I know that I could use .index() from Python or possibly loop through qs comparing each object to obj, but what is the best way to go about doing this? I'm looking for high performance and that's my only criteria.
Using Python 2.6.2 with Django 1.0.2 on Windows.
If you're already iterating over the queryset and just want to know the index of the element you're currently on, the compact and probably the most efficient solution is:
for index, item in enumerate(your_queryset):
...
However, don't use this if you have a queryset and an object obtained by some unrelated means, and want to learn the position of this object in the queryset (if it's even there).
If you just want to know where you object sits amongst all others (e.g. when determining rank), you can do it quickly by counting the objects before you:
index = MyModel.objects.filter(sortField__lt = myObject.sortField).count()
Assuming for the purpose of illustration that your models are standard with a primary key id, then evaluating
list(qs.values_list('id', flat=True)).index(obj.id)
will find the index of obj in qs. While the use of list evaluates the queryset, it evaluates not the original queryset but a derived queryset. This evaluation runs a SQL query to get the id fields only, not wasting time fetching other fields.
QuerySets in Django are actually generators, not lists (for further details, see Django documentation on QuerySets).
As such, there is no shortcut to get the index of an element, and I think a plain iteration is the best way to do it.
For starter, I would implement your requirement in the simplest way possible (like iterating); if you really have performance issues, then I would use some different approach, like building a queryset with a smaller amount of fields, or whatever.
In any case, the idea is to leave such tricks as late as possible, when you definitely knows you need them.
Update: You may want to use directly some SQL statement to get the rownumber (something lie . However, Django's ORM does not support this natively and you have to use a raw SQL query (see documentation). I think this could be the best option, but again - only if you really see a real performance issue.
It's possible for a simple pythonic way to query the index of an element in a queryset:
(*qs,).index(instance)
This answer will unpack the queryset into a list, then use the inbuilt Python index function to determine it's position.
You can do this using queryset.extra(…) and some raw SQL like so:
queryset = queryset.order_by("id")
record500 = queryset[500]
numbered_qs = queryset.extra(select={
'queryset_row_number': 'ROW_NUMBER() OVER (ORDER BY "id")'
})
from django.db import connection
cursor = connection.cursor()
cursor.execute(
"WITH OrderedQueryset AS (" + str(numbered_qs.query) + ") "
"SELECT queryset_row_number FROM OrderedQueryset WHERE id = %s",
[record500.id]
)
index = cursor.fetchall()[0][0]
index == 501 # because row_number() is 1 indexed not 0 indexed
Anyone know why this query_set doesn't return any values for me? Using filter separately, it works perfectly, so it seems .filter().filter() together is the wrong approach to filter for 'either or'.
ticket_query = request.event.tickets.filter(status='on-sale').filter(status='paused').prefetch_related('ticket_tax')
filter() with multiple parameters joins them with AND statements:
https://docs.djangoproject.com/en/2.0/ref/models/querysets/#filter
To perform OR queries in django you can use Q objects:
from django.db.models import Q
ticket_query = request.event.tickets.filter(Q(status='on-sale') | Q(status='paused')).prefetch_related('ticket_tax')
More details here:
https://docs.djangoproject.com/en/2.0/topics/db/queries/#complex-lookups-with-q
request.event.tickets.filter(status='on-sale') returns all objects with status='on-sale', and you are looking for objects with status='paused' in that list, which is why you are getting an empty queryset.
Chaining 2 filters is fine if it is a ManyToManyField where you can have multiple objects for the same field OR if you are chaining for 2 separate fields eg. request.event.tickets.filter(status='on-sale').filter(is_available=True), which returns all tickets that are on sale and are available.
The simplest approach for this problem would be to use __in in the filter like this: ticket_query = request.event.tickets.filter(status__in=['on-sale', 'paused']).prefetch_related('ticket_tax').
Hope this helps. :)
Chaining filter like you have done applies the subsequent filter to the queryset returned by the previous filter, i.e. it acts like an AND condition, not OR.
You can use the | operator to combine two queries:
ticket_query = request.event.tickets.filter(status='on-sale') | request.event.tickets.filter(status='paused')
I need to develop a query to find MF001317-077944-01 in the database, but the string provided(which I must use to search), is without the -.
So I am currently using:
select * from sims where replace(pack, "-", "") = "MF00131707794401";
sqlAlchemy equivalent:
s.query(Sims).filter(func.replace(Sims.pack, "-", "") == "MF00131707794401").all()
But it is taking to long. It is taking, on average 1min 22s, I need to get is well under 1 second.
I have considered using wildcards, but I do not know if that is the best way of approaching my problem.
Is there a way to optimize the replace query?
or is there a better way of achieving what I want i.e, manipulating the string in python to get MF001317-077944-01?
oh.. I should also mention that it might not always be the same, for example, two different pack numbers might be XAN002-026-001 or CK10000579-020-3.
Any help would be appreciated :).
You must find a way to avoid a table scan.
Several Options:
1) create an index on your "pack" column and put the "-" into the search String before Querying. Will only work when you know where to put the "-" in the search string (e.g. when they always at the same positions). This is the easiest way.
2) create an additional column "pack_search". Fill it with replace(pack, "-", ""). Create an INSERT OR UPDATE Trigger to update its value when rows are updated or inserted. Create an Index on that column and use that column for your query.
3) nicer: create a View on the table with a modified pack column and an Index on that view (dunno if that works on mysql, postgres can definitely do that). Use that view vor your Query. For further speedup you could materialize that view if the table is much more read than written or if a lag is ok for the query results (e.g. if the table is updated nightly and you query for an Online Service).
4) maybe it can be done by using a functional Index
In my case, I have a number of column names coming from a form. I want to filter to make sure they're all true. Here's how I currently do it:
for op in self.cleaned_data['options']:
cars = cars.filter((op, True))
Now it works but there are are a possible ~40 columns to be tested and it therefore doesn't appear very efficient to keep querying.
Is there a way I can condense this into one filter query?
Build the query as a dictionary and use the ** operator to unpack the options as keyword arguments to the filter method.
op_kwargs = {}
for op in self.cleaned_data['options']:
op_kwargs[op] = True
cars = CarModel.objects.filter(**op_kwargs)
This is covered in the django documentation and has been covered on SO as well.
Django's query sets are lazy, so what you're currently doing is actually pretty efficient. The database won't be hit until you try to access one of the fields in the QuerySet... assuming, that is, that you didn't edit out some code, and it is effectively like this:
cars = CarModel.objects.all()
for op in self.cleaned_data['options']:
cars = cars.filter((op, True))
More information here.