Django queryset and GROUP BY - python

I'm struggling with django querysets and GROUP BY queries, I know there are plenty of similar questions, but I really don't get it:/
I would like to be able to create a request similar to this one (SQLite):
SELECT MAX(fb_game_score.value), fb_game_fbuser.first_name, fb_game_fbuser.last_name
FROM fb_game_score
JOIN fb_game_game ON (fb_game_score.game_id = fb_game_game.id)
JOIN fb_game_fbuser ON (fb_game_game.user_id = fb_game_fbuser.id)
GROUP BY fb_game_fbuser.fb_user_id;
The query is quite simple, it lists the users scores by showing only the best score for each players.
For clarification here's the model classes:
class FBUser(AbstractUser):
fb_user_id = models.CharField(max_length=100, null=True)
oauth_token = models.CharField(max_length=1024, null=True)
expires = models.IntegerField(null=True)
highest_score = models.IntegerField(null=True)
class Game(models.Model):
identifier = models.CharField(max_length=100, db_index=True)
user = models.ForeignKey(settings.AUTH_USER_MODEL, related_name='games')
class Score(models.Model):
game = models.ForeignKey(Game, related_name='scores')
value = models.IntegerField()
date = models.DateTimeField(auto_now=True)
timestamp = models.FloatField(default=0)
inter = models.BooleanField(default=False)

There's no high-level group_by in the queryset. It's used in calls to aggregate and annotate but it is not available to you.
There's a low-level API which is not documented at all. You can get an internal query description:
queryset = ... #whatever query you'd want to group by
query = queryset.query
and then you can alter the group_by member -which is a list- by adding a field which you'd want to group by:
query.group_by.append('a_field')
But:
you have to seriously know what you're doing.
there's no guarantee of stability of this API.
The current alternative for this is falling back to a raw (django.db.connection.* methods) SQL query.
Edit: I just saw this 3rd-party application which could help you with reports. I don't know if you can use in-code reports, or you have to limit yourself to in-view reports (i.e.: don't know if you can process reports in code or just have them as final results).

Related

Error inserting data into Django database field with a OneToOnefield

I've asked this question before and tried to Google it but I've had no luck, so I have simplified my question. I have two very simple models: one holds some shift numbers and the other holds some data related to the sale of gift cards during a shift. In this case, we have an employee who worked shift "1234" and sold $200.45 and $43.67 worth of gift card from each of two terminals. The models are below:
class Data_Shifts(models.Model):
shift_id = models.CharField(max_length=25, primary_key=True, db_column="shift_id", verbose_name="Shift ID")
def __str__(self):
return str(self.shift_id)
class Data_GiftCards(models.Model):
shift_id = models.OneToOneField('Data_Shifts', on_delete=models.CASCADE, primary_key=True, db_column="shift_id", verbose_name="Shift ID")
net_sales_terminal_1 = models.DecimalField(max_digits=8, decimal_places=2, default=0)
net_sales_terminal_2 = models.DecimalField(max_digits=8, decimal_places=2, default=0)
def __str__(self):
return str(self.shift_id)
I then try to insert some test data into the table using the following command:
Data_GiftCards.objects.create(shift_id="1234", net_sales_terminal_1="200.45", net_sales_terminal_2="43.67")
Upon submitting the web form, I get the following error:
Cannot assign "'1234'": "Data_GiftCards.shift_id" must be a "Data_Shifts" instance.
I am boggled by this. I have a workaround that bypasses django and inserts directly into the table successfully, but this is dirty and I'd prefer to use the proper Pythonic Django way. What am I doing wrong here?
Many thanks in advance.
That is because you name your field shift_id. Django's ORM maps the name of the field in your model to an instance of the related model, not to an ID.
You can still work with IDs instead of instances, but then you have to add _id to the end of your field name.
In your case, you have two options, you can simply do that, which would mean your query should look like:
Data_GiftCards.objects.create(shift_id_id="1234", net_sales_terminal_1=200.45, net_sales_terminal_2=43.67)
But shift_id_id looks redundant, so you can tweak the other end and remove the _id suffix in your model:
class Data_GiftCards(models.Model):
shift = models.OneToOneField('Data_Shifts', on_delete=models.CASCADE, primary_key=True, db_column="shift_id", verbose_name="Shift ID")
net_sales_terminal_1 = models.DecimalField(max_digits=8, decimal_places=2, default=0)
net_sales_terminal_2 = models.DecimalField(max_digits=8, decimal_places=2, default=0)
def __str__(self):
return str(self.shift)
Then you will have to query as you are doing, but you should not use strings if the field types are numeric.
Data_GiftCards.objects.create(shift_id="1234", net_sales_terminal_1=200.45, net_sales_terminal_2=43.67)
Also, you don't need the attribute db_column="shift_id". If your field name is shift, the name of the field in the database table will already be shift_id.

Django translating MySQL query to Django Query

How can i get the equivalent of this MySQL script in Django views? The depart_city, arrive_city, and travel_date will be inputed by the user.
Here is my Models
class Driver(models.Model):
first_name = models.CharField(max_length=30, null=True, blank=False)
last_name = models.CharField(max_length=30, null=True, blank=False)
class Schedule(models.Model):
depart_time = models.TimeField()
arrive_time = models.TimeField()
class TravelDate(models.Model):
start_date = models.DateField(null = True)
interval = models.IntegerField(null = True)
class Route(models.Model):
depart_city = models.CharField(max_length=50, null=True, blank=False)
arrive_city = models.CharField(max_length=50, null=True, blank=False)
driver = models.ForeignKey(Driver)
schedule = models.ForeignKey(Schedule)
traveldate = models.ForeignKey(TravelDate)
Here is my MySQL script. This works when i run it on MySQL workbench but I'm not sure how to translate this to Django Query
SELECT busapp_route.depart_city, busapp_route.arrive_city, busapp_driver.first_name, busapp_schedule.depart_time
FROM (((busapp_route INNER JOIN busapp_driver ON busapp_route.driver_id = busapp_driver.id)
INNER JOIN busapp_schedule ON busapp_route.schedule_id = busapp_schedule.id)
INNER JOIN busapp_traveldate ON busapp_route.traveldate_id = busapp_traveldate.id)
WHERE busapp_route.depart_city='Tropoje' AND busapp_route.arrive_city='Tirane'
AND (DATEDIFF('2017-11-26', busapp_traveldate.start_date) % busapp_traveldate.interval = 0);
Django doesn't support MySQL DATEDIFF natively. As a workaround, you could use something like this:
from django.db.models.expressions import RawSQL
routes = Route.objects\
.values('depart_city', 'arrive_city', 'driver__first_name', 'schedule__depart_time', 'traveldate__start_date')\
.annotate(datediff_mod=RawSQL("DATEDIFF(%s, busapp_traveldate.start_date) MOD busapp_traveldate.interval", ('2017-11-26', )))\
.filter(depart_city='Tropoje', arrive_city='Tirane', datediff_mod = 0)
I don't use MySQL so I couldn't test this, but I'm pretty sure it should work or maybe at least give you an idea how to implement it.
I don't think that it's a good idea, trying to solve this from the SQL side.
And what you're building there, is not trivial as there are a lot of things to keep in mind.
Your current Model setup lags some points:
What if a user searches for a travel in next year? With your model you'll have to create entries for all possible dates in the future, that's nearly unusable.
Think also about separating times and dates in different models, as they usually should be kept together, cause a ride from A to B at date x and time y is one object.
It might help to have a look at django-scheduler, they solved some troubles with the use of Events and Occurences. Have a look at their pretty good docs.

Using annotate or extra to add field of foreignkey to queryset ? (equivalent of SQL "AS" ?)

I have merged two querysets (qs1 and qs2) which individually work fine, as follows:
qlist = [qs1, qs2]
results = list(chain(qs1, qs2))
So far, so good - the above works. But now I'm trying to order the results using the following:
qlist = [qs1, qs2]
results = sorted(chain(qs1, qs2), key=attrgetter('monthly_fee'))
The problem is that the second queryset (qs2) refers to the monthly_fee through a ForeignKey; whereas qs1 has 'monthly_fee' available. Here is qs2:
qs2 = Offer.objects.select_related('subscription')
qs2 = qs2.order_by(subscription__monthly_fee)
And the simplified models:
class Subscription(models.Model):
monthly_fee = models.IntegerField(null=False, blank=True, default=0)
name = models.CharField(max_length=120, null=True, blank=True)
class Offer(models.Model):
promotion_name = models.CharField(max_length=120, null=True, blank=True)
subscription = models.ForeignKey(Subscription)
discount = models.IntegerField(null=False, blank=True, default=0)
I've tried using .annotate() and .extra() to rename the subscription__monthly_fee in the query qs2 as follows:
qs2 = Offer.objects.select_related('subscription').annotate(monthly_fee=subscription__monthly_fee)
But then get the error
global name 'subscription__monthly_fee' is not defined
I am at the point of just hacking this by over-riding the .save() methods of my models to manually add the monthly_fee to each Offer instance whenever an object is created. But just wanted to check whether there isn't a better way ?
Thank you,
Michael
I've used an F expression to achieve this sort of renaming before. Try this:
from django.db.models import F
qs2 = Offer.objects.select_related('subscription').annotate(monthly_fee=F('subscription__monthly_fee'))
OK, I found a way to do this.
qs2 = Offer.objects.select_related('subscription').extra(select={'monthly_fee':'mobile_subscription.monthly_fee'})
where 'mobile' is the name of the Django app. I didn't realize that .extra DOES allow you to follow foreign keys but that you actually have to specify the actual database table and use SQL dot notation.
Is the above the actual correct way we are supposed to do it ? (i.e. dropping in raw SQL table names/fields)
I had been trying to use Django syntax such as .extra(select={'monthly_fee':'subscription__monthly_fee'}) which doesn't work!

Most efficient Django query to return results spanning multiple tables

I am trying to do a pretty complex query in Django in the most efficient way and I am not sure how to get started. I have these models (this is a simplified version)
class Status(models.Model):
status = models.CharField(max_length=200)
class User(models.Model):
name = models.CharField(max_length=200)
class Event(models.Model):
user = models.ForeignKey(User)
class EventItem(models.Model):
event = models.ForeignKey(Event)
rev1 = models.ForeignKey(Status, related_name='rev1', blank=True, null=True)
rev2 = models.ForeignKey(Status, related_name='rev2', blank=True, null=True)
active = models.BooleanField()
I want to create a query that will result in a list of Users that have the most events in which all their dependent EventItems have rev1 and rev2 are not blank or nulland active = True.
I know I could do this by iterating through the list of users and then checking all their events for the matching rev1, rev2, and active criteria and then return those events, but this is heavy on the database. Any suggestions?
Thanks!
Your model is broken, but this should sum up what you were doing in a cleaner way.
class Status(models.Model):
status = models.CharField(max_length=200)
class User(models.Model):
name = models.CharField(max_length=200)
events = models.ManyToManyField('Event')
class Event(models.Model):
rev1 = models.ForeignKey(Status, related_name='rev1', blank=True, null=True)
rev2 = models.ForeignKey(Status, related_name='rev2', blank=True, null=True)
active = models.BooleanField()
And the query
User.objects.filter(events__active=True).exclude(Q(events__rev1=None)|Q(events__rev2=None)).annotate(num_events=Count('events')).order_by('-num_events')
This will return a list of users, sorted by the number of events in their set.
For more information check out Many-To-Many fields.
I want to create a query that will result in a list of Users that have the most events in which all their dependent EventItems have rev1 and rev2 are not blank or null and active = True.
First, you want Event objects which always have this type of EventItem.
events = Event.objects.filter(active=True)
events = events.exclude(eventitem__rev1__isnull=True)
events = events.exclude(eventitem__rev1='')
events = events.exclude(eventitem__rev2__isnull=True)
events = events.exclude(eventitem__rev2='')
Also, you didn't specify if you wanted to deal with Event objects that have no EventItem. You can filter those out with:
events = events.exclude(eventitem__isnull=True)
Note that events may contain plenty of duplicates. You can throw in an events.distinct() if you like, but should only do that if you need it human-readable.
Once you have those, you can now extract the User objects that you want:
users = User.objects.filter(event__in=events)
Note that on certain database backends, *ahem* MySQL *ahem*, you may find that the .filter(field__in=QuerySet) pattern is really slow. For that case, the code should be:
users = User.objects.filter(event__in=list(events.values_list('pk', flat=True)))
You may then order things by the number of Event objects attached:
from django.db.models import Count
active_users = users.annotate(num_events=Count('event')).order_by('-num_events')
You can try something like:
EventItem.objects.exclude(rev1=None).exclude(rev2=None).filter(active=True).values_list('event__user', flat=True)
That will give you a flat list of user ids, where the frequency of each id is how many EventItem objects that user has.
You may be able to do better, and integrate this into a query using .annotate(), but I'm not sure how to right now.

Django: follow relations backwards

Hey, I have models like this:
class Galleries(models.Model):
creation_date = models.DateTimeField()
name = models.CharField(max_length=255, unique=True)
gallery_type = models.ForeignKey(Categories)
class Categories(models.Model):
handle = models.CharField(max_length=255, unique=True)
class Values(models.Model):
category = models.ForeignKey(Categories)
language = models.CharField(max_length=7)
category_name = models.CharField(max_length=50)
And now, I just want to reach the values of categories by starting from Galleries. For example: galleries = Galleries.objects.get(id=1). And now I want to reach somehow the values by using this "galleries" object... To get values with specific language would be much more better... I miss skills in Django ORM, so if you can, please point me to some docs or give some code example. Thanks!
galleries = Galleries.objects.get(id=1)
values = galleries.gallery_type.values_set.filter(language='language')
Interestingly, you used the exact wording that the docs use to refer to the related field lookups. I always found the definition strange to the gut, maybe because they put it in quotes.
FOLLOWING RELATIONSHIPS "BACKWARD"
http://docs.djangoproject.com/en/1.2/topics/db/queries/#following-relationships-backward
You may want to use the select_related method of objects so you reduce the number of queries you are making. select_related
gallery = Galleries.objects.select_related().get(id=1)
You can set a related name for the Values model in the category fk:
class Values(models.Model):
category = models.ForeignKey(Categories, related_name="categories")
language = models.CharField(max_length=7)
category_name = models.CharField(max_length=50)
now you can get your list of values for a specific language by doing
values = gallery.gallery_type.categories.filter(language="language")

Categories

Resources