Queryset API distinct() does not work?

Queryset API distinct() does not work? - python

class Message(models.Model):
subject = models.CharField(max_length=100)
pub_date = models.DateTimeField(default=datetime.now())
class Topic(models.Model):
title = models.CharField(max_length=100)
message = models.ManyToManyField(Message, verbose_name='Discussion')
I want to get order all the topics according to the latest message object attached to that topic.
I executed this query but this does not give the distinct queryset.
>> Topic.objects.order_by('-message__pub_date').distinct()

You don't need distinct() here, what you need is aggregation. This query will do what you want:
from django.db.models import Max
Topic.objects.annotate(Max('message__pub_date')).order_by('-message__pub_date__max')
Though if this is production code, you'll probably want to follow akaihola's advice and denormalize "last_message_posted" onto the Topic model directly.
Also, there's an error in your default value for Message.pub_date. As you have it now, whenever you first run the server and this code is loaded, datetime.now() will be executed once and that value will be used as the pub_date for all Messages. Use this instead to pass the callable itself so it isn't called until each Message is created:
pub_date = models.DateTimeField(default=datetime.now)

You'll find the explanation in the documentation for .distinct().
I would de-normalize by adding a modified_date field to the Topic model and updating it whenever a Message is saved or deleted.

Related

queryset value for SlugRelatedField when unique_together applies in django-rest

I'm building a simple API for an ESP8266 to connect to in an IoT application, passing a JSON string. In this application there are multiple Monitors (internet connected devices) per Site (location/address), and multiple LogEntries per Site/Monitor.
The API was originally setup with an endpoint like:
/api/logentries/
Posting a JSON string like:
{"site":"abcd","monitor":"xyz","data_point":"value"}
In the object model, Monitor is a child of Site, but for convenience of entry creation and reporting, the JSON format of the LogEntry posted by each device flattens this structure out, meaning that the LogEntry model also has a FK relationship for both Site and Monitor. In the code below, "textID" is the ID used within the context of the API for the Site/Monitor (e.g. PK values remain "hidden" for API callers).
In models.py:
class Site(models.Model):
name = models.CharField(max_length=32)
textID = models.CharField(max_length=32, blank=True, db_index=True, unique=True)
class Monitor(models.Model):
textID = models.CharField(max_length=32)
site = models.ForeignKey(Site, on_delete=models.CASCADE)
class Meta:
unique_together = ('site', 'textID')
class LogEntry(models.Model):
site = models.ForeignKey(Site, on_delete=models.CASCADE)
monitor = models.ForeignKey(Monitor, on_delete=models.CASCADE)
data_point = models.CharField(max_length=8, default='')
To get this to work on a single site, I created a custom serializer:
class LogEntrySerializer(serializers.HyperlinkedModelSerializer):
site = serializers.SlugRelatedField(slug_field='textID', queryset=Site.objects.all())
monitor = serializers.SlugRelatedField(slug_field='textID', queryset=Monitor.objects.filter())
class Meta:
model = LogEntry
fields = ('pk', 'site', 'monitor', 'data_point', )
This works for reading valid data, and saving when all monitor IDs are unique across sites.
However, if two sites have a Monitor with the same textID—e.g. "Site1/001" and "Site2/001" this breaks, as the Monitor.objects.all() results in multiple records being retrieved (which makes sense and is expected behaviour).
What I'm wanting to do is to have the second queryset (for monitor) limited to the specified site, to avoid this error.
This post almost answers my question, however it benefits from the second field value (user) being available in the request object, something that is not available in this case.
Is there a way I can retrieve the Site.pk or Site.textID for the queryset value to resolve correctly--e.g. queryset=Monitor.objects.filter(site__textID=xxx)--what would 'xxx' be? Or do I need to completely override the serializer (and not rely on SlugRelatedField)? Or some other approach that might work?
(As an aside: I recognise that this could be achieved by modifying the URL pattern to something like /api///logentries, which would then have this information available as part of the request/context and from a normalisation perspective would be better also. However this would require reflashing of a number of already deployed devices to reflect the changed API details, so I'd like to avoid such a change if possible, even though upon reflection this is probably a cleaner solution/approach long-term.)
Thanks in advance.

You'll need to write your own SlugRelatedField subclass. The unicity constraint that applies to a SlugRelatedField doesn't apply to your case.
This can be done by creating a subfield and overriding the get_value to retrieve the site/monitor tuple and to_internal_value to select the appropriate monitor.

Thanks to the pointers from Linovia, the following field class resolves the issue:
class MonitorRelatedField(serializers.Field):
def to_representation(self, obj):
return obj.textID
def get_value(self, data):
site_textID = data['site']
monitor_textID = data['monitor']
return ( site_textID, monitor_textID, )
def to_internal_value(self, data):
return Monitor.objects.get(site__textID=data[0], textID=data[1])

Changing queryset definition causes app requests to queue up and performance to dip severely in Django web app (new relic output included)

Dire performance situation: I have a web app where users leave messages and spectators upvote/downvote them. It's developed using Django. I have a ListView that populates the home page with users' latest messages.
My simple models.py contains:
class MessageVoteCountManager(models.Manager):
def get_query_set(self):
return super(MessageVoteCountManager, self).get_query_set().annotate(votes=Sum('vote__value'))
class Message(models.Model):
description = models.TextField()
submitted_by = models.ForeignKey(User) # django.contrib.auth.models user
submitted_on = models.DateTimeField(auto_now_add = True)
with_votes = MessageVoteCountManager()
class Vote(models.Model):
voter = models.ForeignKey(User)
message = models.ForeignKey(Message)
value = models.IntegerField(default = 0)
In my views.py, I have a ListView to support the functionality I described in the first paragraph. It's basic skeleton is like so:
class MessageListView(ListView):
model = Message
queryset = Message.objects.order_by('-submitted_on')[:120]
paginate_by = 10
Recently, I changed the queryset definition above to queryset = Message.with_votes.order_by('submitted_on')[:120] In other words, I annotated votes with each message, which is the sum of all votes a message instance has received. The moment this change went live, the following happened (new relic output). I reverted the commit at 6:27AM:
My best guess is that that I have half a million messages in my DB. Maybe once I change the queryset definition, requests queue up to recalculate all messages ever entered, and my app essentially suffers severely deprecated performance? I only want to apply this new queryset calculation to the latest 120 messages - hence I sliced the latest 120 after ordering them by date. This clearly didn't work.
How do I make sure the new queryset definition only applies to the latest messages, coming onwards from the point of changing the code? Something like: (Message.objects.order_by('submitted'on')[:120]).with_votes ?
Please ask me for more information if it's needed. My ultimate goal is to get better performance here.

Writing correct SQL query in extra() modifier in django queryset

I need help in writing some SQL for an extra() modifier used within a django queryset. Getting stuck in elegantly framing it, several retries have led to dead-ends. Help!
Background: I have a website where people submit interesting URLs (called Link in my models.py). Other users can post comments under each link posted (called Publicreply in my models.py).
Here's what I want to do: for every user, I want to produce a queryset that -
(i) contains only links the user commented under
(ii) is sorted by the timestamp of the most recent comment under every link. E.g. if a link has multiple comments, I want the most recent comment's timestamp to sort the said link in the queryset. Note that this comment can be from anyone (not necessarily from the user herself).
Now for some simple code, straight from my views.py:
class UserActivityView(ListView):
model = Link
slug_field = "username"
template_name = "user_activity.html"
paginate_by = 10
def get_queryset(self):
comments_by_user = Publicreply.objects.filter(submitter=user)
links_with_comments = [reply.answer_to.id for reply in replies_by_user.iterator()]
qset = Link.objects.filter(id__in=links_with_comments).extra(select={'submitted_on':"SELECT max('submitted_on') FROM 'links_publicreply' WHERE 'links_publicreply.answer_to' = 'links_link.id'"})
return qset
And models.py contains:
class Publicreply(models.Model):
submitter = models.ForeignKey(User)
answer_to = models.ForeignKey(Link)
submitted_on = models.DateTimeField(auto_now_add=True)
description = models.TextField(validators=[MaxLengthValidator(250)])
class Link(models.Model):
submitter = models.ForeignKey(User)
submitted_on = models.DateTimeField(auto_now_add=True)
url = models.URLField(max_length=250)
The SQL I wrote in the extra() modifier in my views.py under the get_queryset() method is not giving the desired result (I'm getting None). Note that I'm trying to use the extra() modifier to attach the most recent timestamp of a publicreply instance to its related link. I'll later sort the queryset according to the said timestamp.
Please help out in formulating the correct SQL in the extra() modifier. I use Django 1.5 and Python2.7. Thanks.

Include Queryset Key in String Format

I am trying to run a system command with the value of a field from a queryset but not having much luck. And it is returning 'Jobs' object is not subscriptable'. Please see below for relevant information.
Models.py
class Jobs(models.Model):
user_id = models.CharField(max_length=100)
template = models.CharField(max_length=100)
change_ref = models.CharField(max_length=100)
summary = models.CharField(max_length=100)
category = models.CharField(max_length=100)
Views
def delete_job(request, job_id):
record = Jobs.objects.get(pk=int(job_id))
os.system('mkdir /home/username/' + record['summary'])
return HttpResponseRedirect("/configen")
I am passing the job_id in through the URL which seems to work fine ( I can delete the record no problem). I was under the impression the 'get' would simply get one record, which I could then reference as a dictionary?
I'm sure there is a simple solution, it doesn't seem to work with string formatting either (using %s or .format()) methods.
Thank you in advance

You're correct that get does get one record, but wrong that you can reference it as a dictionary. It's a model instance, so you use the normal dot notation: record.summary.

Django Poll for new records

In ajax I am polling a django url to retrieve the latest records. I do not want to display any records I have retrieved previously and I only want to retrieve 1 record for each poll request.
What would be the best way to do this?

class Article(models.Model):
headline = models.CharField(max_length=100)
pub_date = models.DateField()
expire_date = models.DateField()
class Meta:
get_latest_by = 'pub_date'
>>> from mysite.models import Article
>>> Article.objects.latest()
If I'm not wrong in understanding your question, You may go for get_latest_by attribute ofMetaclass and call the methodlatest()` which may serve your purpose, in order not to retrieve the record twice you may use the obj.pk > your_prev_retired_pk.

Hmm. You could do it two ways that I can think of off the bat - there are surely more.
You can add a field called "already_retrieved" and set it to True for those fields that have already been retrieved, and then only grab Whatever.objects.filter(already_retrieved=False).
Also, if they are in order by a pk, you could just keep track of how far you are in the list of pk's.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Queryset API distinct() does not work? - python

You'll find the explanation in the documentation for .distinct(). I would de-normalize by adding a modified_date field to the Topic model and updating it whenever a Message is saved or deleted.

Related

queryset value for SlugRelatedField when unique_together applies in django-rest

Changing queryset definition causes app requests to queue up and performance to dip severely in Django web app (new relic output included)

Writing correct SQL query in extra() modifier in django queryset

Include Queryset Key in String Format

Django Poll for new records

Categories

Resources