Django what is best way to build query - python

I wonder what is the way to build a query.
I'm was trying to use SubQuery or Prefetch, prefetch_related, select_releated but i can't get better results from which I start.
I have situation where I'm getting object instance.
object = get_object_or_404(Object, id=pk)
then i need to get more data.
object.id,
object.name,
object.description,
object.update_frequency,
object.resources.values_list('extension'),
object.tags.values_list('name'),
object.resources.count(),
object.resources.values_list('file'),
object.resources.values_list('licence'),
object.edited
each row is different query.
How in best way reduce query number?

First of all, I would check if the optimization worth it.
I use django-debug-toolbar to have some metrics. For SQL request, you will see how many queries and how much time is spent on each.
Then, I will concentrate on what's important. Good resources to read are this and this.
If you provide a complete example, we can go deeper: template, model, etc

Related

How to randomise the order of a queryset

Consider the following query:
candidates = Candidate.objects.filter(ElectionID=ElectionIDx)
Objects in this query are ordered by their id field.
How do I randomise the order of the objects in the query? Can it be done using .order_by()?
Yes, you can use the special argument ? with order_by to get randomized queryset:
Candidate.objects.filter(ElectionID=ElectionIDx).order_by('?')
Doc
Note that, depending on the DB backend, the randomization might be slow and expensive. I would suggest you to do the benchmark first. If you feel it's slow, then try finding alternatives, before that go with ? first.

Django: Fastest way to random query one record using filter

What is the fastest way to query one record from database that is satisfy my filter query.
mydb.objects.filter(start__gte='2017-1-1', status='yes').order_by('?')[:1]
This statement will first query thousands of records and then select one, and it is very slow, but I only need one, a random one. what is the fastest one to get?
Well, I'm not sure you will be able to do exactly what you want. I was running into a similar issue a few months ago and I ended up redesigning my implementation of my backend to make it work.
Essentially, you want to make the query time shorter by having it choose a random record that fulfills both requirements (start__gte='2017-1-1', status='yes'), but like you say in order for the query to do so, it needs to filter your entire database. This means that you can cannot get a "true" random record from the database that also fulfills the filter requirements, because filtering inherently needs to look through all of your records (otherwise it wouldn't be really random, it would just be the first one it finds that fulfills your requirements).
Instead, consider putting all records that have a status='yes' in a separate relation, so that you can pull a random record from there and join with the larger relation. That would make the query time considerably faster (and it's the type of solution I implemented to get my code to work).
If you really want a random record with the correct filter information, you might need to employ some convoluted means.
You could use a custom manager in Django to have it find only one random record, something like this:
class UsersManager(models.Manager):
def random(self):
count = self.aggregate(count=Count('id'))['count']
random_index = randint(0, count - 1)
return self.all()[random_index]
class User(models.Model):
objects = UsersManager()
#Your fields here (whatever they are, it seems start__gte and status are some)!
objects = UserManager()
Which you can invoke then just by using:
User.objects.random()
This could be repeated with a check in your code until it returns a random record that fulfills your requirements. I don't think this is necessarily the cleanest or programmatically correct way of implementing this, but I don't think a faster solution exists for your specific issue.
I used this site as a source for this answer, and it has a lot more solid information about using this custom random method! You'll likely have to change the custom manager to serve your own needs, but if you add the random() method to your existing custom manager it should be able to do what you need of it!
Hope it helps!
Using order_by('?') will cause you a great performance issue. A better way is to use something like this: Getting a random row from a relational database.
count = mydb.objects.filter(start__gte='2017-1-1', status='yes').aggregate(count=Count('id'))['count']
random_index = randint(0, count - 1)
result= mydb.objects.filter(start__gte='2017-1-1', status='yes')[random_index]

Building dynamic SQL queries with psycopg2 and postgresql

I'm not really sure the best way to go about this or if i'm just asking for a life that's easier than it should be. I have a backend for a web application and I like to write all of the queries in raw SQL. For instance getting a specific user profile, or a number of users I have a query like this:
SELECT accounts.id,
accounts.username,
accounts.is_brony,
WHERE accounts.id IN %(ids)s;
This is really nice because I can get one user profile, or many user profiles with the same query. Now my real query is actually almost 50 lines long. It has a lot of joins and other conditions for this profile.
Lets say I want to get all of the same information from a user profile but instead of getting a specific user ID i want to get a single random user? I don't think it makes sense to copy and paste 50 lines of code just to modify two lines at the end.
SELECT accounts.id,
accounts.username,
accounts.is_brony,
ORDER BY Random()
LIMIT 1;
Is there some way to use some sort of inheritance in building queries, so that at the end I can modify a couple of conditions while keeping the core similarities the same?
I'm sure I could manage it by concatenating strings and such, but I was curious if there's a more widely accepted method for approaching such a situation. Google has failed me.
The canonical answer is to create a view and use that with different WHERE and ORDER BY clauses in queries.
But, depending on your query and your tables, that might not be a good solution for your special case.
A query that is blazingly fast with WHERE accounts.id IN (1, 2, 3) might perform abysmally with ORDER BY random() LIMIT 1. In that case you'll have to come up with a different query for the second requirement.

Django Optimiztion

I have query written in raw sql in Django..
Suppose the result of that query is assigned to a variable queryResult.
I then loop this queryResult, then retrive data from almost three tables using django ORM.
For example..
for item in queryResult:
a=table1.objects.get(id=item[0])
b=table2.objects.get(id=item[1])
c=table2.objects.get(id=item[2])
z=a.result
x=a.result1
v=c.result
####based on some condition check the data is stored into a list as dictionary.
recentDocsList.append({'PurchaseType':item[0],
'CaseName':z,
'DocketNumber':x,
'CourtID':item[2],
'PacerCmecf':v,
'DID':item[3]})
After completing the loop this recentDocsList is returned back...
But the entire thing is making my to page render slowly. Anybody has any method to resolve this issue.
PS: The entire thing is inside a while loop. At a time only 50 results are retrieved. The control comes out of the while loop if the result retrieved is less than 50 or the
recentDocsList length is equal to 10.
Thanks in advance.
Don't optimize too early - this can create obfuscation and confusion.
Even using SQLite3 you should be able to pull 50 chained querysets without taxing the DB (upping to a higher performance DB like PostgreSQL would improve this further). This would suggest that your problem is elsewhere, to debug this try calling your models / queries / views in
$ ./manage.py debugshell
and this will print out your SQL queries so you can see what is actually being called. Even better would be to install django-debug-toolbar as this would inform you where the SQL / rendering slow downs are.
But! Unless you have a really good reason to do so, DON'T WRITE CUSTOM SQL to be executed in django - the ORM can take care of almost everything. Some of the dangers of custom SQL include terrible performance - as you're probably experiencing.
Further - a while loop in a performance sensitive place (like page rendering) sounds like a disaster waiting to happen - are you sure you can't rewrite this in a safer way?
Without seeing more code it's difficult to help - how large are your query sets? Are they efficient? Do you have indexes to your tables? (Django will provide these if you allow it, but it sounds like you're doing something different).

Dynamically Created Top Articles List in Django?

I'm creating a Django-powered site for my newspaper-ish site. The least obvious and common-sense task that I have come across in getting the site together is how best to generate a "top articles" list for the sidebar of the page.
The first thing that came to mind was some sort of database column that is updated (based on what?) with every view. That seems (to my instincts) ridiculously database intensive and impractical and thus I think I'd like to find another solution.
Thanks all.
I would give celery a try (with django-celery). While it's not so easy to configure and use as cache, it enables you to queue tasks like incrementing counters and do them in background. It could be even combined with cache technique - in views increment counters in cache and define PeriodicTask that will run every now and then, resetting counters and writing them to the database.
I just remembered - I once found this blog entry which provides nice way of incrementing 'viewed_count' (or similar) column in database with AJAX JS call. If you don't have heavy traffic maybe it's good idea?
Also mentioned in this post is django-tracking, but I don't know much about it, I never used it myself (yet).
Premature optimization, first try the db way and then see if it really is too database sensitive. Any decent database has so good caches it probably won't matter very much. And even if it is a problem, take a look at the other db/cache suggestions here.
It is most likely by the way is that you will have many more intensive db queries with each view than a simple view update.
If you do something like sort by top views, it would be fast if you index the view column in the DB. Another option is to only collect the top x articles every hour or so, and toss that value into Django's cache framework.
The nice thing about caching the list is that the algorithm you use to determine top articles can be as complex as you like without hitting the DB hard with every page view. Django's cache framework can use memory, db, or file system. I prefer DB, but many others prefer memory. I believe it uses pickle, so you can also store Python objects directly. It's easy to use, recommended.
An index wouldn't help as them main problem I believe is not so much getting the sorted list as having a DB write with every page view of an article. Another index actually makes that problem worse, albeit only a little.
So I'd go with the cache. I think django's cache shim is a problem here because it requires timeouts on all keys. I'm not sure if that's imposed by memcached, if not then go with redis. Actually just go with redis anyway, the python library is great, I've used it from django projects before, and it has atomic increments and powerful sorting - everything you need.

Categories

Resources