getting .count(), .exist() from cached queryset without hitting database - python

I am working on a Django app and I want to make several lookups in a queryset.
My problem is subsequent database hits in finding .count()
I tried using Django's cache framework but it doesn't seem to work.
This is what I've done so far
# app/models.py
from django.core.cache.backends.base import DEFAULT_TIMEOUT
from django.views.decorators.cache import cache_page
from django.core.cache import cache
class my_table(models.Model):
class Meta:
db_table = 'table_name'
name = models.CharField(max_length=200, blank=True)
date_created = models.DateTimeField(auto_now_add=True, blank=True)
ip_address = models.CharField(max_length=100, null=True, blank=True)
user = models.ForeignKey(User, on_delete=models.CASCADE)
def save(self, *args, **kwargs):
cache.set(self.user , my_table.objects.filter(user=self.user))
super(my_table, self).save(*args, **kwargs)
I am updating the cache every time the database is updated.
I tried printing connection.queries in my views
# views.py
def myview(request):
print(len(connection.queries)) # prints 0
records = my_table.objects.filter(user=request.user)
print(records)
print(len(connection.queries)) # prints 1
if record.count() > 0:
# ... some code here
print(len(connection.queries)) # prints 2
print() and .count() making extra db hits.
Now I tried getting results from the cache
# views.py
def myview(request):
print(len(connection.queries)) # prints 0
records = cache.get(request.user)
print(records)
print(len(connection.queries)) # prints 0
if record.count() > 0:
# ... some code here
print(len(connection.queries)) # prints 1
There was no extra query for print() but .count() still hitting database.
how can I perform ORM operations on cached queries without hitting the database multiple times?
I want to perform filtering, aggregations, and count/exist on this queryset without hitting the database.
Also, cache.get(request.user) returns None after some time.
Any help would be appreciated

The Django source code suggests that calling .count() should not hit the database if the queryset has already been fully retrieved (docs were only updated in Django 3.2, see ticket, but 2.2 has the same code).
I'm not sure about explicit caches and how that might interact with it, but the above is true if you don't use any explicit cache (it relies on the temporary cache built in to QuerySet).
Presumably you're calling records.count() rather than record.count() (in case record happens to exist in your full code and be something else)?

Related

Displaying epics and tasks in Django

I'm using Django to create a Scrum board.
In Scrum, Tasks may or may not be part of an Epic.
Here's a simplified version of my models.py file:
from django.db import models
class Epic(models.Model):
name = models.CharField(max_length=50)
class Task(models.Model):
TO_DO = "TODO"
IN_PROGRESS = "IP"
DONE = "DONE"
TASK_STATUS_CHOICES = [
(TO_DO, 'To Do'),
(IN_PROGRESS, 'In Progress'),
(DONE, 'Done'),
]
name = models.CharField(max_length=50)
status = models.CharField(
max_length = 4,
choices = TASK_STATUS_CHOICES,
)
epic = models.ForeignKey(Epic, on_delete=models.DO_NOTHING, null=True, blank=True)
My views.py page currently filters objects by status:
def board(request, **kwargs):
todos = Task.objects.filter(status='TODO')
ip = Task.objects.filter(status='IP')
done = Task.objects.filter(status='DONE')
context = {
'todos': todos,
'ip': ip,
'done': done,
}
return render(request, 'scrum_board/main.html', context)
My template then displays three columns of tasks by iterating through each QuerySet by using a for loop.
However, I'd like tasks from the same Epic to be displayed together and enveloped within a container (can be a simple HTML table, or a Bootstrap card).
How can I do this? One way that immediately comes to mind is to edit views.py so that I can have QuerySets to execute the following (pseudo-code) for each column (task status):
for epic in epics:
# draw a html table cell
for task in tasks:
# display task
for task in tasks_without_epics:
# display task
I'm just wondering if there are simpler methods? Especially since down the road I'll have to further sort according to task/epic priority, and so on.
How about using .order_by() Queryset method? Yours will issue DB queries for each iteration in the inner for loop because Django lazily evaluates Querysets. If the DB is not that large (I assumed so because it's a scrum board), fetching all data in a query may be more beneficial than fetching with multiple DB queries. There can be the null valued epic value for some tasks, but you can handle it using a query expression with F() and its method desc() or asc(), which have boolean arguments nulls_first and nulls_last.
This is how we query todos ordered by ids of epics in descending order:
from django.db.models import F
todos = Task.objects.filter(status='TODO').order_by(F('epic__pk').desc(nulls_last=True))
Then you can implement your view using loops with only one DB query for each column, putting all tasks of null-valued epic at the tail. Another advantage of this approach is you can easily implement sorting according to other fields by adding sort conditions on .order_by()

Django Rest Framework updating nested m2m objects. Does anyone know a better way?

I have a case when user needs to update one instance together with adding/editing the m2m related objects on this instance.
Here is my solution:
# models.py
class AdditionalAction(SoftDeletionModel):
ADDITIONAL_CHOICES = (
('to_bring', 'To bring'),
('to_prepare', 'To prepare'),
)
title = models.CharField(max_length=50)
type = models.CharField(choices=ADDITIONAL_CHOICES, max_length=30)
class Event(models.Model):
title= models.CharField(max_length=255)
actions = models.ManyToManyField(AdditionalAction, blank=True)
# serializers.py
class MySerializer(serializers.ModelSerializer):
def update(self, instance, validated_data):
actions_data = validated_data.pop('actions')
# Use atomic block to rollback if anything raised Exception
with transaction.atomic():
# update main object
updated_instance = super().update(instance, validated_data)
actions = []
# Loop over m2m relation data and
# create/update each action instance based on id present
for action_data in actions_data:
action_kwargs = {
'data': action_data
}
id = action_data.get('id', False)
if id:
action_kwargs['instance'] = AdditionalAction.objects.get(id=id)
actions_ser = ActionSerializerWrite(**action_kwargs)
actions_ser.is_valid(raise_exception=True)
actions.append(actions_ser.save())
updated_instance.actions.set(actions)
return updated_instance
Can anyone suggest better solution?
P.S. actions can be created or updated in this case, so i can't just use many=True on serializer cause it also needs instance to update.
Using for loop with save here will be a killer if you have a long list or actions triggered on save, etc. I'd try to avoid it.
You may be better off using ORMS update with where clause: https://docs.djangoproject.com/en/2.0/topics/db/queries/#updating-multiple-objects-at-once and even reading the updated objects from the database after the write.
For creating new actions you could use bulk_create:https://docs.djangoproject.com/en/2.0/ref/models/querysets/#bulk-create
There is also this one: https://github.com/aykut/django-bulk-update (disclaimer: I am not a contributor or author of the package).
You have to be aware of cons of this method - if you use any post/pre_ save signals those will not be triggered by the update.
In general, running multiple saves will kill the database, and you might end up with hard to diagnose deadlocks. In one of the projects I worked on moving from save() in the loop into update() decreased response time from 30 something seconds to < 10 where the longest operations left where sending emails.

Filter latest record in Django

Writing my first Django app that gets messages from other applications and stores reports about them.
It is performing very slow due to the following logic that I hope can be improved but I'm struggling to find a way to do it with out a loop.
Basically I'm just trying to go through all of the apps (there are about 500 unique ones) and get the latest report for each one. Here are my models and function:
class App(models.Model):
app_name = models.CharField(max_length=200)
host = models.CharField(max_length=50)
class Report(models.Model):
app = models.ForeignKey(App)
date = models.DateTimeField(auto_now_add=True)
status = models.CharField(max_length=20)
runtime = models.DecimalField(max_digits=13, decimal_places=2,blank=True,null=True)
end_time = models.DateTimeField(blank=True,null=True)
def get_latest_report():
""" Returns the latest report from each app """
lset = set()
## get distinct app values
for r in Report.objects.order_by().values_list('app_id').distinct():
## get latest report (by date) and push in to stack.
lreport = Report.objects.filter(app_id=r).latest('date')
lset.add(lreport.pk)
## Filter objects and return the latest runs
return Report.objects.filter(pk__in = lset)
If you're not afraid of executing a query for every app in your database you can try it this way:
def get_latest_report():
""" Returns the latest report from each app """
return [app.report_set.latest('date') for app in App.objects.all()]
This adds a query for every app in your database, but is really expressive and sometimes maintainability and readability are more important than performance.
If you are using PostgreSQL you can use distinct and order_by in combination, giving you the latest report for each app like so
Report.objects.order_by('-date').distinct('app')
If you are using a database that does not support the DISTINCT ON clause, MySQL for example, and you do not mind changing the default ordering of the Report model, you can use prefetch_related to reduce 500+ queries to 2 (however this method will use a lot more memory as it will load every report)
class Report(models.Model):
# Fields
class Meta:
ordering = ['-date']
def get_latest_report():
latest_reports = []
for app in App.objects.all().prefetch_related('report_set'):
try:
latest_reports.append(app.report_set.all()[0])
except IndexError:
pass
return latest_reports

Query prefetched objects without using the db?

I'm getting multiple objects with prefetched relations from my db:
datei_logs = DateiLog.objects.filter(user=request.user)
.order_by("-pk")
.prefetch_related('transfer_logs')
transfer_logs refers to this:
class TransferLog(models.Model):
datei_log = models.ForeignKey("DateiLog", related_name="transfer_logs")
status = models.CharField(
max_length=1,
choices=LOG_STATUS_CHOICES,
default='Good'
)
server_name = models.CharField(max_length=100, blank=True, default="(no server)")
server = models.ForeignKey('Server')
class Meta:
verbose_name_plural = "Transfer-Logs"
def __unicode__(self):
return self.server_name
Now I want to get all the TransferLogs that have a status of "Good". But I think if I do this:
datei_logs[0].transfer_logs.filter(...)
It queries the db again! Since this happens on a website with many log entries I end up with 900 Queries!
I use:
datei_logs[0].transfer_logs.count()
As well and it causes lots of queries to the db too!
What can I do to "just get everything" and then just query an object that holds all the information instead of the db?
Since you're on Django 1.7 you can use the new Prefetch() objects to specify the queryset you want to use for the related lookup.
queryset = TransferLog.objects.filter(status='Good')
datei_logs = DateiLog.objects.filter(user=request.user)
.order_by("-pk")
.prefetch_related(Prefetch('transfer_logs',
queryset=queryset,
to_attr='good_logs'))
Then you can access datei_logs[0].good_logs and check len(datei_logs[0].good_logs).
If you're interested in multiple statuses, you can just use multiple Prefetch objects. But if you're going to get all the logs anyway, you might as well stick to your original query and then split the logs up in Python, rather than calling filter().

How to get Django AutoFields to start at a higher number

For our Django App, we'd like to get an AutoField to start at a number other than 1. There doesn't seem to be an obvious way to do this. Any ideas?
Like the others have said, this would be much easier to do on the database side than the Django side.
For Postgres, it'd be like so: ALTER SEQUENCE sequence_name RESTART WITH 12345; Look at your own DB engine's docs for how you'd do it there.
For MySQL i created a signal that does this after syncdb:
from django.db.models.signals import post_syncdb
from project.app import models as app_models
def auto_increment_start(sender, **kwargs):
from django.db import connection, transaction
cursor = connection.cursor()
cursor = cursor.execute("""
ALTER table app_table AUTO_INCREMENT=2000
""")
transaction.commit_unless_managed()
post_syncdb.connect(auto_increment_start, sender=app_models)
After a syncdb the alter table statement is executed. This will exempt you from having to login into mysql and issuing it manually.
EDIT: I know this is an old thread, but I thought it might help someone.
A quick peek at the source shows that there doesn't seem to be any option for this, probably because it doesn't always increment by one; it picks the next available key: "An IntegerField that automatically increments according to available IDs" — djangoproject.com
Here is what I did..
def update_auto_increment(value=5000, app_label="xxx_data"):
"""Update our increments"""
from django.db import connection, transaction, router
models = [m for m in get_models() if m._meta.app_label == app_label]
cursor = connection.cursor()
for model in models:
_router = settings.DATABASES[router.db_for_write(model)]['NAME']
alter_str = "ALTER table {}.{} AUTO_INCREMENT={}".format(
_router, model._meta.db_table, value)
cursor.execute(alter_str)
transaction.commit_unless_managed()
I found a really easy solution to this! AutoField uses the previous value used to determine what the next value assigned will be. So I found that if I inserted a dummy value with the start AutoField value that I want, then following insertions will increment from that value.
A simple example in a few steps:
1.)
models.py
class Product(models.Model):
id = model.AutoField(primaryKey=True) # this is a dummy PK for now
productID = models.IntegerField(default=0)
productName = models.TextField()
price = models.DecimalField(max_digits=6, decimal_places=2)
makemigrations
migrate
Once that is done, you will need to insert the initial row where "productID" holds a value of your desired AutoField start value. You can write a method or do it from django shell.
From view the insertion could look like this:
views.py
from app.models import Product
dummy = {
'productID': 100000,
'productName': 'Item name',
'price': 5.98,
}
Products.objects.create(**product)
Once inserted you can make the following change to your model:
models.py
class Product(models.Model):
productID = models.AutoField(primary_key=True)
productName = models.TextField()
price = models.DecimalField(max_digits=6, decimal_places=2)
All following insertions will get a "productID" incrementing starting at 100000...100001...100002...
The auto fields depend, to an extent, on the database driver being used.
You'll have to look at the objects actually created for the specific database to see what's happening.
I needed to do something similar. I avoided the complex stuff and simply created two fields:
id_no = models.AutoField(unique=True)
my_highvalue_id = models.IntegerField(null=True)
In views.py, I then simply added a fixed number to the id_no:
my_highvalue_id = id_no + 1200
I'm not sure if it helps resolve your issue, but I think you may find it an easy go-around.
In the model you can add this:
def save(self, *args, **kwargs):
if not User.objects.count():
self.id = 100
else:
self.id = User.objects.last().id + 1
super(User, self).save(*args, **kwargs)
This works only if the DataBase is currently empty (no objects), so the first item will be assigned id 100 (if no previous objects exist) and next inserts will follow the last id + 1
For those who are interested in a modern solution, I found out to be quite useful running the following handler in a post_migrate signal.
Inside your apps.py file:
import logging
from django.apps import AppConfig
from django.db import connection, transaction
from django.db.models.signals import post_migrate
logger = logging.getLogger(__name__)
def auto_increment_start(sender, **kwargs):
min_value = 10000
with connection.cursor() as cursor:
logger.info('Altering BigAutoField starting value...')
cursor.execute(f"""
SELECT setval(pg_get_serial_sequence('"apiV1_workflowtemplate"','id'), coalesce(max("id"), {min_value}), max("id") IS NOT null) FROM "apiV1_workflowtemplate";
SELECT setval(pg_get_serial_sequence('"apiV1_workflowtemplatecollection"','id'), coalesce(max("id"), {min_value}), max("id") IS NOT null) FROM "apiV1_workflowtemplatecollection";
SELECT setval(pg_get_serial_sequence('"apiV1_workflowtemplatecategory"','id'), coalesce(max("id"), {min_value}), max("id") IS NOT null) FROM "apiV1_workflowtemplatecategory";
""")
transaction.atomic()
logger.info(f'BigAutoField starting value changed successfully to {min_value}')
class Apiv1Config(AppConfig):
default_auto_field = 'django.db.models.BigAutoField'
name = 'apiV1'
def ready(self):
post_migrate.connect(auto_increment_start, sender=self)
Of course the downside of this, as some already have pointed out, is that this is DB specific.

Categories

Resources