Query prefetched objects without using the db?

Query prefetched objects without using the db? - python

I'm getting multiple objects with prefetched relations from my db:
datei_logs = DateiLog.objects.filter(user=request.user)
.order_by("-pk")
.prefetch_related('transfer_logs')
transfer_logs refers to this:
class TransferLog(models.Model):
datei_log = models.ForeignKey("DateiLog", related_name="transfer_logs")
status = models.CharField(
max_length=1,
choices=LOG_STATUS_CHOICES,
default='Good'
)
server_name = models.CharField(max_length=100, blank=True, default="(no server)")
server = models.ForeignKey('Server')
class Meta:
verbose_name_plural = "Transfer-Logs"
def __unicode__(self):
return self.server_name
Now I want to get all the TransferLogs that have a status of "Good". But I think if I do this:
datei_logs[0].transfer_logs.filter(...)
It queries the db again! Since this happens on a website with many log entries I end up with 900 Queries!
I use:
datei_logs[0].transfer_logs.count()
As well and it causes lots of queries to the db too!
What can I do to "just get everything" and then just query an object that holds all the information instead of the db?

Since you're on Django 1.7 you can use the new Prefetch() objects to specify the queryset you want to use for the related lookup.
queryset = TransferLog.objects.filter(status='Good')
datei_logs = DateiLog.objects.filter(user=request.user)
.order_by("-pk")
.prefetch_related(Prefetch('transfer_logs',
queryset=queryset,
to_attr='good_logs'))
Then you can access datei_logs[0].good_logs and check len(datei_logs[0].good_logs).
If you're interested in multiple statuses, you can just use multiple Prefetch objects. But if you're going to get all the logs anyway, you might as well stick to your original query and then split the logs up in Python, rather than calling filter().

Related

Django Rest Framework updating nested m2m objects. Does anyone know a better way?

I have a case when user needs to update one instance together with adding/editing the m2m related objects on this instance.
Here is my solution:
# models.py
class AdditionalAction(SoftDeletionModel):
ADDITIONAL_CHOICES = (
('to_bring', 'To bring'),
('to_prepare', 'To prepare'),
)
title = models.CharField(max_length=50)
type = models.CharField(choices=ADDITIONAL_CHOICES, max_length=30)
class Event(models.Model):
title= models.CharField(max_length=255)
actions = models.ManyToManyField(AdditionalAction, blank=True)
# serializers.py
class MySerializer(serializers.ModelSerializer):
def update(self, instance, validated_data):
actions_data = validated_data.pop('actions')
# Use atomic block to rollback if anything raised Exception
with transaction.atomic():
# update main object
updated_instance = super().update(instance, validated_data)
actions = []
# Loop over m2m relation data and
# create/update each action instance based on id present
for action_data in actions_data:
action_kwargs = {
'data': action_data
}
id = action_data.get('id', False)
if id:
action_kwargs['instance'] = AdditionalAction.objects.get(id=id)
actions_ser = ActionSerializerWrite(**action_kwargs)
actions_ser.is_valid(raise_exception=True)
actions.append(actions_ser.save())
updated_instance.actions.set(actions)
return updated_instance
Can anyone suggest better solution?
P.S. actions can be created or updated in this case, so i can't just use many=True on serializer cause it also needs instance to update.

Using for loop with save here will be a killer if you have a long list or actions triggered on save, etc. I'd try to avoid it.
You may be better off using ORMS update with where clause: https://docs.djangoproject.com/en/2.0/topics/db/queries/#updating-multiple-objects-at-once and even reading the updated objects from the database after the write.
For creating new actions you could use bulk_create:https://docs.djangoproject.com/en/2.0/ref/models/querysets/#bulk-create
There is also this one: https://github.com/aykut/django-bulk-update (disclaimer: I am not a contributor or author of the package).
You have to be aware of cons of this method - if you use any post/pre_ save signals those will not be triggered by the update.
In general, running multiple saves will kill the database, and you might end up with hard to diagnose deadlocks. In one of the projects I worked on moving from save() in the loop into update() decreased response time from 30 something seconds to < 10 where the longest operations left where sending emails.

Django : Migration of polymorphic models back to a single base class

Let's suppose I have a polymorphic model and I want to get rid of it.
class AnswerBase(models.Model):
question = models.ForeignKey(Question, related_name="answers")
response = models.ForeignKey(Response, related_name="answers")
class AnswerText(AnswerBase):
body = models.TextField(blank=True, null=True)
class AnswerInteger(AnswerBase):
body = models.IntegerField(blank=True, null=True)
When I want to get all the answers I can never access "body" or I need to try to get the instance of a sub-class by trial and error.
# Query set of answerBase, no access to body
AnswerBase.objects.all()
question = Question.objects.get(pk=1)
# Query set of answerBase, no access to body (even with django-polymorphic)
question.answers.all()
I don't want to use django-polymorphic because of performances, because it does not seem to work for foreignKey relation, and because I don't want my model to be too complicated. So I want this polymorphic architecture to become this simplified one :
class Answer(models.Model):
question = models.ForeignKey(Question, related_name="answers")
response = models.ForeignKey(Response, related_name="answers")
body = models.TextField(blank=True, null=True)
The migrations cannot be created automatically, it would delete all older answers in the database. I've read the Schema Editor documentation but it does not seem there is a buildin to migrate a model to something that already exists. So I want to create my own operation to save the AnswerText and AnswerInteger as an Answer then delete AnswerText and AnswerInteger. I'm hoping I won't have to write SQL directly, but maybe that's the only solution ? My migration file looks like this. I created an Operation called MigrateAnswer :
from myapp.migrations import MigrateAnswer
class Migration(migrations.Migration):
operations = [
migrations.RenameModel("AnswerBase", "Answer"),
migrations.AddField(
model_name='answer',
name='body',
field=models.TextField(blank=True, null=True),
),
MigrateAnswer("AnswerInteger"),
MigrateAnswer("AnswerText"),
migrations.DeleteModel(name='AnswerInteger',),
migrations.DeleteModel(name='AnswerText',),
]
So what I want to do in MigrateAnswer is to migrate the value for an old model (AnswerInteger and AnswerText) to the base class (now named Answer, previousely AnswerBase). Here's my operation class :
from django.db.migrations.operations.base import Operation
class MigrateAnswer(Operation):
reversible = False
def __init__(self, model_name):
self.old_name = model_name
def database_forwards(self, app_label, schema_editor, from_state,
to_state):
new_model = to_state.apps.get_model(app_label, "Answer")
old_model = from_state.apps.get_model(app_label, self.old_name)
for field in old_model._meta.local_fields:
# loop on "question", "reponse" and "body"
# schema_editor.alter_field() Alter a field on a single model
# schema_editor.add_field() + remove_field() Does not permit
# to migrate the value from the old field to the new one
pass
So my question is : Is it possible to do this wihout using "execute" (ie : without writing SQL). If so what should I do in the for loop of my Operation ?
Thanks in advance !

There is no need to write an Operations class; data migrations can be done simply with a RunPython call, as the docs show.
Within that function you can use perfectly normal model instance methods; since you know the fields you want to move the data for, there is no need to get them via meta lookups.
However you will need to temporarily call the new body field a different name, so it doesn't conflict with the old fields on the subclasses; you can rename it back at the end and delete the child classes because the value will be in the base class.
def migrate_answers(apps, schema_editor):
classes = []
classes_str = ['AnswerText', 'AnswerInteger']
for class_name in classes_str:
classes.append(apps.get_model('survey', class_name))
for class_ in classes:
for answer in class_.objects.all():
answer.new_body = answer.body
answer.save()
operations = [
migrations.AddField(
model_name='answerbase',
name='new_body',
field=models.TextField(blank=True, null=True),
),
migrations.RunPython(migrate_answers),
migrations.DeleteModel(name='AnswerInteger',),
migrations.DeleteModel(name='AnswerText',),
migrations.RenameField('AnswerBase', 'new_body', 'body'),
migrations.RenameModel("AnswerBase", "Answer"),
]

You could create an empty migration for the app you want to do these modifications and use the migrations.RunPython Class to execute custom python functions.
Inside these functions you can have access to your models
The Django ORM that you can do data manipulation.
Pure python, no raw SQL.

how to use peewee's Using as a decorator to dynamically specify a database?

Despite numerous recipes and examples in peewee's documentation; I have not been able to find how to accomplish the following:
For finer-grained control, check out the Using context manager / decorator. This allows you to specify the database to use with a given list of models for the duration of the wrapped block.
I assume it would go something like...
db = MySQLDatabase(None)
class BaseModelThing(Model):
class Meta:
database = db
class SubModelThing(BaseModelThing):
'''imagine all the fields'''
class Meta:
db_table = 'table_name'
runtime_db = MySQLDatabase('database_name.db', fields={'''imagine field mappings here''', **extra_stuff)
#Using(runtime_db, [SubModelThing])
#runtime_db.execution_context()
def some_kind_of_query():
'''imagine the queries here'''
but I have not found examples, so an example would be the answer to this question.

Yeah, there's not a great example of using Using or the execution_context decorators, so the first thing is: don't use the two together. It doesn't appear to break anything, just seems to be redundant. Logically that makes sense as both of the decorators cause the specified model calls in the block to run in a single connection/transaction.
The only(/biggest) difference between the two is that Using allows you to specify the particular database that the connection will be using - useful for master/slave (though the Read slaves extension is probably a cleaner solution).
If you run with two databases and try using execution_context on the 'second' database (in your example, runtime_db) nothing will happen with the data. A connection will be opened at the start of the block and closed and the end, but no queries will be executed on it because the models are still using their original database.
The code below is an example. Every run should result in only 1 row being added to each database.
from peewee import *
db = SqliteDatabase('other_db')
db.connect()
runtime_db = SqliteDatabase('cmp_v0.db')
runtime_db.connect()
class BaseModelThing(Model):
class Meta:
database = db
class SubModelThing(Model):
first_name = CharField()
class Meta:
db_table = 'table_name'
db.create_tables([SubModelThing], safe=True)
SubModelThing.delete().where(True).execute() # Cleaning out previous runs
with Using(runtime_db, [SubModelThing]):
runtime_db.create_tables([SubModelThing], safe=True)
SubModelThing.delete().where(True).execute()
#Using(runtime_db, [SubModelThing], with_transaction=True)
def execute_in_runtime(throw):
SubModelThing(first_name='asdfasdfasdf').save()
if throw: # to demo transaction handling in Using
raise Exception()
# Create an instance in the 'normal' database
SubModelThing.create(first_name='name')
try: # Try to create but throw during the transaction
execute_in_runtime(throw=True)
except:
pass # Failure is expected, no row should be added
execute_in_runtime(throw=False) # Create a row in the runtime_db
print 'db row count: {}'.format(len(SubModelThing.select()))
with Using(runtime_db, [SubModelThing]):
print 'Runtime DB count: {}'.format(len(SubModelThing.select()))

Filter latest record in Django

Writing my first Django app that gets messages from other applications and stores reports about them.
It is performing very slow due to the following logic that I hope can be improved but I'm struggling to find a way to do it with out a loop.
Basically I'm just trying to go through all of the apps (there are about 500 unique ones) and get the latest report for each one. Here are my models and function:
class App(models.Model):
app_name = models.CharField(max_length=200)
host = models.CharField(max_length=50)
class Report(models.Model):
app = models.ForeignKey(App)
date = models.DateTimeField(auto_now_add=True)
status = models.CharField(max_length=20)
runtime = models.DecimalField(max_digits=13, decimal_places=2,blank=True,null=True)
end_time = models.DateTimeField(blank=True,null=True)
def get_latest_report():
""" Returns the latest report from each app """
lset = set()
## get distinct app values
for r in Report.objects.order_by().values_list('app_id').distinct():
## get latest report (by date) and push in to stack.
lreport = Report.objects.filter(app_id=r).latest('date')
lset.add(lreport.pk)
## Filter objects and return the latest runs
return Report.objects.filter(pk__in = lset)

If you're not afraid of executing a query for every app in your database you can try it this way:
def get_latest_report():
""" Returns the latest report from each app """
return [app.report_set.latest('date') for app in App.objects.all()]
This adds a query for every app in your database, but is really expressive and sometimes maintainability and readability are more important than performance.

If you are using PostgreSQL you can use distinct and order_by in combination, giving you the latest report for each app like so
Report.objects.order_by('-date').distinct('app')
If you are using a database that does not support the DISTINCT ON clause, MySQL for example, and you do not mind changing the default ordering of the Report model, you can use prefetch_related to reduce 500+ queries to 2 (however this method will use a lot more memory as it will load every report)
class Report(models.Model):
# Fields
class Meta:
ordering = ['-date']
def get_latest_report():
latest_reports = []
for app in App.objects.all().prefetch_related('report_set'):
try:
latest_reports.append(app.report_set.all()[0])
except IndexError:
pass
return latest_reports

How to soft delete many to many relation with Django

In my Django project, all entities deleted by the user must be soft deleted by setting the current datetime to deleted_at property. My model looks like this: Trip <-> TripDestination <-> Destination (many to many relation). In other words, a Trip can have multiple destinations.
When I delete a Trip, the SoftDeleteManager filters out all the deleted trip. However, if I request all the destinations of a trip (using get_object_or_404(Trip, pk = id)), I also get the deleted ones (i.e. TripDestination models with deleted_at == null OR deleted_at != null). I really don't understand why since all my models inherit from LifeTimeTracking and are using the SoftDeleteManager.
Can someone please help me to understand why the SoftDeleteManager isn't working for n:m relation?
class SoftDeleteManager(models.Manager):
def get_query_set(self):
query_set = super(SoftDeleteManager, self).get_query_set()
return query_set.filter(deleted_at__isnull = True)
class LifeTimeTrackingModel(models.Model):
created_at = models.DateTimeField(auto_now_add = True)
updated_at = models.DateTimeField(auto_now = True)
deleted_at = models.DateTimeField(null = True)
objects = SoftDeleteManager()
all_objects = models.Manager()
class Meta:
abstract = True
class Destination(LifeTimeTrackingModel):
city_name = models.CharField(max_length = 45)
class Trip(LifeTimeTrackingModel):
name = models.CharField(max_length = 250)
destinations = models.ManyToManyField(Destination, through = 'TripDestination')
class TripDestination(LifeTimeTrackingModel):
trip = models.ForeignKey(Trip)
destination = models.ForeignKey(Destination)
Resolution
I filed the bug 17746 in Django Bug DB. Thanks to Caspar for his help on this.

It looks like this behaviour comes from the ManyToManyField choosing to use its own manager, which the Related objects reference mentions, because when I try making up some of my own instances & try soft-deleting them using your model code (via the manage.py shell) everything works as intended.
Unfortunately it doesn't mention how you can override the model manager. I spent about 15 minutes searching through the ManyToManyField source but haven't tracked down where it instantiates its manager (looking in django/db/models/fields/related.py).
To get the behaviour you are after, you should specify use_for_related_fields = True on your SoftDeleteManager class as specified by the documentation on controlling automatic managers:
class SoftDeleteManager(models.Manager):
use_for_related_fields = True
def get_query_set(self):
query_set = super(SoftDeleteManager, self).get_query_set()
return query_set.filter(deleted_at__isnull = True)
This works as expected: I'm able to define a Trip with 2 Destinations, each through a TripDestination, and if I set a Destination's deleted_at value to datetime.datetime.now() then that Destination no longer appears in the list given by mytrip.destinations.all(), which is what you are after near as I can tell.
However, the docs also specifically say do not filter the query set by overriding get_query_set() on a manager used for related fields, so if you run into problems later, bear this in mind as a possible cause.

To enable filtering by deleted_at field of Destinantion and Trip models setting use_for_related_fields = True for SoftDeleteManager class is enough. As per Caspar's answer this does not return deleted Destinations for trip_object.destinations.all().
However from your comments we can see you would like to filter out Destinations that are linked to Trip via a TripDestination object with a set deleted_at field, a.k.a. soft delete on a through instance.
Let's clarify the way managers work. Related managers are the managers of the remote model, not of a through model.
trip_object.destinantions.some_method() calls default Destination manager.
destinantion_object.trip_set.some_method() calls default Trip manager.
TripDestination manager is not called at any time.
You can call it with trip_object.destinantions.through.objects.some_method(), if you really want to. Now, what I would do is add an Instance method Trip.get_destinations and a similar Destination.get_trips that filters out deleted connections.
If you insist on using the manager to do the filtering it gets more complicated:
class DestinationManager(models.Manager):
use_for_related_fields = True
def get_query_set(self):
query_set = super(DestinationManager, self).get_query_set()
if hasattr(self, "through"):
through_objects = self.through.objects.filter(
destination_id=query_set.filter(**self.core_filters).get().id,
trip_id=self._fk_val,
deleted_at__isnull=True)
query_set = query_set.filter(
id__in=through_objects.values("destination_id"))
return query_set.filter(deleted_at__isnull = True)
The same would have to be done for TripManager as they would differ. You may check the performance and look at django/db/models/fields/related.py for reference.
Modifying the get_queryset method of the default manager may hamper the ability to backup the database and the documentation discourages it. Writing a Trip.get_destinations method is the alternative.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Query prefetched objects without using the db? - python

Related

Django Rest Framework updating nested m2m objects. Does anyone know a better way?

Django : Migration of polymorphic models back to a single base class

how to use peewee's Using as a decorator to dynamically specify a database?

Filter latest record in Django

How to soft delete many to many relation with Django

Categories

Resources