I've got a question about foreign key behaviour in Django.
I've defined a tree hierarchy in my models, where a parent-son relation is represented as a foreign key in the son model. Now, starting at the leaf level, I'd like to retrieve the parent, the parent's parent etc. as the objects I've defined.
This is possible by simply calling Leaf.objects.all() and accessing the objects normally from Python code.
But here come the troubles. For each such call, Django makes a SELECT query for the appropriate foreign ID. This is obviously terribly slow and inefficient. I'd like to tell Django something like "hey, just fetch me all the data including the foreign keys at once, just do the joins and all the stuff at the database side". Is that somehow posible?
Just use select_related():
Leaf.objects.select_related().all()
Related
I'm currently working on a project where I handle both public and private information, both stored as different models in a common database.
I would like to split this database in two, one with the private model objects and another one with the public ones.
The thing is, both this models have a ForeignKey relationship with each other, and I've found conflicting answers over the internet about if this relationships can work even if the models are in two different databases.
So, is this possible? Is there a better approach for doing this?
Just to clarify why I want to do this, I want the project to be open source, therefore the public database should be public, but the sensitive information (users and passwords) should be kept private.
From Django docs:
Django doesn’t currently provide any support for foreign key or many-to-many relationships spanning multiple databases. If you have used a router to partition models to different databases, any foreign key and many-to-many relationships defined by those models must be internal to a single database.
This is because of referential integrity. In order to maintain a relationship between two objects, Django needs to know that the primary key of the related object is valid. If the primary key is stored on a separate database, it’s not possible to easily evaluate the validity of a primary key.
For possible solutions check out this discussion: https://stackoverflow.com/a/32078727/14209813
I'm using Django and have a few models. They correlate to each other without any foreign keys, but I want to be able to select them in a centralized place, here are the models (without the inheritance and fields so tests are easy):
class ItemTypeOne:
pass
class ItemOneExtra:
pass
# -----------------------------
class ItemTypeTwo:
pass
class ItemTwoExtra:
pass
# ... and so on
What I thought of using so far its a dict to map them, like so:
correlated_extra_model = {ItemTypeOne: ItemOneExtra, ItemTypeTwo: ItemTwoExtra}[ItemTypeOne]
This works, but I'm not sure if it's acceptable
Firstly, in a relational setting, your tables are related only if you have sort of foreign key or one-to-one field pointing to another table. Without it, it does not any relation at db level. You may be enforcing this behaviour through your logic.
Secondly, if you have to create an auxiliary models for each of your base model, I think it's an indicator that the current design is flawed and you need to rethink your models. Of course, this is based on my assumptions and they might not be very much applicable to your use case. So, if you can, please share some more details about the problem statement.
Well, I do my first steps with Django and Django REST framework. The problem I face is that all examples throughout the whole Internet are based on hard-coded models. But the whole concept of models frustrates me a little bit, because I'm used to deal with different data which comes from numerous sources (various relational databases and nosql - all that stuff). So, I do not want to stick to a particular model with a fixed number of predefined fields, but I want to specify them just at the moment when a user goes to a particular page of my app.
Let's say I have a table or a collection in one of my databases, which stores information about users - it has any kinds of fields (not just email, name and likewise - all those fields as in all those examples throughout the web). So when a user goes to /users/ I connect to my datebase, get my table, set my cursor and populate my resultant dictionary with all rows and all fields I need. And REST API does all the rest.
So, I need a "first-step" example wich starts from data, not from a model: you have a table "items" in your favorite database, when a user goes to /items/, he or she gets all data from that table. To make such simplistic api, you should do this and this... I need this kind of example.
I think the key is to use the models differently. If you use onetomany or foreignkey references in your model construction you can more dynamically link different types of data together, then access that from the parent object.
For example, for your user, you could create a basic user model and reference that in many other models such as interests, occupation, and have those models store very dynamic data.
When you have the root user model object, you can access it's foreign key objects by either iterating through the dictionary of fields returned by the object or accessing the foreign key references directly with model.reference_set.all()
I'm creating a model that refers to a model within a 3rd party package -- Celery (CrontabSchedule and PeriodicTask). My model (let's called it ScheduledRun) will contain a foreign key to a PeriodicTask.
I know that a cascade delete will happen if I delete a foreign key itself, and that the parents referring to that foreign key will also get deleted. (Unless overridden by on_delete...)
But due to my situation of pointing ScheduledRun at a FK of a PeriodicTask, PeriodicTask won't be automatically deleted when I delete a ScheduledRun. (Nor should it as there might be other models pointing to that foreign key!)
So how could I cleanup PeriodicTasks that are orphans -- i.e., when no model instances point to it anymore.
I could add a post_delete signal and check it this way (this example is deleting extraneous CrontabSchedules not associated with a periodic task anymore:
# periodictask below is actually a related periodictask_set,
# but in Django you refer to the empty set as 'periodictask=None'
CrontabSchedule.objects.filter(id=instance.crontab.id,
periodictask=None).delete()
But I'm not guaranteed there aren't other related relations that could cause a cascade drop.
I could subclass the table PeriodicTask as ScheduledRun .... but would rather not integrate that tightly with the 3rd party model.
It's almost as if I want a .delete(do_not_cascade=True) and if it fails due to constraints, just ignore the failure. If it succeeded, then it was an orphan. on_delete=DO_NOTHING is similar to this, but I only want it on only temporarily for the scope of a single delete, and I don't want to modify the third party package.
Are there other/better ways for dealing with this?
Here's my solution ... seems like it might be robust enough. My goal is to only delete the foreign key value if no other model instances still refer to it.
So, what I will try is a delete based on each of the related keys value being None:
# This is a class method to my ScheduledRun class that has
# a foreign key to a PeriodicTask. PeriodicTask has a
# FK to a CrontabSchedule, which I'd like to "trim"
# if nothing points to that FK anymore.
#classmethod
def _post_delete(cls, instance, **kwargs):
instance.periodic_task.delete()
# Delete the crontab if no other tasks point to it.
# This could cause a cascade if we don't check all related...
filter = dict(id=instance.crontab.id)
for related in instance.crontab._meta.get_all_related_objects():
filter[related.var_name] = None
assert('id' in filter)
assert('schedule' in filter)
assert('periodictask' in filter)
CrontabSchedule.objects.filter(**filter).delete()
It would be easier if I could say:
instance.crontab.delete(NO_CASCADE=True)
I've got a Django view that I'm trying to optimise. It shows a list of parent objects on a page, along with their children. The child model has the foreign key back to the parent, so select_related doesn't seem to apply.
class Parent(models.Model):
name = models.CharField(max_length=31)
class Child(models.Model):
name = models.CharField(max_length=31)
parent = models.ForeignKey(Parent)
A naive implementation uses n+1 queries, where n is the number of parent objects, ie. one query to fetch the parent list, then one query to fetch the children of each parent.
I've written a view that does the job in two queries - one to fetch the parent objects, another to fetch the related children, then some Python (that I'm far too embarrassed to post here) to put it all back together again.
Once I found myself importing the standard library's collections module I realised that I was probably doing it wrong. There is probably a much easier way, but I lack the Django experience to find it. Any pointers would be much appreciated!
Add a related_name to the foreign key, then use the prefetch_related method which added to Django 1.4:
Returns a QuerySet that will automatically retrieve, in a single
batch, related objects for each of the specified lookups.
This has a similar purpose to select_related, in that both are
designed to stop the deluge of database queries that is caused by
accessing related objects, but the strategy is quite different:
select_related works by creating a SQL join and including the fields
of the related object in the SELECT statement. For this reason,
select_related gets the related objects in the same database query.
However, to avoid the much larger result set that would result from
joining across a 'many' relationship, select_related is limited to
single-valued relationships - foreign key and one-to-one.
prefetch_related, on the other hand, does a separate lookup for each
relationship, and does the 'joining' in Python. This allows it to
prefetch many-to-many and many-to-one objects, which cannot be done
using select_related, in addition to the foreign key and one-to-one
relationships that are supported by select_related. It also supports
prefetching of GenericRelation and GenericForeignKey.
class Parent(models.Model):
name = models.CharField(max_length=31)
class Child(models.Model):
name = models.CharField(max_length=31)
parent = models.ForeignKey(Parent, related_name='children')
>>> Parent.objects.all().prefetch_related('children')
All the relevant children will be fetched in a single query, and used
to make QuerySets that have a pre-filled cache of the relevant
results. These QuerySets are then used in the self.children.all()
calls.
Note 1 that, as always with QuerySets, any subsequent chained methods which imply a different database query will ignore previously
cached results, and retrieve data using a fresh database query.
Note 2 that if you use iterator() to run the query, prefetch_related() calls will be ignored since these two
optimizations do not make sense together.
If you ever need to work with more than 2 levels at once, you can consider a different approach to storing trees in db using MPTT
In a nutshell, it adds data to your model which are updated during updates and allow a much more efficient retrieval.
Actually, select_related is what you are looking for. select_related creates a JOIN so that all the data that you need is fetched in one statement. prefetch_related runs all the queries at once then caches them.
The trick here is to "join in" only what you absolutely need to in order to reduce the performance penalty of the join. "What you absolutely need to" is the long way of saying that you should pre-select only the fields that you will read later in your view or template. There is good documentation here: https://docs.djangoproject.com/en/1.4/ref/models/querysets/#select-related
This is a snippet from one of my models where I faced a similar problem:
return QuantitativeResult.objects.select_related(
'enrollment__subscription__configuration__analyte',
'enrollment__subscription__unit',
'enrollment__subscription__configuration__analyte__unit',
'enrollment__subscription__lab',
'enrollment__subscription__instrument_model'
'enrollment__subscription__instrument',
'enrollment__subscription__configuration__method',
'enrollment__subscription__configuration__reagent',
'enrollment__subscription__configuration__reagent__manufacturer',
'enrollment__subscription__instrument_model__instrument__manufacturer'
).filter(<snip, snip - stuff edited out>)
In this pathological case, I went down from 700+ queries to just one. The django debug toolbar is your friend when it comes to this sort of issue.