I've noticed that many SQLAlchemy tutorials would use relationship() in "connecting" multiple tables together, may their relationship be one-to-one, one-to-many, or many-to-many. However, when using raw SQL, you are not able to define the relationships between tables explicitly, as far as I know.
In what cases is relationship() required and not required? Why do we have to explicitly define the relationship between tables in SQLAlchemy?
In SQL, tables are related to each other via foreign keys. In an ORM, models are related to each other via relationships. You're not required to use relationships, just as you are not required to use models (i.e. the ORM). Mapped classes give you the ability to work with tables as if they are objects in memory; along the same lines, relationships give you the ability to work with foreign keys as if they are references in memory.
You want to set up relationships for the same purpose as wanting to set up models: convenience. For this reason, the two go hand-in-hand. It is uncommon to see models with raw foreign keys but no relationships.
Related
I'm currently working on a project where I handle both public and private information, both stored as different models in a common database.
I would like to split this database in two, one with the private model objects and another one with the public ones.
The thing is, both this models have a ForeignKey relationship with each other, and I've found conflicting answers over the internet about if this relationships can work even if the models are in two different databases.
So, is this possible? Is there a better approach for doing this?
Just to clarify why I want to do this, I want the project to be open source, therefore the public database should be public, but the sensitive information (users and passwords) should be kept private.
From Django docs:
Django doesn’t currently provide any support for foreign key or many-to-many relationships spanning multiple databases. If you have used a router to partition models to different databases, any foreign key and many-to-many relationships defined by those models must be internal to a single database.
This is because of referential integrity. In order to maintain a relationship between two objects, Django needs to know that the primary key of the related object is valid. If the primary key is stored on a separate database, it’s not possible to easily evaluate the validity of a primary key.
For possible solutions check out this discussion: https://stackoverflow.com/a/32078727/14209813
I'm starting to work with Django, already done some models, but always done that with 'code-first' approach, so Django handled the table creations etc. Right now I'm integrating an already existing database with ORM and I encountered some problems.
Database has a lot of many-to-many relationships so there are quite a few tables linking two other tables. I ran inspectdb command to let Django prepare some models for me. I revised them, it did rather good job guessing the fields and relations, but the thing is, I think I don't need those link tables in my models, because Django handles many-to-many relationships with ManyToManyField fields, but I want Django to use that link tables under the hood.
So my question is: Should I delete the models for link tables and add ManyToManyFields to corresponding models, or should I somehow use this models?
I don't want to somehow mess-up database structure, it's quite heavy populated.
I'm using Postgres 9.5, Django 2.2.
In many cases it doesn't matter. If you would like to keep the code minimal then m2m fields are a good way to go. If you don't control the database structure it might be worth keeping the inspectdb schema in case you have to do it again after schema changes that you don't control. If the m2m link tables can grow properties of their own then you need to keep them as models.
I've got a Django view that I'm trying to optimise. It shows a list of parent objects on a page, along with their children. The child model has the foreign key back to the parent, so select_related doesn't seem to apply.
class Parent(models.Model):
name = models.CharField(max_length=31)
class Child(models.Model):
name = models.CharField(max_length=31)
parent = models.ForeignKey(Parent)
A naive implementation uses n+1 queries, where n is the number of parent objects, ie. one query to fetch the parent list, then one query to fetch the children of each parent.
I've written a view that does the job in two queries - one to fetch the parent objects, another to fetch the related children, then some Python (that I'm far too embarrassed to post here) to put it all back together again.
Once I found myself importing the standard library's collections module I realised that I was probably doing it wrong. There is probably a much easier way, but I lack the Django experience to find it. Any pointers would be much appreciated!
Add a related_name to the foreign key, then use the prefetch_related method which added to Django 1.4:
Returns a QuerySet that will automatically retrieve, in a single
batch, related objects for each of the specified lookups.
This has a similar purpose to select_related, in that both are
designed to stop the deluge of database queries that is caused by
accessing related objects, but the strategy is quite different:
select_related works by creating a SQL join and including the fields
of the related object in the SELECT statement. For this reason,
select_related gets the related objects in the same database query.
However, to avoid the much larger result set that would result from
joining across a 'many' relationship, select_related is limited to
single-valued relationships - foreign key and one-to-one.
prefetch_related, on the other hand, does a separate lookup for each
relationship, and does the 'joining' in Python. This allows it to
prefetch many-to-many and many-to-one objects, which cannot be done
using select_related, in addition to the foreign key and one-to-one
relationships that are supported by select_related. It also supports
prefetching of GenericRelation and GenericForeignKey.
class Parent(models.Model):
name = models.CharField(max_length=31)
class Child(models.Model):
name = models.CharField(max_length=31)
parent = models.ForeignKey(Parent, related_name='children')
>>> Parent.objects.all().prefetch_related('children')
All the relevant children will be fetched in a single query, and used
to make QuerySets that have a pre-filled cache of the relevant
results. These QuerySets are then used in the self.children.all()
calls.
Note 1 that, as always with QuerySets, any subsequent chained methods which imply a different database query will ignore previously
cached results, and retrieve data using a fresh database query.
Note 2 that if you use iterator() to run the query, prefetch_related() calls will be ignored since these two
optimizations do not make sense together.
If you ever need to work with more than 2 levels at once, you can consider a different approach to storing trees in db using MPTT
In a nutshell, it adds data to your model which are updated during updates and allow a much more efficient retrieval.
Actually, select_related is what you are looking for. select_related creates a JOIN so that all the data that you need is fetched in one statement. prefetch_related runs all the queries at once then caches them.
The trick here is to "join in" only what you absolutely need to in order to reduce the performance penalty of the join. "What you absolutely need to" is the long way of saying that you should pre-select only the fields that you will read later in your view or template. There is good documentation here: https://docs.djangoproject.com/en/1.4/ref/models/querysets/#select-related
This is a snippet from one of my models where I faced a similar problem:
return QuantitativeResult.objects.select_related(
'enrollment__subscription__configuration__analyte',
'enrollment__subscription__unit',
'enrollment__subscription__configuration__analyte__unit',
'enrollment__subscription__lab',
'enrollment__subscription__instrument_model'
'enrollment__subscription__instrument',
'enrollment__subscription__configuration__method',
'enrollment__subscription__configuration__reagent',
'enrollment__subscription__configuration__reagent__manufacturer',
'enrollment__subscription__instrument_model__instrument__manufacturer'
).filter(<snip, snip - stuff edited out>)
In this pathological case, I went down from 700+ queries to just one. The django debug toolbar is your friend when it comes to this sort of issue.
guys, how or where is the "join" query in Django?
i think that Django dont have "join"..but how ill make join?
Thanks
If you're using models, the select_related method will return the object for any foreign keys you have set up (up to a limit you specify) within that model.
Look into model relationships and accessing related objects.
SQL Join queries are a hack because SQL doesn't have objects or navigation among objects.
Objects don't need "joins". Just access the related objects.
I'm using a hand built (Postgres) database with Django. With "inspectdb" I was able to automatically create a model for it. The problem is that some tables have multiple primary keys (for many-to-many relations) and they are not accessible via Django.
What's the best way to access these tables?
There is no way to use composite primary keys in Django's ORM as of now (up to v1.0.2).
I can only think of three solutions/workarounds:
There is a fork of django with a composite pk patch at github that you might want to try.
You could use SQLAlchemy together with Django.
You have to add a single field primary key field to those tables.
Django does have support for many-to-many relationships. If you want to use a helper table to manage this relationships, the ManyToManyField takes a through argument which specifies the table to use. You can't model anything terribly complex this way, but it is good for most simple applications.