I'm trying to figure out the best way to develop constraints around a set of models
class ParentDefinition:
child_definition = relationship()
class ChildDefinition:
parent_definition_id = ForeignKey()
class Parent:
parent_definition = relationship()
parent_definition_id = ForeignKey()
class Child:
parent = relationship()
parent_id = ForeignKey()
child_definition = relationship()
child_definition_id = ForeignKey()
I want to ensure that Child.child_definition_id == Child.parent.parent_definition.child_definition.id but I'm not sure the best way to do that.
I know that this probably isn't the best model design but there are pre-existing architecture considerations I'm working around.
Any help would be appreciated!
According to the documentation here you need to define constraint to a table or a column. As far as I know, SQL constraints can't be setup over multiple table.
If you really want to go through constraints, you can use another table that handle the association between each entities constrainable fields (ids of each items and add your constraint to id_parent_def = id_child_def).
If you don't want to create this table, you can always use listeners so you can check before inserting data but this may be inefficient for your needs.
Another way would to use database specific functions or triggers that could check conditions for you, I guess you would write plain SQL inside your migrations file to create them (or manually in your DB).
Related
I'm using Django and have a few models. They correlate to each other without any foreign keys, but I want to be able to select them in a centralized place, here are the models (without the inheritance and fields so tests are easy):
class ItemTypeOne:
pass
class ItemOneExtra:
pass
# -----------------------------
class ItemTypeTwo:
pass
class ItemTwoExtra:
pass
# ... and so on
What I thought of using so far its a dict to map them, like so:
correlated_extra_model = {ItemTypeOne: ItemOneExtra, ItemTypeTwo: ItemTwoExtra}[ItemTypeOne]
This works, but I'm not sure if it's acceptable
Firstly, in a relational setting, your tables are related only if you have sort of foreign key or one-to-one field pointing to another table. Without it, it does not any relation at db level. You may be enforcing this behaviour through your logic.
Secondly, if you have to create an auxiliary models for each of your base model, I think it's an indicator that the current design is flawed and you need to rethink your models. Of course, this is based on my assumptions and they might not be very much applicable to your use case. So, if you can, please share some more details about the problem statement.
I'd like to create a 1:n relationship between two tables dynamically. My DB model is mapped via SQLAlchemy but due to some special features of my application I can not use the default declarative way.
E.g.
class Foo(Base):
id = Column(Integer, autoincrement=True, primary_key=True)
flag = Column(Boolean)
class Bar(Base):
id = Column(Integer, autoincrement=True, primary_key=True)
foo_id = Column(Integer, ForeignKey('foo.id'))
# declarative version:
# foo = relationship(Foo)
So I want to add relationship named "foo" to the mapped class "Bar" after Bar was defined and SQLAlchemy did its job of defining a mapper etc.
Update 2017-09-05: Why is this necessary for me? (I thought I could omit this because I think it mostly distracts from the actual problem to solve but since there were comments abouts it...)
First of all I don't have a single database but hundreds/thousands. Data in old databases must not be altered in any way but I want a single source code to access old data (even though data structure and calculation rules change significantly).
Currently we use multiple model definitions. Later definitions extend/modify previous ones. Often we manipulate SQLAlchemy models dynamically. We try not to have code in mapped classes because we think it will be much harder ensuring correctness of that code after changing a table many times (code must work in every intermediate step).
In many cases we extend tables (mapped classes) programatically in model X after it was initially defined in model X-1. Adding columns to an existing SQLAlchemy ORM class is manageable. Now we are adding a new reference column an existing table and a relationship() provides a nicer Python API.
Well, my question above is again a nice example of SQLAlchemy's super powers (and my limited understanding):
Bar.__mapper__.add_property('foo', relationship('Foo'))
Likely I was unable to get this working initially because some of my surrounding code mixed adding relationships and columns. Also there is one important difference to declaring columns:
Column('foo', Integer)
For columns the first parameter can be the column name but you can not use this for relationships. relationship('foo', 'Foo') triggers exceptions when passing it to .add_property().
I searched a lot and did not find what I´am looking for.
What would be the best concept for a model class in django?
To extend User, would be better to have a class with several attributes, or break this class into several classes with few attributes? I´m using the django ORM now.
Say I have a class called Person that extends User, would be better:
class Person(models.Model):
user = foreingkey(User)
attribute1 =
...
attributeN =
Or, would it be better to do this:
class PersonContac(models.Model):
user = foreingkey(User)
attribute1 =
...
attribute3 =
class PersonAddress(models.Model):
user = foreingkey(User)
attribute1 =
...
attribute3 =
class PersonHobby(models.Model):
user = foreingkey(User)
attribute1 =
...
attribute3 =
My each of my views would use the data from the smaller classes (probably).
Over time, the atrribute number can expand.
I want to do is do it once, and touch the minimum possible.
Various attributes can be unfilled by the user, they are not required.
The number of user is indefinite (can be a lot).
I´m concerned in terms of long term performance and maintaining.
If someone can explain me, what would be better for my code, and why.
And what would be better in general (less classes/more attributes, or more classes/less attributes), using the Django ORM.
It is better if my views use the data of only one model class, or it makes no (or little) difference?
Edit:
On the rush for writing I used bad names on class. None of these attributes are many-to-many fields, the User will have only one value for each attribute, or blank.
The number of atributes can expand over time, but not in a great number.
Put any data that is specific to only one User directly in the model. This would probably be things like "Name", "Birthday", etc.
Some things might be better served by a separate model, though. For example multiple people might have the same Hobby or one User might have multiple Hobby(s). Make this a separate class and use a ForeignKeyField or ManyToManyField as necessary.
Whatever you choose, the real trick is to optimize the number of database queries. The django-debug-toolbar is helpful here.
Splitting up your models would by default result in multiple database queries, so make sure to read up on select related to condense that down to one.
Also take a look at the defer method when retrieving a queryset. You can exclude some of those fields that aren't necessary if you know you won't use them in a particular view.
I think it's all up to your interface.
If you have to expose ALL data for a user in a single page and you have a single, large model you will end up with a single sql join instead of one for each smaller table.
Conversely, if you just need a few of these attributes, you might obtain a small performance gain in memory usage if you join the user table with a smaller one because you don't have to load a lot of attributes that aren't going to be used (though this might be mitigated through values (documentation here)
Also, if your attributes are not mandatory, you should at least have an idea of how many attributes are going to be filled. Having a large table of almost empty records could be a waste of space. Maybe a problem, maybe not. It depends on your hw resources.
Lastly, if you really think that your attributes can expand a lot, you could try the EAV approach.
I've got a Django view that I'm trying to optimise. It shows a list of parent objects on a page, along with their children. The child model has the foreign key back to the parent, so select_related doesn't seem to apply.
class Parent(models.Model):
name = models.CharField(max_length=31)
class Child(models.Model):
name = models.CharField(max_length=31)
parent = models.ForeignKey(Parent)
A naive implementation uses n+1 queries, where n is the number of parent objects, ie. one query to fetch the parent list, then one query to fetch the children of each parent.
I've written a view that does the job in two queries - one to fetch the parent objects, another to fetch the related children, then some Python (that I'm far too embarrassed to post here) to put it all back together again.
Once I found myself importing the standard library's collections module I realised that I was probably doing it wrong. There is probably a much easier way, but I lack the Django experience to find it. Any pointers would be much appreciated!
Add a related_name to the foreign key, then use the prefetch_related method which added to Django 1.4:
Returns a QuerySet that will automatically retrieve, in a single
batch, related objects for each of the specified lookups.
This has a similar purpose to select_related, in that both are
designed to stop the deluge of database queries that is caused by
accessing related objects, but the strategy is quite different:
select_related works by creating a SQL join and including the fields
of the related object in the SELECT statement. For this reason,
select_related gets the related objects in the same database query.
However, to avoid the much larger result set that would result from
joining across a 'many' relationship, select_related is limited to
single-valued relationships - foreign key and one-to-one.
prefetch_related, on the other hand, does a separate lookup for each
relationship, and does the 'joining' in Python. This allows it to
prefetch many-to-many and many-to-one objects, which cannot be done
using select_related, in addition to the foreign key and one-to-one
relationships that are supported by select_related. It also supports
prefetching of GenericRelation and GenericForeignKey.
class Parent(models.Model):
name = models.CharField(max_length=31)
class Child(models.Model):
name = models.CharField(max_length=31)
parent = models.ForeignKey(Parent, related_name='children')
>>> Parent.objects.all().prefetch_related('children')
All the relevant children will be fetched in a single query, and used
to make QuerySets that have a pre-filled cache of the relevant
results. These QuerySets are then used in the self.children.all()
calls.
Note 1 that, as always with QuerySets, any subsequent chained methods which imply a different database query will ignore previously
cached results, and retrieve data using a fresh database query.
Note 2 that if you use iterator() to run the query, prefetch_related() calls will be ignored since these two
optimizations do not make sense together.
If you ever need to work with more than 2 levels at once, you can consider a different approach to storing trees in db using MPTT
In a nutshell, it adds data to your model which are updated during updates and allow a much more efficient retrieval.
Actually, select_related is what you are looking for. select_related creates a JOIN so that all the data that you need is fetched in one statement. prefetch_related runs all the queries at once then caches them.
The trick here is to "join in" only what you absolutely need to in order to reduce the performance penalty of the join. "What you absolutely need to" is the long way of saying that you should pre-select only the fields that you will read later in your view or template. There is good documentation here: https://docs.djangoproject.com/en/1.4/ref/models/querysets/#select-related
This is a snippet from one of my models where I faced a similar problem:
return QuantitativeResult.objects.select_related(
'enrollment__subscription__configuration__analyte',
'enrollment__subscription__unit',
'enrollment__subscription__configuration__analyte__unit',
'enrollment__subscription__lab',
'enrollment__subscription__instrument_model'
'enrollment__subscription__instrument',
'enrollment__subscription__configuration__method',
'enrollment__subscription__configuration__reagent',
'enrollment__subscription__configuration__reagent__manufacturer',
'enrollment__subscription__instrument_model__instrument__manufacturer'
).filter(<snip, snip - stuff edited out>)
In this pathological case, I went down from 700+ queries to just one. The django debug toolbar is your friend when it comes to this sort of issue.
For example, there are such models:
class User(Base):
photo_id = Column(ForeignKey('photo.id'))
class Group(Base):
photo_id = Column(ForeignKey('photo.id'))
class Photo(Base):
__tablename__ = 'photo'
user = relationship('User', backref='photo')
group = relationship('Group', backref='photo')
But in last model relationship to User and Group is not good because in one case first relationship will be None and in other case second relationship will be None (because photo owner can be only user or group, but not both)... And if there will be more than 2 models with foreignkeys to model Photo - situation will be even worse.
How to do such relationship correct?
Thanks in advance!
If your User and Group are not stored in the same table, there is nothing wrong to defined them with two relationship. These two relationship means two different SQL query,
and you actually needs these two different query in your case.
If your User and group can be stored in the same table, you can use inheritance.
and create a relationshop to the parent table
http://docs.sqlalchemy.org/en/latest/orm/inheritance.html
or create a view for that
http://docs.sqlalchemy.org/en/rel_0_7/core/schema.html#reflecting-views
Use table inheritance: http://docs.sqlalchemy.org/en/rel_0_7/orm/extensions/declarative.html#joined-table-inheritance
I recommend this slide to you: http://www.slideshare.net/tyler4long/quickorm . It is about quick_orm, which is base on SQLAlchemy. You will see how the same problem is resolved by means of table inheritance.
Slide 7: many models should have relationship with "comments"
Slide 8: add a parent class named "Commentable" to solve the problem.
The syntax is different from SQLAlchemy, but you can get the main idea.
I do not think there is one correct way of modeling this kind of relationships. Cardinality, navigability are also facts to consider.
To a solution very similar to your modeling problem, see Generic Associations examples. The examples might look somewhat complicated at first, but if you read Mike's blog on Polymorphic Associations with SQLAlchemy it should be pretty clear what is happening there. You will end up with somewhat different model, and navigating back from Photo to the correct parent by single attribute (parent or owner) might not be achievable, but do you really need to navigate the relationship from the side of Photo?