Embedding Vs Linking in MongoDB.when to embed and when to link? - python

I read this page but didn't get when to use embedding feature and when to use linking.I have a project in django for which I am using MongoDB.In my models.py file I have following models:
class Projects(models.Model):
projectName =models.CharField(max_length = 100,unique=True,db_index=True)
projectManager = EmbeddedModelField('Users')
class Teams(models.Model):
teamType = models.CharField(max_length =100)
teamLeader = EmbeddedModelField('Users')
teamProject = EmbeddedModelField('Projects')
objects = MongoDBManager()
class Users(models.Model):
name = models.CharField(max_length = 100,unique=True)
designation = models.CharField(max_length =100 )
teams = ListField(EmbeddedModelField('Teams'))
class Tasks(models.Model):
title = models.CharField(max_length = 150)
description = models.CharField(max_length=1000)
priority = models.CharField(max_length=20)
Status = models.CharField(max_length=20)
assigned_to = EmbeddedModelField('Users')
assigned_by = EmbeddedModelField('Users')
child_tasks = ListField()
parent_task = models.CharField(max_length = 150)
My question is if we do embedding do we have to update the object in all models.Like if I want to update the name of a 'user' ,I would have to run update for models:Projects, Teams, Users and Tasks or linking would be better in my case?

In your example, yes, changing the name of a user implies that if you use embedding then you must update all other documents with an extra step. What is more appropriate in your situation is linking (referencing). This involves an extra step at query time, but because of your particular "business logic", it is better.
Generally, if a given document needs to be accessed from a number of different places then it makes sense to make it a reference rather than embedded. The same applies in situations when a document changes frequently.

First, conceptually, name your model classes as singular objects.
Users should be User, Teams should be Team...
Think of the model as the mold from which multiple objects will be made. User model will product Users and be stored in a table called Users where each document/row is a User object.
Now, regarding your question, hymloth is exactly right. The way to make it a reference to a document instead of an embedded one is to change those particular fields to reference the id of a user in the user's collection. That way you are just storing an id to lookup instead of a copy of the user document. When you change the reference document, it will be changed in all of the places it is referenced as well. (Typical relational association)
I didn't see a field for that in Django-mongoDB either but maybe you can use the traditional django ForeignKey field for this purpose. I don't know if you can mix and match so give it a shot.
for example, your Teams class would have a field like this:
teamLeader = ForeignKey(User)
Let me know if that works.

Related

Django models - is it a good practice to create models that only link two tables?

Let's say that I want to develop a simple todo-list app. This is what the models.py mght look like.
PRIORITIES = ["LOW", "MEDIUM", "HIGH", "URGENT"]
class User(models.Model):
username = models.CharField(max_length = 20)
password = models.CharField(max_length = 100)
email = models.EmailField()
class Task(models.Model):
text = models.TextField()
dueDate = models.DateField()
priority = models.CharField(choices = PRIORITIES, default = "LOW")
class UserTask(models.Model):
user = models.ForeignKey(User, on_delete = models.CASCADE)
task = models.ForeignKey(Task, on_delete = models.CASCADE)
Here, the UserTask model was created only with a view to reducing redundancies in the database.
Is it a good practice? I don't think that this is what models should be used for.
Here, the UserTask model was created only with a view to reducing redundancies in the database.
Given I understand it correctly, a Task belongs to a single User, at least based on your comment:
I read about ManyToMany, the problem is that this relation in question is in fact OneToMany.
In that case, I do not see how you are "reducing" redundancies in the database, in fact you create extra data duplication, since now you need to keep the Task and its user = ForeignKey(..) in harmony with UserTask, this thus means that all creates, edits and removals have impact, and some circumvent Django's signal tooling, so it makes it quite hard, if not impossible.
The fact is that you do not need to construct a table to query the relation in reverse. By default Django will put a database index on a ForeignKey, so the database can efficiently retrieve the Tasks that belong to a single (or multiple) users, it thus makes a JOIN with the user table more efficient.
One typically thus defines a ForeignKey like:
class Task(models.Model):
text = models.TextField()
dueDate = models.DateField()
user = models.ForeignKey(User, on_delete=models.CASCADE)
priority = models.CharField(choices = PRIORITIES, default = "LOW")
At the Python/Django level, Django also provides convenience to obtain the tasks of a User. If you want to retrieve all the Tasks of some_user, you can query with:
some_user.task_set.all()
or you can filter on the tasks, like:
some_user.task_set.filter(text__icontains='someword')
You can make the name of the relation in reverse more convenient as well, by specifying a related_name in the ForeignKey, like:
class Task(models.Model):
user = models.ForeignKey(
User,
on_delete=models.CASCADE,
related_name='tasks'
)
text = models.TextField()
dueDate = models.DateField()
priority = models.CharField(choices = PRIORITIES, default = "LOW")
In which case you thus query the Tasks of a User with:
some_user.tasks.all()
There is however a scenario where Django creates an implicit model that links two models together: in case one defines a ManyToManyField [Django-doc], since specifying an array of identifiers is typically hard (or even impossible) for a lot of relational databases (even if it is possible, typically one can no longer guarantee FOREIGN KEY constraints), etc.
In case the many-to-many relation between two models contains extra data (for example a timestamp when two users became friends), then one even defines an explicit model with two ForeignKey relations (and extra attributes), and uses this as a through [Django-doc] model.

Django/Python: Best Practice/Advice on handling external IDs for Multiple Multi-directional External APIs

So this is more of a conceptual question, and I am really looking for someone to just help point me in the right direction. I am building a middleware platform where I will be pull data in from inbound channels, manipulating it, and then pushing it out the other door to outbound channels. I will need to store the external id for each of these records, but the kicker is, records will be pulled from multiple sources, and then pushed to multiple sources. A single record in my system will need to be tied to any number of external ids.
a quick model to work with:
class record(models.Model):
#id
Name = models.CharField(max_length=255, help_text="")
Description = models.CharField(max_length=255, help_text="")
category_id = model.ForeignKey(category)
class category(models.Model):
#id
name = models.CharField(max_length=255, help_text="")
description = models.CharField(max_length=255, help_text="")
class channel(models.Model):
#id
name = models.CharField(max_length=255, help_text="")
inbound = models.BooleanField()
outbound = models.BooleanField()
Obviously, I cannot add a new field to every model every time I add a new integration, that would be soooo 90s. The obvious would be to create another model to simply store the channel and record id with the unique id, and maybe this is the answer.
class external_ref(models.Model):
model_name = models.CharfieldField()
internal_id = models.IntegerField()
external_id = models.IntegerField()
channel_id = models.IntegerField()
class Meta:
unique_together = ('model', 'internal_id',)
While my example holds simply 4 models, I will be integrating records from 10-20 different models, so something I could implement an a global level would be optimal. Other things I have considered:
Overwriting the base model class to create a new "parent" class that also holds an alpha-numberic representation of every record in the db as unique.
Creating an abstract model to do the same.
Possibly storing a json reference with channel : external_id that I could ping on every record to see if it has an external reference.
I'm really an open book on this, and the internet has become increasingly overwhelming to sift through. Any best practices or advice would be much appreciated. Thanks in advance.
I have this exact issue and yes there is not much information on the web in using Django this way. Heres what Im doing - haven't used it long enough to determine if its "the best" way.
I have a class IngestedModel which tracks the source of the incoming objects as well as their external ids. This is also where you would put a unique_together constraint (on external_id and source)
class RawObject(TimeStampedModel):
"""
A Table to keep track of all objects ingested into the database and where they came from
"""
data = models.JSONField()
source = models.ForeignKey(Source,on_delete=models.PROTECT)
class IngestedModel(models.Model):
external_id = models.CharField(max_length=50)
source = models.ForeignKey(Source,on_delete=models.CASCADE)# 1 or 0
raw_objects = models.ManyToManyField(RawObject,blank=True)
class Meta:
abstract = True
then every model that is created from ingested data inherits from this IngestedModel. That way you know its source and you can use each external object for more than 1 internal object and vise versa.
class Customer(IngesteModel):
class Order(IngestedModel):
...
etc.
Now this means there is no "IngestedModel" table but that every model has a field for source, external_id and a reference to a raw object (many to many). This feels more compositional rather than inherited - no child tables which seems better to me. I would also love to hear feedback on the "right" way to do this.

Python Django - Accessing foreignkey data

I am trying to figure out how to get data from my models with ForeignKey relationships. I have the following models.py:
class wine(models.Model):
name = models.CharField(max_length=256)
year = models.CharField(max_length=4)
description = models.TextField()
class collection(models.Model):
name = models.CharField(max_length=50)
description = models.TextField(null=True)
class collection_detail(models.Model):
collection = models.ForeignKey(collection)
wine = models.ForeignKey(wine)
quantity = models.IntegerField()
price_paid = models.DecimalField(max_digits=5, decimal_places=2)
value = models.DecimalField(max_digits=5, decimal_places=2)
bottle_size = models.ForeignKey(bottle_size)
Wine is a basic table, Collection Detail is a table that references a wine and adds user specific data (like price paid) to it. Collection is a group of collection_detail objects.
I am struggling on how to access data within these models. I can easily display data from a specific model, but when viewing a particular collection, I cannot access the collection_detail.wine.name data. Do I need to write specific queries to do this? Can I access this data from the templating language? My data model appears correct when viewed via the admin, I can add the data and relationships I need.
Thanks for any help!
Use collection_detail_set to obtain a queryset of all collection_detail's with that collection.
If you want a one-to-one relationship instead of a one-to-many (which is what you get using ForeignKey), change
collection = models.ForeignKey(collection)
to
collection = models.OneToOneField(collection)
and access the collection's collection_detail simply by calling collection_detail from your collection model.

How to specify data in models.ManyToManyField(Foo,Bar)

I'm working on an application were I want the user roles to be entered into a table in the database.
class Role(models.Model):
name = models.CharField(max_length = 40)
details = models.TextField()
class Foo(models.Model):
name = models.CharField(max_length = 40)
details = models.TextField()
class Bar(models.Model):
name = models.CharField(max_length = 40)
detauls = models.TextField()
foos = models.ManyToMany(Foo,related_name="foo_binding_and_roles")
I want the schema of the table foo_binding_and_roles to resemble:
{|id|Foo_id|Bar_id|Role_id|}
I'm still scratching my head on this, wondering if it's even possible. I know if I write a custom manager I could pull it off but I want to avoid that.
The idea behind this scheme, is to assign users permissions on a relational basis when bar meets foo and foo meets bar.
I don't understand what exactly you are trying to do, but reading this line: {|id|Foo_id|Bar_id|Role_id|}
makes me think you could either define a model with those fields, or set up a through model with that extra field.
To simplify this so you can think of it more clearly, don't use many-to-many associations, and build your entire scheme using many-to-one associations. If I understood you right, I think what you're looking for is possible using a three-way association class.

How to perform queries in Django following double-join relationships (or: How to get around Django's restrictions on ManyToMany "through" models?)

There must be a way to do this query through the ORM, but I'm not seeing it.
The Setup
Here's what I'm modelling: one Tenant can occupy multiple rooms and one User can own multiple rooms. So Rooms have an FK to Tenant and an FK to User. Rooms are also maintained by a (possibly distinct) User.
That is, I have these (simplified) models:
class Tenant(models.Model):
name = models.CharField(max_length=100)
class Room(models.Model):
owner = models.ForeignKey(User)
maintainer = models.ForeignKey(User)
tenant = models.ForeignKey(Tenant)
The Problem
Given a Tenant, I want the Users owning a room which they occupy.
The relevant SQL query would be:
SELECT auth_user.id, ...
FROM tenants_tenant, tenants_room, auth_user
WHERE tenants_tenant.id = tenants_room.tenant_id
AND tenants_room.owner_id = auth_user.id;
Getting any individual value off the related User objects can be done with, for example, my_tenant.rooms.values_list('owner__email', flat=True), but getting a full queryset of Users is tripping me up.
Normally one way to solve it would be to set up a ManyToMany field on my Tenant model pointing at User with TenantRoom as the 'through' model. That won't work in this case, though, because the TenantRoom model has a second (unrelated) ForeignKey to User(see "restictions"). Plus it seems like needless clutter on the Tenant model.
Doing my_tenant.rooms.values_list('user', flat=True) gets me close, but returns a ValuesListQuerySet of user IDs rather than a queryset of the actual User objects.
The Question
So: is there a way to get a queryset of the actual model instances, through the ORM, using just one query?
Edit
If there is, in fact, no way to do this directly in one query through the ORM, what is the best (some combination of most performant, most idiomatic, most readable, etc.) way to accomplish what I'm looking for? Here are the options I see:
Subselect
users = User.objects.filter(id__in=my_tenant.rooms.values_list('user'))
Subselect through Python (see Performance considerations for reasoning behind this)
user_ids = id__in=my_tenant.rooms.values_list('user')
users = User.objects.filter(id__in=list(user_ids))
Raw SQL:
User.objects.all("""SELECT auth_user.*
FROM tenants_tenant, tenants_room, auth_user
WHERE tenants_tenant.id = tenants_room.tenant_id
AND tenants_room.owner_id = auth_user.id""")
Others...?
The proper way to do this is with related_name:
class Tenant(models.Model):
name = models.CharField(max_length=100)
class Room(models.Model):
owner = models.ForeignKey(User, related_name='owns')
maintainer = models.ForeignKey(User, related_name='maintains')
tenant = models.ForeignKey(Tenant)
Then you can do this:
jrb = User.objects.create(username='jrb')
bill = User.objects.create(username='bill')
bob = models.Tenant.objects.create(name="Bob")
models.Room.objects.create(owner=jrb, maintainer=bill, tenant=bob)
User.objects.filter(owns__tenant=bob)

Categories

Resources