Nested chain vs duplicated information - python

There is a models.py with 4 model.
Its standard record is:
class Main(models.Model):
stuff = models.IntegerField()
class Second(models.Model):
nested = models.ForeignKey(Main)
stuff = models.IntegerField()
class Third(models.Model):
nested = models.ForeignKey(Second)
stuff = models.IntegerField()
class Last(models.Model):
nested = models.ForeignKey(Third)
stuff = models.IntegerField()
and there is another variant of Last model:
class Last(models.Model):
nested1 = models.ForeignKey(Main)
nested2 = models.ForeignKey(Second)
nested = models.ForeignKey(Third)
stuff = models.IntegerField()
Will that way save some database load?
The information in nested1 and nested2 will duplicate fields in Secod and Third and even it may become outdated ( fortunately not in my case, as the data will not be changed, only new is added ).
But from my thoughts it may save database load, when I'm looking all Last records for a certain Main record. Or when I'm looking only for Main.id for specific Last item.
Am I right?
Will it really save the load or there is a better practice?

It all depends how you access the data. By default Django will make another call to the database when you access a foreign key. So if you want to make less calls to the database, you can use select_related to prefetch the models in foreign keys.

Related

Django prefetch_related and N+1 - How is it solved?

I am sitting with a query looking like this:
# Get the amount of kilo attached to products
product_data = {}
for productSpy in ProductSpy.objects.all():
product_data[productSpy.product.product_id] = productSpy.kilo # RERUN
I do not see how I on my last line would be able to use prefetch_related. In the examples in the docs it's very simplified and somehow makes sense, but I do not understand the whole concept enough to see myself out of this. Could I please get explained what's being done and how? I find this very important to understand, and where met by my first N+1 here.
Thank you up front for your time.
models.py
class ProductSpy(models.Model):
created_by = models.ForeignKey(settings.AUTH_USER_MODEL, on_delete=models.CASCADE)
product = models.ForeignKey(Product, on_delete=models.CASCADE)
def __str__(self):
return self.kilo
class Product(models.Model):
product_id = models.IntegerField()
name = models.CharField(max_length=150)
def __str__(self):
return self.name
Django fetches related tables at runtime:
each call to productSpy.product will fetch from the table product using productSpy.id
The latency in I/O operation means that this code is highly inefficient. using prefetch_related will fetch product for all the product spy objects in one shot resulting in better performance.
# Get the amount of kilo attached to products
product_data = {}
product_spies = ProductSpy.objects.all()
product_spies.prefetch_related('product')
product_spies.prefetch_related('kilo')
for productSpy in product_spies:
product_data[productSpy.product.product_id] = productSpy.kilo # RERUN
When one writes productSpy.product if the related object is not already fetched, Django makes automatically will make a query to the database to get the related Product instance. Hence if ProductSpy.objects.all() returned N instances by writing productSpy.product in a loop we will be making N more queries which is what we call N + 1 problem.
Moving further although you can use prefetch_related (will use 2 queries in your case) here it would be better for you to use select_related [Django docs] which will use a LEFT JOIN and get you the related instances in 1 query itself:
product_data = {}
queryset = ProductSpy.objects.select_related('product')
for productSpy in queryset:
product_data[productSpy.product.product_id] = productSpy.kilo # No extra queries as we used `select_related`
Note: There seems to be some problem with your logic here though, as multiple ProductSpy instances can have the same Product,
hence your loop might overwrite some values.

Flexible database models for users to define extra columns to database tables in Django

I am trying to build a tool that, at a simple level, tries to analyse how to buy a flat. DB = POSTGRES
So the model basically is:
class Property(models.Model):
address = CharField(max_length = 200)
price = IntegerField()
user = ForeignKey(User) # user who entered the property in the database
#..
#..
# some more fields that are common across all flats
#However, users might have their own way of analysing
# one user might want to put
estimated_price = IntegerField() # his own estimate of the price, different from the zoopla or rightmove listing price
time_to_purchase = IntegerField() # his own estimate on how long it will take to purchase
# another user might want to put other fields
# might be his purchase process requires sorting or filtering based on these two fields
number_of_bedrooms = IntegerField()
previous_owner_name = CharField()
How do I give such flexiblity to users? They should be able to sort , filter and query their own rows (in the Property table) by these custom fields. The only option I can think of now is the JSONField Postgres field
Any advice? I am surprised this is not solved readily in Django - I am sure lots of other people would have come across this problem already
Thanks
Edit: As the comments point out. JSON field is a better idea in this case.
Simple. Use Relations.
Create a model called attributes.
It will have a foreign key to a Property, a name field and a value field.
Something like,
class Attribute(models.Model):
property = models.ForiegnKey(Property)
name = models.CharField(max_length=50)
value = models.CharField(max_length=150)
Create an object each for all custom attributes of a property.
When using database queries use select_related of prefetch_related for faster response, less db operations.

Error saving django model with OneToOne field - Column specified twice

This question has been asked before, but the answers there do not solve my problem.
I am using a legacy database, nothing can be changed
Here are my django models, with all but the relevant fields stripped off, obviously class meta has Managed=False in my actual code:
class AppCosts(models.Model):
id = models.CharField(primary_key=True)
cost = models.DecimalField()
class AppDefs(models.Model):
id = models.CharField(primary_key=True)
data = models.TextField()
appcost = models.OneToOneField(AppCosts, db_column='id')
class JobHistory(models.Model):
job_name = models.CharField(primary_key=True)
job_application = models.CharField()
appcost = models.OneToOneField(AppCosts, to_field='id', db_column='job_application')
app = models.OneToOneField(AppDefs, to_field='id', db_column='job_application')
The OneToOne fields work fine for querying, and I get the correct result using select_related()
But when I create a new record for the JobHistory table, when I call save(), I get:
DatabaseError: (1110, "Column 'job_application' specified twice")
I am using django 1.4 and I do not quite get how this OneToOneField works. I can't find any example where primary keys are named differently and has this particular semantics
I need the django model that would let me do this SQL:
select job_history.job_name, job_history.job_application, app_costs.cost from job_history, app_costs where job_history.job_application = app_costs.id;
You have defined appcost and app to have the same underlying database column, job_application, which is also the name of another existing field: so three fields share the same column. That makes no sense at all.
OneToOneFields are just foreign keys constrained to a single value on both ends. If you have foreign keys from JobHistory to AppCost and AppDef, then presumably you have actual columns in your database that contain those foreign keys. Those are the values you should be using for db_field for those fields, not "job_application".
Edit I'm glad you said you didn't design this schema, because it is pretty horrible: you won't have any foreign key constraints, for example, which makes referential integrity impossible. But never mind, we can actually achieve what you want, more or less.
There are various issues with that you have, but the main one is that you don't need the separate "job_application" field at all. That is, as I said earlier, the foreign key, so let it be that. Also note it should be an actual foreign key field, not a one-to-one, since there are many histories to one app.
One constraint that we can't achieve easily in Django is to have the same field acting as FK for two tables. But that doesn't really matter, since we can get to AppCosts via AppDefs.
So the models could just look like this:
class AppCosts(models.Model):
app = models.OneToOneField('AppDefs', primary_key=True, db_field='id')
cost = models.DecimalField()
class AppDefs(models.Model):
id = models.CharField(primary_key=True)
data = models.TextField()
class JobHistory(models.Model):
job_name = models.CharField(primary_key=True)
app = models.ForeignKey(AppDefs, db_column='job_application')
Note that I've moved the one-to-one between Costs and Defs onto AppCosts, since it seems to make sense to have the canonical ID in Defs.
Now, given a JobHistory instance, you can do history.app to get the app instance, history.app.cost to get the app cost, and use the history.app_id to get the underlying app ID from the job_application column.
If you wanted to reproduce that SQL output more exactly, something like this would now work:
JobHistory.objects.values_list('job_name', 'app_id', 'app__appcosts__cost')

In Django, how do I join a table with a composite primary key to another table?

Here's what I have in the way of models:
class Lead(models.Model):
user = models.ForeignKey(User, related_name='leads')
site = models.ForeignKey(Site)
...
class UserDemographic(models.Model):
user = models.ForeignKey(User, related_name='user_demographic')
site = models.ForeignKey(Site)
...
class Meta:
unique_together = 'user', 'site'
In the first model, we record leads on a per-site, per-user basis. There can be multiple leads from the same user on a given site. In the second model, we store each user's demographic data. For each site, each use has only one record of demographic data.
What I would like to be able to do is tack this demographic data onto our leads query. Each lead has both user and site, and I want to grab the data in the demographic table and pair it to the corresponding lead. So basically what I need here is a left join that will unite these two. This is simple enough to do when there is only one foreign key, but I have no clue how to make it work when there are two foreign keys on which to join the tables.
Any ideas on this? Is there even a way to do this in Django, or will I have to resort to a raw query? Thanks!
Django's ORM doesn't let you do this natively, but you can minimise your raw sql by using the extra method. Something like this should work:
Lead.objects.extra(tables=['appname_userdemographic'],
where=['appname_userdemographic.user_id=appname_lead.user_id',
'appname_userdemographic.site_id=appname_lead.site_id'],
select={'country': 'appname_userdemographic.country'})
Alternately, you could refactor your models so you don't need the composite key - for example, create a UserSite model and link your lead and demographic models to that.
class UserSite(models.Model):
user = models.ForeignKey(User)
site = models.ForeignKey(Site)
class Lead(models.Model):
user_site = models.OneToOneField(UserSite)
...
class UserDemographic(models.Model):
user_site = models.OneToOneField(UserSite)
...
Then you can use select_related, like so:
Lead.objects.select_related('usersite__userdemographic')

How to do this join query in Django

In Django, I have two models:
class Product(models.Model):
name = models.CharField(max_length = 50)
categories = models.ManyToManyField(Category)
class ProductRank(models.Model):
product = models.ForeignKey(Product)
rank = models.IntegerField(default = 0)
I put the rank into a separate table because every view of a page will cause the rank to change and I was worried that all these writes would make my other (mostly read) queries slow down.
I gather a list of Products from a simple query:
cat = Category.objects.get(pk = 1)
products = Product.objects.filter(categories = cat)
I would now like to get all the ranks for these products. I would prefer to do it all in one go (using a SQL join) and was wondering how to express that using Django's query mechanism.
What is the right way to do this in Django?
This can be done in Django, but you will need to restructure your models a little bit differently:
class Product(models.Model):
name = models.CharField(max_length=50)
product_rank = models.OneToOneField('ProductRank')
class ProductRank(models.Model):
rank = models.IntegerField(default=0)
Now, when fetching Product objects, you can following the one-to-one relationship in one query using the select_related() method:
Product.objects.filter([...]).select_related()
This will produce one query that fetches product ranks using a join:
SELECT "example_product"."id", "example_product"."name", "example_product"."product_rank_id", "example_productrank"."id", "example_productrank"."rank" FROM "example_product" INNER JOIN "example_productrank" ON ("example_product"."product_rank_id" = "example_productrank"."id")
I had to move the relationship field between Product and ProductRank to the Product model because it looks like select_related() follows foreign keys in one direction only.
I haven't checked but:
products = Product.objects.filter(categories__pk=1).select_related()
Should grab every instance.
For Django 2.1
From documentation
This example retrieves all Entry objects with a Blog whose name is 'Beatles Blog':
Entry.objects.filter(blog__name='Beatles Blog')
Doc URL
https://docs.djangoproject.com/en/2.1/topics/db/queries/
Add a call to the QuerySet's select_related() method, though I'm not positive that grabs references in both directions, it is the most likely answer.

Categories

Resources