Does Django ManyToManyField create table with a redundant index? - python

If I have a model Foo that has a simple M2M field to model Bar:
class Foo(Model):
bar = ManyToManyField(Bar)
Django seems to create a table foo_bar which has the following indices:
index 1: primary, unique (id)
index 2: unique (foo_id, bar_id)
index 3: non_unique (foo_id)
index 4: non_unique (bar_id)
I recall from my basic knowledge of SQL, that if a query needs to look for conditions on foo_id, index 2 would suffice (since the left-most column can be used for lookup). index 3 seems to be redundant.
Am I correct to assume that index 3 does indeed take up index space while offering no benefit? That I'm better off using a through table and manually create a unique index on (foo_id, bar_id), and optionally, another index on (bar_id) if needed?

The key to understanding how a many-to-many association is represented in the database is to realize that each line of the junction table (in this case, foo_bar) connects one line from the left table (foo) with one line from the right table (bar). Each pk of "foo" can be copied many times to "foo_bar"; each pk of "bar" can also be copied many times to "foo_bar". But the same pair of fk's in "foo_bar" can only occur once.
So if you have only one index (pk of "foo" or "bar") in "foo_bar" it can be only one occurrence of it ... and it is not Many to many relation.
For example we have two models (e-commerce): Product, Order.
Each product can be in many orders and one order can contain many products.
class Product(models.Model):
...
class Order(models.Model):
products = ManyToManyField(Product, through='OrderedProduct')
class OrderedProduct(models.Model):
# each pair can be only one time, so in one order you can calculate price for each product (considering ordered amount of it).
# and at the same time you can get somewhere in your template|view all orders which contain same product
order = models.ForeignKey(Order)
product = models.ForeignKey(Product)
amount = models.PositiveSmallIntegerField() # amount of ordered products
price = models.IntegerField() # just int in this simple example
def save(self, *args, **kwargs):
self.price = self.product__price * self.amount
super(OrderedProduct, self).save(*args, **kwargs)

For someone else who is still wondering. This is a known issue and there is an open ticket on the django bug tracker :
https://code.djangoproject.com/ticket/22125

Related

How to get the latest (distinct) records filtered by a non unique field in Django

I'll demonstrate by using an example. This is the model (the primary key is implicit):
class Item(models.Model):
sku = models.CharField(null=False)
description = models.CharField(null=True)
I have a list of skus, I need to get the latest descriptions for all skus in the filter list that are written in the table for the model Item. Latest item == greatest id.
I need a way to annotate the latest description per sku:
Item.objects.values("sku").filter(sku__in=list_of_skus).annotate(latest_descr=Latest('description').order_by("-id")
but this won't work for various reasons (excluding the missing aggregate function).
Item.objects.values("sku").filter(sku__in=list_of_skus).annotate(latest_descr=Latest('description').lastest("-id")
Or use this
Item.objects.values("sku").filter(sku__in=list_of_skus).annotate(latest_descr=Latest('description').order_by("-id").reverse()[0]
I used postgres ArrayAgg aggregate function to aggregate the latest description like so:
from django.contrib.postgres.aggregates import ArrayAgg
class ArrayAggLatest(ArrayAgg):
template = "(%(function)s(%(expressions)s ORDER BY id DESC))[1]"
Item.objects.filter(sku__in=skus).values("sku").annotate(descr=ArrayAggLatest("description"))
The aggregate function aggregates all descriptions ordered by descending ID of the original table and gets the 1st element (0 element is None)
Answer from #M.J.GH.PY or #dekomote war not correct.
If you have a model:
class Item(models.Model):
sku = models.CharField(null=False)
description = models.CharField(null=True)
this model has already by default order_by= 'id',
You don't need annotate something. You can:
get the last object:
Item.objects.filter(sku__in=list_of_skus).last()
get the last value of description:
Item.objects.filter(sku__in=list_of_skus).values_list('description', flat=True).last()
Both variants give you a None if a queryset is empty.

How to get related model_set on querying with filter

Let's say I have 2 models
class Shipment(models.Model):
# ...
class ShipmentTimeline(models.Model):
shipment = models.ForeignKey(
Shipment, on_delete=models.CASCADE, related_name="shipmenttimeline")
I want to get related ShipmentTimeline objects along with the Shipment filtered list in one go. For Now, I'm querying the shipments like:
Shipment.objects.filter(
# some filters
).only('id')
As here I have IDs and then if I go with get() query first and then related_set of each shipment and then appending to a new list. it would be a lot of mess I guess in the for loops. Any better way to get the data like
qs = [{'Shipment 1', ['ShipmentTimeline_x', 'ShipmentTimeline_y']}, ...]

Use a model field to query another model field in django

I have two models in our django app
class Reg(models.Model):
transactions = ManyToMany
price = IntegerField
class Transaction(models.Model)
amount = IntegerField
Now I would like to make a lookup like:
Registration.objects.filter(reg__price==transaction__amount)
Previously we used the following approach:
Registration has a property is_paid that computes wether a transaction with equal amount exists
[r for r in Registration.objects.filter(...) if r.is_paid]
This is ofc very query-consuming and inefficient.
I wonder whether there would be a better way to do this!
Any hint is appreciated :)
You can use an F expression for such a query:
from django.db.models import F
Registration.objects.filter(price=F('transactions__amount'))
This will filter all Registration instances whose price is equal to one of their transactions' amount. If you want all transactions amounts' sum to be equal or more than the registration price, you can use annotations to aggregate each registration's Sum:
paid_registrations = Registration.objects.\
annotate(ta=Sum('transactions__amount')).\ # annotate with ta sum
filter(price__lte=F('ta')) # filter those whose price is <= that sum

Create model for MySQL view and join on it

I'm doing some prototyping and have a simple model like this
class SampleModel(models.Model):
user_id = models.IntegerField(default=0, db_index=True)
staff_id = models.IntegerField(default=0, db_index=True)
timestamp = models.DateTimeField(default=timezone.now, db_index=True)
objects = AsOfManager()
Now we need to do queries that require a self join, which written in raw SQL are simply something like this:
SELECT X.* FROM no_chain_samplemodel as X
JOIN (SELECT user_id, MAX(timestamp) AS timestamp
FROM no_chain_samplemodel
GROUP BY user_id) AS Y
ON (X.user_id = Y.user_id and X.timestamp = Y.timestamp);
This query should return for each user_id what is the last row ordering by timestamp. Each of this "chain" (of user_id related rows) could have thousands of rows potentially.
Now I could use raw SQL but then I lose composability, I would like to return another queryset.
And at the same time would be nice to make also writing raw SQL easier, so I thought I could use a database view.
The view could be just something like this
CREATE VIEW no_chain_sample_model_with_max_date AS SELECT user_id AS id, MAX(timestamp) AS timestamp
FROM no_chain_samplemodel
GROUP BY user_id;
So the model that refers to the view could be simply like this:
class SampleModelWithMaxDate(models.Model):
class Meta:
managed = False
db_table = 'no_chain_sample_model_with_max_date'
id = models.IntegerField(default=0, primary_key=True)
timestamp = models.DateTimeField(default=timezone.now, db_index=True)
However there are a few problems:
even if managed = False './manage.py makemigrations' still creates the migration for this table.
I even tried to leave the migration there but replacing the model with raw SQL to create the view
but no luck.
I need now to do select_related to join the two tables and query, but how should I do that?
I tried a foreign key on SampleModel like this:
by_date = models.ForeignKey(SampleModelWithMaxDate, null=True)
but this also doesn't work:
OperationalError: (1054, "Unknown column 'no_chain_sample_model_with_max_date.by_date_id' in 'field list'")
So in general I'm not even sure if it's possible, I can see other people that are using models with views and just for querying the independent model that works also for me, but is it possible to do anything smarter than that?
Thanks
I couldn't find any ORM method to get what you want in one query but we could kind of do this with two queries:
First, we get max timestamp for all the users
latest_timestamps = SampleModel.objects.values('user_id')
.annotate(max_ts=Max('timestamp')).values('max_ts')
Here values(user_id) works as group by operation.
Now, we get all the instanecs of SampleModel with the exact timestamps
qs = SampleModel.objects.filter(timestamp__in=latest_timestamps)
PostgreSQL speficic answer:
You could mix order_by and distinct to achieve what you want:
SampleModel.objects.order_by('user_id', '-timestamp').distinct('user_id')
Breaking it down:
# order by user_id, and in decreasing order of timestamp
qs = SampleModel.objects.order_by('user_id', '-timestamp')
# get distinct rows using user_id, this will make sure that the first entry for
# each user is retained and since we further ordered in decreasing order of
# timestamp for each user the first entry will have last row added
# for the user in the database.
qs = qs.distinct('user_id')

Selecting distinct values from a column in Peewee

I am looking to select all values from one column which are distinct using Peewee.
For example if i had the table
Organization Year
company_1 2000
company_1 2001
company_2 2000
....
To just return unique values in the organization column [i.e.company_1 and company_2]
I had assumed this was possible using the distinct option as documented http://docs.peewee-orm.com/en/latest/peewee/api.html#SelectQuery.distinct
My current code:
organizations_returned = organization_db.select().distinct(organization_db.organization_column).execute()
for item in organizations_returned:
print (item.organization_column)
Does not result in distinct rows returned (it results in e.g. company_1 twice).
The other option i tried:
organization_db.select().distinct([organization_db.organization_column]).execute()
included [ ] within the disctinct option, which although appearing to be more consistent with the documentation, resulted in the error peewee.OperationalError: near "ON": syntax error:
Am i correct in assume that it is possible to return unique values directly from Peewee - and if so, what am i doing wrong?
Model structure:
cd_sql = SqliteDatabase(sql_location, threadlocals=True, pragmas=(("synchronous", "off"),))
class BaseModel(Model):
class Meta:
database = cd_sql
class organization_db(BaseModel):
organization_column = CharField()
year_column = CharField()
So what coleifer was getting at is that Sqlite doesn't support DISTINCT ON. That's not a big issue though, I think you can accomplish what you want like so:
organization_db.select(organization_db.organization).distinct()

Categories

Resources