I'm doing some prototyping and have a simple model like this
class SampleModel(models.Model):
user_id = models.IntegerField(default=0, db_index=True)
staff_id = models.IntegerField(default=0, db_index=True)
timestamp = models.DateTimeField(default=timezone.now, db_index=True)
objects = AsOfManager()
Now we need to do queries that require a self join, which written in raw SQL are simply something like this:
SELECT X.* FROM no_chain_samplemodel as X
JOIN (SELECT user_id, MAX(timestamp) AS timestamp
FROM no_chain_samplemodel
GROUP BY user_id) AS Y
ON (X.user_id = Y.user_id and X.timestamp = Y.timestamp);
This query should return for each user_id what is the last row ordering by timestamp. Each of this "chain" (of user_id related rows) could have thousands of rows potentially.
Now I could use raw SQL but then I lose composability, I would like to return another queryset.
And at the same time would be nice to make also writing raw SQL easier, so I thought I could use a database view.
The view could be just something like this
CREATE VIEW no_chain_sample_model_with_max_date AS SELECT user_id AS id, MAX(timestamp) AS timestamp
FROM no_chain_samplemodel
GROUP BY user_id;
So the model that refers to the view could be simply like this:
class SampleModelWithMaxDate(models.Model):
class Meta:
managed = False
db_table = 'no_chain_sample_model_with_max_date'
id = models.IntegerField(default=0, primary_key=True)
timestamp = models.DateTimeField(default=timezone.now, db_index=True)
However there are a few problems:
even if managed = False './manage.py makemigrations' still creates the migration for this table.
I even tried to leave the migration there but replacing the model with raw SQL to create the view
but no luck.
I need now to do select_related to join the two tables and query, but how should I do that?
I tried a foreign key on SampleModel like this:
by_date = models.ForeignKey(SampleModelWithMaxDate, null=True)
but this also doesn't work:
OperationalError: (1054, "Unknown column 'no_chain_sample_model_with_max_date.by_date_id' in 'field list'")
So in general I'm not even sure if it's possible, I can see other people that are using models with views and just for querying the independent model that works also for me, but is it possible to do anything smarter than that?
Thanks
I couldn't find any ORM method to get what you want in one query but we could kind of do this with two queries:
First, we get max timestamp for all the users
latest_timestamps = SampleModel.objects.values('user_id')
.annotate(max_ts=Max('timestamp')).values('max_ts')
Here values(user_id) works as group by operation.
Now, we get all the instanecs of SampleModel with the exact timestamps
qs = SampleModel.objects.filter(timestamp__in=latest_timestamps)
PostgreSQL speficic answer:
You could mix order_by and distinct to achieve what you want:
SampleModel.objects.order_by('user_id', '-timestamp').distinct('user_id')
Breaking it down:
# order by user_id, and in decreasing order of timestamp
qs = SampleModel.objects.order_by('user_id', '-timestamp')
# get distinct rows using user_id, this will make sure that the first entry for
# each user is retained and since we further ordered in decreasing order of
# timestamp for each user the first entry will have last row added
# for the user in the database.
qs = qs.distinct('user_id')
Related
Let's say I have 2 models
class Shipment(models.Model):
# ...
class ShipmentTimeline(models.Model):
shipment = models.ForeignKey(
Shipment, on_delete=models.CASCADE, related_name="shipmenttimeline")
I want to get related ShipmentTimeline objects along with the Shipment filtered list in one go. For Now, I'm querying the shipments like:
Shipment.objects.filter(
# some filters
).only('id')
As here I have IDs and then if I go with get() query first and then related_set of each shipment and then appending to a new list. it would be a lot of mess I guess in the for loops. Any better way to get the data like
qs = [{'Shipment 1', ['ShipmentTimeline_x', 'ShipmentTimeline_y']}, ...]
Using SQLAlchemy, I am simply trying to query a set of records such as follows
session.query(MyTable).filter_by(foreign_id=413).all()
Then, I just need to make a copy of these records, change the foreign_id, and save them back to the same table as new records. I can't think of an efficient way to do this right now. The only thing that I have come up with is looping through the result set, creating new records that share all the same properties besides foreign_id and then bulk saving these new records.
It is important that I keep the original records in tact, so simply updating the rows is not an option.
If it helps, here is essentially the MyTable object
class MyTable(Base):
__tablename__ = 'my_table'
id = Column(Integer, primary_key=True)
col_a = Column(String(64))
col_b = Column(String(64))
foreign_id = Column(Integer, ForeignKey('other_table.id'))
In this example, I would want to keep col_a and col_b the same, but update the foreign_id and id columns.
If you were using Django ORM, you could simply set the id field to None and save it, creating a copied record, assuming the primary id field was auto-generated.
With SQLAlchemy, this will not work, nor does it make sense in pure SQL terms.
An efficient and extensible way to achieve this is:
from sqlalchemy.orm import class_mapper
recs = session.query(MyTable).filter_by(foreign_id=413).all()
for rec in recs:
newrec = MyTable()
for item in [p.key for p in class_mapper(MyTable).iterate_properties]:
if item not in ['id', 'foreign_id']:
setattr(newrec, item, getattr(rec, item))
# assign foreign field as appropriate...
session.add(newrec)
session.commit()
I am looking to select all values from one column which are distinct using Peewee.
For example if i had the table
Organization Year
company_1 2000
company_1 2001
company_2 2000
....
To just return unique values in the organization column [i.e.company_1 and company_2]
I had assumed this was possible using the distinct option as documented http://docs.peewee-orm.com/en/latest/peewee/api.html#SelectQuery.distinct
My current code:
organizations_returned = organization_db.select().distinct(organization_db.organization_column).execute()
for item in organizations_returned:
print (item.organization_column)
Does not result in distinct rows returned (it results in e.g. company_1 twice).
The other option i tried:
organization_db.select().distinct([organization_db.organization_column]).execute()
included [ ] within the disctinct option, which although appearing to be more consistent with the documentation, resulted in the error peewee.OperationalError: near "ON": syntax error:
Am i correct in assume that it is possible to return unique values directly from Peewee - and if so, what am i doing wrong?
Model structure:
cd_sql = SqliteDatabase(sql_location, threadlocals=True, pragmas=(("synchronous", "off"),))
class BaseModel(Model):
class Meta:
database = cd_sql
class organization_db(BaseModel):
organization_column = CharField()
year_column = CharField()
So what coleifer was getting at is that Sqlite doesn't support DISTINCT ON. That's not a big issue though, I think you can accomplish what you want like so:
organization_db.select(organization_db.organization).distinct()
If I have a model Foo that has a simple M2M field to model Bar:
class Foo(Model):
bar = ManyToManyField(Bar)
Django seems to create a table foo_bar which has the following indices:
index 1: primary, unique (id)
index 2: unique (foo_id, bar_id)
index 3: non_unique (foo_id)
index 4: non_unique (bar_id)
I recall from my basic knowledge of SQL, that if a query needs to look for conditions on foo_id, index 2 would suffice (since the left-most column can be used for lookup). index 3 seems to be redundant.
Am I correct to assume that index 3 does indeed take up index space while offering no benefit? That I'm better off using a through table and manually create a unique index on (foo_id, bar_id), and optionally, another index on (bar_id) if needed?
The key to understanding how a many-to-many association is represented in the database is to realize that each line of the junction table (in this case, foo_bar) connects one line from the left table (foo) with one line from the right table (bar). Each pk of "foo" can be copied many times to "foo_bar"; each pk of "bar" can also be copied many times to "foo_bar". But the same pair of fk's in "foo_bar" can only occur once.
So if you have only one index (pk of "foo" or "bar") in "foo_bar" it can be only one occurrence of it ... and it is not Many to many relation.
For example we have two models (e-commerce): Product, Order.
Each product can be in many orders and one order can contain many products.
class Product(models.Model):
...
class Order(models.Model):
products = ManyToManyField(Product, through='OrderedProduct')
class OrderedProduct(models.Model):
# each pair can be only one time, so in one order you can calculate price for each product (considering ordered amount of it).
# and at the same time you can get somewhere in your template|view all orders which contain same product
order = models.ForeignKey(Order)
product = models.ForeignKey(Product)
amount = models.PositiveSmallIntegerField() # amount of ordered products
price = models.IntegerField() # just int in this simple example
def save(self, *args, **kwargs):
self.price = self.product__price * self.amount
super(OrderedProduct, self).save(*args, **kwargs)
For someone else who is still wondering. This is a known issue and there is an open ticket on the django bug tracker :
https://code.djangoproject.com/ticket/22125
I have models, more or less like this:
class ModelA(models.Model):
field = models.CharField(..)
class ModelB(models.Model):
name = models.CharField(.., unique=True)
modela = models.ForeignKey(ModelA, blank=True, related_name='modelbs')
class Meta:
unique_together = ('name','modela')
I want to do a query that says something like: "Get all the ModelA's where field name equals to X that have a ModelB model with a name of X OR with no model name at all"
So far I have this:
ModelA.objects.exclude(field=condition).filter(modelsbs__name=condition)
This will get me all the ModelAs that have at least one modelB (and in reality it will ALWAYS be just one) - but if a ModelA has no related ModelBs, it will not be in the result set. I need it to be in the resultset with something like obj.modelb = None
How can I accomplish this?
Use Q to combine the two conditions:
from django.db.models import Q
qs = ModelA.objects.exclude(field=condition)
qs = qs.filter(Q(modelbs__name=condition) | Q(modelbs__isnull=True))
To examine the resulting SQL query:
print qs.query.as_sql()
On a similar query, this generates a LEFT OUTER JOIN ... WHERE (a.val = b OR a.id IS NULL).
It looks like you are coming up against the 80% barrier. Why not just use .extra(select={'has_x_or_none':'(EXISTS (SELECT ...))'}) to perform a subquery? You can write the subquery any way you like and should be able to filter against the new field. The SQL should wind up looking something like this:
SELECT *,
((EXISTS (SELECT * FROM other WHERE other.id=primary.id AND other.name='X'))
OR (NOT EXISTS (SELECT * FROM other WHERE other.id=primary.id))) AS has_x_or_none
FROM primary WHERE has_x_or_none=1;
Try this patch for custom joins: https://code.djangoproject.com/ticket/7231
LEFT JOIN is a union of two queries. Sometimes it's optimized to one query. Sometimes, it is not actually optimized by the underlying SQL engine and is done as two separate queries.
Do this.
for a in ModelA.objects.all():
related = a.model_b.set().all()
if related.count() == 0:
# These are the A with no B's
else:
# These are the A with some B's
Don't fetishize about SQL outer joins appearing to be a "single" query.