How to do ...ON DUPLICATE KEY UPDATE... in django

How to do ...ON DUPLICATE KEY UPDATE... in django - python

Is there a way to do the following in django's ORM?
INSERT INTO mytable
VALUES (1,2,3)
ON DUPLICATE KEY
UPDATE field=4
I'm familiar with get_or_create, which takes default values, but that doesn't update the record if there are differences in the defaults. Usually I use the following approach, but it takes two queries instead of one:
item = Item(id=1)
item.update(**fields)
item.save()
Is there another way to do this?

I'm familiar with get_or_create, which takes default values, but that doesn't update the record if there are differences in the defaults.
update_or_create should provide the behavior you're looking for.
Item.objects.update_or_create(
id=1,
defaults=fields,
)
It returns the same (object, created) tuple as get_or_create.
Note that this will still perform two queries, but only in the event the record does not already exist (as is the case with get_or_create). If that is for some reason unacceptable, you will likely be stuck writing raw SQL to handle this, which would be unfortunate in terms of readability and maintainability.

I think get_or_create() is still the answer, but only specify the pk field(s).
item, _ = Item.objects.get_or_create(id=1)
item.update(**fields)
item.save()

Django 4.1 has added the support for INSERT...ON DUPLICATE KEY UPDATE query. It will update the fields in case the unique validation fails.
Example of above in a single query:
# Let's say we have an Item model with unique on key
items = [
Item(key='foobar', value=10),
Item(key='foobaz', value=20),
]
# this function will create 2 rows in a single SQL query
Item.objects.bulk_create(items)
# this time it will update the value for foobar
# and create new row for barbaz
# all in a single SQL query
items = [
Item(key='foobar', value=30),
Item(key='barbaz', value=50),
]
Item.objects.bulk_create(
items,
update_conflicts=True,
update_fields=['rate']
)

Related

Django querysets optimization - preventing selection of annotated fields

Let's say I have following models:
class Invoice(models.Model):
...
class Note(models.Model):
invoice = models.ForeignKey(Invoice, related_name='notes', on_delete=models.CASCADE)
text = models.TextField()
and I want to select Invoices that have some notes. I would write it using annotate/Exists like this:
Invoice.objects.annotate(
has_notes=Exists(Note.objects.filter(invoice_id=OuterRef('pk')))
).filter(has_notes=True)
This works well enough, filters only Invoices with notes. However, this method results in the field being present in the query result, which I don't need and means worse performance (SQL has to execute the subquery 2 times).
I realize I could write this using extra(where=) like this:
Invoice.objects.extra(where=['EXISTS(SELECT 1 FROM note WHERE invoice_id=invoice.id)'])
which would result in the ideal SQL, but in general it is discouraged to use extra / raw SQL.
Is there a better way to do this?

You can remove annotations from the SELECT clause using .values() query set method. The trouble with .values() is that you have to enumerate all names you want to keep instead of names you want to skip, and .values() returns dictionaries instead of model instances.
Django internaly keeps the track of removed annotations in
QuerySet.query.annotation_select_mask. So you can use it to tell Django, which annotations to skip even wihout .values():
class YourQuerySet(QuerySet):
def mask_annotations(self, *names):
if self.query.annotation_select_mask is None:
self.query.set_annotation_mask(set(self.query.annotations.keys()) - set(names))
else:
self.query.set_annotation_mask(self.query.annotation_select_mask - set(names))
return self
Then you can write:
invoices = (Invoice.objects
.annotate(has_notes=Exists(Note.objects.filter(invoice_id=OuterRef('pk'))))
.filter(has_notes=True)
.mask_annotations('has_notes')
)
to skip has_notes from the SELECT clause and still geting filtered invoice instances. The resulting SQL query will be something like:
SELECT invoice.id, invoice.foo FROM invoice
WHERE EXISTS(SELECT note.id, note.bar FROM notes WHERE note.invoice_id = invoice.id) = True
Just note that annotation_select_mask is internal Django API that can change in future versions without a warning.

Ok, I've just noticed in Django 3.0 docs, that they've updated how Exists works and can be used directly in filter:
Invoice.objects.filter(Exists(Note.objects.filter(invoice_id=OuterRef('pk'))))
This will ensure that the subquery will not be added to the SELECT columns, which may result in a better performance.
Changed in Django 3.0:
In previous versions of Django, it was necessary to first annotate and then filter against the annotation. This resulted in the annotated value always being present in the query result, and often resulted in a query that took more time to execute.
Still, if someone knows a better way for Django 1.11, I would appreciate it. We really need to upgrade :(

We can filter for Invoices that have, when we perform a LEFT OUTER JOIN, no NULL as Note, and make the query distinct (to avoid returning the same Invoice twice).
Invoice.objects.filter(notes__isnull=False).distinct()

This is best optimize code if you want to get data from another table which primary key reference stored in another table
Invoice.objects.filter(note__invoice_id=OuterRef('pk'),)

We should be able to clear the annotated field using the below method.
Invoice.objects.annotate(
has_notes=Exists(Note.objects.filter(invoice_id=OuterRef('pk')))
).filter(has_notes=True).query.annotations.clear()

Date ranges prefetch gives multiple entries instead of 1

The following returns 3 objects but this should be only 1. Since there is only 1 InsiderTrading object that has these filters, but there are 3 owners.
quarter_trading_2018q1 = InsiderTrading.objects.filter(
issuer=company_issuer.pk,
owners__company=company.pk,
transaction_date__range=["2018-01-01", "2018-03-30"]
).prefetch_related('owners')
If I however remove the owner_company filter it return 1 (correct behaviour)
quarter_trading_2018q1 = InsiderTrading.objects.filter(
issuer=company_issuer.pk,
transaction_date__range=["2018-01-01", "2018-03-30"]
).prefetch_related('owners')
But I still want to filter on owners_company, how do I get 1 returned then?

You should add a distinct().
InsiderTrading.objects.filter(
issuer=company_issuer.pk,
owners__company=company.pk,
transaction_date__range=["2018-01-01", "2018-03-30"]
).distinct().prefetch_related('owners')

If distinct works: this means that this query results with
InsiderTrading.objects.filter(
issuer=company_issuer.pk,
owners__company=company.pk,
transaction_date__range=["2018-01-01", "2018-03-30"]
)
with multiple results not one.
Django docs states
select_related() "follows" foreign-key relationships, selecting
additional related-object data when it executes its query.
prefetch_related() does a separate lookup for each relationship, and
does the "joining" in Python.
I'd say use select_related instead.
check https://stackoverflow.com/a/31237071/4117381
Another solution is to use group_by owner_id.

How to update multiple records using peewee

I'm using Peewee with Postgres database. I want to know how to update multiple records in a tabel at once?
We can perform this update in SQL using these commands, and I'm looking for a Peewee equivalent approach.

Yes, you can use the insert_many() function:
Insert multiple rows at once. The rows parameter must be an iterable
that yields dictionaries. As with insert(), fields that are not
specified in the dictionary will use their default value, if one
exists.
Example:
usernames = ['charlie', 'huey', 'peewee', 'mickey']
row_dicts = ({'username': username} for username in usernames)
# Insert 4 new rows.
User.insert_many(row_dicts).execute()
More details at: http://docs.peewee-orm.com/en/latest/peewee/api.html#Model.insert_many

ORMs usually dose not support bulk update and you have to use custom SQL, you can see samples in this link (db.excute_sql)

Effective batch "update-or-insert" in SqlAlchemy

There exists a table Users and in my code I have a big list of User objects. To insert them I can use :
session.add_all(user_list)
session.commit()
The problem is that there can be several duplicates which I want to update but the database wont allow to insert duplicate entries. For sure, I can iterate over user_list and try to insert user in the database and if it fails - update it :
for u in users:
q = session.query(T).filter(T.fullname==u.fullname).first()
if q:
session.query(T).filter_by(index=q.index).update({column: getattr(u,column) for column in Users.__table__.columns.keys() if column!='id'})
session.commit()
else:
session.add(u)
session.commit()
but I find this solution quiet ineffective : first, I am making several requests to retrieve object q, and instead of batch inserting of new items I insert them one per one. I wonder if there exists a better solution for this task.
UPD better version:
for u in users:
q = session.query(T).filter(Users.fullname==u.fullname).first()
if q:
for column in Users.__table__.columns.keys():
if not column=='index':
setattr(q,column,getattr(u,column))
session.add(q)
else:
session.add(u)
session.commit()

a better solution would be to use
INSERT ... ON DUPLICATE KEY UPDATE ...
bulk MySQL construct (I assume you're using MySQL because your post is tagged with 'mysql'). This way you're both inserting new entries and updating existing ones in one statement / transaction, see http://dev.mysql.com/doc/refman/5.6/en/insert-on-duplicate.html
It's not ideal if you have multiple unique indexes and, depending on your schema, you'll have to fill in all NOT NULL values (hence issuing one bulk SELECT before calling it), but it's definitely the most efficient option and we use it a lot. The bulk version will look something like (let's assume name is a unique key):
INSERT INTO User (name, phone, ...) VALUES
('ksmith', '111-11-11', ...),
('jford', '222-22,22', ...),
...,
ON DUPLICATE KEY UPDATE
phone = VALUES(phone),
... ;
Unfortunately, INSERT ... ON DUPLICATE KEY UPDATE ... is not supported natively by SQLa so you'll have to implement a little helper function which will build the query for you.

How to efficiently fetch objects after created using bulk_create function of Django ORM?

I have to insert multiple objects in a table, there are two ways to do that-
1) Insert each one using save(). But in this case there will be n sql dB queries for n objects.
2) Insert all of them together using bulk_create(). In this case there will be one sql dB query for n objects.
Clearly, second option is better and hence I am using that. Now the problem with bulk__create is that it does not return ids of the inserted objects hence they can not be used further to create objects of other models which have foreign key to the created objects.
To overcome this, we need to fetch the objects created by bulk_create.
Now the question is "assuming as in my situation, there is no way to uniquely identify the created objects, how do we fetch them?"
Currently I am maintaining a time_stamp to fetch them, something like below-
my_objects = []
# Timestamp to be used for fetching created objects
time_stamp = datetime.datetime.now()
# Creating list of intantiated objects
for obj_data in obj_data_list:
my_objects.append(MyModel(**obj_data))
# Bulk inserting the instantiated objects to dB
MyModel.objects.bulk_create(my_objects)
# Using timestamp to fetch the created objects
MyModel.objects.filter(created_at__gte=time_stamp)
Now this works good, but will fail in one case.
If at the time of bulk-creating these objects, some more objects are created from somewhere else, then those objects will also be fetched in my query, which is not desired.
Can someone come up with a better solution?

As bulk_create will not create the primary keys, you'll have to supply the keys yourself.
This process is simple if you are not using the default generated primary key, which is an AutoField.
If you are sticking with the default, you'll need to wrap your code into an atomic transaction and supply the primary key yourself. This way you'll know what records are inserted.
from django.db import transaction
inserted_ids = []
with transacation.atomic():
my_objects = []
max_id = int(MyModel.objects.latest('pk').pk)
id_count = max_id
for obj_data in obj_data_list:
id_count += 1
obj_data['id'] = id_count
inserted_ids.append(obj_data['id'])
my_objects.append(MyModel(**obj_data))
MyModel.objects.bulk_create(my_objects)
inserted_ids = range(max_id, id_count)

As you already know.
If the model’s primary key is an AutoField it does not retrieve and
set the primary key attribute, as save() does.
The way you're trying to do, it's usually the way people do.
The other solution in some cases, this way is better.
my_ids = MyModel.objects.values_list('id', flat=True)
objs = MyModel.objects.bulk_create(my_objects)
new_objs = MyModel.objects.exclude(id__in=my_ids).values_list('id', flat=True)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to do ...ON DUPLICATE KEY UPDATE... in django - python

I think get_or_create() is still the answer, but only specify the pk field(s). item, _ = Item.objects.get_or_create(id=1) item.update(**fields) item.save()

Related

Django querysets optimization - preventing selection of annotated fields

Date ranges prefetch gives multiple entries instead of 1

How to update multiple records using peewee

Effective batch "update-or-insert" in SqlAlchemy

How to efficiently fetch objects after created using bulk_create function of Django ORM?

Categories

Resources