Access several tables with only one query

Access several tables with only one query - python

I have the same schema in my django application:
class SomeModel(models.Model):
value = models.CharField(max_length=30)
class AbstractModel(models.Model):
someModel = models.ForeignKey(SomeModel)
class Meta:
abstract = True
class A(AbstractModel):
anotherValue = models.CharField(max_length=5)
class B(AbstractModel):
anotherValue = models.CharField(max_length=5)
class C(AbstractModel):
anotherValue = models.CharField(max_length=5)
class D(AbstractModel):
anotherValue = models.CharField(max_length=5)
class E(AbstractModel):
anotherValue = models.CharField(max_length=5)
With this layout, I need the most efficient way to query all objects from models A, B, C, D and E with a given id of SomeModel. I know that I cannot execute a query in an abstract model, so right now, what I do is query each model separately like this:
A.objects.filter(someModel__id=id)
B.objects.filter(someModel__id=id)
C.objects.filter(someModel__id=id)
D.objects.filter(someModel__id=id)
E.objects.filter(someModel__id=id)
Obviously this approach is quite slow, because I need to make 5 different queries each time I want to know all those objects. So my question is, is there a way to optimize this kind of query?
UPDATE:
I have tried the union method like this:
qs1 = A.objects.filter(**filters) # hits DB
qs2 = B.objects.filter(**filters) # hits DB
qs3 = C.objects.filter(**filters) # hits DB
qs4 = D.objects.filter(**filters) # hits DB
qs5 = E.objects.filter(**filters) # hits DB
qs1.union(qs2, qs3, qs4, qs5) # hits DB
That's actually 6 hits to the database!! I woulk like only one!
I have checked this printing the number of queries made:
from django.conf import settings
settings.DEBUG = True
from django.db import connection
print(len(connection.queries))

You may use union method, but what you want to do? If you want to call five objects by one pk and you want to be sure that they have strict relation between each other you may use OneToOne relationship.
So in the first case you just need to make a query, in the second case you must make new migration and maybe you will need to rebuild your tables.

Related

How to use prefetch_related on two M2M values?

I want to prefetch_related to two level of M2M values,
Here is my models.py
class A(models.Model):
name = models.CharField(max_length=40)
b = models.ManyToManyField('B')
class B(models.Model):
name = models.CharField(max_length=40)
c = models.ManyToManyField('C')
class C(models.Model):
name = models.CharField(max_length=40)
d = models.ManyToManyField('D')
And my ORM is
a_obj = A.objects.all().prefetch_related('a__b__c')
And I am trying to access the values like below,
Method A:
for each_obj in a_obj:
print(each_obj.a__b__c)
Method B:
for each_obj in a_obj:
print(each_obj.a.all())
Method A throws an error saying No such value a__b__b for A found
Method B doesn't throw any error, but the number of queries increases to the length of a_obj.
Is there a way to access a__b__c in a single query?

You load both the related B and C models with .prefetch_related(…) [Django-doc]:
a_objs = A.objects.prefetch_related('b__c')
But here .prefetch_related(…) does not change how the items look, it simply loads items. You thus can access these with:
for a in a_objs:
for b in a.b.all():
for c in b.c.all():
print(f'{a} {b} {c}')
You this still access the items in the same way, but here Django will already load the objects in advance to prevent extra queries.

Django ORM get jobs with top 3 scores for each model_used

Models.py:
class ScoringModel(models.Model):
title = models.CharField(max_length=64)
class PredictedScore(models.Model):
job = models.ForeignKey('Job')
candidate = models.ForeignKey('Candidate')
model_used = models.ForeignKey('ScoringModel')
score = models.FloatField()
created_at = models.DateField(auto_now_add=True)
modified_at = models.DateTimeField(auto_now=True)
serializers.py:
class MatchingJobsSerializer(serializers.ModelSerializer):
job_title = serializers.CharField(source='job.title', read_only=True)
class Meta:
model = PredictedScore
fields = ('job', 'job_title', 'score', 'model_used', 'candidate')
To fetch the top 3 jobs, I tried the following code:
queryset = PredictedScore.objects.filter(candidate=candidate)
jobs_serializer = MatchingJobsSerializer(queryset, many=True)
jobs = jobs_serializer.data
top_3_jobs = heapq.nlargest(3, jobs, key=lambda item: item['score'])
Its giving me top 3 jobs for the whole set which contains all the models.
I want to fetch the jobs with top 3 scores for a given candidate for each model used.
So, it should return the top 3 matching jobs with each ML model for the given candidate.
I followed this answer https://stackoverflow.com/a/2076665/2256258 . Its giving the latest entry of cake for each bakery, but I need the top 3.
I read about annotations in django ORM but couldn't get much about this issue. I want to use DRF serializers for this operations. This is a read only operation.
I am using Postgres as database.
What should be the Django ORM query to perform this operation?

Make the database do the work. You don't need annotations either as you want the objects, not the values or manipulated values.
To get a set of all scores for a candidate (not split by model_used) you would do:
queryset = candidate.property_set.filter(candidate=candidate).order_by('-score)[:2]
jobs_serializer = MatchingJobsSerializer(queryset, many=True)
jobs = jobs_serializer.data
What you're proposing isn't particularly well suited in the Django ORM, annoyingly - I think you may need to make separate queries for each model_used. A nicer solution (untested for this example) is to hook Q queries together, as per this answer.
Example is there is tags, but I think holds -
#lets get a distinct list of the models_used -
all_models_used = PredictedScore.objects.values('models_used').distinct()
q_objects = Q() # Create an empty Q object to start with
for m in all_models_used:
q_objects |= Q(model_used=m)[:3] # 'or' the Q objects together
queryset = PredictedScore.objects.filter(q_objects)

Django object locking

I'm using Django 1.8 and I have a model
class ModelA(models.Model):
some_field = models.PositiveIntegerField()
Now, in my view I want to add a new ModelA object, but only if there are fewer than x entries for that value already.
def my_view(request):
# Using the value of 4 here just as an example
c = ModelA.objects.filter(some_field=4).count()
# Check if fewer than (x=20) objects with this field already
if c < 20:
# Fewer, so create one
new_model = ModelA(4)
new_model.save()
else:
# Return a message saying "too many"
From my understanding, there could be more than one thread running this method and so thread 1 may perform the count and there are fewer than 20 and then the other thread saves a new object, then thread 1 would save its object and there be 20 or more.
Is there some way to have the view be
def my_view(request):
get_a_lock_on_model(ModelA)
c = ModelA.objects....
# Rest of the code the same
release_lock_on_model(ModelA)
Or is there some other way I should be thinking about doing this? There are only ever inserts, never updates or deletes.
Thanks!

In order to do this you need to lock the entire table and how to do that depends on the RDBMS that you are using. It will involve the use of raw sql. An alternative approach is to do the count after you have saved your record
def my_view(request):
new_model = ModelA(4)
new_model.save()
try :
c = ModelA.objects.filter(some_field=4)[20]
if c.pk == new_model.pk:
c.delete()
# Return a message saying "too many"
except IndexError:
pass
This approach does not get in each others way, each thread is responsible for deleting the extra item that it added. Instead of deleting you can use atomic and rollback if the count is greater than 20

Tested on Django 1.10.x and postgres:
models.py:
class ModelA(models.Model):
some_field = models.PositiveIntegerField()
active = models.BooleanField()
And:
from django.db.models.expressions import RawSQL
n = 42
maximum = 3
raw_sql = RawSQL('select (select count(*) from fooapp_modela where some_field=%s) < %s', (n, maximum))
while True:
o = ModelA.objects.create(some_field=n, active=raw_sql)
o.refresh_from_db()
print(o.id, o.active)
if not o.active:
# o.delete()
break
Caveat: By default, while one transaction is active, other transactions on other connections could not "see" inserted rows until the transactions are committed. Try to avoid creating rows in a complex transactions. I believe that this means that this method is not completely bullet proof :-( More info: https://www.postgresql.org/docs/9.6/static/transaction-iso.html .
A more robust solution might include a db constraint (probably unique_together):
class ModelA(models.Model):
some_field = models.PositiveIntegerField()
ordinal = models.IntegerField()
class Meta:
unique_together = (
('some_field', 'ordinal'),
)
#...
raw_sql = RawSQL('select count(*) + 1 from fooapp_modela where some_field=%s', (n,))
o = ModelA.objects.create(some_field=n, ordinal=raw_sql) # retry a few times on IntegrityError
o.refresh_from_db()
print(o.id, o.ordinal)

Django - Select same column from many tables

I have many models with the same column name, something like this example:
class Model1(models.Model):
name = models.CharField(max_lenght=50)
other_field = ....
class Model2(models.Model):
name = models.CharField(max_lenght=50)
other_different_field = ....
class Model3(models.Model):
name = models.CharField(max_lenght=50)
different_field = ....
I need to retrieve all names(name column) from those tables (Models) in one Django-sintax query. I have only one filter (name__startswith='bla')
Is it possible?
If not, what is the easiest way to get this?

This is unnecessary and impossible to do in one query, but if you really want to get it fast:
result = []
for model in [Model1, Model2, Model3]:
names = model.objects.filter(name__startswith='bla') \
.values_list('name', flat=True).distinct()
result += names
This would hit database as many times as the amount of models you have(in this case it's three).

This Models are not related each other, you need to query them indivually:
Model1.objects.filter(name__startswith='bla')
Model2.objects.filter(name__startswith='bla')
Model3.objects.filter(name__startswith='bla')

If you really need this, you can use something like this:
from django.db.models.loading import get_model
names = []
models = [('core','Model1'), ('core','Model2'), ('core','Model3')]
klasses = [get_model(app, model) for app, model in models]
for klass in klasses:
for obj in klass.objects.filter(name__startswith='bla'):
names.append(obj.name)
Of course this will hit the database more than once.
You can specify the models directly, like #Shang Wang says:
klasses = [Model1, Model2, Model3]
But if you need, something more complex, maybe the get_model, can be helpful.

Django - Checking for two models if their primary keys match

I have 2 models (sett, data_parsed), and data_parsed have a foreign key to sett.
class sett(models.Model):
setid = models.IntegerField(primary_key=True)
block = models.ForeignKey(mapt, related_name='sett_block')
username = models.ForeignKey(mapt, related_name='sett_username')
ts = models.IntegerField()
def __unicode__(self):
return str(self.setid)
class data_parsed(models.Model):
setid = models.ForeignKey(sett, related_name='data_parsed_setid', primary_key=True)
block = models.CharField(max_length=2000)
username = models.CharField(max_length=2000)
time = models.IntegerField()
def __unicode__(self):
return str(self.setid)
The data_parsed model should have the same amount of rows, but there is a possibility that they are not in "sync".
To avoid this from happening. I basically do these two steps:
Check if sett.objects.all().count() == data_parsed.objects.all().count()
This works great for a fast check, and it takes literally seconds in 1 million rows.
If they are not the same, I would check for all the sett model's pk, exclude the ones already found in data_parsed.
sett.objects.select_related().exclude(
setid__in = data_parsed.objects.all().values_list('setid', flat=True)).iterator():
Basically what this does is select all the objects in sett that exclude all the setid already in data_parsed. This method "works", but it will take around 4 hours for 1 million rows.
Is there a faster way to do this?

Finding setts without data_parsed using the reverse relation:
setts.objects.filter(data_parsed_setid__isnull=True)

If i am getting it right you are trying to keep a list of processed objects in another model by setting a foreign key.
You have only one data_parsed object by every sett object, so a many to one relationship is not needed. You could use one to one relationships and then check which object has that field as empty.
With a foreign key you could try to filter using the reverse query but that is at object level so i doubt that works.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Access several tables with only one query - python

Related

How to use prefetch_related on two M2M values?

Django ORM get jobs with top 3 scores for each model_used

Django object locking

Django - Select same column from many tables

Django - Checking for two models if their primary keys match

Categories

Resources