Django - Checking for two models if their primary keys match - python

I have 2 models (sett, data_parsed), and data_parsed have a foreign key to sett.
class sett(models.Model):
setid = models.IntegerField(primary_key=True)
block = models.ForeignKey(mapt, related_name='sett_block')
username = models.ForeignKey(mapt, related_name='sett_username')
ts = models.IntegerField()
def __unicode__(self):
return str(self.setid)
class data_parsed(models.Model):
setid = models.ForeignKey(sett, related_name='data_parsed_setid', primary_key=True)
block = models.CharField(max_length=2000)
username = models.CharField(max_length=2000)
time = models.IntegerField()
def __unicode__(self):
return str(self.setid)
The data_parsed model should have the same amount of rows, but there is a possibility that they are not in "sync".
To avoid this from happening. I basically do these two steps:
Check if sett.objects.all().count() == data_parsed.objects.all().count()
This works great for a fast check, and it takes literally seconds in 1 million rows.
If they are not the same, I would check for all the sett model's pk, exclude the ones already found in data_parsed.
sett.objects.select_related().exclude(
setid__in = data_parsed.objects.all().values_list('setid', flat=True)).iterator():
Basically what this does is select all the objects in sett that exclude all the setid already in data_parsed. This method "works", but it will take around 4 hours for 1 million rows.
Is there a faster way to do this?

Finding setts without data_parsed using the reverse relation:
setts.objects.filter(data_parsed_setid__isnull=True)

If i am getting it right you are trying to keep a list of processed objects in another model by setting a foreign key.
You have only one data_parsed object by every sett object, so a many to one relationship is not needed. You could use one to one relationships and then check which object has that field as empty.
With a foreign key you could try to filter using the reverse query but that is at object level so i doubt that works.

Related

Django - Can I add a calculated field that only exists for a particular sub-set or occurences of my model?

Imagine that you have a model with some date-time fields that can be categorized depending on the date. You make an annotation for the model with different cases that assign a different 'status' depending on the calculation for the date-time fields:
#Models.py
class Status(models.TextChoices):
status_1 = 'status_1'
status_2 = 'status_2'
status_3 = 'status_3'
special_status = 'special_status'
class MyModel(models.Model):
important_date_1 = models.DateField(null=True)
important_date_2 = models.DateField(null=True)
calculated_status = models.CharField(max_length=32, choices=Status.choices, default=None, null=True, blank=False,)
objects = MyModelCustomManager()
And the manager with which to do the calculation as annotations:
# managers.py
class MyModelCustomManager(models.Manager):
def get_queryset(self):
queryset = super().get_queryset().annotate(**{
'status': Case(
When(**{'important_date_1' is foo, 'then':
Value(Status.status_1)}),
When(**{'important_date_2' is fii, 'then':
Value(Status.status_2)}),
When(**{'important_date_1' is foo AND 'importante_date_2' is whatever, 'then':
Value(Status.status_3)}),
# And so on and so on
)
}
)
return queryset
Now, here's where it gets tricky. Only one of these sub-sets of occurrences on the model requires an ADDITIONAL CALCULATED FIELD that literally only exists for it, that looks something like this:
special_calculated_field = F('important_date_1') - F('importante_date_2') #Only for special_status
So, basically I want to make a calculated field with the condition that the model instance must belong to this specific status. I don't want to make it an annotation, because other instances of the model would always have this value set to Null or empty if it were a field or annotation and I feel like it would be a waste of a row in the database.
Is there way, for example to do this kind of query:
>>> my_model_instance = MyModel.objects.filter(status='special_status')
>>> my_model_instance.special_calculated_field
Thanks a lot in advance if anyone can chime in with some help.

Filter Django model on reverse relationship list

I have two Django models as follows:
class Event(models.Model):
name = models.CharField()
class EventPerson(models.Model):
event = models.ForeignKey('Event',on_delete='CASCADE',related_name='event_persons')
person_name = models.CharField()
If an Event exists in the database, it will have exactly two EventPerson objects that are related to it.
What I want to do is to determine if there exists an Event with a given name AND that have a given set of two people (EventPersons) in that event. Is this possible to do in a single Django query?
I know I could write python code like this to check, but I'm hoping for something more efficient:
def event_exists(eventname,person1name,person2name):
foundit=False
for evt in Event.objects.filter(name=eventname):
evtperson_names = [obj.person_name in evt.event_persons.all()]
if len(evtperson_names) == 2 and person1name in evtperson_names and person2name in evtperson_names:
foundit=True
break
return foundit
Or would it be better to refactor the models so that Event has person1name and person2name as its own fields like this:
class Event(models.Model):
name = models.CharField()
person1name = models.CharField()
person2name = models.CharField()
The problem with this is that there is no natural ordering for person1 and person2, ie if the persons are "Bob" and "Sally" then we could have person1name="Bob" and person2name="Sally" or we could have person1name="Sally" and person2name="Bob".
Suggestions?
You can query for EventPerson objects where the event name is as given instead, use the values_list to extract the person_name field, and convert the returning list of values to a set for an unordered comparison:
def event_exists(eventname, person1name, person2name):
return set(EventPerson.objects.filter(event__name=eventname).values_list(
'person_name', flat=True)) == {person1name, person2name}
I modified #blhsing answer slightly adding a filter on names.
def event_exists(eventname, person1name, person2name):
event_people = EventPerson.objects.select_related('event').filter(person_name__in=[person1name, person2name], event__name=eventname)
return set(event_people.values_list('person_name', flat=True)) person1name, person2name}
I would suggest passing EventPerson objects or theird ids to this function instead of just names, would make filtering easier (you wouldn't need a set and filter straight by ids) and more efficient (by using db indices ... or you would have to index person_name as well)

Django query optimization for 3 related tables

I have 4 models:
class Run(models.Model):
start_time = models.DateTimeField(db_index=True)
end_time = models.DateTimeField()
chamber = models.ForeignKey(Chamber, on_delete=models.CASCADE)
recipe = models.ForeignKey(Recipe, default=None, blank=True, null=True, on_delete=models.CASCADE)
class RunProperty(models.Model):
run = models.ForeignKey(Run, on_delete=models.CASCADE)
property_name = models.CharField(max_length=50)
property_value = models.CharField(max_length=500)
class RunValue(models.Model):
run = models.ForeignKey(Run, on_delete=models.CASCADE)
run_parameter = models.ForeignKey(RunParameter, on_delete=models.CASCADE)
value = models.FloatField(default=0)
class RunParameter(models.Model):
parameter = models.ForeignKey(Parameter, on_delete=models.CASCADE)
chamber = models.ForeignKey(Chamber, on_delete=models.CASCADE)
param_name_user_defined = models.BooleanField(default=True)
A Run can have any number of RunProperty (usually user defined properties, can be custom), and a few predefined RunValue (such as Average Voltage, Minimum Voltage, Maximum Voltage) that are numeric values.
The RunParameter is basically just a container of parameter names (Voltage, Current, Frequency, Temperature, Impedance, Oscillation, Variability, etc, there's a ton of them.
When I build a front end table to show each Run along with all of its "File" RunProperty (where the Run came from) and all of its "Voltage" RunValue, I first query the DB for all Run objects, then do an additional 3 queries for the Min/Max/Avg, and then another query for the File, then I build a dict on the backend to pass to the front to build the table rows:
runs = Run.objects.filter(chamber__in=chambers)
min_v_run_values = RunValue.objects.filter(run__in=runs, run_parameter__parameter__parameter_name__icontains="Minimum Voltage")
max_v_run_values = RunValue.objects.filter(run__in=runs, run_parameter__parameter__parameter_name__icontains="Maximum Voltage")
avg_v_run_values = RunValue.objects.filter(run__in=runs, run_parameter__parameter__parameter_name__icontains="Average Voltage")
run_files = RunProperty.objects.filter(run__in=runs, property_name="File")
This is not such a big problem for customer with ~10 to 30 Run objects in their database, but we have one heavy usage customer who has 3500 Run instances. Needless to say, it's far, far too slow. I'm doing 5 queries to get all the needed instances, and then I have to loop and put them together into one dict. It takes upwards of 45 seconds to do this for that one customer (and about 8 or 10 for most other customers).
Is there a way that I can query my DB for all Run objects along with all of the Min/Max/Avg Voltage RunValue and the File RunProperty and return, say, a list of dicts, one for each Run along with the other objects?
I think Q queries can be used here, but I'm not quite sure HOW to use them, or if they are applicable for this scenario?
I tried this (but didn't get far):
runs = Run.objects.filter(chamber__in=chambers)
v_query = Q(run_parameter__parameter__parameter_name__icontains="Voltage")
run_values = RunValue.objects.filter(run__in=runs).filter(v_query)
run_files = RunProperty.objects.filter(run__in=runs, property_name="File")
That gets me all the RunValue related objects in 1 query, but it's still 3 queries per. I need to optimize this much more, if possible.
I am looking for something along the lines of:
runs = Run.objects.filter(chamber__in=chambers)
.annotate(Q(run__runvalue__run_parameter__parameter__parameter_name__icontains="Voltage")
& Q(run__runproperty__property_name__icontains="File"))
I think in very broad terms (not even pseudocode) I would need a query like:
"Get all Runs, and for each Run, get all the RunValue objects related to that Run that contain ["Average", "Maximum", "Minimum"] and also all the RunProperty objects for that Run that contain "File".
I don't know if it's possible (sounds like it should be), and I'm not sure whether I should use Q filtering, aggregates or annotation. In broad terms, I need to get all instances of one model, along with all foreign keys for each instance, in one query, if possible
Example:
I have table Run with 2 instances:
R1
R2
Each Run instance has an associated RunProperty instance "File" (just a string) for each:
R1_run.dat
R2_run.dat
EachRun instance has many RunValue instances (I am using Voltage as an example, but there's 26 of them):
R1_max_v
R1_min_v
R1_avg_v
R2_max_v
R2_min_v
R2_avg_v
I would need to query the DB such that it returns (list or dict, I can work around either):
[{R1, R1_run.dat, R1_max_v, R1_min_v, R1_avg_v},
{R2, R2_run.dat, R2_max_v, R2_min_v, R2_avg_v}]
Or a 2D array even:
[[R1, R1_run.dat, R1_max_v, R1_min_v, R1_avg_v],
[R2, R2_run.dat, R2_max_v, R2_min_v, R2_avg_v]]
Is this even possible?
From database perspective, you can get all the data you need using just a single query with a few joins:
-- This assumes that there is a primary key Run.id and
-- foreign keys RunValue.run_id and RunProperty.run_id.
-- IDs or names of min/max/avg run parameters, as well as
-- chamber ids are replaced with *_PARAMETER and CHAMBER_IDS
-- for brevity.
SELECT Run.*,
RVmin.value AS min_value,
RVmax.value AS max_value,
RVavg.value AS avg_value,
RP.value AS file_value
FROM Run
JOIN RunValue RVmin ON Run.id = RVmin.run_id
JOIN RunValue RVmax ON Run.id = RVmax.run_id
JOIN RunValue RVavg ON Run.id = RVavg.run_id
JOIN RunProperty RP ON Run.id = RP.run_id
WHERE
RVmin.run_parameter = MIN_PARAMETER AND
RVmax.run_parameter = MAX_PARAMETER AND
RVavg.run_parameter = AVG_PARAMETER AND
RP.property_name = 'File' AND
Run.chamber IN (CHAMBER_IDS);
Django way of building such joins must be something like Run.runvalue_set.filter(run_parameter__contains 'Maximum Voltage')
See "following relationships backward": https://docs.djangoproject.com/en/2.2/topics/db/queries/#following-relationships-backward
You can get this in query by using annotate, Min, Max, Avg.
For your problem. You can do this.
Add related name in ForeignKey fields.
class RunProperty(models.Model):
run = models.ForeignKey(Run, on_delete=models.CASCADE, related_name="run_prop_name")
class RunValue(models.Model):
run = models.ForeignKey(Run, on_delete=models.CASCADE, related_name="run_value_name")
run_parameter = models.ForeignKey(RunParameter, on_delete=models.CASCADE)
value = models.FloatField(default=0)
views.py
from django.db.models import Avg, Max, Min
filt = 'run_value_name__value'
query = Run.objects.annotate(run_avg = Avg(filt), run_max = Max(filt))
You can get all values:
for i in query:
print(i.run_avg, i.run_max, i.run_min )
-----------Edit------------
Please check I have added "related_name" in RunValue model.
let's assume you two values in Run model.
1) run_1
2) run_2
in model RunValue, 6 entries.
run = 1, run_parameter = "Avg_value", value = 50
run = 1, run_parameter = "Min_value", value = 25
run = 1, run_parameter = "Max_value", value = 75
run = 2, run_parameter = "Avg_value", value = 28
run = 2, run_parameter = "Max_value", value = 40
run = 2, run_parameter = "Min_value", value = 16
you want dictionary something like this:
{'run_1': {'Avg_value': 50, 'Min_value': 25, 'Max_value': 75}, 'run_2': {...}}
Do this remember to read select_related and prefetch_related for documentation.
rt = Rub.objects.all().prefetch_related('run_value_name')
s = {} # output dictionary
for i in rt:
s[i.run] = {} # run dictionary
for j in i.run_value_name.all():
s[i.run].update({j.run_parameter: j.value}) # update run dictionary
print(s)
----------Addition-----------
Check number of database hit by this code.
from django.db import connection, reset_queries
print(len(connection.queries))
reset_queries()

Filter objects in Django 2 if all objects in subquery have specific value

I have the follow models:
class FactoryDevice(models.Model)
...
class InspectionRegister(models.Model)
factory_device = models.ForeignKey(FactoryDevice)
inspection_date = models.DateTimeField()
status = models.CharField(choices=choices.STATUS)
This is the scenario:
In a factory, every week devices are inspected.
I want filter only FactoryDevices that the last five related InspectionRegisters have status as choices.REPPROVED. If one of the last five InspectionRegister in a FactoryDevice not has status as choices.REPPROVED so this FactoryDevice must not be in the results.
First off, I would define a related_name for your reverse relationship to make your life easier:
factory_device = models.ForeignKey(FactoryDevice, related_name='inspections')
Then something like this could work:
queryset = FactoryDevice.objects
.prefetch_related(Prefetch(
'inspections', # your related name
InspectionRegister.objects.order_by('-inspection_date')[:5].filter(status=choices.REPPROVED),
to_attr='failed_inspections'
)
.annotate(failed_count=Count('failed_inspections'))
)
.filter(failed_count__gte=5)

I need to query for a set of objects whose primary keys are contained inside of a list

As the title says, I need a way to perform this query. I have tried the following:
user_list_ids = []
user_lists = []
user_entries = OwnerEntry.objects.filter(name=request.user)
for user in user_entries:
user_list_ids.append(user.list_id)
user_lists = ListEntry.objects.filter(id__in=user_list_ids)
for user in user_entries:
user_list_ids.append(user.list_id)
user_lists = ListEntry.objects.filter(id__in=user_list_ids)
However, I get an error on the last line: int() argument must be a string or a number, not 'ListEntry'
Here are the relevant models:
class OwnerEntry(models.Model):
name = models.CharField(max_length=32)
list_id = models.ForeignKey(ListEntry)
class Meta:
ordering = ('name',)
class ListEntry(models.Model):
name = models.CharField(max_length=64)
# active_date = models.DateTimeField('date of last list activity')
expire_date = models.DateField('date of expiration')
create_date = models.DateField('date created')
to answer your question directly, please note that you have a list_id rather than list as a ForeignKey name (OwnerEntry model). In order to actually extract the fk value, you should use list_id_id instead (or rename list_id to list ;))
Please also note that django supports object references, like so:
someowner = OwnerEntry.objects.get( ... )
ownerslist = someowner.listentry_set.all()
cheers!
You can define OwnerEntry's foreign key to ListEntry as :
list_id = models.ForeignKey(ListEntry, related_query_name='owner_entry')
and then do this one-liner in your code:
user_lists = ListEntry.objects.filter(owner_entry__name=request.user)
What this does is exactly filter every ListEntry which has at least one owner_entry whose name is equal to request.user's.
The redefinition of the foreign key is just for the sake of giving a nice name to the query attribute.
For more details on queries that work with backward relationships: https://docs.djangoproject.com/en/dev/topics/db/queries/#lookups-that-span-relationships

Categories

Resources