Django annotate value based on another model field - python

I have these two models, Cases and Specialties, just like this:
class Case(models.Model):
...
judge = models.CharField()
....
class Specialty(models.Model):
name = models.CharField()
sys_num = models.IntegerField()
I know this sounds like a really weird structure but try to bare with me:
The field judge in the Case model refer to a Specialty instance sys_num value (judge is a charfield but it will always carries an integer) (each Specialty instance has a unique sys_num). So I can get the Specialty name related to a specific Case instance using something like this:
my_pk = #some number here...
my_case_judge = Case.objects.get(pk=my_pk).judge
my_specialty_name = Specialty.objects.get(sys_num=my_case_judge)
I know this sounds really weird but I can't change the underlying schemma of the tables, just work around it with sql and Django's orm.
My problem is: I want to annotate the Specialty names in a queryset of Cases that have already called values().
I only managed to get it working using Case and When but it's not dynamic. If I add more Specialty instances I'll have to manually alter the code.
cases.annotate(
specialty=Case(
When(judge=0, then=Value('name 0 goes here')),
When(judge=1, then=Value('name 1 goes here')),
When(judge=2, then=Value('name 2 goes here')),
When(judge=3, then=Value('name 3 goes here')),
...
Can this be done dynamically? I look trough django's query reference docs but couldn't produce a working solution with the tools specified there.

You can do this with a subquery expression:
from django.db.models import OuterRef, Subquery
Case.objects.annotate(
specialty=Subquery(
Specialty.objects.filter(sys_num=OuterRef('judge')).values('name')[:1]
)
)
For some databases, casting might even be necessary:
from django.db.models import IntegerField, OuterRef, Subquery
from django.db.models.functions import Cast
Case.objects.annotate(
specialty=Subquery(
Specialty.objects.filter(sys_num=Cast(
OuterRef('judge'),
output_field=IntegerField()
)).values('name')[:1]
)
)
But the modeling is very bad. Usually it is better to work with a ForeignKey, this will guarantee that the judge can only point to a valid case (so referential integrity), will create indexes on the fields, and it will also make the Django ORM more effective since it allows more advanced querying with (relativily) small queries.

Related

Django-native/ORM based approach to self join

Trying to set up a Django-native query that grabs all rows/relationships when it shows up on the other side of many-to-many relationship.
I can explain with an example:
# Django models
class Ingredient:
id = ...
name = ...
...
class Dish:
id = ...
name = ...
...
class IngredientToDish
# this is a many to many relationship
ingredient_id = models.ForeignKey("Ingredient", ...)
dish_id = models.ForeignKey("Dish", ...)
...
I'd like a Django-native way of: "For each dish that uses tomato, find all the ingredients that it uses".
Looking for a list of rows that looks like:
(cheese_id, pizza_id)
(sausage_id, pizza_id)
(tomato_id, pizza_id)
(cucumber_id, salad_id)
(tomato_id, salad_id)
I'd like to keep it in one DB query, for optimization. In SQL this would be a simple JOIN with itself (IngredientToDish table), but couldn't find what the conventional approach with Django would be... Likely uses some form of select_related but haven't been able to make it work; I think part of the reason is that I haven't been able to succinctly express the problem in words to come across the right documentation during research.
You can .filter(…) [Django-doc] with:
Ingredient.objects.filter(
ingredienttodish__dish_id__ingredienttodish__ingredient_id__name='Tomato'
)
You can also add the primary key of the dish for which this holds with:
from django.db.models import F
Ingredient.objects.filter(
ingredienttodish__dish_id__ingredienttodish__ingredient_id__name='Tomato'
).annotate(
dish_id=F('ingredienttodish__dish_id')
)
The Ingredient objects that arise from this QuerySet will have an extra attribute dish_id that contains the primary key of the Dish for which these were used.
Note: Normally one does not add a suffix …_id to a ForeignKey field, since Django
will automatically add a "twin" field with an …_id suffix. Therefore it should
be dish, instead of dish_id.

Django: Unique ID's across tables

I was curious if there was a good solution to implement unique ID's between tables.
class Voice(models.Model):
id = .. <------|
slug = ... |
name = .... |-- No duplicate IDs
|
class Group(models.Model): |
id = .. <------|
slug = ...
name = ....
My hope is that when I get an ID in a view, selecting from one model will give me None but the other will always return the object (and vice versa). If there is a better approach feel free to share. Right now I am using the slug+id as query filters but would like to move away from that.
I'd worry less about the unique ids and consider the data model relationships. From what you're saying, it sounds like there's a commonality between the two and that model can have a voice, group or both associated with it.
class NewCommonModel(models.Model):
# common fields go here.
class Voice(models.Model):
new_common_model = models.OneToOneField(NewCommonModel, on_delete=models.CASCADE)
# voice specific fields
class Group(models.Model):
new_common_model = models.OneToOneField(NewCommonModel, on_delete=models.CASCADE)
# group specific fields
Define id as an IntegerField instead of auto. Voice always has even numbers as id and Group always odd. This way you will even know in advance in which model you should look for
I would recommend the use of a uuid as your primary key. Solves your unique problem, obfuscates your pk, is unique across the universe, and is built into django as well.
Since you mentioned slug, there are ways to have a unique slug per model where you'd never need to make a composite key with the pk. Or, you can include the slug in the url for cosmetic reasons and just filter in your view for only the pk, which should always be unique.
But ordinarily, having the same pk in two models shouldn't really ever be an issue, and without knowing more, I would be concerned you're doing something odd.

Get all values from Django QuerySet plus additional fields from a related model

I was wondering if there is a shortcut to getting all fields from a Django model and only defining additional fields that are retrieved through a join (or multiple joins).
Consider models like the following:
class A(models.Model):
text = models.CharField(max_length=10, blank=True)
class B(models.Model):
a = models.ForeignKey(A, null=True, on_delete=models.CASCADE)
y = models.PositiveIntegerField(null=True)
Now I can use the values() function like this
B.objects.values('y', 'a__text')
to get tuples containing the specified values from the B model and the actual field from the A model. If I only use
B.objects.values()
I only get tuples containing fields from the B model (i.e., y and the foreign key id a). Let's assume a scenario where B and A have many fields, and I am interested in all of those belonging to B but only in a single field from A. Manually specifying all the field names in the values() call would be possible, but tedious and error-prone.
So is there a way to specify that I want all local fields, but only a (few) specific joined field(s)?
Note: I'm currently using Django 1.11, but if a solution only works with a more recent version I am interested in that too.
You can use prefetch_related for this. See docs:
You want to use performance optimization techniques like deferred
fields:
queryset = Pizza.objects.only('name')
restaurants = Restaurant.objects.prefetch_related(Prefetch('best_pizza', queryset=queryset))
In your case you can do something like this:
from django.db.models import Prefetch
queryset = A.objects.only('text')
b_list = B.objects.prefetch_related(Prefetch('a', queryset=queryset))
Maybe something like this would work in your case?
B.objects.select_related('a').defer('a__field_to_lazy_load');
This will load all fields from both models except the ones you specify in defer(), where you can use the usual Django double underscore convention to traverse the relationship.
The fields you specify in defer() won't be loaded from the db but they will be if you try to access them later on (e.g. in a template).

ProgrammingError: column "product" is of type product[] but expression is of type text[] enum postgres

I would like to save array of enums.
I have the following:
CREATE TABLE public.campaign
(
id integer NOT NULL,
product product[]
)
product is an enum.
In Django I defined it like this:
PRODUCT = (
('car', 'car'),
('truck', 'truck')
)
class Campaign(models.Model):
product = ArrayField(models.CharField(null=True, choices=PRODUCT))
However, when I write the following:
campaign = Campaign(id=5, product=["car", "truck"])
campaign.save()
I get the following error:
ProgrammingError: column "product" is of type product[] but expression is of type text[]
LINE 1: ..."product" = ARRAY['car...
Note
I saw this answer, but I don't use sqlalchemy and would rather not use it if not needed.
EDITED
I tried #Roman Konoval suggestion below like this:
class PRODUCT(Enum):
CAR = 'car'
TRUCK = 'truck'
class Campaign(models.Model):
product = ArrayField(EnumField(PRODUCT, max_length=10))
and with:
campaign = Campaign(id=5, product=[CAR, TRUCK])
campaign.save()
However, I still get the same error,
I see that django is translating it to list of strings.
if I write the following directly the the psql console:
INSERT INTO campaign ("product") VALUES ('{car,truck}'::product[])
it works just fine
There are two fundamental problems here.
Don't use Enums
If you continue to use enum, your next question here on Stackoverflow will be "how do I add a new entry to an enum?". Django does not support enum type out of the box (thank heavens). So you have to use third party libraries for this. Your mileage will vary with how complete the library is.
An enum value occupies four bytes on disk. The length of an enum
value's textual label is limited by the NAMEDATALEN setting compiled
into PostgreSQL; in standard builds this means at most 63 bytes.
If you are thinking that you are saving space on disk by using enum, the above quote from the manual shows that it's an illusion.
See this Q&A for more on advantages and disadvantages of enum. But generally the disadvantages outweigh the advantages.
Don't use Arrays
Tip: Arrays are not sets; searching for specific array elements can be
a sign of database misdesign. Consider using a separate table with a
row for each item that would be an array element. This will be easier
to search, and is likely to scale better for a large number of
elements.
Source: https://www.postgresql.org/docs/9.6/static/arrays.html
If you are going to search for a campaign that deals with Cars or Trucks you are going to have to do a lot of hard work. So will the database.
The correct design
The correct design is the one suggested in the postgresql arrays documentation page. Create a related table. This is the standard django way as well.
class Campaign(models.Model):
name = models.CharField(max_length=20)
class Product(Models.model):
name = models.CharField(max_length=20)
campaign = models.ForeignKey(Campaign)
This makes your code simpler. Doesn't require any extra storage. Doesn't require third party libraries. And best of all the vast api of the django related models becomes available to you.
The definition of product field is incorrect as it specifies that it is array of CharFields but it is array of enums in reality. Django does not support enum type now so you can try this extension to define the type correctly:
class Product(Enum):
ProductA = 'a'
...
class Campaign(models.Model):
product = ArrayField(EnumField(Product, max_length=<whatever>))
Try this:
def django2psql(s):
return '{'+','.join(s) + '}'
campaign = Campaign(id=5, product=django2psql(["car", "truck"]))
I think you may have to subclass CharField to get it to report the correct db_type. There may be more problems than this but you can give this a try:
class Product(models.CharField):
def db_type(self, connection):
return 'product'
PRODUCT = (
('car', 'car'),
('truck', 'truck')
)
class Campaign(models.Model):
product = ArrayField(Product(null=True, choices=PRODUCT))

Error saving django model with OneToOne field - Column specified twice

This question has been asked before, but the answers there do not solve my problem.
I am using a legacy database, nothing can be changed
Here are my django models, with all but the relevant fields stripped off, obviously class meta has Managed=False in my actual code:
class AppCosts(models.Model):
id = models.CharField(primary_key=True)
cost = models.DecimalField()
class AppDefs(models.Model):
id = models.CharField(primary_key=True)
data = models.TextField()
appcost = models.OneToOneField(AppCosts, db_column='id')
class JobHistory(models.Model):
job_name = models.CharField(primary_key=True)
job_application = models.CharField()
appcost = models.OneToOneField(AppCosts, to_field='id', db_column='job_application')
app = models.OneToOneField(AppDefs, to_field='id', db_column='job_application')
The OneToOne fields work fine for querying, and I get the correct result using select_related()
But when I create a new record for the JobHistory table, when I call save(), I get:
DatabaseError: (1110, "Column 'job_application' specified twice")
I am using django 1.4 and I do not quite get how this OneToOneField works. I can't find any example where primary keys are named differently and has this particular semantics
I need the django model that would let me do this SQL:
select job_history.job_name, job_history.job_application, app_costs.cost from job_history, app_costs where job_history.job_application = app_costs.id;
You have defined appcost and app to have the same underlying database column, job_application, which is also the name of another existing field: so three fields share the same column. That makes no sense at all.
OneToOneFields are just foreign keys constrained to a single value on both ends. If you have foreign keys from JobHistory to AppCost and AppDef, then presumably you have actual columns in your database that contain those foreign keys. Those are the values you should be using for db_field for those fields, not "job_application".
Edit I'm glad you said you didn't design this schema, because it is pretty horrible: you won't have any foreign key constraints, for example, which makes referential integrity impossible. But never mind, we can actually achieve what you want, more or less.
There are various issues with that you have, but the main one is that you don't need the separate "job_application" field at all. That is, as I said earlier, the foreign key, so let it be that. Also note it should be an actual foreign key field, not a one-to-one, since there are many histories to one app.
One constraint that we can't achieve easily in Django is to have the same field acting as FK for two tables. But that doesn't really matter, since we can get to AppCosts via AppDefs.
So the models could just look like this:
class AppCosts(models.Model):
app = models.OneToOneField('AppDefs', primary_key=True, db_field='id')
cost = models.DecimalField()
class AppDefs(models.Model):
id = models.CharField(primary_key=True)
data = models.TextField()
class JobHistory(models.Model):
job_name = models.CharField(primary_key=True)
app = models.ForeignKey(AppDefs, db_column='job_application')
Note that I've moved the one-to-one between Costs and Defs onto AppCosts, since it seems to make sense to have the canonical ID in Defs.
Now, given a JobHistory instance, you can do history.app to get the app instance, history.app.cost to get the app cost, and use the history.app_id to get the underlying app ID from the job_application column.
If you wanted to reproduce that SQL output more exactly, something like this would now work:
JobHistory.objects.values_list('job_name', 'app_id', 'app__appcosts__cost')

Categories

Resources