Django-native/ORM based approach to self join - python

Trying to set up a Django-native query that grabs all rows/relationships when it shows up on the other side of many-to-many relationship.
I can explain with an example:
# Django models
class Ingredient:
id = ...
name = ...
...
class Dish:
id = ...
name = ...
...
class IngredientToDish
# this is a many to many relationship
ingredient_id = models.ForeignKey("Ingredient", ...)
dish_id = models.ForeignKey("Dish", ...)
...
I'd like a Django-native way of: "For each dish that uses tomato, find all the ingredients that it uses".
Looking for a list of rows that looks like:
(cheese_id, pizza_id)
(sausage_id, pizza_id)
(tomato_id, pizza_id)
(cucumber_id, salad_id)
(tomato_id, salad_id)
I'd like to keep it in one DB query, for optimization. In SQL this would be a simple JOIN with itself (IngredientToDish table), but couldn't find what the conventional approach with Django would be... Likely uses some form of select_related but haven't been able to make it work; I think part of the reason is that I haven't been able to succinctly express the problem in words to come across the right documentation during research.

You can .filter(…) [Django-doc] with:
Ingredient.objects.filter(
ingredienttodish__dish_id__ingredienttodish__ingredient_id__name='Tomato'
)
You can also add the primary key of the dish for which this holds with:
from django.db.models import F
Ingredient.objects.filter(
ingredienttodish__dish_id__ingredienttodish__ingredient_id__name='Tomato'
).annotate(
dish_id=F('ingredienttodish__dish_id')
)
The Ingredient objects that arise from this QuerySet will have an extra attribute dish_id that contains the primary key of the Dish for which these were used.
Note: Normally one does not add a suffix …_id to a ForeignKey field, since Django
will automatically add a "twin" field with an …_id suffix. Therefore it should
be dish, instead of dish_id.

Related

Django annotate value based on another model field

I have these two models, Cases and Specialties, just like this:
class Case(models.Model):
...
judge = models.CharField()
....
class Specialty(models.Model):
name = models.CharField()
sys_num = models.IntegerField()
I know this sounds like a really weird structure but try to bare with me:
The field judge in the Case model refer to a Specialty instance sys_num value (judge is a charfield but it will always carries an integer) (each Specialty instance has a unique sys_num). So I can get the Specialty name related to a specific Case instance using something like this:
my_pk = #some number here...
my_case_judge = Case.objects.get(pk=my_pk).judge
my_specialty_name = Specialty.objects.get(sys_num=my_case_judge)
I know this sounds really weird but I can't change the underlying schemma of the tables, just work around it with sql and Django's orm.
My problem is: I want to annotate the Specialty names in a queryset of Cases that have already called values().
I only managed to get it working using Case and When but it's not dynamic. If I add more Specialty instances I'll have to manually alter the code.
cases.annotate(
specialty=Case(
When(judge=0, then=Value('name 0 goes here')),
When(judge=1, then=Value('name 1 goes here')),
When(judge=2, then=Value('name 2 goes here')),
When(judge=3, then=Value('name 3 goes here')),
...
Can this be done dynamically? I look trough django's query reference docs but couldn't produce a working solution with the tools specified there.
You can do this with a subquery expression:
from django.db.models import OuterRef, Subquery
Case.objects.annotate(
specialty=Subquery(
Specialty.objects.filter(sys_num=OuterRef('judge')).values('name')[:1]
)
)
For some databases, casting might even be necessary:
from django.db.models import IntegerField, OuterRef, Subquery
from django.db.models.functions import Cast
Case.objects.annotate(
specialty=Subquery(
Specialty.objects.filter(sys_num=Cast(
OuterRef('judge'),
output_field=IntegerField()
)).values('name')[:1]
)
)
But the modeling is very bad. Usually it is better to work with a ForeignKey, this will guarantee that the judge can only point to a valid case (so referential integrity), will create indexes on the fields, and it will also make the Django ORM more effective since it allows more advanced querying with (relativily) small queries.

Django: Unique ID's across tables

I was curious if there was a good solution to implement unique ID's between tables.
class Voice(models.Model):
id = .. <------|
slug = ... |
name = .... |-- No duplicate IDs
|
class Group(models.Model): |
id = .. <------|
slug = ...
name = ....
My hope is that when I get an ID in a view, selecting from one model will give me None but the other will always return the object (and vice versa). If there is a better approach feel free to share. Right now I am using the slug+id as query filters but would like to move away from that.
I'd worry less about the unique ids and consider the data model relationships. From what you're saying, it sounds like there's a commonality between the two and that model can have a voice, group or both associated with it.
class NewCommonModel(models.Model):
# common fields go here.
class Voice(models.Model):
new_common_model = models.OneToOneField(NewCommonModel, on_delete=models.CASCADE)
# voice specific fields
class Group(models.Model):
new_common_model = models.OneToOneField(NewCommonModel, on_delete=models.CASCADE)
# group specific fields
Define id as an IntegerField instead of auto. Voice always has even numbers as id and Group always odd. This way you will even know in advance in which model you should look for
I would recommend the use of a uuid as your primary key. Solves your unique problem, obfuscates your pk, is unique across the universe, and is built into django as well.
Since you mentioned slug, there are ways to have a unique slug per model where you'd never need to make a composite key with the pk. Or, you can include the slug in the url for cosmetic reasons and just filter in your view for only the pk, which should always be unique.
But ordinarily, having the same pk in two models shouldn't really ever be an issue, and without knowing more, I would be concerned you're doing something odd.

Error saving django model with OneToOne field - Column specified twice

This question has been asked before, but the answers there do not solve my problem.
I am using a legacy database, nothing can be changed
Here are my django models, with all but the relevant fields stripped off, obviously class meta has Managed=False in my actual code:
class AppCosts(models.Model):
id = models.CharField(primary_key=True)
cost = models.DecimalField()
class AppDefs(models.Model):
id = models.CharField(primary_key=True)
data = models.TextField()
appcost = models.OneToOneField(AppCosts, db_column='id')
class JobHistory(models.Model):
job_name = models.CharField(primary_key=True)
job_application = models.CharField()
appcost = models.OneToOneField(AppCosts, to_field='id', db_column='job_application')
app = models.OneToOneField(AppDefs, to_field='id', db_column='job_application')
The OneToOne fields work fine for querying, and I get the correct result using select_related()
But when I create a new record for the JobHistory table, when I call save(), I get:
DatabaseError: (1110, "Column 'job_application' specified twice")
I am using django 1.4 and I do not quite get how this OneToOneField works. I can't find any example where primary keys are named differently and has this particular semantics
I need the django model that would let me do this SQL:
select job_history.job_name, job_history.job_application, app_costs.cost from job_history, app_costs where job_history.job_application = app_costs.id;
You have defined appcost and app to have the same underlying database column, job_application, which is also the name of another existing field: so three fields share the same column. That makes no sense at all.
OneToOneFields are just foreign keys constrained to a single value on both ends. If you have foreign keys from JobHistory to AppCost and AppDef, then presumably you have actual columns in your database that contain those foreign keys. Those are the values you should be using for db_field for those fields, not "job_application".
Edit I'm glad you said you didn't design this schema, because it is pretty horrible: you won't have any foreign key constraints, for example, which makes referential integrity impossible. But never mind, we can actually achieve what you want, more or less.
There are various issues with that you have, but the main one is that you don't need the separate "job_application" field at all. That is, as I said earlier, the foreign key, so let it be that. Also note it should be an actual foreign key field, not a one-to-one, since there are many histories to one app.
One constraint that we can't achieve easily in Django is to have the same field acting as FK for two tables. But that doesn't really matter, since we can get to AppCosts via AppDefs.
So the models could just look like this:
class AppCosts(models.Model):
app = models.OneToOneField('AppDefs', primary_key=True, db_field='id')
cost = models.DecimalField()
class AppDefs(models.Model):
id = models.CharField(primary_key=True)
data = models.TextField()
class JobHistory(models.Model):
job_name = models.CharField(primary_key=True)
app = models.ForeignKey(AppDefs, db_column='job_application')
Note that I've moved the one-to-one between Costs and Defs onto AppCosts, since it seems to make sense to have the canonical ID in Defs.
Now, given a JobHistory instance, you can do history.app to get the app instance, history.app.cost to get the app cost, and use the history.app_id to get the underlying app ID from the job_application column.
If you wanted to reproduce that SQL output more exactly, something like this would now work:
JobHistory.objects.values_list('job_name', 'app_id', 'app__appcosts__cost')

Including Duplicate Tables using Django's ORM Extra()

I'm trying to implement a simple triplestore using Django's ORM. I'd like to be able to search for arbitrarily complex triple patterns (e.g. as you would with SparQL).
To do this, I'm attempting to use the .extra() method. However, even though the docs mention it can, in theory, handle duplicate references to the same table by automatically creating an alias for the duplicate table references, I've found it does not do this in practice.
For example, say I have the following model in my "triple" app:
class Triple(models.Model):
subject = models.CharField(max_length=100)
predicate = models.CharField(max_length=100)
object = models.CharField(max_length=100)
and I have the following triples stored in my database:
subject predicate object
bob has-a hat .
bob knows sue .
sue has-a house .
bob knows tom .
Now, say I want to query the names of everyone bob knows who has a house. In SQL, I'd simply do:
SELECT t2.subject AS name
FROM triple_triple t1
INNER JOIN triple_triple t2 ON
t1.subject = 'bob'
AND t1.predicate = 'knows'
AND t1.object = t2.subject
AND t2.predicate = 'has-a'
AND t2.object = 'house'
I'm not completely sure what this would look like with Django's ORM, although I think it would be along the lines of:
q = Triple.objects.filter(subject='bob', predicate='knows')
q = q.extra(tables=['triple_triple'], where=["triple_triple.object=t1.subject AND t1.predicate = 'has-a' AND t1.object = 'house'"])
q.values('t1.subject')
Unfortunately, this fails with the error "DatabaseError: no such column: t1.subject"
Running print q.query shows:
SELECT "triple_triple"."subject" FROM "triple_triple" WHERE ("triple_triple"."subject" = 'bob' AND "triple_triple"."predicate" = 'knows'
AND triple_triple.object = t1.subject AND t1.predicate = 'has-a' AND t1.object = 'house')
which appears to show that the tables param in my call to .extra() is being ignored, as there's no second reference to triple_triple inserted anywhere.
Why is this happening? What's the appropriate way to refer to complex relationships between records in the same table using Django's ORM?
EDIT: I found this useful snippet for including custom SQL via .extra() so that it's usable inside a model manager.
I think what you're missing is the select parameter (for the extra method)
This seems to work:
qs = Triple.objects.filter(subject="bob", predicate="knows").extra(
select={'known': "t1.subject"},
tables=['"triple_triple" AS "t1"'],
where=['''triple_triple.object=t1.subject
AND t1.predicate="has-a" AND t1.object="'''])
qs.values("known")
I've had the same issue where Django escapes (adds back-ticks) to my table names, meaning that I can't add an alias manually; the resulting FROM clause looks like this:
"mytable" AS T100
But at the same time, Django won't automatically create aliases for you if the table is already mentioned; instead it ignores the tables and just adds on the WHERE clauses as if they refer to the original tables.
The documentation for Django 1.8 suggests that .extra() will create aliases for you:
https://docs.djangoproject.com/en/1.8/ref/models/querysets/#django.db.models.query.QuerySet.extra
But this doesn't appear to be the case for my query, possibly because original table is part of a LEFT OUTER JOIN rather than a simple FROM x,y,z clause.

Django: How to filter Users that belong to a specific group

I'm looking to narrow a query set for a form field that has a foreignkey to the User's table down to the group that a user belongs to.
The groups have been previously associated by me. The model might have something like the following:
myuser = models.ForeignKey(User)
And my ModelForm is very bare bones:
class MyForm(ModelForm):
class Meta:
model = MyModel
So when I instantiate the form I do something like this in my views.py:
form = MyForm()
Now my question is, how can I take the myuser field, and filter it so only users of group 'foo' show up.. something like:
form.fields["myuser"].queryset = ???
The query in SQL looks like this:
mysql> SELECT * from auth_user INNER JOIN auth_user_groups ON auth_user.id = auth_user_groups.user_id INNER JOIN auth_group ON auth_group.id = auth_user_groups.group_id WHERE auth_group.name = 'client';
I'd like to avoid using raw SQL though. Is it possible to do so?
You'll want to use Django's convention for joining across relationships to join to the group table in your query set.
Firstly, I recommend giving your relationship a related_name. This makes the code more readable than what Django generates by default.
class Group(models.Model):
myuser = models.ForeignKey(User, related_name='groups')
If you want only a single group, you can join across that relationship and compare the name field using either of these methods:
form.fields['myuser'].queryset = User.objects.filter(
groups__name='foo')
form.fields['myuser'].queryset = User.objects.filter(
groups__name__in=['foo'])
If you want to qualify multiple groups, use the in clause:
form.fields['myuser'].queryset = User.objects.filter(
groups__name__in=['foo', 'bar'])
If you want to quickly see the generated SQL, you can do this:
qs = User.objects.filter(groups__name='foo')
print qs.query
This is a really old question, but for those googling the answer to this (like I did), please know that the accepted answer is no longer 100% correct. A user can belong to multiple groups, so to correctly check if a user is in some group, you should do:
qs = User.objects.filter(groups__name__in=['foo'])
Of course, if you want to check for multiple groups, you can add those to the list:
qs = User.objects.filter(groups__name__in=['foo', 'bar'])

Categories

Resources