I'm trying to implement a simple triplestore using Django's ORM. I'd like to be able to search for arbitrarily complex triple patterns (e.g. as you would with SparQL).
To do this, I'm attempting to use the .extra() method. However, even though the docs mention it can, in theory, handle duplicate references to the same table by automatically creating an alias for the duplicate table references, I've found it does not do this in practice.
For example, say I have the following model in my "triple" app:
class Triple(models.Model):
subject = models.CharField(max_length=100)
predicate = models.CharField(max_length=100)
object = models.CharField(max_length=100)
and I have the following triples stored in my database:
subject predicate object
bob has-a hat .
bob knows sue .
sue has-a house .
bob knows tom .
Now, say I want to query the names of everyone bob knows who has a house. In SQL, I'd simply do:
SELECT t2.subject AS name
FROM triple_triple t1
INNER JOIN triple_triple t2 ON
t1.subject = 'bob'
AND t1.predicate = 'knows'
AND t1.object = t2.subject
AND t2.predicate = 'has-a'
AND t2.object = 'house'
I'm not completely sure what this would look like with Django's ORM, although I think it would be along the lines of:
q = Triple.objects.filter(subject='bob', predicate='knows')
q = q.extra(tables=['triple_triple'], where=["triple_triple.object=t1.subject AND t1.predicate = 'has-a' AND t1.object = 'house'"])
q.values('t1.subject')
Unfortunately, this fails with the error "DatabaseError: no such column: t1.subject"
Running print q.query shows:
SELECT "triple_triple"."subject" FROM "triple_triple" WHERE ("triple_triple"."subject" = 'bob' AND "triple_triple"."predicate" = 'knows'
AND triple_triple.object = t1.subject AND t1.predicate = 'has-a' AND t1.object = 'house')
which appears to show that the tables param in my call to .extra() is being ignored, as there's no second reference to triple_triple inserted anywhere.
Why is this happening? What's the appropriate way to refer to complex relationships between records in the same table using Django's ORM?
EDIT: I found this useful snippet for including custom SQL via .extra() so that it's usable inside a model manager.
I think what you're missing is the select parameter (for the extra method)
This seems to work:
qs = Triple.objects.filter(subject="bob", predicate="knows").extra(
select={'known': "t1.subject"},
tables=['"triple_triple" AS "t1"'],
where=['''triple_triple.object=t1.subject
AND t1.predicate="has-a" AND t1.object="'''])
qs.values("known")
I've had the same issue where Django escapes (adds back-ticks) to my table names, meaning that I can't add an alias manually; the resulting FROM clause looks like this:
"mytable" AS T100
But at the same time, Django won't automatically create aliases for you if the table is already mentioned; instead it ignores the tables and just adds on the WHERE clauses as if they refer to the original tables.
The documentation for Django 1.8 suggests that .extra() will create aliases for you:
https://docs.djangoproject.com/en/1.8/ref/models/querysets/#django.db.models.query.QuerySet.extra
But this doesn't appear to be the case for my query, possibly because original table is part of a LEFT OUTER JOIN rather than a simple FROM x,y,z clause.
Related
Trying to set up a Django-native query that grabs all rows/relationships when it shows up on the other side of many-to-many relationship.
I can explain with an example:
# Django models
class Ingredient:
id = ...
name = ...
...
class Dish:
id = ...
name = ...
...
class IngredientToDish
# this is a many to many relationship
ingredient_id = models.ForeignKey("Ingredient", ...)
dish_id = models.ForeignKey("Dish", ...)
...
I'd like a Django-native way of: "For each dish that uses tomato, find all the ingredients that it uses".
Looking for a list of rows that looks like:
(cheese_id, pizza_id)
(sausage_id, pizza_id)
(tomato_id, pizza_id)
(cucumber_id, salad_id)
(tomato_id, salad_id)
I'd like to keep it in one DB query, for optimization. In SQL this would be a simple JOIN with itself (IngredientToDish table), but couldn't find what the conventional approach with Django would be... Likely uses some form of select_related but haven't been able to make it work; I think part of the reason is that I haven't been able to succinctly express the problem in words to come across the right documentation during research.
You can .filter(…) [Django-doc] with:
Ingredient.objects.filter(
ingredienttodish__dish_id__ingredienttodish__ingredient_id__name='Tomato'
)
You can also add the primary key of the dish for which this holds with:
from django.db.models import F
Ingredient.objects.filter(
ingredienttodish__dish_id__ingredienttodish__ingredient_id__name='Tomato'
).annotate(
dish_id=F('ingredienttodish__dish_id')
)
The Ingredient objects that arise from this QuerySet will have an extra attribute dish_id that contains the primary key of the Dish for which these were used.
Note: Normally one does not add a suffix …_id to a ForeignKey field, since Django
will automatically add a "twin" field with an …_id suffix. Therefore it should
be dish, instead of dish_id.
I have these two models, Cases and Specialties, just like this:
class Case(models.Model):
...
judge = models.CharField()
....
class Specialty(models.Model):
name = models.CharField()
sys_num = models.IntegerField()
I know this sounds like a really weird structure but try to bare with me:
The field judge in the Case model refer to a Specialty instance sys_num value (judge is a charfield but it will always carries an integer) (each Specialty instance has a unique sys_num). So I can get the Specialty name related to a specific Case instance using something like this:
my_pk = #some number here...
my_case_judge = Case.objects.get(pk=my_pk).judge
my_specialty_name = Specialty.objects.get(sys_num=my_case_judge)
I know this sounds really weird but I can't change the underlying schemma of the tables, just work around it with sql and Django's orm.
My problem is: I want to annotate the Specialty names in a queryset of Cases that have already called values().
I only managed to get it working using Case and When but it's not dynamic. If I add more Specialty instances I'll have to manually alter the code.
cases.annotate(
specialty=Case(
When(judge=0, then=Value('name 0 goes here')),
When(judge=1, then=Value('name 1 goes here')),
When(judge=2, then=Value('name 2 goes here')),
When(judge=3, then=Value('name 3 goes here')),
...
Can this be done dynamically? I look trough django's query reference docs but couldn't produce a working solution with the tools specified there.
You can do this with a subquery expression:
from django.db.models import OuterRef, Subquery
Case.objects.annotate(
specialty=Subquery(
Specialty.objects.filter(sys_num=OuterRef('judge')).values('name')[:1]
)
)
For some databases, casting might even be necessary:
from django.db.models import IntegerField, OuterRef, Subquery
from django.db.models.functions import Cast
Case.objects.annotate(
specialty=Subquery(
Specialty.objects.filter(sys_num=Cast(
OuterRef('judge'),
output_field=IntegerField()
)).values('name')[:1]
)
)
But the modeling is very bad. Usually it is better to work with a ForeignKey, this will guarantee that the judge can only point to a valid case (so referential integrity), will create indexes on the fields, and it will also make the Django ORM more effective since it allows more advanced querying with (relativily) small queries.
A short intoduction to the problem...
PostgreSQL has very neat array fields (int array, string array) and functions for them like UNNEST and ANY.
These fields are supported by Django (I am using djorm_pgarray for that), but functions are not natively supported.
One could use .extra(), but Django 1.8 introduced a new concept of database functions.
Let me provide a most primitive example of what I am basicly doing with all these. A Dealer has a list of makes that it supports. A Vehicle has a make and is linked to a dealer. But it happens that Vehicle's make does not match Dealer's make list, that is inevitable.
MAKE_CHOICES = [('honda', 'Honda'), ...]
class Dealer(models.Model):
make_list = TextArrayField(choices=MAKE_CHOICES)
class Vehicle(models.Model):
dealer = models.ForeignKey(Dealer, null=True, blank=True)
make = models.CharField(max_length=255, choices=MAKE_CHOICES, blank=True)
Having a database of dealers and makes, I want to count all vehicles for which the vehicle's make and its dealer's make list do match. That's how I do it avoiding .extra().
from django.db.models import functions
class SelectUnnest(functions.Func):
function = 'SELECT UNNEST'
...
Vehicle.objects.filter(
make__in=SelectUnnest('dealer__make_list')
).count()
Resulting SQL:
SELECT COUNT(*) AS "__count" FROM "myapp_vehicle"
INNER JOIN "myapp_dealer"
ON ( "myapp_vehicle"."dealer_id" = "myapp_dealer"."id" )
WHERE "myapp_vehicle"."make"
IN (SELECT UNNEST("myapp_dealer"."make_list"))
And it works, and much faster than a traditional M2M approach we could use in Django. BUT, for this task, UNNEST is not a very good solution: ANY is much faster. Let's try it.
class Any(functions.Func):
function = 'ANY'
...
Vehicle.objects.filter(
make=Any('dealer__make_list')
).count()
It generates the following SQL:
SELECT COUNT(*) AS "__count" FROM "myapp_vehicle"
INNER JOIN "myapp_dealer"
ON ( "myapp_vehicle"."dealer_id" = "myapp_dealer"."id" )
WHERE "myapp_vehicle"."make" =
(ANY("myapp_dealer"."make_list"))
And it fails, because braces around ANY are bogus. If you remove them, it runs in the psql console with no problems, and fast.
So my question.
Is there any way to remove these braces? I could not find anything about that in Django documentation.
If not, - maybe there are other ways to rephrase this query?
P. S. I think that an extensive library of database functions for different backends would be very helpful for database-heavy Django apps.
Of course, most of these will not be portable. But you typically do not often migrate such a project from one database backend to another. In our example, using array fields and PostGIS we are stuck to PostgreSQL and do not intend to move.
Is anybody developing such a thing?
P. P. S. One might say that, in this case, we should be using a separate table for makes and intarray instead of string array, that is correct and will be done, but nature of problem does not change.
UPDATE.
TextArrayField is defined at djorm_pgarray. At the linked source file, you can see how it works.
The value is list of text strings. In Python, it is represented as a list. Example: ['honda', 'mazda', 'anything else'].
Here is what is said about it in the database.
=# select id, make from appname_tablename limit 3;
id | make
---+----------------------
58 | {vw}
76 | {lexus,scion,toyota}
39 | {chevrolet}
And underlying PostgreSQL field type is text[].
I've managed to get (more or less) what you need using following:
from django.db.models.lookups import BuiltinLookup
from django.db.models.fields import Field
class Any(BuiltinLookup):
lookup_name = 'any'
def get_rhs_op(self, connection, rhs):
return " = ANY(%s)" % (rhs,)
Field.register_lookup(Any)
and query:
Vehicle.objects.filter(make__any=F('dealer__make_list')).count()
as result:
SELECT COUNT(*) AS "__count" FROM "zz_vehicle"
INNER JOIN "zz_dealer" ON ("zz_vehicle"."dealer_id" = "zz_dealer"."id")
WHERE "zz_vehicle"."make" = ANY(("zz_dealer"."make_list"))
btw. instead djorm_pgarray and TextArrayField you can use native django:
make_list = ArrayField(models.CharField(max_length=200), blank=True)
(to simplify your dependencies)
I haven't been able to find an answer to this, but I'm sure it must be somewhere.
My question is similar to this question: sqlalchemy: how to join several tables by one query?
But I need a query result, not a tuple. I don't have access to the models, so I can't change it, and I can't modify the functions to use a tuple.
I have two tables, UserInformation and MemberInformation, both with a foreign key and relationship to Principal, but not to each other.
How can I get all the records and columns from both tables in one query?
I've tried:
query = DBSession.query(MemberInformation).join(UserInformation, MemberInformation.pId == UserInformation.pId)
but it only returns the columns of MemberInformation
and:
query = DBSession.query(MemberInformation, UserInformation).join(UserInformation, MemberInformation.pId == UserInformation.pId)
but that returns a tuple.
What am I missing here?
Old question, but worth answering because i see it's got a lot of view activity.
You need to create a relationship and then tell SQLAlchemy how to load the related data. Not sure what your tables / relationship looks like, but it might look something like this:
# Create relationship
MemberInformation.user = relationship(
"UserInformation",
foreign_keys=[MemberInformation.pId],
lazy="joined",
)
# Execute query
query = DBSession.query(MemberInformation) \
.options(joinedload(MemberInformation.user)) \
.all()
# All objects are in memory. Evaluating the following will NOT result in additional
# database interaction
for member in query:
print(f'Member: {member} User: {member.user}')
# member is a MemberInformation object, member.user is a UserInformation object
Ideally, the relationship would be defined in your models. If can, however, be defined at run time list the example above.
Only way I found to do this is to use statement instead of query:
stmt = select([table1, table2.col.label('table2_col')]).select_from(join(table1, table2, table1.t1_id == table2.t2_id))
obj = session.execute(stmt).fetchall()
After building a few application on the gae platform I usually use some relationship between different models in the datastore in basically every application. And often I find my self the need to see what record is of the same parent (like matching all entry with same parent)
From the beginning I used the db.ReferenceProperty to get my relations going, like:
class Foo(db.Model):
name = db.StringProperty()
class Bar(db.Model):
name = db.StringProperty()
parentFoo = db.ReferanceProperty(Foo)
fooKey = someFooKeyFromSomePlace
bars = Bar.all()
for bar in bar:
if bar.parentFoo.key() == fooKey:
// do stuff
But lately I've abandoned this approch since the bar.parentFoo.key() makes a sub query to fetch Foo each time. The approach I now use is to store each Foo key as a string on Bar.parentFoo and this way I can string compare this with someFooKeyFromSomePlace and get rid of all the subquery overhead.
Now I've started to look at Entity groups and wondering if this is even a better way to go? I can't really figure out how to use them.
And as for the two approaches above I'm wondering is there any downsides to using them? Could using stored key string comeback and bit me in the * * *. And last but not least is there a faster way to do this?
Tip:
replace...
bar.parentFoo.key() == fooKey
with...
Bar.parentFoo.get_value_for_datastore(bar) == fooKey
To avoid the extra lookup and just fetch the key from the ReferenceProperty
See Property Class
I think you should consider this as well. This will help you fetch all the child entities of a single parent.
bmw = Car(brand="BMW")
bmw.put()
lf = Wheel(parent=bmw,position="left_front")
lf.put()
lb = Wheel(parent=bmw,position="left_back")
lb.put()
bmwWheels = Wheel.all().ancestor(bmw)
For more reference in modeling. you can refer this Appengine Data modeling
I'm not sure what you're trying to do with that example block of code, but I get the feeling it could be accomplished with:
bars = Bar.all().filter("parentFoo " = SomeFoo)
As for entity groups, they are mainly used if you want to alter multiple things in transactions, since appengine restricts that to entities within the same group only; in addition, appengine allows ancestor filters ( http://code.google.com/appengine/docs/python/datastore/queryclass.html#Query_ancestor ), which could be useful depending on what it is you need to do. With the code above, you could very easily also use an ancestor query if you set the parent of Bar to be a Foo.
If your purposes still require a lot of "subquerying" as you put it, there is a neat prefetch pattern that Nick Johnson outlines here: http://blog.notdot.net/2010/01/ReferenceProperty-prefetching-in-App-Engine which basically fetches all the properties you need in your entity set as one giant get instead of a bunch of tiny ones, which gets rid of a lot of the overhead. However do note his warnings, especially regarding altering the properties of entities while using this prefetch method.
Not very specific, but that's all the info I can give you until you be more specific about exactly what you're trying to do here.
When you design your modules you also need to consider whether you want to be able to save this within a transaction. However only do this if you need to use transactions.
An alternative approach is to assign the parent like so:
from google.appengine.ext import db
class Foo(db.Model):
name = db.StringProperty()
class Bar(db.Model):
name = db.StringProperty()
def _save_entities( foo_name, bar_name ):
"""Save the model data"""
foo_item = Foo( name = foo_name )
foo_item.put()
bar_item = Bar( parent = foo_item, name = bar_name )
bar_item.put()
def main():
# Run the save in a transaction, if any fail this should all roll back
db.run_in_transaction( _save_transaction, "foo name", "bar name" )
# to query the model data using the ancestor relationship
for item in bar_item.gql("WHERE ANCESTOR IS :ancestor", ancestor = foo_item.key()).fetch(1000):
# do stuff