Filtering with joined tables - python

I'm trying to get some query performance improved, but the generated query does not look the way I expect it to.
The results are retrieved using:
query = session.query(SomeModel).
options(joinedload_all('foo.bar')).
options(joinedload_all('foo.baz')).
options(joinedload('quux.other'))
What I want to do is filter on the table joined via 'first', but this way doesn't work:
query = query.filter(FooModel.address == '1.2.3.4')
It results in a clause like this attached to the query:
WHERE foos.address = '1.2.3.4'
Which doesn't do the filtering in a proper way, since the generated joins attach tables foos_1 and foos_2. If I try that query manually but change the filtering clause to:
WHERE foos_1.address = '1.2.3.4' AND foos_2.address = '1.2.3.4'
It works fine. The question is of course - how can I achieve this with sqlalchemy itself?

If you want to filter on joins, you use join():
session.query(SomeModel).join(SomeModel.foos).filter(Foo.something=='bar')
joinedload() and joinedload_all() are used only as a means to load related collections in one pass, not used for filtering/ordering!. Please read:
http://docs.sqlalchemy.org/en/latest/orm/tutorial.html#joined-load - the note on "joinedload() is not a replacement for join()", as well as :
http://docs.sqlalchemy.org/en/latest/orm/loading.html#the-zen-of-eager-loading

Related

FastAPI in-memory filtering

I'm following the tutorial here: https://github.com/Jastor11/phresh-tutorial/tree/tutorial-part-11-marketplace-functionality-in-fastapi/backend/app and I had a question: I want to filter a model by different parameters so how would I do that?
The current situation is that I have a list of doctors and so I get all of them. Then depending on the filter query parameters, I filter doctors. I can't just do it all in one go because these query parameters are optional.
so I was thinking something like (psuedocode):
all_doctors = await self.db.fetch_all(query=GET_ALL_DOCTORS)
if language_id:
all_doctors = all_doctors.filter(d => doctor.language_id = language_id)
if area:
all_doctors = all_doctors.xyzabc
I'm trying out FastAPI according to that tutorial and couldn't figure out how to do this.
I have defined a model file for different models and am using SQLAlchemy.
One way I thought of is just getting the ids of all the doctors then at each filtering step, passing in the doctor ids from the last step and funneling them through different sql queries but this is filtering using the database and would result in one more query per filter parameter. I want to know how to use the ORM to filter in memory.
EDIT: So basically, in the tutorial I was following, no SQLAlchemy models were defined. The tutorial was using SQL statements. Anyways, to answer my own question: I would first need to define SQLAlchemy models before I can use them.
The SQLAlchemy query object (and its operations) returns itself, so you can keep building out the query conditionally inside if-statements:
query = db_session.query(Doctor)
if language_id:
query = query.filter(Doctor.language_id == language_id)
if area_id:
query = query.filter(Doctor.area_id == area_id)
return query.all()
The query doesn't run before you call all at the end. If neither argument is given, you'll get all the doctors.

What is the difference between with_entities and load_only in SQLAlchemy?

When querying my database, I only want to load specified columns. Creating a query with with_entities requires a reference to the model column attribute, while creating a query with load_only requires a string corresponding to the column name. I would prefer to use load_only because it is easier to create a dynamic query using strings. What is the difference between the two?
load_only documentation
with_entities documentation
There are a few differences. The most important one when discarding unwanted columns (as in the question) is that using load_only will still result in creation of an object (a Model instance), while using with_entities will just get you tuples with values of chosen columns.
>>> query = User.query
>>> query.options(load_only('email', 'id')).all()
[<User 1 using e-mail: n#d.com>, <User 2 using e-mail: n#d.org>]
>>> query.with_entities(User.email, User.id).all()
[('n#d.org', 1), ('n#d.com', 2)]
load_only
load_only() defers loading of particular columns from your models.
It removes columns from query. You can still access all the other columns later, but an additional query (in the background) will be performed just when you try to access them.
"Load only" is useful when you store things like pictures of users in your database but you do not want to waste time transferring the images when not needed. For example, when displaying a list of users this might suffice:
User.query.options(load_only('name', 'fullname'))
with_entities
with_entities() can either add or remove (simply: replace) models or columns; you can even use it to modify the query, to replace selected entities with your own function like func.count():
query = User.query
count_query = query.with_entities(func.count(User.id)))
count = count_query.scalar()
Note that the resulting query is not the same as of query.count(), which would probably be slower - at least in MySQL (as it generates a subquery).
Another example of the extra capabilities of with_entities would be:
query = (
Page.query
.filter(<a lot of page filters>)
.join(Author).filter(<some author filters>)
)
pages = query.all()
# ok, I got the pages. Wait, what? I want the authors too!
# how to do it without generating the query again?
pages_and_authors = query.with_entities(Page, Author).all()

How to replace columns in sqlalchemy query

I have the query:
q = Session.query(func.array_agg(Order.col))
The compiled query will be:
SELECT array_agg(order.col) FROM orders
I want dynamically replace the existing column. After replacing query have to be:
SELECT group_concat(orders.col) FROM orders
I have to use Session and model. I don't have to use SQLAlchemy core. I don't have to use subqueries. And, of course, there can be some other columns, but I need to replace only one. I tried to replace objects in column_descriptions property, I tried to use q.selectable.replace (or something like this, sorry, but I don't remember right names) and I didn't get right result.
The right method:
q = Session.query(func.array_agg(Order.col))
q.with_entities(func.group_concat(Order.col))
SELECT group_concat(orders.col) FROM orders

Django ORM values_list with '__in' filter performance

What is the preferred way to filter query set with '__in' in Django?
providers = Provider.objects.filter(age__gt=10)
consumers = Consumer.objects.filter(consumer__in=providers)
or
providers_ids = Provider.objects.filter(age__gt=10).values_list('id', flat=True)
consumers = Consumer.objects.filter(consumer__in=providers_ids)
These should be totally equivalent. Underneath the hood Django will optimize both of these to a subselect query in SQL. See the QuerySet API reference on in:
This queryset will be evaluated as subselect statement:
SELECT ... WHERE consumer.id IN (SELECT id FROM ... WHERE _ IN _)
However you can force a lookup based on passing in explicit values for the primary keys by calling list on your values_list, like so:
providers_ids = list(Provider.objects.filter(age__gt=10).values_list('id', flat=True))
consumers = Consumer.objects.filter(consumer__in=providers_ids)
This could be more performant in some cases, for example, when you have few providers, but it will be totally dependent on what your data is like and what database you're using. See the "Performance Considerations" note in the link above.
I Agree with Wilduck. However couple of notes
You can combine a filter such as these into one like this:
consumers = Consumer.objects.filter(consumer__age__gt=10)
This would give you the same result set - in a single query.
The second thing, to analyze the generated query, you can use the .query clause at the end.
Example:
print Provider.objects.filter(age__gt=10).query
would print the query the ORM would be generating to fetch the resultset.

django,split the query parameter and pass the value in IN cluase

i have written a native sql query in django, now i need to pass filter condition in where clause.
i am passing the URL as list of values like (a,b,c,d) and in database i need to compare and these with a column and filter the data.
Example URL:
(//10.100.212.16:8000/test/&param1=a,b,c,d)
example sql:
select * from test where test like(%a%) or like(%b%) or like(%c%) or like(%d%)
how can write this in django using native sql.
i am using postgres as db
thanks
Given the URL as presented, but assuming you're using the correct ? character rather than & to mark the beginning of the querystring:
import operator
from django.db.models import Q
param1_raw_string = request.GET.get('param1')
if param1_raw_string:
param1_values = param1_raw_string.split(',')
tests = Test.objects.filter(reduce(operator.or_, (Q(test__contains=param1) for param1 in param1_values)))
else:
# do something reasonable when param1 is missing
If you want case-insensitive comparison, use __icontains instead. Composing multiple Q objects using operator.or_ is the main point.
If you use ?param1=a&param1=b... you can skip the split and just write param1_values = request.GET.getlist('param1').

Categories

Resources