What is the difference between with_entities and load_only in SQLAlchemy?

What is the difference between with_entities and load_only in SQLAlchemy? - python

When querying my database, I only want to load specified columns. Creating a query with with_entities requires a reference to the model column attribute, while creating a query with load_only requires a string corresponding to the column name. I would prefer to use load_only because it is easier to create a dynamic query using strings. What is the difference between the two?
load_only documentation
with_entities documentation

There are a few differences. The most important one when discarding unwanted columns (as in the question) is that using load_only will still result in creation of an object (a Model instance), while using with_entities will just get you tuples with values of chosen columns.
>>> query = User.query
>>> query.options(load_only('email', 'id')).all()
[<User 1 using e-mail: n#d.com>, <User 2 using e-mail: n#d.org>]
>>> query.with_entities(User.email, User.id).all()
[('n#d.org', 1), ('n#d.com', 2)]
load_only
load_only() defers loading of particular columns from your models.
It removes columns from query. You can still access all the other columns later, but an additional query (in the background) will be performed just when you try to access them.
"Load only" is useful when you store things like pictures of users in your database but you do not want to waste time transferring the images when not needed. For example, when displaying a list of users this might suffice:
User.query.options(load_only('name', 'fullname'))
with_entities
with_entities() can either add or remove (simply: replace) models or columns; you can even use it to modify the query, to replace selected entities with your own function like func.count():
query = User.query
count_query = query.with_entities(func.count(User.id)))
count = count_query.scalar()
Note that the resulting query is not the same as of query.count(), which would probably be slower - at least in MySQL (as it generates a subquery).
Another example of the extra capabilities of with_entities would be:
query = (
Page.query
.filter(<a lot of page filters>)
.join(Author).filter(<some author filters>)
)
pages = query.all()
# ok, I got the pages. Wait, what? I want the authors too!
# how to do it without generating the query again?
pages_and_authors = query.with_entities(Page, Author).all()

Related

FastAPI in-memory filtering

I'm following the tutorial here: https://github.com/Jastor11/phresh-tutorial/tree/tutorial-part-11-marketplace-functionality-in-fastapi/backend/app and I had a question: I want to filter a model by different parameters so how would I do that?
The current situation is that I have a list of doctors and so I get all of them. Then depending on the filter query parameters, I filter doctors. I can't just do it all in one go because these query parameters are optional.
so I was thinking something like (psuedocode):
all_doctors = await self.db.fetch_all(query=GET_ALL_DOCTORS)
if language_id:
all_doctors = all_doctors.filter(d => doctor.language_id = language_id)
if area:
all_doctors = all_doctors.xyzabc
I'm trying out FastAPI according to that tutorial and couldn't figure out how to do this.
I have defined a model file for different models and am using SQLAlchemy.
One way I thought of is just getting the ids of all the doctors then at each filtering step, passing in the doctor ids from the last step and funneling them through different sql queries but this is filtering using the database and would result in one more query per filter parameter. I want to know how to use the ORM to filter in memory.
EDIT: So basically, in the tutorial I was following, no SQLAlchemy models were defined. The tutorial was using SQL statements. Anyways, to answer my own question: I would first need to define SQLAlchemy models before I can use them.

The SQLAlchemy query object (and its operations) returns itself, so you can keep building out the query conditionally inside if-statements:
query = db_session.query(Doctor)
if language_id:
query = query.filter(Doctor.language_id == language_id)
if area_id:
query = query.filter(Doctor.area_id == area_id)
return query.all()
The query doesn't run before you call all at the end. If neither argument is given, you'll get all the doctors.

Mongo Embedded Document Query

I've 2 DynamicDocuments:
class Tasks(db.DynamicDocument):
task_id = db.UUIDField(primary_key=True,default=uuid.uuid4)
name = db.StringField()
flag = db.IntField()
class UserTasks(db.DynamicDocument):
user_id = db.ReferenceField('User')
tasks = db.ListField(db.ReferenceField('Tasks'),default=list)
I want to filter the UserTasks document by checking whether the flag value (from Tasks Document) of the given task_id is 0 or 1, given the task_id and user_id. So I query in the following way:-
obj = UserTasks.objects.get(user_id=user_id,tasks=task_id)
This fetches me an UserTask object.
Now I loop around the task list and first I get the equivalent task and then check its flag value in the following manner.
task_list = obj.tasks
for t in task_list:
if t['task_id'] == task_id:
print t['flag']
Is there any better/direct way of querying UserTasks Document in order to fetch the flag value of Tasks Document.
PS : I could have directly fetched flag value from the Tasks Document, but I also need to check whether the task is associated with the user or not. Hence I directly queried the USerTasks document.

Can we directly filter on a document with the ReferenceField's fields in a single query?
No, its not possible to directly filter a document with the fields of ReferenceField as doing this would require joins and mongodb does not support joins.
As per MongoDB docs on database references:
MongoDB does not support joins. In MongoDB some data is denormalized,
or stored with related data in documents to remove the need for joins.
From another page on the official site:
If we were using a relational database, we could perform a join on
users and stores, and get all our objects in a single query. But
MongoDB does not support joins and so, at times, requires bit of
denormalization.
Relational purists may be feeling uneasy already, as if we were
violating some universal law. But let’s bear in mind that MongoDB
collections are not equivalent to relational tables; each serves a
unique design objective. A normalized table provides an atomic,
isolated chunk of data. A document, however, more closely represents
an object as a whole.
So in 1 query, we can't both filter tasks with a particular flag value and with the given user_id and task_id on the UserTasks model.
How to perform the filtering then?
To perform the filtering as per the required conditions, we will need to perform 2 queries.
In the first query we will try to filter the Tasks model with the given task_id and flag. Then, in the 2nd query, we will filter UserTasks model with the given user_id and the task retrieved from the first query.
Example:
Lets say we have a user_id, task_id and we need to check if the related task has flag value as 0.
1st Query
We will first retrive the my_task with the given task_id and flag as 0.
my_task = Tasks.objects.get(task_id=task_id, flag=0) # 1st query
2nd Query
Then in the 2nd query, you need to filter on UserTask model with the given user_id and my_task object.
my_user_task = UserTasks.objects.get(user_id=user_id, tasks=my_task) # 2nd query
You should perform 2nd query only if you get a my_task object with the given task_id and flag value. Also, you will need to add error handling in case there are no matched objects.
What if we have used EmbeddedDocument for the Tasks model?
Lets say we have defined our Tasks document as an EmbeddedDocument and the tasks field in UserTasks model as an EmbeddedDocumentField, then to do the desired filtering we could have done something like below:
my_user_task = UserTasks.objects.get(user_id=user_id, tasks__task_id=task_id, tasks__flag=0)
Getting the particular my_task from the list of tasks
The above query will return a UserTask document which will contain all the tasks. We will then need to perform some sort of iteration to get the desired task.
For doing that, we can perform list comprehension using enumerate().
Then the desired index will be the 1st element of the 1-element list returned.
my_task_index = [i for i,v in enumerate(my_user_task.tasks) if v.flag==0][0]

#Praful, based on your schema you need two queries because mongodb does not have joins, so if you want to get "all the data" in one query you need a schema which fit that case.
ReferenceField is a special field which does a lazy load of the other collection (it requires a query).
Based on the query you need, I recommend you to change your schema to fit that. The idea behind NOSQL engines is "denormalization" so it is not bad to have a list of EmbeddedDocument. EmbeddedDocument can be a smaller document (denormalized version) with a set of fields instead of all of them.
If you do not want to load the whole document into memory while querying you can exclude that fields using a "projection".
Supossing your UserTasks has a list of EmbeddedDocument with the task you could do:
UserTasks.objects.exclude('tasks').filter(**filters)
I hope it helps you.
Good luck!

Django ORM values_list with '__in' filter performance

What is the preferred way to filter query set with '__in' in Django?
providers = Provider.objects.filter(age__gt=10)
consumers = Consumer.objects.filter(consumer__in=providers)
or
providers_ids = Provider.objects.filter(age__gt=10).values_list('id', flat=True)
consumers = Consumer.objects.filter(consumer__in=providers_ids)

These should be totally equivalent. Underneath the hood Django will optimize both of these to a subselect query in SQL. See the QuerySet API reference on in:
This queryset will be evaluated as subselect statement:
SELECT ... WHERE consumer.id IN (SELECT id FROM ... WHERE _ IN _)
However you can force a lookup based on passing in explicit values for the primary keys by calling list on your values_list, like so:
providers_ids = list(Provider.objects.filter(age__gt=10).values_list('id', flat=True))
consumers = Consumer.objects.filter(consumer__in=providers_ids)
This could be more performant in some cases, for example, when you have few providers, but it will be totally dependent on what your data is like and what database you're using. See the "Performance Considerations" note in the link above.

I Agree with Wilduck. However couple of notes
You can combine a filter such as these into one like this:
consumers = Consumer.objects.filter(consumer__age__gt=10)
This would give you the same result set - in a single query.
The second thing, to analyze the generated query, you can use the .query clause at the end.
Example:
print Provider.objects.filter(age__gt=10).query
would print the query the ORM would be generating to fetch the resultset.

'Stringing together' a pymongo query based on a set of conditions

I have a set of conditions that I need to use to retrieve some data from a mongodb database (using pymongo). Some of these conditions are optional, and others may have more than one possible value.
I'm wondering if there is a way of 'dynamically' constructing a pymongo query based on these conditions (instead of creating individual queries for each possible combination of conditions).
For example, assume that I have one query which has to be constrained to the following conditions:
tag contains any of this, is, a, tag
user is johnsmith
date_published is before today
...whereas another query may only be constrained to the following:
user is johnsmith
date_published is after today
Summary: Instead of having to create every possible combination of conditions, is there a way of stringing conditions together to form a query in pymongo?

A PyMongo query is just a Python dictionary, so you can use all the usual techniques to build one on the fly:
def find_things(tags=None, user=None, published_since=None):
# all queries begin with something common, which may
# be an empty dict, but here's an example
query = {
'is_published': True
}
if tags:
# assume that it is an array of strings
query['tags'] = {'$in': tags}
if user:
# assume that it is a string
query['user'] = user
if published_since:
# assume that it is a datetime.datetime
query['date_published'] = {'$gte': published_since}
# etc...
return db.collection.find(query)
The actual logic you implement is obviously dependent on what you want to vary your find calls by, these are just a few examples. You will also want to validate the input if it is coming from an untrusted source (e.g. a web application form, URL parameters, etc).

How to filter by joinloaded table in SqlAlchemy?

Lets say I got 2 models, Document and Person. Document got relationship to Person via "owner" property. Now:
session.query(Document)\
.options(joinedload('owner'))\
.filter(Person.is_deleted!=True)
Will double join table Person. One person table will be selected, and the doubled one will be filtered which is not exactly what I want cuz this way document rows will not be filtered.
What can I do to apply filter on joinloaded table/model ?

You are right, table Person will be used twice in the resulting SQL, but each of them serves different purpose:
one is to filter the the condition: filter(Person.is_deleted != True)
the other is to eager load the relationship: options(joinedload('owner'))
But the reason your query returns wrong results is because your filter condition is not complete. In order to make it produce the right results, you also need to JOIN the two models:
qry = (session.query(Document).
join(Document.owner). # THIS IS IMPORTANT
options(joinedload(Document.owner)).
filter(Person.is_deleted != True)
)
This will return correct rows, even though it will still have 2 references (JOINs) to Person table. The real solution to your query is that using contains_eager instead of joinedload:
qry = (session.query(Document).
join(Document.owner). # THIS IS STILL IMPORTANT
options(contains_eager(Document.owner)).
filter(Person.is_deleted != True)
)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.