Django How to make only one query with queryset exists method? - python

I try to understand Django documentation for queryset exists method
Additionally, if a some_queryset has not yet been evaluated, but you
know that it will be at some point, then using some_queryset.exists()
will do more overall work (one query for the existence check plus an
extra one to later retrieve the results) than simply using
bool(some_queryset), which retrieves the results and then checks if
any were returned.
What I'm doing:
if queryset.exists():
do_something()
for element in queryset:
do_something_else(element)
So I'm doing more overall work than just using bool(some_queryset)
Does this code makes only one query?
if bool(queryset):
do_something()
for element in queryset:
do_something_else(element)
If yes where python puts the results ? In queryset variable ?
Thank you

From the .exists() docs itself:
Additionally, if a some_queryset has not yet been evaluated, but you
know that it will be at some point, then using
some_queryset.exists() will do more overall work (one query for the
existence check plus an extra one to later retrieve the results) than
simply using bool(some_queryset), which retrieves the results and
then checks if any were returned.
The results of an already evaluated queryset are cached by Django. So, whenever the data is required from the queryset the cached results are used.
Related docs: Caching and QuerySets

It is quite easy to check number of queries with assertNumQueries:
https://docs.djangoproject.com/en/1.3/topics/testing/#django.test.TestCase.assertNumQueries
In your case:
with self.assertNumQueries(1):
if bool(queryset):
do_something()
for element in queryset:
do_something_else(element)

Related

Django : Can we use .exclude() on .get() in django querysets

Can we use
MyClass.objects.get(description='hi').exclude(status='unknown')
Your code works as expected if you do the exclude() before the get():
MyClass.objects.exclude(status='unknown').get(description='hi')
As #Burhan Khalid points out, the call to .get will only succeed if the resulting query returns exactly one row.
You could also use the Q object to get specify the filter directly in the .get:
MyClass.objects.get(Q(description='hi') & ~Q(status='unknown'))
Note that the Q object is only necessary because you use a .exclude (and Django's ORM does not have a not equal field lookup so you have to use .exclude).
If your original code had been (note that .exclude has been replaced with .filter):
MyClass.objects.filter(status='unknown').get(description='hi')
... you could simply do:
MyClass.objects.get(status='unknown', description='hi')
You want instead:
MyClass.objects.filter(description='hi').exclude(status='unknown')
.get() will raise MultipleObjectsReturned if your query results in more than one matching set; which is likely to happen considering you are searching on something that isn't a primary key.
Using filter will give you a QuerySet, which you can later chain with other methods or simply step through to get the results.

Idiomatic/fast Django ORM check for existence on mysql/postgres

If I want to check for the existence and if possible retrieve an object, which of the following methods is faster? More idiomatic? And why? If not either of the two examples I list, how else would one go about doing this?
if Object.objects.get(**kwargs).exists():
my_object = Object.objects.get(**kwargs)
my_object = Object.objects.filter(**kwargs)
if my_object:
my_object = my_object[0]
If relevant, I care about mysql and postgres for this.
Why not do this in a try/except block to avoid the multiple queries / query then an if?
try:
obj = Object.objects.get(**kwargs)
except Object.DoesNotExist:
pass
Just add your else logic under the except.
django provides a pretty good overview of exists
Using your first example it will do the query two times, according to the documentation:
if some_queryset has not yet been evaluated, but you
know that it will be at some point, then using some_queryset.exists()
will do more overall work (one query for the existence check plus an
extra one to later retrieve the results) than simply using
bool(some_queryset), which retrieves the results and then checks if
any were returned.
So if you're going to be using the object, after checking for existance, the docs suggest just using it and forcing evaluation 1 time using
if my_object:
pass

Re evaluate django query after changes done to database

I got this long queryset statement on a view
contributions = user_profile.contributions_chosen.all()\
.filter(payed=False).filter(belongs_to=concert)\
.filter(contribution_def__left__gt=0)\
.filter(contribution_def__type_of='ticket')
That i use in my template
context['contributions'] = contributions
And later on that view i make changes(add or remove a record) to the table contributions_chosen and if i want my context['contributions'] updated i need to requery the database with the same lenghty query.
contributions = user_profile.contributions_chosen.all()\
.filter(payed=False).filter(belongs_to=concert)\
.filter(contribution_def__left__gt=0)\
.filter(contribution_def__type_of='ticket')
And then again update my context
context['contributions'] = contributions
So i was wondering if theres any way i can avoid repeating my self, to reevaluate the contributions so it actually reflects the real data on the database.
Ideally i would modify the queryset contributions and its values would be updated, and at the same time the database would reflect this changes, but i don't know how to do this.
UPDATE:
This is what i do between the two
context['contributions'] = contributions
I add a new contribution object to the contributions_chosen(this is a m2m relation),
contribution = Contribution.objects.create(kwarg=something,kwarg2=somethingelse)
user_profile.contributions_chosen.add(contribution)
contribution.save()
user_profile.save()
And in some cases i delete a contribution object
contribution = user_profile.contributions_chosen.get(id=1)
user_profile.contributions_chosen.get(id=request.POST['con
contribution.delete()
As you can see i'm modifying the table contributions_chosen so i have to reissue the query and update the context.
What am i doing wrong?
UPDATE
After seeing your comments about evaluating, i realize i do eval the queryset i do
len(contributions) between context['contribution'] and that seems to be problem.
I'll just move it after the database operations and thats it, thanks guy.
update
Seems you have not evaluated the queryset contributions, thus there is no need to worry about updating it because it still has not fetched data from DB.
Can you post code between two context['contributions'] = contributions lines? Normally before you evaluate the queryset contributions (for example by iterating over it or calling its __len__()), it does not contain anything reading from DB, hence you don't have to update its content.
To re-evaluate a queryset, you could
# make a clone
contribution._clone()
# or any op that makes clone, for example
contribution.filter()
# or clear its cache
contribution._result_cache = None
# you could even directly add new item to contribution._result_cache,
# but its could cause unexpected behavior w/o carefulness
I don't know how you can avoid re-evaluating the query, but one way to save some repeated statements in your code would be to create a dict with all those filters and to specify the filter args as a dict:
query_args = dict(
payed=False,
belongs_to=concert,
contribution_def__left__gt=0,
contribution_def__type_of='ticket',
)
and then
contributions = user_profile.contributions_chosen.filter(**query_args)
This just removes some repeated code, but does not solve the repeated query. If you need to change the args, just handle query_args as a normal Python dict, it is one after all :)

Completing object with its relations and avoiding unnecessary queries in sqlalchemy

I have some database structure; as most of it is irrelevant for us, i'll describe just some relevant pieces. Let's lake Item object as example:
items_table = Table("invtypes", gdata_meta,
Column("typeID", Integer, primary_key = True),
Column("typeName", String, index=True),
Column("marketGroupID", Integer, ForeignKey("invmarketgroups.marketGroupID")),
Column("groupID", Integer, ForeignKey("invgroups.groupID"), index=True))
mapper(Item, items_table,
properties = {"group" : relation(Group, backref = "items"),
"_Item__attributes" : relation(Attribute, collection_class = attribute_mapped_collection('name')),
"effects" : relation(Effect, collection_class = attribute_mapped_collection('name')),
"metaGroup" : relation(MetaType,
primaryjoin = metatypes_table.c.typeID == items_table.c.typeID,
uselist = False),
"ID" : synonym("typeID"),
"name" : synonym("typeName")})
I want to achieve some performance improvements in the sqlalchemy/database layer, and have couple of ideas:
1) Requesting the same item twice:
item = session.query(Item).get(11184)
item = None (reference to item is lost, object is garbage collected)
item = session.query(Item).get(11184)
Each request generates and issues SQL query. To avoid it, i use 2 custom maps for an item object:
itemMapId = {}
itemMapName = {}
#cachedQuery(1, "lookfor")
def getItem(lookfor, eager=None):
if isinstance(lookfor, (int, float)):
id = int(lookfor)
if eager is None and id in itemMapId:
item = itemMapId[id]
else:
item = session.query(Item).options(*processEager(eager)).get(id)
itemMapId[item.ID] = item
itemMapName[item.name] = item
elif isinstance(lookfor, basestring):
if eager is None and lookfor in itemMapName:
item = itemMapName[lookfor]
else:
# Items have unique names, so we can fetch just first result w/o ensuring its uniqueness
item = session.query(Item).options(*processEager(eager)).filter(Item.name == lookfor).first()
itemMapId[item.ID] = item
itemMapName[item.name] = item
return item
I believe sqlalchemy does similar object tracking, at least by primary key (item.ID). If it does, i can wipe both maps (although wiping name map will require minor modifications to application which uses these queries) to not duplicate functionality and use stock methods. Actual question is: if there's such functionality in sqlalchemy, how to access it?
2) Eager loading of relationships often helps to save alot of requests to database. Say, i'll definitely need following set of item=Item() properties:
item.group (Group object, according to groupID of our item)
item.group.items (fetch all items from items list of our group)
item.group.items.metaGroup (metaGroup object/relation for every item in the list)
If i have some item ID and no item is loaded yet, i can request it from the database, eagerly loading everything i need: sqlalchemy will join group, its items and corresponding metaGroups within single query. If i'd access them with default lazy loading, sqlalchemy would need to issue 1 query to grab an item + 1 to get group + 1*#items for all items in the list + 1*#items to get metaGroup of each item, which is wasteful.
2.1) But what if i already have Item object fetched, and some of the properties which i want to load are already loaded? As far as i understand, when i re-fetch some object from the database - its already loaded relations do not become unloaded, am i correct?
2.2) If i have Item object fetched, and want to access its group, i can just getGroup using item.groupID, applying any eager statements i'll need ("items" and "items.metaGroup"). It should properly load group and its requested relations w/o touching item stuff. Will sqlalchemy properly map this fetched group to item.group, so that when i access item.group it won't fetch anything from the underlying database?
2.3) If i have following things fetched from the database: original item, item.group and some portion of the items from the item.group.items list some of which may have metaGroup loaded, what would be best strategy for completing data structure to the same as eager list above: re-fetch group with ("items", "items.metaGroup") eager load, or check each item from items list individually, and if item or its metaGroup is not loaded - load them? It seems to depend on the situation, because if everything has already been loaded some time ago - issuing such heavy query is pointless. Does sqlalchemy provide a way to track if some object relation is loaded, with the ability to look deeper than just one level?
As an illustration to 2.3 - i can fetch group with ID 83, eagerly fetching "items" and "items.metaGroup". Is there a way to determine from an item (which has groupID of an 83), does it have "group", "group.items" and "group.items.metaGroup" loaded or not, using sqlalchemy tools (in this case all of them should be loaded)?
To force loading lazy attributes just access them. This the simplest way and it works fine for relations, but is not as efficient for Columns (you will get separate SQL query for each column in the same table). You can get a list of all unloaded properties (both relations and columns) from sqlalchemy.orm.attributes.instance_state(obj).unloaded.
You don't use deferred columns in your example, but I'll describe them here for completeness. The typical scenario for handling deferred columns is the following:
Decorate selected columns with deferred(). Combine them into one or several groups by using group parameter to deferred().
Use undefer() and undefer_group() options in query when desired.
Accessing deferred column put in group will load all columns in this group.
Unfortunately this doesn't work reverse: you can combine columns into groups without deferring loading of them by default with column_property(Column(…), group=…), but defer() option won't affect them (it works for Columns only, not column properties, at least in 0.6.7).
To force loading deferred column properties session.refresh(obj, attribute_names=…) suggested by Nathan Villaescusa is probably the best solution. The only disadvantage I see is that it expires attributes first so you have to insure there is not loaded attributes among passed as attribute_names argument (e.g. by using intersection with state.unloaded).
Update
1) SQLAlchemy does track loaded objects. That's how ORM works: there must be the only object in the session for each identity. Its internal cache is weak by default (use weak_identity_map=False to change this), so the object is expunged from the cache as soon as there in no reference to it in your code. SQLAlchemy won't do SQL request for query.get(pk) when object is already in the session. But this works for get() method only, so query.filter_by(id=pk).first() will do SQL request and refresh object in the session with loaded data.
2) Eager loading of relations will lead to fewer requests, but it's not always faster. You have to check this for your database and data.
2.1) Refetching data from database won't unload objects bound via relations.
2.2) item.group is loaded using query.get() method, so there won't lead to SQL request if object is already in the session.
2.3) Yes, it depends on situation. For most cases it's the best is to hope SQLAlchemy will use the right strategy :). For already loaded relation you can check if related objects' relations are loaded via state.unloaded and so recursively to any depth. But when relation is not loaded yet you can't get know whether related objects and their relations are already loaded: even when relation is not yet loaded the related object[s] might be already in the session (just imagine you request first item, load its group and then request other item that has the same group). For your particular example I see no problem to just check state.unloaded recursively.
1)
From the Session documentation:
[The Session] is somewhat used as a cache, in that
it implements the identity map
pattern, and stores objects keyed to
their primary key. However, it doesn’t
do any kind of query caching. ... It’s only
when you say query.get({some primary
key}) that the Session doesn’t have to
issue a query.
2.1) You are correct, relationships are not modified when you refresh an object.
2.2) Yes, the group will be in the identity map.
2.3) I believe your best bet will be to attempt to reload the entire group.items in a single query. From my experience it is usually much quicker to issue one large request than several smaller ones. The only time it would make sense to only reload a specific group.item is there was exactly one of them that needed to be loaded. Though in that case you are doing one large query instead of one small one so you don't actually reduce the number of queries.
I have not tried it, but I believe you should be able to use the sqlalchemy.orm.util.identity_key method to determine whether an object is in sqlalchemy's identiy map. I would be interested to find out what calling identiy_key(Group, 83) returns.
Initial Question)
If I understand correctly you have an object that you fetched from the database where some of its relationships were eagerloaded and you would like to fetch the rest of the relationships with a single query? I believe you may be able to use the Session.refresh() method passing in the the names of the relationships that you want to load.

Duplicate an AppEngine Query object to create variations of a filter without affecting the base query

In my AppEngine project I have a need to use a certain filter as a base then apply various different extra filters to the end, retrieving the different result sets separately. e.g.:
base_query = MyModel.all().filter('mainfilter', 123)
Then I need to use the results of various sub queries separately:
subquery1 = basequery.filter('subfilter1', 'xyz')
#Do something with subquery1 results here
subquery2 = basequery.filter('subfilter2', 'abc')
#Do something with subquery2 results here
Unfortunately 'filter()' affects the state of the basequery Query instance, rather than just returning a modified version. Is there any way to duplicate the Query object and use it as a base? Is there perhaps a standard Python way of duping an object that could be used?
The extra filters are actually applied by the results of different forms dynamically within a wizard, and they use the 'running total' of the query in their branch to assess whether to ask further questions.
Obviously I could pass around a rudimentary stack of filter criteria, but I'd rather use the Query itself if possible, as it adds simplicity and elegance to the solution.
There's no officially approved (Eg, not likely to break) way to do this. Simply creating the query afresh from the parameters when you need it is your best option.
As Nick has said, you better create the query again, but you can still avoid repeating yourself. A good way to do that would be like this:
#inside a request handler
def create_base_query():
return MyModel.all().filter('mainfilter', 123)
subquery1 = create_base_query().filter('subfilter1', 'xyz')
#Do something with subquery1 results here
subquery2 = create_base_query().filter('subfilter2', 'abc')
#Do something with subquery2 results here

Categories

Resources