I was having a debate on this with some colleagues. Is there a preferred way to retrieve an object in Django when you're expecting only one?
The two obvious ways are:
try:
obj = MyModel.objects.get(id=1)
except MyModel.DoesNotExist:
# We have no object! Do something...
pass
And:
objs = MyModel.objects.filter(id=1)
if len(objs) == 1:
obj = objs[0]
else:
# We have no object! Do something...
pass
The first method seems behaviorally more correct, but uses exceptions in control flow which may introduce some overhead. The second is more roundabout but won't ever raise an exception.
Any thoughts on which of these is preferable? Which is more efficient?
get() is provided specifically for this case. Use it.
Option 2 is almost precisely how the get() method is actually implemented in Django, so there should be no "performance" difference (and the fact that you're thinking about it indicates you're violating one of the cardinal rules of programming, namely trying to optimize code before it's even been written and profiled -- until you have the code and can run it, you don't know how it will perform, and trying to optimize before then is a path of pain).
You can install a module called django-annoying and then do this:
from annoying.functions import get_object_or_None
obj = get_object_or_None(MyModel, id=1)
if not obj:
#omg the object was not found do some error stuff
1 is correct. In Python an exception has equal overhead to a return. For a simplified proof you can look at this.
2 This is what Django is doing in the backend. get calls filter and raises an exception if no item is found or if more than one object is found.
I'm a bit late to the party, but with Django 1.6 there is the first() method on querysets.
https://docs.djangoproject.com/en/dev/ref/models/querysets/#django.db.models.query.QuerySet.first
Returns the first object matched by the queryset, or None if there is no matching object. If the QuerySet has no ordering defined, then the queryset is automatically ordered by the primary key.
Example:
p = Article.objects.order_by('title', 'pub_date').first()
Note that first() is a convenience method, the following code sample is equivalent to the above example:
try:
p = Article.objects.order_by('title', 'pub_date')[0]
except IndexError:
p = None
Why do all that work? Replace 4 lines with 1 builtin shortcut. (This does its own try/except.)
from django.shortcuts import get_object_or_404
obj = get_object_or_404(MyModel, id=1)
I can't speak with any experience of Django but option #1 clearly tells the system that you are asking for 1 object, whereas the second option does not. This means that option #1 could more easily take advantage of cache or database indexes, especially where the attribute you're filtering on is not guaranteed to be unique.
Also (again, speculating) the second option may have to create some sort of results collection or iterator object since the filter() call could normally return many rows. You'd bypass this with get().
Finally, the first option is both shorter and omits the extra temporary variable - only a minor difference but every little helps.
Some more info about exceptions. If they are not raised, they cost almost nothing. Thus if you know you are probably going to have a result, use the exception, since using a conditional expression you pay the cost of checking every time, no matter what. On the other hand, they cost a bit more than a conditional expression when they are raised, so if you expect not to have a result with some frequency (say, 30% of the time, if memory serves), the conditional check turns out to be a bit cheaper.
But this is Django's ORM, and probably the round-trip to the database, or even a cached result, is likely to dominate the performance characteristics, so favor readability, in this case, since you expect exactly one result, use get().
I've played with this problem a bit and discovered that the option 2 executes two SQL queries, which for such a simple task is excessive. See my annotation:
objs = MyModel.objects.filter(id=1) # This does not execute any SQL
if len(objs) == 1: # This executes SELECT COUNT(*) FROM XXX WHERE filter
obj = objs[0] # This executes SELECT x, y, z, .. FROM XXX WHERE filter
else:
# we have no object! do something
pass
An equivalent version that executes a single query is:
items = [item for item in MyModel.objects.filter(id=1)] # executes SELECT x, y, z FROM XXX WHERE filter
count = len(items) # Does not execute any query, items is a standard list.
if count == 0:
return None
return items[0]
By switching to this approach, I was able to substantially reduce number of queries my application executes.
.get()
Returns the object matching the given lookup parameters, which should
be in the format described in Field lookups.
get() raises MultipleObjectsReturned if more than one object was
found. The MultipleObjectsReturned exception is an attribute of the
model class.
get() raises a DoesNotExist exception if an object wasn't found for
the given parameters. This exception is also an attribute of the model
class.
.filter()
Returns a new QuerySet containing objects that match the given lookup
parameters.
Note
use get() when you want to get a single unique object, and filter()
when you want to get all objects that match your lookup parameters.
Interesting question, but for me option #2 reeks of premature optimisation. I'm not sure which is more performant, but option #1 certainly looks and feels more pythonic to me.
I suggest a different design.
If you want to perform a function on a possible result, you could derive from QuerySet, like this: http://djangosnippets.org/snippets/734/
The result is pretty awesome, you could for example:
MyModel.objects.filter(id=1).yourFunction()
Here, filter returns either an empty queryset or a queryset with a single item. Your custom queryset functions are also chainable and reusable. If you want to perform it for all your entries: MyModel.objects.all().yourFunction().
They are also ideal to be used as actions in the admin interface:
def yourAction(self, request, queryset):
queryset.yourFunction()
Option 1 is more elegant, but be sure to use try..except.
From my own experience I can tell you that sometimes you're sure there cannot possibly be more than one matching object in the database, and yet there will be two... (except of course when getting the object by its primary key).
Sorry to add one more take on this issue, but I am using the django paginator, and in my data admin app, the user is allowed to pick what to query on. Sometimes that is the id of a document, but otherwise it is a general query returning more than one object, i.e., a Queryset.
If the user queries the id, I can run:
Record.objects.get(pk=id)
which throws an error in django's paginator, because it is a Record and not a Queryset of Records.
I need to run:
Record.objects.filter(pk=id)
Which returns a Queryset with one item in it. Then the paginator works just fine.
".get()" can return one object:
{
"name": "John",
"age": "26",
"gender": "Male"
}
".filter()" can return **a list(set) of one or more objects:
[
{
"name": "John",
"age": "26",
"gender": "Male"
},
{
"name": "Tom",
"age": "18",
"gender": "Male"
},
{
"name": "Marry",
"age": "22",
"gender": "Female"
}
]
Related
Let's imagine I have this model and I would like to sort them by logical operation n1 != n2:
class Thing(Model):
n1 = IntegerField()
n2 = IntegerField()
...
def is_different(self):
return self.n1 != self.n2
If I sort them by sorted built-in function, I found that it does not return a Queryset, but a list:
things = Thing.objects.all()
sorted_things = sorted(things, key=lambda x: x.is_different())
Now, if I use annotate
sorted_things = things.annotate(diff=(F('n1') != F('n2'))).order_by('diff')
it raises the following error: AttributeError: 'bool' object has no attribute 'resolve_expression'.
I found a solution using extra queryset:
sorted_things = things.extra(select={'diff': 'n1!=n2'}).order_by('diff')
but following Django docs (https://docs.djangoproject.com/en/2.0/ref/models/querysets/#extra):
Use this method as a last resort
This is an old API that we aim to deprecate at some point in the future. Use it only if you cannot express your query using other queryset methods. If you do need to use it, please file a ticket using the QuerySet.extra keyword with your use case (please check the list of existing tickets first) so that we can enhance the QuerySet API to allow removing extra(). We are no longer improving or fixing bugs for this method.
Then, what is the optimal way to do it?
Thanks!
Conditional expressions
One option for it is to use conditional expressions. They provide simple way of checking conditions and providing one of values depending on them. In your case it will look like:
sorted_things = things.annotate(diff=Case(When(n1=F('n2'), then=True), default=False, output_field=BooleanField())).order_by('diff')
Q and ExpressionWrapper
There is another, a bit hacky way, to achieve that by combining usage of Q and ExpressionWrapper.
In django, Q is intended to be used inside filter(), exclude(), Case etc. but it simply creates condition that apparently can be used anywhere. It has only one drawback: it doesn't define what type is outputting (it's always boolean and django can assume that in every case when Q is intended to be used.
But there comes ExpressionWrapper that allows you to wrap any expression and define it's final output type. That way we can simply wrap Q expression (or more than one Q expresisons glued together using &, | and brackets) and define by hand what type it outputs.
Be aware that this is undocumented, so this behavior may change in future, but I've checked it using django versions 1.8, 1.11 and 2.0 and it works fine
Example:
sorted_things = things.annotate(diff=ExpressionWrapper(Q(n1=F('n2')), output_field=BooleanField())).order_by('diff')
You can work around it using Func() expressions.
from django.db.models import Func, F
class NotEqual(Func):
arg_joiner = '<>'
arity = 2
function = ''
things = Thing.objects.annotate(diff=NotEqual(F('n1'), F('n2'))).order_by('diff')
I have some topic to discuss. I have a fragment of code with 24 ifs/elifs. Operation is my own class that represents functionality similar to Enum. Here is a fragment of code:
if operation == Operation.START:
strategy = strategy_objects.StartObject()
elif operation == Operation.STOP:
strategy = strategy_objects.StopObject()
elif operation == Operation.STATUS:
strategy = strategy_objects.StatusObject()
(...)
I have concerns from readability point of view. Is is better to change it into 24 classes and use polymorphism? I am not convinced that it will make my code maintainable... From one hand those ifs are pretty clear and it shouldn't be hard to follow, on the other hand there are too many ifs.
My question is rather general, however I'm writing code in Python so I cannot use constructions like switch.
What do you think?
UPDATE:
One important thing is that StartObject(), StopObject() and StatusObject() are constructors and I wanted to assign an object to strategy reference.
You could possibly use a dictionary. Dictionaries store references, which means functions are perfectly viable to use, like so:
operationFuncs = {
Operation.START: strategy_objects.StartObject
Operation.STOP: strategy_objects.StopObject
Operation.STATUS: strategy_objects.StatusObject
(...)
}
It's good to have a default operation just in case, so when you run it use a try except and handle the exception (ie. the equivalent of your else clause)
try:
strategy = operationFuncs[operation]()
except KeyError:
strategy = strategy_objects.DefaultObject()
Alternatively use a dictionary's get method, which allows you to specify a default if the key you provide isn't found.
strategy = operationFuncs.get(operation(), DefaultObject())
Note that you don't include the parentheses when storing them in the dictionary, you just use them when calling your dictionary. Also this requires that Operation.START be hashable, but that should be the case since you described it as a class similar to an ENUM.
Python's equivalent to a switch statement is to use a dictionary. Essentially you can store the keys like you would the cases and the values are what would be called for that particular case. Because functions are objects in Python you can store those as the dictionary values:
operation_dispatcher = {
Operation.START: strategy_objects.StartObject,
Operation.STOP: strategy_objects.StopObject,
}
Which can then be used as follows:
try:
strategy = operation_dispatcher[operation] #fetch the strategy
except KeyError:
strategy = default #this deals with the else-case (if you have one)
strategy() #call if needed
Or more concisely:
strategy = operation_dispatcher.get(operation, default)
strategy() #call if needed
This can potentially scale a lot better than having a mess of if-else statements. Note that if you don't have an else case to deal with you can just use the dictionary directly with operation_dispatcher[operation].
You could try something like this.
For instance:
def chooseStrategy(op):
return {
Operation.START: strategy_objects.StartObject
Operation.STOP: strategy_objects.StopObject
}.get(op, strategy_objects.DefaultValue)
Call it like this
strategy = chooseStrategy(operation)()
This method has the benefit of providing a default value (like a final else statement). Of course, if you only need to use this decision logic in one place in your code, you can always use strategy = dictionary.get(op, default) without the function.
Starting from python 3.10
match i:
case 1:
print("First case")
case 2:
print("Second case")
case _:
print("Didn't match a case")
https://pakstech.com/blog/python-switch-case/
You can use some introspection with getattr:
strategy = getattr(strategy_objects, "%sObject" % operation.capitalize())()
Let's say the operation is "STATUS", it will be capitalized as "Status", then prepended to "Object", giving "StatusObject". The StatusObject method will then be called on the strategy_objects, failing catastrophically if this attribute doesn't exist, or if it's not callable. :) (I.e. add error handling.)
The dictionary solution is probably more flexible though.
If the Operation.START, etc are hashable, you can use dictionary with keys as the condition and the values as the functions to call, example -
d = {Operation.START: strategy_objects.StartObject ,
Operation.STOP: strategy_objects.StopObject,
Operation.STATUS: strategy_objects.StatusObject}
And then you can do this dictionary lookup and call the function , Example -
d[operation]()
Here is a bastardized switch/case done using dictionaries:
For example:
# define the function blocks
def start():
strategy = strategy_objects.StartObject()
def stop():
strategy = strategy_objects.StopObject()
def status():
strategy = strategy_objects.StatusObject()
# map the inputs to the function blocks
options = {"start" : start,
"stop" : stop,
"status" : status,
}
Then the equivalent switch block is invoked:
options["string"]()
Is there a way to provide a hint for an upsert in a bulk in MongoDB / Python?
I would like to add a hint in a query like: Bulk.find(<query>).upsert().update(<update>).
I have tried:
Bulk.find(<query>).hint(<index>).upsert().update(<update>): .hint() method does not exist.
Bulk.find({'$query': <query>, '$hint': <hint>}).upsert().update(<update>): one cannot mix {$query: <query>} syntax with method chaining (see this & this for example).
Am I missing something?
This is not so much about Bulk Operations but is rather about the general behavior of queries in "update" statements. See SERVER-1599.
So the same format of operations supported by the basic Op_Query which is linked to .find() has never been supported in update statements. This is also true of the Bulk API because the .find() method there is it's own method and belongs to the Bulk API where it is not related to the basic collection method, hence the lacking .hint() method.
So using the special forms as with $query does not work even with .update() in a basic form. But there is something you can do as of MongoDB 2.6 to influence the index chosen by the query.
The new addition here is "index filters", this allows you to set up a list of indexes to be considered for a given "query shape". The main definition here is through the planCacheSetFilter command. This allows you do do something like the following ( just in shell for brevity ):
db.junk.ensureIndex({ "b": 1, "a": 1 })
db.runCommand({
"planCacheSetFilter": "junk",
"query": { "a": 1 },
"indexes": [
{ "b": 1, "a": 1 }
]
})
The values provided in the "query" argument there are irrelevant, but what is important is the "shape". So regardless of what data is being queried for, as long as the "shape" is basically the same then the filter set is considered. i.e:
db.junk.find({ "a": 1 }).explain(1).filterSet; // returns true
db.junk.find({ "a": 2 }).explain(1).filterSet; // returns true
db.junk.find({ "b": 1 }).explain(1).filterSet; // returns false, different shape
Unlike the direct form of $hint, this will work with both .update() statements or in the Bulk .find().update() chain as a way to provide an index choice for the query operation.
Beware though that this is not a "permanent" setting, nor is it able to be isolated to a singular operation or sequence of operations. This "filter" will stay in the plan cache once set until the server instance is restarted. You can alternately clear it with the planCacheClearFilters command.
So until that JIRA Issue is resolved, "filters" are the only possible way like what you are asking to achieve without factoring in other queries to narrow down additional filtering parameters to optimize on the likely selected index.
Can we use
MyClass.objects.get(description='hi').exclude(status='unknown')
Your code works as expected if you do the exclude() before the get():
MyClass.objects.exclude(status='unknown').get(description='hi')
As #Burhan Khalid points out, the call to .get will only succeed if the resulting query returns exactly one row.
You could also use the Q object to get specify the filter directly in the .get:
MyClass.objects.get(Q(description='hi') & ~Q(status='unknown'))
Note that the Q object is only necessary because you use a .exclude (and Django's ORM does not have a not equal field lookup so you have to use .exclude).
If your original code had been (note that .exclude has been replaced with .filter):
MyClass.objects.filter(status='unknown').get(description='hi')
... you could simply do:
MyClass.objects.get(status='unknown', description='hi')
You want instead:
MyClass.objects.filter(description='hi').exclude(status='unknown')
.get() will raise MultipleObjectsReturned if your query results in more than one matching set; which is likely to happen considering you are searching on something that isn't a primary key.
Using filter will give you a QuerySet, which you can later chain with other methods or simply step through to get the results.
If I want to check for the existence and if possible retrieve an object, which of the following methods is faster? More idiomatic? And why? If not either of the two examples I list, how else would one go about doing this?
if Object.objects.get(**kwargs).exists():
my_object = Object.objects.get(**kwargs)
my_object = Object.objects.filter(**kwargs)
if my_object:
my_object = my_object[0]
If relevant, I care about mysql and postgres for this.
Why not do this in a try/except block to avoid the multiple queries / query then an if?
try:
obj = Object.objects.get(**kwargs)
except Object.DoesNotExist:
pass
Just add your else logic under the except.
django provides a pretty good overview of exists
Using your first example it will do the query two times, according to the documentation:
if some_queryset has not yet been evaluated, but you
know that it will be at some point, then using some_queryset.exists()
will do more overall work (one query for the existence check plus an
extra one to later retrieve the results) than simply using
bool(some_queryset), which retrieves the results and then checks if
any were returned.
So if you're going to be using the object, after checking for existance, the docs suggest just using it and forcing evaluation 1 time using
if my_object:
pass