I'm trying to update two IntegerField's in django 1.8.4 so I've decided to use atomic transactions, but I have some doubts:
1- Is that good idea to use atomic transactions in this case? What is the real benefit of using it? How much more efficient is it?
2- How can I check if these two pieces work same as each other or not?
A.
#transaction.atomic
class LinkManager(models.Manager):
def vote_up(self, pk, increment=True):
if increment:
<update field 1, incrementing by 1>
else:
<update field 1, decrementing by 1>
B.
class LinkManager(models.Manager):
def vote_up(self, pk, increment=True):
if increment:
with transaction.atomic():
<update field 1, incrementing by 1>
else:
with transaction.atomic():
<update field 1, decrementing by 1>
Is it a good idea to use atomic transactions in this case?
No, the atomic decorator makes sure that either all or no updates will be executed in the transactions. It's probably completely useless in this case.
What's the benefit of atomic?
Assuming you're updating a few models from a form, the atomic decorator will ensure that either all models get updated, or if there's an error. None at all.
Is it more efficient?
No, absolutely not. It's a data safety thing, it's actually less efficient and slower than a regular update as it needs to create a transaction for every block.
How can it work?
Update within the database, instead of fetching the result and writing it back just let the database increment it for you.
Something like this:
from django.db.models import F
SomeModel.objects.filter(pk=123).update(some_field=F('some_field') + 1)
Related
I try to understand Django documentation for queryset exists method
Additionally, if a some_queryset has not yet been evaluated, but you
know that it will be at some point, then using some_queryset.exists()
will do more overall work (one query for the existence check plus an
extra one to later retrieve the results) than simply using
bool(some_queryset), which retrieves the results and then checks if
any were returned.
What I'm doing:
if queryset.exists():
do_something()
for element in queryset:
do_something_else(element)
So I'm doing more overall work than just using bool(some_queryset)
Does this code makes only one query?
if bool(queryset):
do_something()
for element in queryset:
do_something_else(element)
If yes where python puts the results ? In queryset variable ?
Thank you
From the .exists() docs itself:
Additionally, if a some_queryset has not yet been evaluated, but you
know that it will be at some point, then using
some_queryset.exists() will do more overall work (one query for the
existence check plus an extra one to later retrieve the results) than
simply using bool(some_queryset), which retrieves the results and
then checks if any were returned.
The results of an already evaluated queryset are cached by Django. So, whenever the data is required from the queryset the cached results are used.
Related docs: Caching and QuerySets
It is quite easy to check number of queries with assertNumQueries:
https://docs.djangoproject.com/en/1.3/topics/testing/#django.test.TestCase.assertNumQueries
In your case:
with self.assertNumQueries(1):
if bool(queryset):
do_something()
for element in queryset:
do_something_else(element)
If I want to check for the existence and if possible retrieve an object, which of the following methods is faster? More idiomatic? And why? If not either of the two examples I list, how else would one go about doing this?
if Object.objects.get(**kwargs).exists():
my_object = Object.objects.get(**kwargs)
my_object = Object.objects.filter(**kwargs)
if my_object:
my_object = my_object[0]
If relevant, I care about mysql and postgres for this.
Why not do this in a try/except block to avoid the multiple queries / query then an if?
try:
obj = Object.objects.get(**kwargs)
except Object.DoesNotExist:
pass
Just add your else logic under the except.
django provides a pretty good overview of exists
Using your first example it will do the query two times, according to the documentation:
if some_queryset has not yet been evaluated, but you
know that it will be at some point, then using some_queryset.exists()
will do more overall work (one query for the existence check plus an
extra one to later retrieve the results) than simply using
bool(some_queryset), which retrieves the results and then checks if
any were returned.
So if you're going to be using the object, after checking for existance, the docs suggest just using it and forcing evaluation 1 time using
if my_object:
pass
I got this long queryset statement on a view
contributions = user_profile.contributions_chosen.all()\
.filter(payed=False).filter(belongs_to=concert)\
.filter(contribution_def__left__gt=0)\
.filter(contribution_def__type_of='ticket')
That i use in my template
context['contributions'] = contributions
And later on that view i make changes(add or remove a record) to the table contributions_chosen and if i want my context['contributions'] updated i need to requery the database with the same lenghty query.
contributions = user_profile.contributions_chosen.all()\
.filter(payed=False).filter(belongs_to=concert)\
.filter(contribution_def__left__gt=0)\
.filter(contribution_def__type_of='ticket')
And then again update my context
context['contributions'] = contributions
So i was wondering if theres any way i can avoid repeating my self, to reevaluate the contributions so it actually reflects the real data on the database.
Ideally i would modify the queryset contributions and its values would be updated, and at the same time the database would reflect this changes, but i don't know how to do this.
UPDATE:
This is what i do between the two
context['contributions'] = contributions
I add a new contribution object to the contributions_chosen(this is a m2m relation),
contribution = Contribution.objects.create(kwarg=something,kwarg2=somethingelse)
user_profile.contributions_chosen.add(contribution)
contribution.save()
user_profile.save()
And in some cases i delete a contribution object
contribution = user_profile.contributions_chosen.get(id=1)
user_profile.contributions_chosen.get(id=request.POST['con
contribution.delete()
As you can see i'm modifying the table contributions_chosen so i have to reissue the query and update the context.
What am i doing wrong?
UPDATE
After seeing your comments about evaluating, i realize i do eval the queryset i do
len(contributions) between context['contribution'] and that seems to be problem.
I'll just move it after the database operations and thats it, thanks guy.
update
Seems you have not evaluated the queryset contributions, thus there is no need to worry about updating it because it still has not fetched data from DB.
Can you post code between two context['contributions'] = contributions lines? Normally before you evaluate the queryset contributions (for example by iterating over it or calling its __len__()), it does not contain anything reading from DB, hence you don't have to update its content.
To re-evaluate a queryset, you could
# make a clone
contribution._clone()
# or any op that makes clone, for example
contribution.filter()
# or clear its cache
contribution._result_cache = None
# you could even directly add new item to contribution._result_cache,
# but its could cause unexpected behavior w/o carefulness
I don't know how you can avoid re-evaluating the query, but one way to save some repeated statements in your code would be to create a dict with all those filters and to specify the filter args as a dict:
query_args = dict(
payed=False,
belongs_to=concert,
contribution_def__left__gt=0,
contribution_def__type_of='ticket',
)
and then
contributions = user_profile.contributions_chosen.filter(**query_args)
This just removes some repeated code, but does not solve the repeated query. If you need to change the args, just handle query_args as a normal Python dict, it is one after all :)
I have a repeating pattern in my code where a model has a related model (one-to-many) which tracks its history/status. This related model can have many objects representing a point-in-time snapshot of the model's state.
For example:
class Profile(models.Model):
pass
class Subscription(models.Model):
profile = models.ForeignKey(Profile)
data_point = models.IntegerField()
created = models.DateTimeField(default=datetime.datetime)
#Example objects
p = Provile()
subscription1 = Subscription(profile=p, data_point=32, created=datetime.datetime(2011, 7 1)
subscription2 = Subscription(profile=p, data_point=2, created=datetime.datetime(2011, 8 1)
subscription3 = Subscription(profile=p, data_point=3, created=datetime.datetime(2011, 9 1)
subscription4 = Subscription(profile=p, data_point=302, created=datetime.datetime(2011, 10 1)
I often need to query these models to find all of the "Profile" objects that haven't had a subscription update in the last 3 days or similar. I've been using subselect queries to accomplish this:
q = Subscription.objects.filter(created__gt=datetime.datetime.now()-datetime.timedelta(days=3).values('id').query
Profile.objects.exclude(subscription__id__in=q).distinct()
The problem is that this is terribly slow when large tables are involved. Is there a more efficient pattern for a query such as this? Maybe some way to make Django use a JOIN instead of a SUBSELECT (seems like getting rid of all those inner nested loops would help)?
I'd lilke to use the ORM, but if needed I'd be willing to use the .extra() method or even raw SQL if the performance boost is compelling enough.
I'm running against Django 1.4alpha (SVN Trunk) and Postgres 9.1.
from django.db.models import Max
from datetime import datetime, timedelta
Profile.objects.annotate(last_update=Max('subscription__created')).filter(last_update__lt=datetime.now()-timedelta(days=3))
Aggregation (and annotation) is awesome-sauce, see: https://docs.djangoproject.com/en/dev/topics/db/aggregation/
Add a DB index to created:
created = models.DateTimeField(default=datetime.datetime, db_index=True)
As a rule of thumb, any column that is used in queries for lookup or sorting should be indexed, unless you are heavy on writing operations (in that case you should think about using a separate search index, maybe).
Queries using db columns without indexes are only so fast. If you want to analyze the query bottlenecks in more detail, turn on logging for longer running statements (e.g. 200ms and above), and do an explain analyze (postgres) on the long running queries.
EDIT:
I've only now seen in your comment that you have an index on the field. In that case, all the more reason to look at the output of explain analyze.
to make sure that the index is really used, and to its full extend.
to look whether postgres is unnecessarily writing to disk instead of using memory
See
- on query planning http://www.postgresql.org/docs/current/static/runtime-config-query.html
Maybe this helps as an intro: http://blog.it-agenten.com/2015/11/tuning-django-orm-part-2-many-to-many-queries/
Im a django newbie. Im making a crude hit counter as an assignment for a course in web programming at Uni. I made a class HitCount:
from django.db import models
# Create your models here.
class HitCount(models.Model):
count = models.IntegerField()
And then I use this code in the views file:
def index(request):
#try getting a hitcounter, if there is none, create one
try:
hc = HitCount.objects.get(pk=1)
except:
hc = HitCount(count=0)
hc.save()
pass
#get a queryset containing our counter
hc = HitCount.objects.filter(pk=1)
#increment its count and update it in db
hc.update(count=F('count')+1)
#ATM hc is a queryset, so hc.count will just return how many
#counters are in the queryset (1). So we have to get the
#actual counter object
hc = HitCount.objects.get(pk=1)
#and return its count
return render_to_response('hitcount/index.html', {'count': hc.count})
This is my index.html file:
<p>{{count}}</p>
This seems to work just fine, but I wonder:
Is this a reasonable way of doing this? Should the code for incrementation really reside in the views file? Or should I move it into a method in the class?
Is this concurrency safe or do I need to use some kind of lock? Part of the assignment is making the counter concurrency safe. I use SQLite, which uses transactions, so I figured it should be all right, but I may be missing something.
Off topic, but you should be catching HitCount.DoesNotExist in your try/except, since you really only want to execute the code in the except if the HitCount object doesn't exist yet.
If it's possible, you might want to look at something like Redis (or another key/val store) to do your hit counter.
Redis provides a method called INCR that will automatically increment a value by 1. It's super fast and a great solution for a hit counter like this. All you need to do is make a key that is related to the page and you can increment that by +1.
It might also make more sense to use a middleware class to track page hits. Much easier than adding it to every view. If you need to display this count on every page, you can use a context processor (more info) to add the page's hit count into the context. There will be less code repetition this way.
Edit
I initially missed that this was for a Uni project, so this might be heavily over engineering for what you need. However, if you were to ever build a hit counter for a production environment, this is what I'd recommend. You can still use the middleware/context processors to do the hit counts/retrieval in a DRY manner.
Locking is possible in python using the following:
lock = Lock()
lock.acquire()
try:
... access shared resource
finally:
lock.release() # release lock, no matter what
Keep in mind that method is not safe in a multi-server environment though.
You could also create a more extensible 'logging' solution that tracks each hit as a row in the db with associated info, and then be able to count/query even at a particular date range.
You could create a new database row for each hit and call HitCount.objects.count() to get the count.