Django page hit counter concurrency - python

Im a django newbie. Im making a crude hit counter as an assignment for a course in web programming at Uni. I made a class HitCount:
from django.db import models
# Create your models here.
class HitCount(models.Model):
count = models.IntegerField()
And then I use this code in the views file:
def index(request):
#try getting a hitcounter, if there is none, create one
try:
hc = HitCount.objects.get(pk=1)
except:
hc = HitCount(count=0)
hc.save()
pass
#get a queryset containing our counter
hc = HitCount.objects.filter(pk=1)
#increment its count and update it in db
hc.update(count=F('count')+1)
#ATM hc is a queryset, so hc.count will just return how many
#counters are in the queryset (1). So we have to get the
#actual counter object
hc = HitCount.objects.get(pk=1)
#and return its count
return render_to_response('hitcount/index.html', {'count': hc.count})
This is my index.html file:
<p>{{count}}</p>
This seems to work just fine, but I wonder:
Is this a reasonable way of doing this? Should the code for incrementation really reside in the views file? Or should I move it into a method in the class?
Is this concurrency safe or do I need to use some kind of lock? Part of the assignment is making the counter concurrency safe. I use SQLite, which uses transactions, so I figured it should be all right, but I may be missing something.

Off topic, but you should be catching HitCount.DoesNotExist in your try/except, since you really only want to execute the code in the except if the HitCount object doesn't exist yet.
If it's possible, you might want to look at something like Redis (or another key/val store) to do your hit counter.
Redis provides a method called INCR that will automatically increment a value by 1. It's super fast and a great solution for a hit counter like this. All you need to do is make a key that is related to the page and you can increment that by +1.
It might also make more sense to use a middleware class to track page hits. Much easier than adding it to every view. If you need to display this count on every page, you can use a context processor (more info) to add the page's hit count into the context. There will be less code repetition this way.
Edit
I initially missed that this was for a Uni project, so this might be heavily over engineering for what you need. However, if you were to ever build a hit counter for a production environment, this is what I'd recommend. You can still use the middleware/context processors to do the hit counts/retrieval in a DRY manner.

Locking is possible in python using the following:
lock = Lock()
lock.acquire()
try:
... access shared resource
finally:
lock.release() # release lock, no matter what
Keep in mind that method is not safe in a multi-server environment though.
You could also create a more extensible 'logging' solution that tracks each hit as a row in the db with associated info, and then be able to count/query even at a particular date range.

You could create a new database row for each hit and call HitCount.objects.count() to get the count.

Related

Understanding atomic transactions in Django

I'm trying to update two IntegerField's in django 1.8.4 so I've decided to use atomic transactions, but I have some doubts:
1- Is that good idea to use atomic transactions in this case? What is the real benefit of using it? How much more efficient is it?
2- How can I check if these two pieces work same as each other or not?
A.
#transaction.atomic
class LinkManager(models.Manager):
def vote_up(self, pk, increment=True):
if increment:
<update field 1, incrementing by 1>
else:
<update field 1, decrementing by 1>
B.
class LinkManager(models.Manager):
def vote_up(self, pk, increment=True):
if increment:
with transaction.atomic():
<update field 1, incrementing by 1>
else:
with transaction.atomic():
<update field 1, decrementing by 1>
Is it a good idea to use atomic transactions in this case?
No, the atomic decorator makes sure that either all or no updates will be executed in the transactions. It's probably completely useless in this case.
What's the benefit of atomic?
Assuming you're updating a few models from a form, the atomic decorator will ensure that either all models get updated, or if there's an error. None at all.
Is it more efficient?
No, absolutely not. It's a data safety thing, it's actually less efficient and slower than a regular update as it needs to create a transaction for every block.
How can it work?
Update within the database, instead of fetching the result and writing it back just let the database increment it for you.
Something like this:
from django.db.models import F
SomeModel.objects.filter(pk=123).update(some_field=F('some_field') + 1)

GAE Python NDB .put not synchronous on development (but works in production)?

The following below should create a Counter model and use (deferred) tasks to increment the counter to 10. Visiting '/' ought to create a single Counter object with count = 10. This happens in production. In development (localhost) multiple Counter objects are created with the largest being 10:
I suspect this is because the put is not synchronous on development (but appears to always be on production). Is there a way to make them synchronous?
Code snippet below:
class Counter(ndb.Model):
count = ndb.IntegerProperty(indexed=False)
def reset():
ndb.delete_multi(Counter().query().fetch(keys_only=True, use_cache=False, use_memcache=False))
def increment():
counter = Counter().query().get(use_cache=False, use_memcache=False)
if not counter:
counter = Counter(count=0)
counter.count += 1
counter.put()
if counter.count < 10:
deferred.defer(increment)
#app.route('/')
def hello():
"""Return a friendly HTTP greeting."""
reset()
deferred.defer(increment)
return 'Hello World!'
I have a git repo that reproduces this behavior here. You can find the commit that makes the last change here.
The production 'synchronicity' is just apparent, it's not guaranteed (in your approach). It can always happen that a newly created counter is not found in the query, thus your code could create multiple counters.
More details in this Balancing Strong and Eventual Consistency with Google Cloud Datastore article.
You should retrieve your counter by key and then you will avoid eventual consistency. Especially as you seem to only create a single Counter object. Not this won't scale if you have a large number of concurrent writes.
It would also pay to read the article linked to in the other answer. There a re number of problems with your approach.
Its seems odd to me that you would even consider using queries for this functionality. By specifying the key you will also guarantee a single counter entity.

How to ensure gae memcache results are same in post method as in get method

For the get method, a list is generated from memcache.
For the post method, the user chooses a post from the list from memcache and does something to it. (I retrieve the memcache list again.) How do I make sure that the memcache list retrieved for post is the same as the one retrieved for get?
The situation I'm worried about is another user submitting a new post and changing the memcache right before the post method is run.
code:
def get(self):
sells = memcache.get("SELLS")
if sells is None:
*do some stuff*
else:
logging.error("OFFERS IN MC")
sells.sort(key = lambda x:x.price)
count = 1
self.render("buy.html", sells = sells, count = count)
def post(self):
first_name = self.request.get('first_name')
num = int(self.request.get('num'))
if first_name and num:
sells = memcache.get("SELLS")
*do some stuff*
self.redirect('/contact?first_name=' + first_name + "&amount=" + amount + "&price=" + price)
else:
cart_error = "fill in all the boxes"
self.render("buy.html", cart_error = cart_error, sells = list(sells))
You could try using the gets and cas methods that are part of the client object.
See: GAE Memecache Reference
I'm not entirely clear on what exactly is necessary in your case though. Is the flow you're worried about supposed to be:
1) User in get request gets to see a list of options
2) He then selects one of those options in the post
If this is the case, just using gets and cas won't do it since you need to track the object across requests, so you could try saving another variable in memcache that tracks when the list was last updated and also send that variable into the browser, so when the user posts you check the value from the browser with the value currently in memcache. If they're different, you know an update occurred.
A different, but also problematic case might be:
1) User in get request looks at the list
2) User selects one of those options in the post
3) Simultaneously another user selects the same option
And with that you'll have yourself a nice little race position. In this case, the gets, cas model works really well, since it will only let you update the object if somebody else didn't touch it.
My guess is you actually want to handle both of these potential cases. Race conditions are a bitch, what can I say.

Why would Django get request with long url lock python?

I have a strange error using the built in webserver in Django (haven't tested against Apache as I'm in active development). I have a url pattern that works for short url parameters (e.g. Chalk%20Hill), but locks up python on this one
http://localhost:8000/chargeback/checkDuplicateProject/Bexar%20Street%20Phase%20IV%20Brigham%20Ln%20to%20Myrtle%20St
The get request just says pending, and never returns, and I have to force quit python to get the server to function again. What am I doing wrong?
EDIT:
In continuing testing, it's strange, if I just enter the url, it returns the correct json response. Then it locks python. While I'm in the website, though, it never returns, and locks python.
urls:
url(r'^chargeback/checkDuplicateProject/(?P<aProjectName>(\w+)((\s)?(-)?(\w+)?)*)/$', 'chargeback.views.isProjectDuplicate'),
views:
def isProjectDuplicate(request, aProjectName):
#count the number of matching project names
p = Project.objects.filter(projectName__exact = aProjectName).count()
#if > 0, the project is a duplicate
if p > 0:
return HttpResponse('{"results":["Duplicate"]}', mimetype='application/json')
else:
return HttpResponse('{"results":["Not Duplicate"]}', mimetype='application/json')
Model:
class Project(models.Model):
projectName = models.TextField('project name')
department = models.ForeignKey('Department')
def __unicode__(self):
return self.projectName
The accepted answer is spot on about the regex, but since we're discussing optimization, I thought I should note that the code for checking whether a project exists could be modified to generate a much quicker query, especially in other contexts where you could be counting millions of rows needlessly. Call this 'best practices' advice, if you will.
p = Project.objects.filter(projectName__exact = aProjectName).count()
if p > 0:
could instead be
if Project.objects.filter(project_name__iexact=aProjectName).exists():
for two reasons.
First, you're not using p for anything so there's no need to store it as a variable as it increases readability and p is an obscure variable name and the best code is no code at all.
Secondly, this way we only ask for a single row instead of saving the results to the queryset cache. Please see the official Queryset API docs, a related question on Stack Overflow and the discussion about the latter on the django-developers group.
Additionally, it is customary in python (and Django, naturally) to name your fields lower_cased_separated_by_underscores. Please see more about this on the Python Style Guide (PEP 8).
Since you are going to check whether aProjectName already exists in the database, there's no need for you to make the regex so complicated.
I suggest you simplify the regex to
url(r'^chargeback/checkDuplicateProject/(?P<aProjectName>[\w+\s-]*)/$', 'chargeback.views.isProjectDuplicate'),
For a further explanation, see the question url regex keeps django busy/crashing on the django-users group.

exception handling with NameError

I want to append new input to list SESSION_U without erasing its content. I try this:
...
try:
SESSION_U.append(UNIQUES)
except NameError:
SESSION_U = []
SESSION_U.append(UNIQUES)
...
I would think that at first try I would get the NameError and SESSION_U list would be created and appended; the second time try would work. But it does not. Do you know why? If this is not clear let me know and I will post the script. Thanks.
Edit
# save string s submitted from form to list K:
K = []
s = self.request.get('sentence')
K.append(s)
# clean up K and create 2 new lists with unique items only and find their frequency
K = K[0].split('\r\n')
UNIQUES = f2(K)
COUNTS = lcount(K, UNIQUES)
# append UNIQUES and COUNTS TO session lists.
# Session lists should not be initialized with each new submission
SESSION_U.append(UNIQUES)
SESSION_C.append(COUNTS)
If I put SESSION_U and SESSION_C after K = [] their content is erased with each submission; if not; I get NameError. I am looking for help about the standard way to handle this situation. Thank you. (I am working Google App Engine)
It appears that the code you posted is probably contained within a request handler. What are your requirements regarding this SESSION_U list? Clearly you want it to be preserved across requests, but there are several ways to do this and the best choice depends on your requirements.
I suspect you want to store SESSION_U in the datastore. You will need to use a transaction to atomically update the list (since multiple requests may try to simultaneously update it). Storing SESSION_U in the datastore makes it durable (i.e., it will persist across requests).
Alternatively, you could use memcache if you aren't worried about losing the list periodically. You could even store the list in a global variable (due to app caching, it will be maintained between requests to a particular instance and will be lost when the instance terminates).

Categories

Resources