I am implementing a really lightweight Web Project, which has just one page, showing data in a diagram. I use Django as a Webserver and d3.js as plotting routine for this diagram. As you can imagine, there are just a few simple time series which have to be responded by Django server, so I was wondering if I simply could hold this variable in ram. My first test was positive, I had something like this in my views.py:
X = np.array([123,23,1,32,123,1])
#csrf_exempt
def getGraph(request):
global X
return HttpResponse(json.dumps(X))
Notice, X is updated by another function every now and then, but all user access is read-only. Do I have to deal with
security issues by defining a global variable?
inconsistencies in general?
I found a thread discussing global variables in Django, but in that case, the difficulty is of handling multiple write-access.
To answer potential questions on why I don't want store data in database: All data I got in my X is already stored in a huge remote database and this web app just needs to display data.
Storing it in a variable does indeed have threading implications (and also scalibility - what if you have two Django servers running the same app?). The advice from the Django community is don't!.
This sounds like a good fit for the Django cache system though. Just cache your getGraph view with #cache_page and the job is done. No need to use memcache, the built-in in-memory memory-cache cache-backend* will work fine. Put a very high number as the time-out on the cache (years).
This way you are storing the HTTP response (JSON) not the value of X. But from your code sample, that is not a problem. If you need to re-calculate X you need to re-calculate the JSON, and if you need to re-calculate the JSON you will need to re-calculate X.
https://docs.djangoproject.com/en/dev/topics/cache/?from=olddocs/
1 or just 'built-in memory backend', I couldn't resist
Related
I have a Django application to log the character sequences from an autocomplete interface. Each time a call is made to the server, the parameters are added to a list and when the user submits the query, the list is written to a file.
Since I am not sure how to preserve the list between subsequent calls, I relied on a global variable say query_logger. Now I can preserve the list in the following way:
def log_query(query, completions, submitted=False):
global query_logger
if query_logger is None:
query_logger = list()
query_logger.append(query, completions, submitted)
if submitted:
query_logger = None
While this hack works for a single client sending requests I don't think this is a stable solution when requests come from multiple clients. My question is two-fold:
What is the order of execution of requests: Do they follow first come first serve (especially if the requests are asynchronous)?
What is a better approach for doing this?
If your django server is single-threaded, then yes, it will respond to requests as it receives them. If you're using wsgi or another proxy, that becomes more complicated. Regardless, I think you'll want to use a db to store the information.
I encountered a similar problem and ended up using sqlite to store the data temporarily, because that's super simple and easy to manage. You'll want to use IP addresses or create a unique ID passed as a url parameter in order to identify clients on subsequent requests.
I also scheduled a daily task (using cron on ubuntu) that goes through and removes any incomplete requests that haven't been completed (excluding those started in the last hour).
You must not use global variables for this.
The proper answer is to use the session - that is exactly what it is for.
Simplest (bad) solution would be to have a global variable. Which means you need some in memory location or a db to store this info
Summary: is there a race condition in Django sessions, and how do I prevent it?
I have an interesting problem with Django sessions which I think involves a race condition due to simultaneous requests by the same user.
It has occured in a script for uploading several files at the same time, being tested on localhost. I think this makes simultaneous requests from the same user quite likely (low response times due to localhost, long requests due to file uploads). It's still possible for normal requests outside localhost though, just less likely.
I am sending several (file post) requests that I think do this:
Django automatically retrieves the user's session*
Unrelated code that takes some time
Get request.session['files'] (a dictionary)
Append data about the current file to the dictionary
Store the dictionary in request.session['files'] again
Check that it has indeed been stored
More unrelated code that takes time
Django automatically stores the user's session
Here the check at 6. will indicate that the information has indeed been stored in the session. However, future requests indicate that sometimes it has, sometimes it has not.
What I think is happening is that two of these requests (A and B) happen simultaneously. Request A retrieves request.session['files'] first, then B does the same, changes it and stores it. When A finally finishes, it overwrites the session changes by B.
Two questions:
Is this indeed what is happening? Is the django development server multithreaded? On Google I'm finding pages about making it multithreaded, suggesting that by default it is not? Otherwise, what could be the problem?
If this race condition is the problem, what would be the best way to solve it? It's an inconvenience but not a security concern, so I'd already be happy if the chance can be decreased significantly.
Retrieving the session data right before the changes and saving it right after should decrease the chance significantly I think. However I have not found a way to do this for the request.session, only working around it using django.contrib.sessions.backends.db.SessionStore. However I figure that if I change it that way, Django will just overwrite it with request.session at the end of the request.
So I need a request.session.reload() and request.session.commit(), basically.
Yes, it is possible for a request to start before another has finished. You can check this by printing something at the start and end of a view and launch a bunch of request at the same time.
Indeed the session is loaded before the view and saved after the view. You can reload the session using request.session = engine.SessionStore(session_key) and save it using request.session.save().
Reloading the session however does discard any data added to the session before that (in the view or before it). Saving before reloading would destroy the point of loading late. A better way would be to save the files to the database as a new model.
The essence of the answer is in the discussion of Thomas' answer, which was incomplete so I've posted the complete answer.
Mark just nailed it, only minor addition from me, is how to load that session:
for key in session.keys(): # if you have potential removals
del session[key]
session.update(session.load())
session.modified = False # just making it clean
First line optional, you only need it if certain values might be removed meanwhile from the session.
Last line is optional, if you update the session, then it does not really matter.
That is true. You can confirm it by having a look at the django.contrib.sessions.middleware.SessionMiddleware.
Basically, request.session is loaded before request hits your view (in process_request), and it is updated in the session backend (if needed) after the response has left your view (in process_response).
If what I mean is unclear, you might want to have a look at the django documentation for Middleware.
The best way to solve the issue will depend on what you're trying to achieve with that information. I'll update my answer if you provide that information!
On a website I'm making, there's a section that hits the database pretty hard. Harder than I want. The data that's being retrieved is all very static. It will rarely change. So I want to cache it.
I came across http://wiki.pylonshq.com/display/pylonsdocs/Caching+in+Templates+and+Controllers and had a good read have been making use of template caching using:
return render('tmpl.html', cache_expire='never')
That works great until I modify the HTML. The only way I've found to delete the cache is to remove the cache_expire parameter from render() and delete the cache folder. But, meh, it works.
What I want to be able to, however, is cache Lists, Tuples and Dictionaries. From reading the above wiki page, it seems this isn't possible?
I want to be able to do something like:
data = [i for i in range(0, 2000000)]
mycache = cache.get_cache('cachename')
value = mycache.get(key='dataset1', list=data, type='memory', expiretime='3600')
print value
Allowing me to do some CPU intensive work (list generation, in this example) and then cache it.
Can this be done with Pylons?
As alternative of traditional cache you can use app globals variables. Once on server startup load data to variable and then use data in you actions or direct in templates.
http://pylonsbook.com/en/1.1/exploring-pylons.html#app-globals-object
Also you can code some action to update this global variable through the admin interface or by other events.
Why not use memcached?
Look at this question on SO on how to use it with pylons: Pylons and Memcached
I'm creating a Django-powered site for my newspaper-ish site. The least obvious and common-sense task that I have come across in getting the site together is how best to generate a "top articles" list for the sidebar of the page.
The first thing that came to mind was some sort of database column that is updated (based on what?) with every view. That seems (to my instincts) ridiculously database intensive and impractical and thus I think I'd like to find another solution.
Thanks all.
I would give celery a try (with django-celery). While it's not so easy to configure and use as cache, it enables you to queue tasks like incrementing counters and do them in background. It could be even combined with cache technique - in views increment counters in cache and define PeriodicTask that will run every now and then, resetting counters and writing them to the database.
I just remembered - I once found this blog entry which provides nice way of incrementing 'viewed_count' (or similar) column in database with AJAX JS call. If you don't have heavy traffic maybe it's good idea?
Also mentioned in this post is django-tracking, but I don't know much about it, I never used it myself (yet).
Premature optimization, first try the db way and then see if it really is too database sensitive. Any decent database has so good caches it probably won't matter very much. And even if it is a problem, take a look at the other db/cache suggestions here.
It is most likely by the way is that you will have many more intensive db queries with each view than a simple view update.
If you do something like sort by top views, it would be fast if you index the view column in the DB. Another option is to only collect the top x articles every hour or so, and toss that value into Django's cache framework.
The nice thing about caching the list is that the algorithm you use to determine top articles can be as complex as you like without hitting the DB hard with every page view. Django's cache framework can use memory, db, or file system. I prefer DB, but many others prefer memory. I believe it uses pickle, so you can also store Python objects directly. It's easy to use, recommended.
An index wouldn't help as them main problem I believe is not so much getting the sorted list as having a DB write with every page view of an article. Another index actually makes that problem worse, albeit only a little.
So I'd go with the cache. I think django's cache shim is a problem here because it requires timeouts on all keys. I'm not sure if that's imposed by memcached, if not then go with redis. Actually just go with redis anyway, the python library is great, I've used it from django projects before, and it has atomic increments and powerful sorting - everything you need.
I'm trying to keep my RESTful site DRY, and I can't come up with a good way to factor out the code to dynamically select from each "user's" separate database. We've got a separate database for each client. This comes in as a part of the URL, and is passed into each view as a keyword arg. I want to give each and every view the behavior of accessing the corresponding database WITHOUT have to make sure each programmer writing a view remembers to use
Thing.objects.using(user).all()
and
t = Thing()
t.save(using=user)
every time. It seems like there ought to be some way to intercept the request and set the default database based on the args to the view before it hits the view, allowing us to use the usual
Thing.objects.all()
This would also have the advantage of factoring out all the user resolution code into a more appropriate place.
We do this by the following technique.
Apache picks off the first part of the path and routes this to a specific mod_wsgi Daemon.
Each mod_wsgi daemon is a different customer's installation.
We have many parallel customers, each with (nearly) identical code, all based off a single common installation of the base software.
Each customer has a separate settings.py with their unique configuration.
They don't (actually can't) know about each other because Apache has peeled off the top layer of the path for us.