In one of my views I have several steps and they use 5 or 7 minutes to finish totally, so I was wondering if there is a way to print the status of the view in the browser, like:
"Calculating models..."
"Post processing models..."
"Making DB..."
"Cleaning old tables..."
Is there a way to do that?
Thanks!
Such heavy duty should probably not be part of a django view. You might want to look into django celery for asynchronous task management.
However you can do something like that just fine by polling your server. The easy setup, use short polling (basically a javascript loop that triggers an ajax request to the server every i seconds, retrieving a status response* which you can use to show your user anything).
*You'll have to setup an url and function that calculates the status somehow or if you're using celery you can use it's asynchronous result
Related
I created a web application to process large amounts of data using python flask framework. I created a form and I get the input parameters for the data consuming program when the user submits the form. The program takes about 20-30 mins to complete and I'd like to show the status of the program to the user in a webpage.
How do I achieve this?
After the user submits the data, do I need to run the program on a separate thread and redirect to a success page on a different thread? How do I communicate between the program and the POST page? How do I refresh the page at continuous intervals?
Thanks in advance.
For tasks that are too long-lived to exist entirely within the request/response cycle, task queues are a good option. Check out celery.
You can string together tasks to update your app's state when your task is finished, then poll your app with setInterval() and clearInterval() browser APIs.
I happened to have done the same thing several months ago. The code is here. Basically you use Celery+rabbitmq to run your task in background. Hope it helps.
I have a simple Django project.
Each time a user hits the homepage,some operations are performed based on which,view is generated. Now the problem is that when a user hits the homepage ,sometimes the operations take a long time based on network connectivity. If in the meantime, a new user hits the homepage,he has to wait for the request from the previous user to get serviced before the page gets rendered.
I found Celery is used for task scheduling and queuing . But I wonder if Celery is what i need.I need each user to have his request be processed independently and not queued.
My project is a single app project and will receive a maximum of 100 users a time.
Thanks.
If the long process needs to be done in order to serve the request and generate the proper response then you cannot use Celery.
The debug web-server that is shipped with Django is a multi-threaded-single-process server, but is really very limited and should not be used in production.
If you use gunicorn or other wsgi servers you can run your application in multiple processes but you will hit the limit quickly if you're doing heavy processing.
The solution would be in my opinion is to either change the way you're processing stuff, either prepare ahead or serve the request and do the processing in the background, you can show the user a Please wait... message, here you can use Celery to do the processing.
The other solution would be to use event-based web-server like Twisted or cyclone or others
We're creating a web service (RESTful API) in Django. Our API will wrap both our own internal data as well as some other APIs that our web services layer will be accessing.
One of the APIs we're using has some long-running calls that don't return an HTTP response for on the order of a minute. The API has a separate API call to get status of the current operation, but that means that the user has to initiate the long-running operation, then have a separate process poll for status. We don't want our API to work that way, we want the initial request to just return a response that says that it's in progress.
So what we want to do is when we get a long-running request, we kick off an HTTP request of our own to the API asynchronously, then return a response. Then every time we get a status poll we just pass that through and respond with the response we got. When we get the callback that the operation is complete, then the next time we get a status poll we'll just respond that the operation is complete and return the data. This means that we'll need handlers for incoming status requests to check the list of in-progress long-running requests to respond with the status.
Does this seem like a reasonable way to approach this? Which python libraries we should be looking at to make this sort of thing easier? We're not sure whether to go with something low-level like eventlet or twisted, or something a little heavier-weight like celery. Celery seems to be the normal recommendation for this sort of thing, but I'm not 100% sure what its place would be.
Thanks,
Spencer
I faced the same situation a couple of months ago, probably you already solved your problem, but for other person facing the same situation I'll post what I did at that time.
Basically I used the http://www.celeryproject.org/ library, dispatching in a asynchronous way a long running operation returning a successful HTTP response the celery job id, the asynch operation would register the status and job id in a sqlite database (was enough for what I was doing), and a client was querying (using rest) the status of the job.
I am developing a django webserver on which another machine (with a known IP) can upload a spreadsheet to my webserver. After the spreadsheet has been updated, I want to trigger some processing/validation/analysis on the spreadsheet (which can take >5 minutes --- too long for the other server to reasonably wait for a response) and then send the other machine (with a known IP) a HttpResponse indicating that the data processing is finished.
I realize that you can't do processing.data() after returning an HttpResponse, but functionally I want code that looks something like this:
# processing.py
def spreadsheet(*args, **kwargs):
print "[robot voice] processing spreadsheet........."
views.finished_processing_spreadsheet()
# views.py
def upload_spreadsheet(request):
print "save the spreadsheet somewhere"
return HttpResponse("started processing spreadsheet")
processing.data()
def finished_processing_spreadsheet():
print "send good news to other server (with known IP)"
I know how to write each function individually, but how can I effectively call processing.data() after views.upload_spreadsheet has returned a response?
I tried using django's request_finished signaling framework but this does not trigger the processing.spreadsheet() method after returning the HttpResponse. I tried using a decorator on views.upload_spreadsheet with the same problem.
I have an inkling that this might have something to do with writing middleware or possibly a custom class-based view, neither of which I have any experience with so I thought I would pose the question to the universe in search of some help.
Thanks for your help!
In fact Django have a syncronous model. If you want to do real async processing, you need a message queue. The most used with django is celery, it may look a bit "overkill" but it's a good answer.
Why do we need this? because in a wsgi app, apache give the request to the executable, and, the executable returns text. It's only once when the executable finish his execution that apache aknowledge the end of the request.
The problem with your implementation is that if the number of spreadsheets in process is equal to the number of workers: your website will not respond anymore.
You should use a background task queue, basically have 2 processes: your server and a background task manager. The server should delegate the processing of the spreadsheet to the background task manager. When the background task is done, it should inform the server somehow. For example, it can do model_with_spreadsheet.processed = datetime.datetime.now().
You should use a background job manager like django-ztask (very easy setup), celery (very powerful, probably overkill in your case) or even uwsgi spooler (which obviously requires uwsgi deployment).
I have a Google App Engine application that performs about 30-50 calls to a remote API. Each call takes about a second, so the whole operation can easily take a minute. Currently, I do this in a loop inside the post() function of my site, so the response isn't printed until the whole operation completes. Needless to say, the app isn't very usable at the moment.
What I would like to do is to print the response immediately after the operation is started, and then update it as each individual API call completes. How would I achieve this? On a desktop application, I would just kick off a worker thread that would periodically update the front-end. Is there a similar mechanism in the Google App Engine?
I googled around for "progress bar" and "google app engine" but most results are from people that want to monitor the progress of uploading a file. My situation is different: the time-consuming task is being performed on the server, so there isn't much the client can do to monitor its progress. This guy is the closest thing I could find, but he works in Java.
Send the post logic to a task using http://code.google.com/appengine/docs/python/taskqueue
Change the logic of the process to set a status (it could be using memcache)
Using AJAX query memcache status each 10 seconds, more or less, it's up to you
You could return immediately from your post, and do one of two things:
Poll from your client every second or so to ask your service for its status
Use the Channel API to push status updates down to your client
Short version: Use a task queue that writes to a memcache key as the operation progresses. Your page can then either use the channel API or repeatedly poll the server for a progress report.
Long version: In your post you delegate the big job to a task. The task will periodically update a key that resides in memcache. If you don't have the time to learn the channel API, you can make the page returned by your post to periodically GET some URL in the app that returns a progress report based on the memcache data and you can then update your progress bar. When the job is complete your script can go to a results page.
If you have the time, learning the Channel API is worth the effort. In this case, the task would receive the channel token so it could communicate with the JavaScript channel client in your page without the polling thing.