I created a web application to process large amounts of data using python flask framework. I created a form and I get the input parameters for the data consuming program when the user submits the form. The program takes about 20-30 mins to complete and I'd like to show the status of the program to the user in a webpage.
How do I achieve this?
After the user submits the data, do I need to run the program on a separate thread and redirect to a success page on a different thread? How do I communicate between the program and the POST page? How do I refresh the page at continuous intervals?
Thanks in advance.
For tasks that are too long-lived to exist entirely within the request/response cycle, task queues are a good option. Check out celery.
You can string together tasks to update your app's state when your task is finished, then poll your app with setInterval() and clearInterval() browser APIs.
I happened to have done the same thing several months ago. The code is here. Basically you use Celery+rabbitmq to run your task in background. Hope it helps.
Related
This a question about architecture. Say I have a long running process on a server such as machine learning in a middle of a training. Now as this run on external machine I would like to have a tool to quickly see from time to time the results. So I thought the best way would be to have a website which quickly connects to the process for example using RPC to display the results as this allows me to always check in. Now the question is how should Django view gather the information from the server process:
1) Using RPC calls such as rpyc directly in the views?
2) Using some kind of messaging queue such as celery ?
3) Or in a completely different way I am not seeing ?
There's at least 2 possible ways to do this.
Implement your data-refreshing function as a view and visit it by ajax(sync)+javascript timer.Since you visit your page that contains these js, it will fetch your data silently and update the page. However,this solution does not work well when you need to record all the data in a given frequency;the ajax/view only executes when the web page is open.
Use messaging queue like selcuk suggests.Alongside celery, APscheduler is also a good choice because it's easier to install and use.You can implement a task(as modal) queue with status(queue/done/stoped/whatever as field) and check them at the frequency you wanted,save the date you retrieved and do all the other stuff.
I need to execute a command on a simple button press event in my Django project (for which I'm using "subprocess.Popen()" in my views.py ).
After I execute this script it may take anywhere from 2 minutes to 5 minutes to complete. So while the script executes I need to disable the html button but I want the users to continue using other web pages while the script finishes in the background. Now the real problem is that I want to enable the html button back, when the process finishes!
I'm stuck at this from many days. Any help or suggestion is really really appreciated.
I think you have to use some "realtime" libraries for django. I personally know django-realtime (simple one) and swampdragon (less simple, but more functional). With both of this libraries you can create web-socket connection and send messages to clients from server that way. It may be command for enabling html button or javascript alert or whatever you want.
In your case I advice you first option, because you can send message to client directly from any view. And swampdragon needs model to track changes as far I know.
Like valentjedi suggested, you should be using swampdragon for real time with django.
You should take the first tutorial here: http://swampdragon.net/tutorial/part-1-here-be-dragons-and-thats-a-good-thing/
Then read this as it holds knowledge required to accomplish what you want:
http://swampdragon.net/tutorial/building-a-real-time-server-monitor-app-with-swampdragon-and-django/
However there is a difference between your situation and the example given above, in your situation:
Use Celery or any other task queue, since the action you wait for takes long time to finish, you will need to pass it to the background. (You can also make these tasks occur one after another if you don't want to freeze your system with enormous memory usage).
Move the part of code that runs the script to your celery task, in this case, Popen should be called in your Celery task and not in your view (router in swampdragon).
You then create a channel with the user's unique identifier, and add relevant swampdragon javascript code in your html file for the button to subscribe to that user's channel (also consider disabling the feature on your view (router) since front-end code can be tempered with.
The channel's role will be to pull the celery task state, you
then disable or enable the button according to the state of
the task.
overview:
Create celery task for your script.
Create a user unique channel that pulls the task state.
Disable or enable the button on the front-end according to the state of the taks, consider displaying failure message in case the script fails so that the user restart again.
Hope this helps!
I have an app on GAE that takes csv input from a web form and stores it to a blob, does some stuff to obtain new information using input from the csv file, then uses csv.writer on self.response.out to write a new csv file and prompt the user to download it. It works well, but my problem is if it takes over 60 seconds it times out. I've tried to setup the do some stuff part as a task in task queue, and it would work, except I can't make the user wait while this is running, and there's no way of calling the post that would write out the new csv file automatically when the task queue is complete, and having the user periodically push a button to see if it is done is less than optimal.
Is there a better solution to a problem like this other than using the task queue and having the user have to manually push a button periodically to see if the task is complete?
You have many options:
Use a timer in your client to check periodically (i.e. every 15 seconds) if the file is ready. This is the simplest option that requires only a few lines of code.
Use the Channel API. It's elegant, but it's an overkill unless you face similar problems frequently.
Email the results to the user.
If your problem is 60s limit for requests, you could consider to use App Engine Modules that allow you to control scaling type of a module/version. Basically there are three scaling types available.
Manual Scaling
Such a module runs continuously. Requests can run indefinitely.
Basic Scaling
Such a module creates an instance when the application receives a request. The instance will be turned down when the app becomes idle. Requests can run indefinitely.
Automatic Scaling
The same scaling policy that App Engine has used since its inception. It is based on request rate, response latencies, and other application metrics. There is 60-second deadline for HTTP requests.
You can find more details here.
In one of my views I have several steps and they use 5 or 7 minutes to finish totally, so I was wondering if there is a way to print the status of the view in the browser, like:
"Calculating models..."
"Post processing models..."
"Making DB..."
"Cleaning old tables..."
Is there a way to do that?
Thanks!
Such heavy duty should probably not be part of a django view. You might want to look into django celery for asynchronous task management.
However you can do something like that just fine by polling your server. The easy setup, use short polling (basically a javascript loop that triggers an ajax request to the server every i seconds, retrieving a status response* which you can use to show your user anything).
*You'll have to setup an url and function that calculates the status somehow or if you're using celery you can use it's asynchronous result
I have a Google App Engine application that performs about 30-50 calls to a remote API. Each call takes about a second, so the whole operation can easily take a minute. Currently, I do this in a loop inside the post() function of my site, so the response isn't printed until the whole operation completes. Needless to say, the app isn't very usable at the moment.
What I would like to do is to print the response immediately after the operation is started, and then update it as each individual API call completes. How would I achieve this? On a desktop application, I would just kick off a worker thread that would periodically update the front-end. Is there a similar mechanism in the Google App Engine?
I googled around for "progress bar" and "google app engine" but most results are from people that want to monitor the progress of uploading a file. My situation is different: the time-consuming task is being performed on the server, so there isn't much the client can do to monitor its progress. This guy is the closest thing I could find, but he works in Java.
Send the post logic to a task using http://code.google.com/appengine/docs/python/taskqueue
Change the logic of the process to set a status (it could be using memcache)
Using AJAX query memcache status each 10 seconds, more or less, it's up to you
You could return immediately from your post, and do one of two things:
Poll from your client every second or so to ask your service for its status
Use the Channel API to push status updates down to your client
Short version: Use a task queue that writes to a memcache key as the operation progresses. Your page can then either use the channel API or repeatedly poll the server for a progress report.
Long version: In your post you delegate the big job to a task. The task will periodically update a key that resides in memcache. If you don't have the time to learn the channel API, you can make the page returned by your post to periodically GET some URL in the app that returns a progress report based on the memcache data and you can then update your progress bar. When the job is complete your script can go to a results page.
If you have the time, learning the Channel API is worth the effort. In this case, the task would receive the channel token so it could communicate with the JavaScript channel client in your page without the polling thing.