When I get a GET request from a user, I send them the response and then spend maybe a second logging stuff about that request. Is there a way to close the connection when I have the response ready, but continue doing that logging part, so that the user wouldn't have to wait for it to complete?
From the Google App Engine docs for the Response object:
App Engine does not support sending
data to the user's browser before
exiting the handler. Some web servers
use this technique to "stream" data to
the user's browser over a period of
time in response to a single request.
App Engine does not support this
streaming technique.
So there's no easy way. If you have a bundle of data that you can pass to a longer-running "process and log" method, try using the deferred library. Note that this will requiring bundling your data up and sending it to the task queue to do your processing and logging, so
you may not save much time, and
the results may not look much like you'd want - for example, you'd be logging from a different request, so might need to radically alter the logging
Still, you could try.
You have two options:
Use the Task Queue API. Enqueueing a task should be fast, so long as you have less than 10k of data (which is the limit on a Task Queue payload).
Use the 'sneaky' trick described by Rafe in this video to do processing after the response completes.
Related
I'm looking for a good solution to implement a handshake between a python backend server and a react frontend connected through a websocket.
The frontend allows the user to upload a file and then send it to the backend for processing. Now it might be possible that the processing encounters some issues and likes the user's confirmation to proceed or cancel - and that's where I'm stuck.
My current implementation has different "endpoints" in the backend which call then different function implementations and a queue which is continuously processed and content (messages) is sent to the frontend. But these are always complete actions, they either succeed or fail and the returned message is accordingly. I have no system in place to interupt a running task (e.g. file processing), send a request to the frontend and then wait for response before I continue the function.
Is there a design pattern or common approach for this kind of problem?
How long it takes to process? Maybe a good solution is set up a message broker like RabbitMQ and create a queue for this process. In the front-end you have to create a panel to see the state of the process, which is running in an async task, and if it has found some issues, let the user know and ask what to do.
In one of the views in my django application, I need to perform a relatively lengthy network IO operation. The problem is other requests must wait for this request to be completed even though they have nothing to do with it.
I did some research and stumbled upon Celery but as I understand, it is used to perform background tasks independent of the request. (so I can not use the result of the task for the response to the request)
Is there a way to process views asynchronously in django so while the network request is pending other requests can be processed?
Edit: What I forgot to mention is that my application is a web service using django rest framework. So the result of a view is a json response not a page that I can later modify using AJAX.
The usual solution here is to offload the task to celery, and return a "please wait" response in your view. If you want, you can then use an Ajax call to periodically hit a view that will report whether the response is ready, and redirect when it is.
You want to maintain that HTTP connection for an extended period of time but still allow other requests to be managed, right? There's no simple solution to this problem. Also, any solution will be a level away from Django as it depends on how you process requests.
I don't know what you're currently using, so I can only tell you how I handled this in the past... I was using uwsgi to provide the WSGI interface between my python application and nginx. In uwsgi I used the asynchronous functions to suspend my long running connection when there was time to wait on the IO connections. The methods allow you to ask it to suspend things until there is something to read or write and then allow other connections to be serviced.
The above mentioned async calls use "green threads". It's much lighter weight then regular threads and you have control over when you move from thread to thread.
I am not saying that it is a good solution for your scenario[1], but the simple answer is using the following pattern:
async_result = some_task.delay(arg1)
result = async_result.get()
Check documentation for the get method. And instead of using the delay method you can use anything that returns an AsyncResult (like the apply_async method
[1] Why it may be a bad idea? Having an ongoing connection waiting a lot is bad for Django (it is not ready for long-lived connections), may conflict with the proxy configuration (if there is a reverse proxy somewhere) and may be identified as a timeout from the browser. So... it seems a Bad Idea[TM] to use this pattern for a Django Rest Framework view.
I have an app on GAE that takes csv input from a web form and stores it to a blob, does some stuff to obtain new information using input from the csv file, then uses csv.writer on self.response.out to write a new csv file and prompt the user to download it. It works well, but my problem is if it takes over 60 seconds it times out. I've tried to setup the do some stuff part as a task in task queue, and it would work, except I can't make the user wait while this is running, and there's no way of calling the post that would write out the new csv file automatically when the task queue is complete, and having the user periodically push a button to see if it is done is less than optimal.
Is there a better solution to a problem like this other than using the task queue and having the user have to manually push a button periodically to see if the task is complete?
You have many options:
Use a timer in your client to check periodically (i.e. every 15 seconds) if the file is ready. This is the simplest option that requires only a few lines of code.
Use the Channel API. It's elegant, but it's an overkill unless you face similar problems frequently.
Email the results to the user.
If your problem is 60s limit for requests, you could consider to use App Engine Modules that allow you to control scaling type of a module/version. Basically there are three scaling types available.
Manual Scaling
Such a module runs continuously. Requests can run indefinitely.
Basic Scaling
Such a module creates an instance when the application receives a request. The instance will be turned down when the app becomes idle. Requests can run indefinitely.
Automatic Scaling
The same scaling policy that App Engine has used since its inception. It is based on request rate, response latencies, and other application metrics. There is 60-second deadline for HTTP requests.
You can find more details here.
We're creating a web service (RESTful API) in Django. Our API will wrap both our own internal data as well as some other APIs that our web services layer will be accessing.
One of the APIs we're using has some long-running calls that don't return an HTTP response for on the order of a minute. The API has a separate API call to get status of the current operation, but that means that the user has to initiate the long-running operation, then have a separate process poll for status. We don't want our API to work that way, we want the initial request to just return a response that says that it's in progress.
So what we want to do is when we get a long-running request, we kick off an HTTP request of our own to the API asynchronously, then return a response. Then every time we get a status poll we just pass that through and respond with the response we got. When we get the callback that the operation is complete, then the next time we get a status poll we'll just respond that the operation is complete and return the data. This means that we'll need handlers for incoming status requests to check the list of in-progress long-running requests to respond with the status.
Does this seem like a reasonable way to approach this? Which python libraries we should be looking at to make this sort of thing easier? We're not sure whether to go with something low-level like eventlet or twisted, or something a little heavier-weight like celery. Celery seems to be the normal recommendation for this sort of thing, but I'm not 100% sure what its place would be.
Thanks,
Spencer
I faced the same situation a couple of months ago, probably you already solved your problem, but for other person facing the same situation I'll post what I did at that time.
Basically I used the http://www.celeryproject.org/ library, dispatching in a asynchronous way a long running operation returning a successful HTTP response the celery job id, the asynch operation would register the status and job id in a sqlite database (was enough for what I was doing), and a client was querying (using rest) the status of the job.
I have a Google App Engine application that performs about 30-50 calls to a remote API. Each call takes about a second, so the whole operation can easily take a minute. Currently, I do this in a loop inside the post() function of my site, so the response isn't printed until the whole operation completes. Needless to say, the app isn't very usable at the moment.
What I would like to do is to print the response immediately after the operation is started, and then update it as each individual API call completes. How would I achieve this? On a desktop application, I would just kick off a worker thread that would periodically update the front-end. Is there a similar mechanism in the Google App Engine?
I googled around for "progress bar" and "google app engine" but most results are from people that want to monitor the progress of uploading a file. My situation is different: the time-consuming task is being performed on the server, so there isn't much the client can do to monitor its progress. This guy is the closest thing I could find, but he works in Java.
Send the post logic to a task using http://code.google.com/appengine/docs/python/taskqueue
Change the logic of the process to set a status (it could be using memcache)
Using AJAX query memcache status each 10 seconds, more or less, it's up to you
You could return immediately from your post, and do one of two things:
Poll from your client every second or so to ask your service for its status
Use the Channel API to push status updates down to your client
Short version: Use a task queue that writes to a memcache key as the operation progresses. Your page can then either use the channel API or repeatedly poll the server for a progress report.
Long version: In your post you delegate the big job to a task. The task will periodically update a key that resides in memcache. If you don't have the time to learn the channel API, you can make the page returned by your post to periodically GET some URL in the app that returns a progress report based on the memcache data and you can then update your progress bar. When the job is complete your script can go to a results page.
If you have the time, learning the Channel API is worth the effort. In this case, the task would receive the channel token so it could communicate with the JavaScript channel client in your page without the polling thing.