Progress bar in Google App Engine - python

I have a Google App Engine application that performs about 30-50 calls to a remote API. Each call takes about a second, so the whole operation can easily take a minute. Currently, I do this in a loop inside the post() function of my site, so the response isn't printed until the whole operation completes. Needless to say, the app isn't very usable at the moment.
What I would like to do is to print the response immediately after the operation is started, and then update it as each individual API call completes. How would I achieve this? On a desktop application, I would just kick off a worker thread that would periodically update the front-end. Is there a similar mechanism in the Google App Engine?
I googled around for "progress bar" and "google app engine" but most results are from people that want to monitor the progress of uploading a file. My situation is different: the time-consuming task is being performed on the server, so there isn't much the client can do to monitor its progress. This guy is the closest thing I could find, but he works in Java.

Send the post logic to a task using http://code.google.com/appengine/docs/python/taskqueue
Change the logic of the process to set a status (it could be using memcache)
Using AJAX query memcache status each 10 seconds, more or less, it's up to you

You could return immediately from your post, and do one of two things:
Poll from your client every second or so to ask your service for its status
Use the Channel API to push status updates down to your client

Short version: Use a task queue that writes to a memcache key as the operation progresses. Your page can then either use the channel API or repeatedly poll the server for a progress report.
Long version: In your post you delegate the big job to a task. The task will periodically update a key that resides in memcache. If you don't have the time to learn the channel API, you can make the page returned by your post to periodically GET some URL in the app that returns a progress report based on the memcache data and you can then update your progress bar. When the job is complete your script can go to a results page.
If you have the time, learning the Channel API is worth the effort. In this case, the task would receive the channel token so it could communicate with the JavaScript channel client in your page without the polling thing.

Related

Update single database value on a website with many users

For this question, I'm particularly struggling with how to structure this:
User accesses website
User clicks button
Value x in database increments
My issue is that multiple people could potentially be on the website at the same time and click the button - I want to make sure each user is able to click the button, and update the value and read the incremented value too, but I don't know how to circumvent any synchronisation/concurrency issues.
I'm using flask to run my website backend, and I'm thinking of using MongoDB or Redis to store my single value that needs to be updated.
Please comment if there is any lack of clarity in my question, but this is a problem I've really been struggling with how to solve.
Thanks :)
redis, I think you can use redis hincrby command, or create a distributed lock to make sure there is only one writer at the same time and only the lock holding writer can make the update in your flask framework. Make sure you release the lock after certain period of time or after the writer done using the lock.
mysql, you can start a transaction, and make the update and commit the change to make sure the data is right
To solve this problem I would suggest you follow a micro service architecture.
A service called worker would handle the flask route that's called when the user clicks on the link/button on the website. It would generate a message to be sent to another service called queue manager that maintains a queue of increment/decrement messages from the worker service.
There can be multiple worker service instances running concurrently but the queue manager is a singleton service that takes the messages from each service and adds them to the queue. If the queue manager is busy the worker service will either timeout and retry or return a failure message to the user. If the queue is full a response is sent back to the worker to retry n number of times, and you can count down that n.
A third service called storage manager is run every time the queue is not empty, this service sends the messages to the storage solution (whatever mongo, redis, good ol' sql) and it will ensure the increment/decrement messages are handled in the order they were received in the queue. You could also include a time stamp from the worker service in the message if you wanted to use that to sort the queue.
Generally whatever hosting environment for flask will use gunicorn as the production web server and support multiple concurrent worker instances to handle the http requests, and this would naturally be your worker service.
How you build and coordinate the queue manager and storage manager is down to implementation preference, for instance you could use something like Google Cloud pub/sub system to send messages between different deployed services but that's just off the top of my head. There's a load of different ways to do it, and you're in the best position to decide that.
Without knowing more details about what you're trying to achieve and what's the requirements for concurrent traffic I can't go into greater detail, but that's roughly how I've approached this type of problem in the past. If you need to handle more concurrent users at the website, you can pick a hosting solution with more concurrent workers. If you need the queue to be longer, you can pick a host with more memory, or else write the queue to an intermediate storage. This will slow it down but will make recovering from a crash easier.
You also need to consider handling when messages fail between different services, how to recover from a service crashing or the queue filling up.
EDIT: Been thinking about this over the weekend and a much simpler solution is to just create a new record in a table directly from the flask route that handles user clicks. Then to get your total you just get a count from this table. Your bottlenecks are going to be how many concurrent workers your flask hosting environment supports and how many concurrent connections your storage supports. Both of these can be solved by throwing more resources at them.

GAE Request Timeout when user uploads csv file and receives new csv file as response

I have an app on GAE that takes csv input from a web form and stores it to a blob, does some stuff to obtain new information using input from the csv file, then uses csv.writer on self.response.out to write a new csv file and prompt the user to download it. It works well, but my problem is if it takes over 60 seconds it times out. I've tried to setup the do some stuff part as a task in task queue, and it would work, except I can't make the user wait while this is running, and there's no way of calling the post that would write out the new csv file automatically when the task queue is complete, and having the user periodically push a button to see if it is done is less than optimal.
Is there a better solution to a problem like this other than using the task queue and having the user have to manually push a button periodically to see if the task is complete?
You have many options:
Use a timer in your client to check periodically (i.e. every 15 seconds) if the file is ready. This is the simplest option that requires only a few lines of code.
Use the Channel API. It's elegant, but it's an overkill unless you face similar problems frequently.
Email the results to the user.
If your problem is 60s limit for requests, you could consider to use App Engine Modules that allow you to control scaling type of a module/version. Basically there are three scaling types available.
Manual Scaling
Such a module runs continuously. Requests can run indefinitely.
Basic Scaling
Such a module creates an instance when the application receives a request. The instance will be turned down when the app becomes idle. Requests can run indefinitely.
Automatic Scaling
The same scaling policy that App Engine has used since its inception. It is based on request rate, response latencies, and other application metrics. There is 60-second deadline for HTTP requests.
You can find more details here.

How to provide an asynchronous RESTful API wrapping a synchronous API

We're creating a web service (RESTful API) in Django. Our API will wrap both our own internal data as well as some other APIs that our web services layer will be accessing.
One of the APIs we're using has some long-running calls that don't return an HTTP response for on the order of a minute. The API has a separate API call to get status of the current operation, but that means that the user has to initiate the long-running operation, then have a separate process poll for status. We don't want our API to work that way, we want the initial request to just return a response that says that it's in progress.
So what we want to do is when we get a long-running request, we kick off an HTTP request of our own to the API asynchronously, then return a response. Then every time we get a status poll we just pass that through and respond with the response we got. When we get the callback that the operation is complete, then the next time we get a status poll we'll just respond that the operation is complete and return the data. This means that we'll need handlers for incoming status requests to check the list of in-progress long-running requests to respond with the status.
Does this seem like a reasonable way to approach this? Which python libraries we should be looking at to make this sort of thing easier? We're not sure whether to go with something low-level like eventlet or twisted, or something a little heavier-weight like celery. Celery seems to be the normal recommendation for this sort of thing, but I'm not 100% sure what its place would be.
Thanks,
Spencer
I faced the same situation a couple of months ago, probably you already solved your problem, but for other person facing the same situation I'll post what I did at that time.
Basically I used the http://www.celeryproject.org/ library, dispatching in a asynchronous way a long running operation returning a successful HTTP response the celery job id, the asynch operation would register the status and job id in a sqlite database (was enough for what I was doing), and a client was querying (using rest) the status of the job.

how to bring a background task to foreground in google app engine?

Currently I have tasks running in background. After the tasks are done executing I need to show output. How do I do this in Google App Engine?
Once the tasks are done the only thing I can do is create another task which is supposed to show output or is there any other way?
You can't "bring a task to the foreground" -- it is a webserver. The server responds to requests from the client.
But, you have a couple choices to accomplish something similar:
Use the Channel API to send the client notice that the work is finished, or a even the results of the processing.
Write status info to memcache or the datastore and poll from the client to determine when the work is finished.
This won't work directly as you describe it.
Once a background task is started, it's a background task for its entire existence. If you want to return some information from the background task to the user, you'll have to add it to the datastore, and have a foreground handler check the datastore for that information.
You may also be able to use the Channel API to have a background task send messages directly to the browser, but I'm not sure if this will work or not (I haven't tried it).
If you give a little more information about exactly what you're trying to accomplish I can try to give more details about how to get it done.

Indicating that GET response is complete w/ Python AppEngine

When I get a GET request from a user, I send them the response and then spend maybe a second logging stuff about that request. Is there a way to close the connection when I have the response ready, but continue doing that logging part, so that the user wouldn't have to wait for it to complete?
From the Google App Engine docs for the Response object:
App Engine does not support sending
data to the user's browser before
exiting the handler. Some web servers
use this technique to "stream" data to
the user's browser over a period of
time in response to a single request.
App Engine does not support this
streaming technique.
So there's no easy way. If you have a bundle of data that you can pass to a longer-running "process and log" method, try using the deferred library. Note that this will requiring bundling your data up and sending it to the task queue to do your processing and logging, so
you may not save much time, and
the results may not look much like you'd want - for example, you'd be logging from a different request, so might need to radically alter the logging
Still, you could try.
You have two options:
Use the Task Queue API. Enqueueing a task should be fast, so long as you have less than 10k of data (which is the limit on a Task Queue payload).
Use the 'sneaky' trick described by Rafe in this video to do processing after the response completes.

Categories

Resources