Allowing user to cancel long running API call - python

A django project I work on allows a user to preview a document made on our site before downloading it. The process takes some time on the backend which is why we allow the user to cancel the operation on the frontend. But even after the user cancels the process, it still keeps running on the server wasting time and compute. We use a simple django-rest-framework api function with no socket connection or anything asynchronous that allows the frontend to track the progress of the task.
I would like to know if there is some way we can abort the function execution on the backend if the user decides to cancel the operation on the frontend.
I'd share any code but I don't think it'll be useful because the function just prepares the document and then sends it.

Related

Django Handling Public API use (anonymouse users making calls to API)

I am making a simple website with as Django as backend.
Ideally, you should be able to use it without creating an account and then all of your saved items ('notes') in there would be visible to anyone.
For now I have created a dummy user on Django, and every time an anonymous user makes an API call to add/delete/modify a notes, on Django's side it selects dummy user as a user.
It would work okey (I think) but one of Django's API can take a really long time to run (~1-2 minutes). Meaning that if multiple people are trying to make API calls while being anonymous, at some point the server will freeze until the long API finishes run.
Is there a way such case should be handed on the server side to prevent freezing of server ?
As Sorin answered in comments, I would go into Celery way. Basically you can create model that collect the data when the last Celery task was ran - and, for example if it was not running since last 24 hours, and user visits the website, you run task again by async way.
You are not even forced to use AJAX calls for that, because sending task to Celery is fast enough to call it on get_context_data() or dispatch() methods. You could overload others, but these are the fastest and safest to override.
You mentioned that one of your API endpoints perform task that take really long time which means you already have a task or process that can be block http request-response cycle. What you can do is to run the time-consuming (or long running) task asynchronously using celery or other task queues.

Interupt backend function and wait for frontend response

I'm looking for a good solution to implement a handshake between a python backend server and a react frontend connected through a websocket.
The frontend allows the user to upload a file and then send it to the backend for processing. Now it might be possible that the processing encounters some issues and likes the user's confirmation to proceed or cancel - and that's where I'm stuck.
My current implementation has different "endpoints" in the backend which call then different function implementations and a queue which is continuously processed and content (messages) is sent to the frontend. But these are always complete actions, they either succeed or fail and the returned message is accordingly. I have no system in place to interupt a running task (e.g. file processing), send a request to the frontend and then wait for response before I continue the function.
Is there a design pattern or common approach for this kind of problem?
How long it takes to process? Maybe a good solution is set up a message broker like RabbitMQ and create a queue for this process. In the front-end you have to create a panel to see the state of the process, which is running in an async task, and if it has found some issues, let the user know and ask what to do.

Best Way to Handle user triggered task (like import data) in Django

I need your opinion on a challenge that I'm facing. I'm building a website that uses Django as a backend, PostgreSQL as my DB, GraphQL as my API layer and React as my frontend framework. Website is hosted on Heroku. I wrote a python script that logs me in to my gmail account and parse few emails, based on pre-defined conditions, and store the parsed data into Google Sheet. Now, I want the script to be part of my website in which user will specify what exactly need to be parsed (i.e. filters) and then display the parsed data in a table to review accuracy of the parsing task.
The part that I need some help with is how to architect such workflow. Below are few ideas that I managed to come up with after some googling:
generate a graphQL mutation that stores a 'task' into a task model. Once a new task entry is stored, a Django Signal will trigger the script. Not sure yet if Signal can run custom python functions, but from what i read so far, it seems doable.
Use Celery to run this task asynchronously. But i'm not sure if asynchronous tasks is what i'm after here as I need this task to run immediately after the user trigger the feature from the frontend. But i'm might be wrong here. I'm also not sure if I need Redis to store the task details or I can do that on PostgreSQL.
What is the best practice in implementing this feature? The task can be anything, not necessarily parsing emails; it can also be importing data from excel. Any task that is user generated rather than scheduled or repeated task.
I'm sorry in advance if this question seems trivial to some of you. I'm not a professional developer and the above project is a way for me to sharpen my technical skills and learn new techniques.
Looking forward to learn from your experiences.
You can dissect your problem into the following steps:
User specifies task parameters
System executes task
System displays result to the User
You can either do all of these:
Sequentially and synchronously in one swoop; or
Step by step asynchronously.
Synchronously
You can run your script when generating a response, but it will come with the following downsides:
The process in the server processing your request will block until the script is finished. This may or may not affect the processing of other requests by that same server (this will depend on the number of simultaneous requests being processed, workload of the script, etc.)
The client (e.g. your browser) and even the server might time out if the script takes too long. You can fix this to some extent by configuring your server appropriately.
The beauty of this approach however is it's simplicity. For you to do this, you can just pass the parameters through the request, server parses and does the script, then returns you the result.
No setting up of a message queue, task scheduler, or whatever needed.
Asynchronously
Ideally though, for long-running tasks, it is best to have this executed outside of the usual request-response loop for the following advantages:
The server responding to the requests can actually serve other requests.
Some scripts can take a while, some you don't even know if it's going to finish
Script is no longer dependent on the reliability of the network (imagine running an expensive task, then your internet connection skips or is just plain intermittent; you won't be able to do anything)
The downside of this is now you have to set more things up, which increases the project's complexity and points of failure.
Producer-Consumer
Whatever you choose, it's usually best to follow the producer-consumer pattern:
Producer creates tasks and puts them in a queue
Consumer takes a task from the queue and executes it
The producer is basically you, the user. You specify the task and the parameters involved in that task.
This queue could be any datastore: in-memory datastore like Redis; a messaging queue like RabbitMQ; or an relational database management system like PostgreSQL.
The consumer is your script executing these tasks. There are multiple ways of running the consumer/script: via Celery like you mentioned which runs multiple workers to execute the tasks passed through the queue; via a simple time-based job scheduler like crontab; or even you manually triggering the script
The question is actually not trivial, as the solution depends on what task you are actually trying to do. It is best to evaluate the constraints, parameters, and actual tasks to decide which approach you will choose.
But just to give you a more relevant guideline:
Just keep it simple, unless you have a compelling reason to do so (e.g. server is being bogged down, or internet connection is not reliable in practice), there's really no reason to be fancy.
The more blocking the task is, or the longer the task takes or the more dependent it is to third party APIs via the network, the more it makes sense to push this to a background process add reliability and resiliency.
In your email import script, I'll most likely push that to the background:
Have a page where you can add a task to the database
In the task details page, display the task details, and the result below if it exists or "Processing..." otherwise
Have a script that executes tasks (import emails from gmail given the task parameters) and save the results to the database
Schedule this script to run every few minutes via crontab
Yes the above has side effects, like crontab running the script in multiple times at the same time and such, but I won't go into detail without knowing more about the specifics of the task.

GAE Request Timeout when user uploads csv file and receives new csv file as response

I have an app on GAE that takes csv input from a web form and stores it to a blob, does some stuff to obtain new information using input from the csv file, then uses csv.writer on self.response.out to write a new csv file and prompt the user to download it. It works well, but my problem is if it takes over 60 seconds it times out. I've tried to setup the do some stuff part as a task in task queue, and it would work, except I can't make the user wait while this is running, and there's no way of calling the post that would write out the new csv file automatically when the task queue is complete, and having the user periodically push a button to see if it is done is less than optimal.
Is there a better solution to a problem like this other than using the task queue and having the user have to manually push a button periodically to see if the task is complete?
You have many options:
Use a timer in your client to check periodically (i.e. every 15 seconds) if the file is ready. This is the simplest option that requires only a few lines of code.
Use the Channel API. It's elegant, but it's an overkill unless you face similar problems frequently.
Email the results to the user.
If your problem is 60s limit for requests, you could consider to use App Engine Modules that allow you to control scaling type of a module/version. Basically there are three scaling types available.
Manual Scaling
Such a module runs continuously. Requests can run indefinitely.
Basic Scaling
Such a module creates an instance when the application receives a request. The instance will be turned down when the app becomes idle. Requests can run indefinitely.
Automatic Scaling
The same scaling policy that App Engine has used since its inception. It is based on request rate, response latencies, and other application metrics. There is 60-second deadline for HTTP requests.
You can find more details here.

Progress bar in Google App Engine

I have a Google App Engine application that performs about 30-50 calls to a remote API. Each call takes about a second, so the whole operation can easily take a minute. Currently, I do this in a loop inside the post() function of my site, so the response isn't printed until the whole operation completes. Needless to say, the app isn't very usable at the moment.
What I would like to do is to print the response immediately after the operation is started, and then update it as each individual API call completes. How would I achieve this? On a desktop application, I would just kick off a worker thread that would periodically update the front-end. Is there a similar mechanism in the Google App Engine?
I googled around for "progress bar" and "google app engine" but most results are from people that want to monitor the progress of uploading a file. My situation is different: the time-consuming task is being performed on the server, so there isn't much the client can do to monitor its progress. This guy is the closest thing I could find, but he works in Java.
Send the post logic to a task using http://code.google.com/appengine/docs/python/taskqueue
Change the logic of the process to set a status (it could be using memcache)
Using AJAX query memcache status each 10 seconds, more or less, it's up to you
You could return immediately from your post, and do one of two things:
Poll from your client every second or so to ask your service for its status
Use the Channel API to push status updates down to your client
Short version: Use a task queue that writes to a memcache key as the operation progresses. Your page can then either use the channel API or repeatedly poll the server for a progress report.
Long version: In your post you delegate the big job to a task. The task will periodically update a key that resides in memcache. If you don't have the time to learn the channel API, you can make the page returned by your post to periodically GET some URL in the app that returns a progress report based on the memcache data and you can then update your progress bar. When the job is complete your script can go to a results page.
If you have the time, learning the Channel API is worth the effort. In this case, the task would receive the channel token so it could communicate with the JavaScript channel client in your page without the polling thing.

Categories

Resources