Django saving progress to the Session in subscript - python

So im wondering how this is done right.
I try to save the progress of a long running task inside the request.session object. And than be able to get the status of the process with another view method
Im using the Pool Class to make my long running progress async:
MyCalculation.py
def longrunning(x,request):
request.session['status'] = 5;
return x*x
views.py
def dolongrunning(request, x):
pool = Pool(processes=1)
result = pool.apply_async(MyCalculation.longrunning, [x, request])
return JsonResponse(..)
def status(request):
return JsonResponse(request.session.get('status))
so this doesnt work. My Async Job does executed but the request object doesnt get my progress informations.
How could i accomplish that or is there another way?
I have the feeling passing the request object is a bad idea in general.
What whould be a good practice to store the Status of a long running operation in Django/Python?

Different processes do not share the same memory space but they get a copy for each one of them.
In your case, the request object received by the worker process in the longrunning function is a copy of the one created in the parent process. Changes done on one of the processes do not affect the others.
What you want to do, is to send updates from the worker process to the parent one and then, within the parent one, update the request status.
from multiprocessing import Pool, Queue
def worker(task, message_queue): # longrunning
# do something
message_queue.put(5)
# do something else
message_queue.put(42)
def request_handler(request, task, message_queue): # dolongrunning
result = pool.apply_async(worker, [task, message_queue])
return JsonResponse(..)
def status(request):
status = message_queue.get() # this is blocking if no messages in queue
request.session['status'] = status;
return JsonResponse(request.session['status'])
pool = Pool(processes=1)
message_queue = Queue()
This is quite simplified and it's actually blocking on status requests if no status is set but it gives an idea.
A better way would be storing the requests in a buffer and keeping the message queue empty with a thread. Each time a status request is received the last status update received from the workers would be returned.

Related

Celery get state of two tasks at the same time not working

I have an asynchronous API with flask, celery and RabbitMQ configured as a backend and broker. I have three endpoints, /task1, /task2 and /progress/<task_id>.
In /task1 and /task2 I have something like this:
task = task1.delay()
return task.id
And /progress/<task_id>:
#app.route('/progress/<task_id>', methods=['GET'])
def progress(task_id):
def progress_stream():
p = ""
while p != "SUCCESS":
res = cel_app.AsyncResult(task_id, app=cel_app)
p = res.status
yield p
return Response(progress_stream(), mimetype='text/event-stream')
And in the client I have the following script:
var eventSource = new EventSource("http://localhost:5000/progress/[task_id]")
eventSource.onmessage = function(e) {
container.innerHTML = e.data;
console.log(e.data)
if(e.data == "SUCCESS"){
eventSource.close();
}
};
And I run celery with the following command:
celery -A celery_app worker --loglevel=info --concurrency=10
This works fine when I make a single POST request to any of the task and then wait for its status. The problem comes when I make two or more request to /task1 or /task2 and then try to get the result of both tasks at the same time. In this case, it works with the first request, but the second one never updates the status or the result, even when celery is actually executing it.
I have read the documentation and I have found the following:
The RPC result backend (rpc://) is special as it doesn’t actually store the states, but rather sends them as messages. This is an important difference as it means that a result can only be retrieved once, and only by the client that initiated the task. Two different processes can’t wait for the same result.
However, in this case the two clients are waiting for different tasks, with different ids, so I am not sure why this is happening.
I also tried adding this to the celery configuration:
result_persistent=True
But it doesn't seem to make a difference

How to send a task to Celery without waiting?

I have the following code in my tasks.py file:
#app.task(bind=True)
def create_car(self, car):
if car is None:
return False
status = subprocess.run(["<some_command_to_run>"])
return True
It should run the command <some_command_to_run> but for some reason the website waits it to finish. I thought the whole point of Celery that it will be run in the background and return status. How can I submit this task in asynchronous way? The wanted behaviour: user asked to create a new car instance, it will add a task to the queue and return true that indicating that the car was requested correctly. In the background it will run that command and return (somewhere - not sure yet where) that status. How to do it?
you just need to call create_car.delay(instance.pk), delay() make it async.
it's JSON encoded so make sure to pass only primary key or json serializable data (model instance is not)
Be carefull because post_save is not async too :)

Background tasks in flask

I am writing a web application which would do some heavy work. With that in mind I thought of making the tasks as background tasks(non blocking) so that other requests are not blocked by the previous ones.
I went with demonizing the thread so that it doesn't exit once the main thread (since I am using threaded=True) is finished, Now if a user sends a request my code will immediately tell them that their request is in progress, it'll be running in the background, and the application is ready to serve other requests.
My current application code looks something like this:
from flask import Flask
from flask import request
import threading
class threadClass:
def __init__(self):
thread = threading.Thread(target=self.run, args=())
thread.daemon = True # Daemonize thread
thread.start() # Start the execution
def run(self):
#
# This might take several minutes to complete
someHeavyFunction()
app = Flask(__name__)
#app.route('/start', methods=['POST'])
try:
begin = threadClass()
except:
abort(500)
return "Task is in progress"
def main():
"""
Main entry point into program execution
PARAMETERS: none
"""
app.run(host='0.0.0.0',threaded=True)
main()
I just want it to be able to handle a few concurrent requests (it's not gonna be used in production)
Could I have done this better? Did I miss anything? I was going through python's multi-threading package and found this
multiprocessing is a package that supports spawning processes using an
API similar to the threading module. The multiprocessing package
offers both local and remote concurrency, effectively side-stepping
the Global Interpreter Lock by using subprocesses instead of threads.
Due to this, the multiprocessing module allows the programmer to fully
leverage multiple processors on a given machine. It runs on both Unix
and Windows.
Can I demonize a process using multi-processing? How can I achieve better than what I have with threading module?
##EDIT
I went through the multi-processing package of python, it is similar to threading.
from flask import Flask
from flask import request
from multiprocessing import Process
class processClass:
def __init__(self):
p = Process(target=self.run, args=())
p.daemon = True # Daemonize it
p.start() # Start the execution
def run(self):
#
# This might take several minutes to complete
someHeavyFunction()
app = Flask(__name__)
#app.route('/start', methods=['POST'])
try:
begin = processClass()
except:
abort(500)
return "Task is in progress"
def main():
"""
Main entry point into program execution
PARAMETERS: none
"""
app.run(host='0.0.0.0',threaded=True)
main()
Does the above approach looks good?
Best practice
The best way to implement background tasks in flask is with Celery as explained in this SO post. A good starting point is the official Flask documentation and the Celery documentation.
Crazy way: Build your own decorator
As #MrLeeh pointed out in a comment, Miguel Grinberg presented a solution in his Pycon 2016 talk by implementing a decorator. I want to emphasize that I have the highest respect for his solution; he called it a "crazy solution" himself. The below code is a minor adaptation of his solution.
Warning!!!
Don't use this in production! The main reason is that this app has a memory leak by using the global tasks dictionary. Even if you fix the memory leak issue, maintaining this sort of code is hard. If you just want to play around or use this in a private project, read on.
Minimal example
Assume you have a long running function call in your /foo endpoint. I mock this with a 10 second sleep timer. If you call the enpoint three times, it will take 30 seconds to finish.
Miguel Grinbergs decorator solution is implemented in flask_async. It runs a new thread in a Flask context which is identical to the current Flask context. Each thread is issued a new task_id. The result is saved in a global dictionary tasks[task_id]['result'].
With the decorator in place you only need to decorate the endpoint with #flask_async and the endpoint is asynchronous - just like that!
import threading
import time
import uuid
from functools import wraps
from flask import Flask, current_app, request, abort
from werkzeug.exceptions import HTTPException, InternalServerError
app = Flask(__name__)
tasks = {}
def flask_async(f):
"""
This decorator transforms a sync route to asynchronous by running it in a background thread.
"""
#wraps(f)
def wrapped(*args, **kwargs):
def task(app, environ):
# Create a request context similar to that of the original request
with app.request_context(environ):
try:
# Run the route function and record the response
tasks[task_id]['result'] = f(*args, **kwargs)
except HTTPException as e:
tasks[task_id]['result'] = current_app.handle_http_exception(e)
except Exception as e:
# The function raised an exception, so we set a 500 error
tasks[task_id]['result'] = InternalServerError()
if current_app.debug:
# We want to find out if something happened so reraise
raise
# Assign an id to the asynchronous task
task_id = uuid.uuid4().hex
# Record the task, and then launch it
tasks[task_id] = {'task': threading.Thread(
target=task, args=(current_app._get_current_object(), request.environ))}
tasks[task_id]['task'].start()
# Return a 202 response, with an id that the client can use to obtain task status
return {'TaskId': task_id}, 202
return wrapped
#app.route('/foo')
#flask_async
def foo():
time.sleep(10)
return {'Result': True}
#app.route('/foo/<task_id>', methods=['GET'])
def foo_results(task_id):
"""
Return results of asynchronous task.
If this request returns a 202 status code, it means that task hasn't finished yet.
"""
task = tasks.get(task_id)
if task is None:
abort(404)
if 'result' not in task:
return {'TaskID': task_id}, 202
return task['result']
if __name__ == '__main__':
app.run(debug=True)
However, you need a little trick to get your results. The endpoint /foo will only return the HTTP code 202 and the task id, but not the result. You need another endpoint /foo/<task_id> to get the result. Here is an example for localhost:
import time
import requests
task_ids = [requests.get('http://127.0.0.1:5000/foo').json().get('TaskId')
for _ in range(2)]
time.sleep(11)
results = [requests.get(f'http://127.0.0.1:5000/foo/{task_id}').json()
for task_id in task_ids]
# [{'Result': True}, {'Result': True}]

Simple threading in Django

I need to implement threading in Django. I require three simple APIs:
/work?process=data1&jobid=1&jobtype=nonasync
/status
/kill?jobid=1
The API descriptions are:
The work api will take a process and spawn a thread that processes it. For now, we can assume it to be a simple sleep(10) method. It will name the thread as jobid-1. The thread should be retrievable by this name. A new thread cannot be created if a jobid already exists. The jobtype could be async i.e, api call will immediately return http status code 200 after spawning a thread. Or it could be nonasync such that the api waits for the server to complete the thread and return result.
status api should just show the statues of each running processes.
kill api should kill a process based on jobid. status api should not show this job any longer.
Here is my Django code:
processList = []
class Processes(threading.Thread):
""" The work api can instantiate a process object and monitor it completion"""
threadBeginTime = time.time()
def __init__(self, timeout, threadName, jobType):
threading.Thread.__init__(self)
self.totalWaitTime = timeout
self.threadName = threadName
self.jobType = jobtype
def beginThread(self):
self.thread = threading.Thread(target=self.execution,
name = self.threadName)
self.thread.start()
def execution(self):
time.sleep(self.totalWaitTime)
def calculatePercentDone(self):
"""Gets the current percent done for the thread."""
temp = time.time()
secondsDone = float(temp - self.threadBeginTime)
percentDone = float((secondsDone) * 100 / self.totalWaitTime)
return (secondsDone, percentDone)
def killThread(self):
pass
# time.sleep(self.totalWaitTime)
def work(request):
""" Django process initiation view """
data = {}
timeout = int(request.REQUEST.get('process'))
jobid = int(request.REQUEST.get('jobid'))
jobtype = int(request.REQUEST.get('jobtype'))
myProcess = Processes(timeout, jobid, jobtype)
myProcess.beginThread()
processList.append(myProcess)
return render_to_response('work.html',{'data':data}, RequestContext(request))
def status(request):
""" Django process status view """
data = {}
for p in processList:
print p.threadName, p.calculatePercentDone()
return render_to_response('server-status.html',{'data':data}, RequestContext(request))
def kill(request):
""" Django process kill view """
data = {}
jobid = int(request.REQUEST.get('jobid'))
# find jobid in processList and kill it
return render_to_response('server-status.html',{'data':data}, RequestContext(request))
There are several implementation issues in the above code. The thread spawning is not done in a proper way. I am not able to retrieve the processes status in status function. Also, the kill function is still implemented as I could not grab thread from its job id. Need help refactoring.
Update: I am doing this example for learning purposes, not for writing production code. Hence will not favour any off-the-shelf queueing libraries. The objective here is to understand how a multithreading works in conjunction with a web framework and what edge cases are there to be dealt.
As #Daniel Roseman mentioned above -- doing threading INSIDE of a Django request / response cycle is a very bad idea for many reasons.
What you're actually looking for here is task queueing.
There are a few libraries out there which make this sort of thing fairly simple -- I'll list them below in order of ease-of-use (the simplest ones are listed first):
Django-RQ (https://github.com/ui/django-rq) -- A very awesome, simple API that uses Redis to handle queueing and asynchronous tasks.
Celery (http://docs.celeryproject.org/en/latest/django/first-steps-with-django.html) -- A very powerful, flexible, and large projects which handles queuing and supports many different technologies for backends. I'd recommend this for large projects, but for everything else I'd use RQ as it's quite a bit simpler.
Just my two cents.

Running Thread even after the main thread terminates using Django

I am working on a django project where the user hits a submits a bunch of images that needs to be processed in the backend.
The images are already uploaded on the server but once the user hits submit, it takes a lot of time to process the request and about 15 seconds until a 'thank you for using us' message is displayed.
What I wanted to do is to put the time consuming part of the process into a different thread and display the thank you message right away. My code looks like this:
def processJob(request):
...
threading.Thread(target=processInBackground, args=(username, jobID)).start()
context = {}
context.update(csrf(request))
return render_to_response('checkout.html', context)
def processInBackground(username, jobID):
...
(processing the rest of the job)
However, once I run it: It create a new thread but it terminates the seconds the main thread terminates. Is there any way how I can process the stuff in the backend, while the user gets the thank message right away?
PROCESSORS = [] # empty list of background processors
Inherit from Trheading.thread
class ImgProcessor(threading.Thread):
def __init__(self, img, username, jobID):
self.img = img
self.username = username
self.jobID = jobID
threading.Thread.__init__(self)
self.start()
self.readyflag = False
def run(self):
... process the image ...
self.readyflag = True
Then, when receiving the request:
def processJob(request):
PROCESSORS.append(ImgProcessor(img, username, jobID))
.... remove all objects in PROCESSORS that have readyflag == True
This isn't necessarily what you're looking for, but with my photographer website platform we upload the photo asynchronously using Javascript (with PLUpload), which allows us to have callbacks for multiple photo upload statuses at a time. Once uploaded, the file is saved out to a folder with a unique name, and also into a database queue, where it is then picked up by a cron job that runs through the queue for anything that's not yet completed, and processes it. I am using Django's custom management commands so I get all the benefit of the Django framework, minus the web part.
Benefits of this are that you get a record of every upload, you can display status with a polling request, and you can perform the processing on any server that you wish if you'd like.

Categories

Resources