Simple threading in Django - python

I need to implement threading in Django. I require three simple APIs:
/work?process=data1&jobid=1&jobtype=nonasync
/status
/kill?jobid=1
The API descriptions are:
The work api will take a process and spawn a thread that processes it. For now, we can assume it to be a simple sleep(10) method. It will name the thread as jobid-1. The thread should be retrievable by this name. A new thread cannot be created if a jobid already exists. The jobtype could be async i.e, api call will immediately return http status code 200 after spawning a thread. Or it could be nonasync such that the api waits for the server to complete the thread and return result.
status api should just show the statues of each running processes.
kill api should kill a process based on jobid. status api should not show this job any longer.
Here is my Django code:
processList = []
class Processes(threading.Thread):
""" The work api can instantiate a process object and monitor it completion"""
threadBeginTime = time.time()
def __init__(self, timeout, threadName, jobType):
threading.Thread.__init__(self)
self.totalWaitTime = timeout
self.threadName = threadName
self.jobType = jobtype
def beginThread(self):
self.thread = threading.Thread(target=self.execution,
name = self.threadName)
self.thread.start()
def execution(self):
time.sleep(self.totalWaitTime)
def calculatePercentDone(self):
"""Gets the current percent done for the thread."""
temp = time.time()
secondsDone = float(temp - self.threadBeginTime)
percentDone = float((secondsDone) * 100 / self.totalWaitTime)
return (secondsDone, percentDone)
def killThread(self):
pass
# time.sleep(self.totalWaitTime)
def work(request):
""" Django process initiation view """
data = {}
timeout = int(request.REQUEST.get('process'))
jobid = int(request.REQUEST.get('jobid'))
jobtype = int(request.REQUEST.get('jobtype'))
myProcess = Processes(timeout, jobid, jobtype)
myProcess.beginThread()
processList.append(myProcess)
return render_to_response('work.html',{'data':data}, RequestContext(request))
def status(request):
""" Django process status view """
data = {}
for p in processList:
print p.threadName, p.calculatePercentDone()
return render_to_response('server-status.html',{'data':data}, RequestContext(request))
def kill(request):
""" Django process kill view """
data = {}
jobid = int(request.REQUEST.get('jobid'))
# find jobid in processList and kill it
return render_to_response('server-status.html',{'data':data}, RequestContext(request))
There are several implementation issues in the above code. The thread spawning is not done in a proper way. I am not able to retrieve the processes status in status function. Also, the kill function is still implemented as I could not grab thread from its job id. Need help refactoring.
Update: I am doing this example for learning purposes, not for writing production code. Hence will not favour any off-the-shelf queueing libraries. The objective here is to understand how a multithreading works in conjunction with a web framework and what edge cases are there to be dealt.

As #Daniel Roseman mentioned above -- doing threading INSIDE of a Django request / response cycle is a very bad idea for many reasons.
What you're actually looking for here is task queueing.
There are a few libraries out there which make this sort of thing fairly simple -- I'll list them below in order of ease-of-use (the simplest ones are listed first):
Django-RQ (https://github.com/ui/django-rq) -- A very awesome, simple API that uses Redis to handle queueing and asynchronous tasks.
Celery (http://docs.celeryproject.org/en/latest/django/first-steps-with-django.html) -- A very powerful, flexible, and large projects which handles queuing and supports many different technologies for backends. I'd recommend this for large projects, but for everything else I'd use RQ as it's quite a bit simpler.
Just my two cents.

Related

wanted to run a python function in background while the main program exits execution

I have a POST method in which accepts user inputs, saves them in db and then send emails as an alert to all the users. While sending the emails it takes too much time and holds the POST method. I wanted to make the send email class to run in background while the POST method completes.
I have little understanding of Process, thread or subprocess, here's what I tried.
class SendEmail():
def __init__(data):
self.data = data
def draft_email(self):
#do something
def run(self):
threading.Thread(target=self.draft_email).start()
From what I understand this will start a background job however, it will wait for the main to complete which again takes time.
Is there any way I can make this efficient what options I have. I am using Flask framework in Python. Thanks for any help or suggestion.
You could do this using background tasks with Celery. Once you complete the setup as explained on the documentation, it is quite simple to use.
For example I use a background task to deploy web apps on the server. This can take up to 2 minutes so it's too long to have an HTTP request depend on it. This is a snippet of my code:
#celery.task()
def celery_task_install_app(install_event_id, users_host):
e = InstallEvents.query.filter_by(id=install_event_id).first()
e.status = 'initializing'
db.session.commit()
cmd = '%s %d' % (app.config['WR_INST_SH'], install_event_id)
subprocess.run(['ssh', 'root#%s' % users_host, "%s" % cmd])
Then I invoke it as follows:
#app.route('/users/api/<op>', methods=['GET','POST'])
def users_api(op):
if _is_admin():
if op == 'blahblah':
...
...
elif op == 'install':
...
celery_task_install_app.delay(new_ie.id, inst_user_host)
...

Background tasks in flask

I am writing a web application which would do some heavy work. With that in mind I thought of making the tasks as background tasks(non blocking) so that other requests are not blocked by the previous ones.
I went with demonizing the thread so that it doesn't exit once the main thread (since I am using threaded=True) is finished, Now if a user sends a request my code will immediately tell them that their request is in progress, it'll be running in the background, and the application is ready to serve other requests.
My current application code looks something like this:
from flask import Flask
from flask import request
import threading
class threadClass:
def __init__(self):
thread = threading.Thread(target=self.run, args=())
thread.daemon = True # Daemonize thread
thread.start() # Start the execution
def run(self):
#
# This might take several minutes to complete
someHeavyFunction()
app = Flask(__name__)
#app.route('/start', methods=['POST'])
try:
begin = threadClass()
except:
abort(500)
return "Task is in progress"
def main():
"""
Main entry point into program execution
PARAMETERS: none
"""
app.run(host='0.0.0.0',threaded=True)
main()
I just want it to be able to handle a few concurrent requests (it's not gonna be used in production)
Could I have done this better? Did I miss anything? I was going through python's multi-threading package and found this
multiprocessing is a package that supports spawning processes using an
API similar to the threading module. The multiprocessing package
offers both local and remote concurrency, effectively side-stepping
the Global Interpreter Lock by using subprocesses instead of threads.
Due to this, the multiprocessing module allows the programmer to fully
leverage multiple processors on a given machine. It runs on both Unix
and Windows.
Can I demonize a process using multi-processing? How can I achieve better than what I have with threading module?
##EDIT
I went through the multi-processing package of python, it is similar to threading.
from flask import Flask
from flask import request
from multiprocessing import Process
class processClass:
def __init__(self):
p = Process(target=self.run, args=())
p.daemon = True # Daemonize it
p.start() # Start the execution
def run(self):
#
# This might take several minutes to complete
someHeavyFunction()
app = Flask(__name__)
#app.route('/start', methods=['POST'])
try:
begin = processClass()
except:
abort(500)
return "Task is in progress"
def main():
"""
Main entry point into program execution
PARAMETERS: none
"""
app.run(host='0.0.0.0',threaded=True)
main()
Does the above approach looks good?
Best practice
The best way to implement background tasks in flask is with Celery as explained in this SO post. A good starting point is the official Flask documentation and the Celery documentation.
Crazy way: Build your own decorator
As #MrLeeh pointed out in a comment, Miguel Grinberg presented a solution in his Pycon 2016 talk by implementing a decorator. I want to emphasize that I have the highest respect for his solution; he called it a "crazy solution" himself. The below code is a minor adaptation of his solution.
Warning!!!
Don't use this in production! The main reason is that this app has a memory leak by using the global tasks dictionary. Even if you fix the memory leak issue, maintaining this sort of code is hard. If you just want to play around or use this in a private project, read on.
Minimal example
Assume you have a long running function call in your /foo endpoint. I mock this with a 10 second sleep timer. If you call the enpoint three times, it will take 30 seconds to finish.
Miguel Grinbergs decorator solution is implemented in flask_async. It runs a new thread in a Flask context which is identical to the current Flask context. Each thread is issued a new task_id. The result is saved in a global dictionary tasks[task_id]['result'].
With the decorator in place you only need to decorate the endpoint with #flask_async and the endpoint is asynchronous - just like that!
import threading
import time
import uuid
from functools import wraps
from flask import Flask, current_app, request, abort
from werkzeug.exceptions import HTTPException, InternalServerError
app = Flask(__name__)
tasks = {}
def flask_async(f):
"""
This decorator transforms a sync route to asynchronous by running it in a background thread.
"""
#wraps(f)
def wrapped(*args, **kwargs):
def task(app, environ):
# Create a request context similar to that of the original request
with app.request_context(environ):
try:
# Run the route function and record the response
tasks[task_id]['result'] = f(*args, **kwargs)
except HTTPException as e:
tasks[task_id]['result'] = current_app.handle_http_exception(e)
except Exception as e:
# The function raised an exception, so we set a 500 error
tasks[task_id]['result'] = InternalServerError()
if current_app.debug:
# We want to find out if something happened so reraise
raise
# Assign an id to the asynchronous task
task_id = uuid.uuid4().hex
# Record the task, and then launch it
tasks[task_id] = {'task': threading.Thread(
target=task, args=(current_app._get_current_object(), request.environ))}
tasks[task_id]['task'].start()
# Return a 202 response, with an id that the client can use to obtain task status
return {'TaskId': task_id}, 202
return wrapped
#app.route('/foo')
#flask_async
def foo():
time.sleep(10)
return {'Result': True}
#app.route('/foo/<task_id>', methods=['GET'])
def foo_results(task_id):
"""
Return results of asynchronous task.
If this request returns a 202 status code, it means that task hasn't finished yet.
"""
task = tasks.get(task_id)
if task is None:
abort(404)
if 'result' not in task:
return {'TaskID': task_id}, 202
return task['result']
if __name__ == '__main__':
app.run(debug=True)
However, you need a little trick to get your results. The endpoint /foo will only return the HTTP code 202 and the task id, but not the result. You need another endpoint /foo/<task_id> to get the result. Here is an example for localhost:
import time
import requests
task_ids = [requests.get('http://127.0.0.1:5000/foo').json().get('TaskId')
for _ in range(2)]
time.sleep(11)
results = [requests.get(f'http://127.0.0.1:5000/foo/{task_id}').json()
for task_id in task_ids]
# [{'Result': True}, {'Result': True}]

Django saving progress to the Session in subscript

So im wondering how this is done right.
I try to save the progress of a long running task inside the request.session object. And than be able to get the status of the process with another view method
Im using the Pool Class to make my long running progress async:
MyCalculation.py
def longrunning(x,request):
request.session['status'] = 5;
return x*x
views.py
def dolongrunning(request, x):
pool = Pool(processes=1)
result = pool.apply_async(MyCalculation.longrunning, [x, request])
return JsonResponse(..)
def status(request):
return JsonResponse(request.session.get('status))
so this doesnt work. My Async Job does executed but the request object doesnt get my progress informations.
How could i accomplish that or is there another way?
I have the feeling passing the request object is a bad idea in general.
What whould be a good practice to store the Status of a long running operation in Django/Python?
Different processes do not share the same memory space but they get a copy for each one of them.
In your case, the request object received by the worker process in the longrunning function is a copy of the one created in the parent process. Changes done on one of the processes do not affect the others.
What you want to do, is to send updates from the worker process to the parent one and then, within the parent one, update the request status.
from multiprocessing import Pool, Queue
def worker(task, message_queue): # longrunning
# do something
message_queue.put(5)
# do something else
message_queue.put(42)
def request_handler(request, task, message_queue): # dolongrunning
result = pool.apply_async(worker, [task, message_queue])
return JsonResponse(..)
def status(request):
status = message_queue.get() # this is blocking if no messages in queue
request.session['status'] = status;
return JsonResponse(request.session['status'])
pool = Pool(processes=1)
message_queue = Queue()
This is quite simplified and it's actually blocking on status requests if no status is set but it gives an idea.
A better way would be storing the requests in a buffer and keeping the message queue empty with a thread. Each time a status request is received the last status update received from the workers would be returned.

Tornado celery integration hacks

Since nobody provided a solution to this post plus the fact that I desperately need a workaround, here is my situation and some abstract solutions/ideas for debate.
My stack:
Tornado
Celery
MongoDB
Redis
RabbitMQ
My problem: Find a way for Tornado to dispatch a celery task ( solved ) and then asynchronously gather the result ( any ideas? ).
Scenario 1: (request/response hack plus webhook)
Tornado receives a (user)request, then saves in local memory (or in Redis) a { jobID : (user)request} to remember where to propagate the response, and fires a celery task with jobID
When celery completes the task, it performs a webhook at some url and tells tornado that this jobID has finished ( plus the results )
Tornado retrieves the (user)request and forwards a response to the (user)
Can this happen? Does it have any logic?
Scenario 2: (tornado plus long-polling)
Tornado dispatches the celery task and returns some primary json data to the client (jQuery)
jQuery does some long-polling upon receipt of the primary json, say, every x microseconds, and tornado replies according to some database flag. When the celery task completes, this database flag is set to True, then jQuery "loop" is finished.
Is this efficient?
Any other ideas/schemas?
My solution involves polling from tornado to celery:
class CeleryHandler(tornado.web.RequestHandlerr):
#tornado.web.asynchronous
def get(self):
task = yourCeleryTask.delay(**kwargs)
def check_celery_task():
if task.ready():
self.write({'success':True} )
self.set_header("Content-Type", "application/json")
self.finish()
else:
tornado.ioloop.IOLoop.instance().add_timeout(datetime.timedelta(0.00001), check_celery_task)
tornado.ioloop.IOLoop.instance().add_timeout(datetime.timedelta(0.00001), check_celery_task)
Here is post about it.
Here is our solution to the problem. Since we look for result in several handlers in our application we made the celery lookup a mixin class.
This also makes code more readable with the tornado.gen pattern.
from functools import partial
class CeleryResultMixin(object):
"""
Adds a callback function which could wait for the result asynchronously
"""
def wait_for_result(self, task, callback):
if task.ready():
callback(task.result)
else:
# TODO: Is this going to be too demanding on the result backend ?
# Probably there should be a timeout before each add_callback
tornado.ioloop.IOLoop.instance().add_callback(
partial(self.wait_for_result, task, callback)
)
class ARemoteTaskHandler(CeleryResultMixin, tornado.web.RequestHandler):
"""Execute a task asynchronously over a celery worker.
Wait for the result without blocking
When the result is available send it back
"""
#tornado.web.asynchronous
#tornado.web.authenticated
#tornado.gen.engine
def post(self):
"""Test the provided Magento connection
"""
task = expensive_task.delay(
self.get_argument('somearg'),
)
result = yield tornado.gen.Task(self.wait_for_result, task)
self.write({
'success': True,
'result': result.some_value
})
self.finish()
I stumbled upon this question and hitting the results backend repeatedly did not look optimal to me. So I implemented a Mixin similar to your Scenario 1 using Unix Sockets.
It notifies Tornado as soon as the task finishes (to be accurate, as soon as next task in chain runs) and only hits results backend once. Here is the link.
Now, https://github.com/mher/tornado-celery comes to rescue...
class GenAsyncHandler(web.RequestHandler):
#asynchronous
#gen.coroutine
def get(self):
response = yield gen.Task(tasks.sleep.apply_async, args=[3])
self.write(str(response.result))
self.finish()

Running Thread even after the main thread terminates using Django

I am working on a django project where the user hits a submits a bunch of images that needs to be processed in the backend.
The images are already uploaded on the server but once the user hits submit, it takes a lot of time to process the request and about 15 seconds until a 'thank you for using us' message is displayed.
What I wanted to do is to put the time consuming part of the process into a different thread and display the thank you message right away. My code looks like this:
def processJob(request):
...
threading.Thread(target=processInBackground, args=(username, jobID)).start()
context = {}
context.update(csrf(request))
return render_to_response('checkout.html', context)
def processInBackground(username, jobID):
...
(processing the rest of the job)
However, once I run it: It create a new thread but it terminates the seconds the main thread terminates. Is there any way how I can process the stuff in the backend, while the user gets the thank message right away?
PROCESSORS = [] # empty list of background processors
Inherit from Trheading.thread
class ImgProcessor(threading.Thread):
def __init__(self, img, username, jobID):
self.img = img
self.username = username
self.jobID = jobID
threading.Thread.__init__(self)
self.start()
self.readyflag = False
def run(self):
... process the image ...
self.readyflag = True
Then, when receiving the request:
def processJob(request):
PROCESSORS.append(ImgProcessor(img, username, jobID))
.... remove all objects in PROCESSORS that have readyflag == True
This isn't necessarily what you're looking for, but with my photographer website platform we upload the photo asynchronously using Javascript (with PLUpload), which allows us to have callbacks for multiple photo upload statuses at a time. Once uploaded, the file is saved out to a folder with a unique name, and also into a database queue, where it is then picked up by a cron job that runs through the queue for anything that's not yet completed, and processes it. I am using Django's custom management commands so I get all the benefit of the Django framework, minus the web part.
Benefits of this are that you get a record of every upload, you can display status with a polling request, and you can perform the processing on any server that you wish if you'd like.

Categories

Resources