Threading in Django is not working in production

Threading in Django is not working in production - python

I have a function in my Django views.py that looks like this.
def process(request):
form = ProcessForm(request.POST, request.FILES)
if form.is_valid():
instance = form.save(commit=False)
instance.requested_by = request.user
instance.save()
t = threading.Thread(target=utils.background_match, args=(instance,), kwargs={})
t.setDaemon(True)
t.start()
return HttpResponseRedirect(reverse('mart:processing'))
Here, I'm trying to call a function 'background_match' in a separate thread when ProcessForm is submitted. Since this thread takes some time to complete, I redirect the user to another page named 'mart:processing'.
The problem I am facing is that it all works fine in my local machine but doesn't work on production server which is an AWS EC2 instance. The thread doesn't start at all. There's a for loop inside the background_match function which doesn't move forward.
However, if I refresh (CTRL + R) the 'mart:processing' page, it does move by 1 or 2 iterations. So, for a complete loop consisting of 1000 iterations to run, I need to refresh the page 1000 times. If after, say, 100 iterations I don't refresh the page it gets stuck at that point and doesn't move to the 101st iteration. Please help!

Wrong architecture. Django and other web apps should be spawning threads like this. The correct way is to create an async task using a task queue. The most popular task queue for django happens to be Celery.
The mart:processing page should then check the async result to determine if the task has been completed. A rough sketch is as follows.
from celery.result import AsynResult
from myapp.tasks import my_task
...
if form.is_valid():
...
task_id = my_task()
request.session['task_id']=task_id
return HttpResponseRedirect(reverse('mart:processing'))
...
On the subsequent page
task_id = request.session.get('task_id')
if task_id:
task = AsyncResult(task_id)

Related

Asynchronous execution of a function inside Django

This is how the views.py file would look like for example. The user will make some post request and trigger a CPU intensive execution which will take a long time to finish. I want to return a response to the user with some message denoting that execution started and maybe some unique execution id.
The point being the user does not need to wait for the execution to end. So I am starting the time-consuming function in a separate thread, and whenever it finishes execution, it will make entry to some remote database.
Is this a good approach to achieve the same or are there any potential vulnerabilities with this approach?
Note: Although the function takes a long time to finish, it is essentially a small service, with probably one instance needed to run in production.
import threading
from rest_framework.views import APIView
from rest_framework.response import Response
def async_function(x):
time.sleep(10)
print(f'[*] Task {x} executed...')
class MainFunctionView(APIView):
def get(self, request):
return Response({'val': 1})
def post(self, request):
t1 = threading.Thread(target=async_function, args=(request.data.get('val'),))
t1.start()
return Response('exection started')
Thanks in advance.

How to send a task to Celery without waiting?

I have the following code in my tasks.py file:
#app.task(bind=True)
def create_car(self, car):
if car is None:
return False
status = subprocess.run(["<some_command_to_run>"])
return True
It should run the command <some_command_to_run> but for some reason the website waits it to finish. I thought the whole point of Celery that it will be run in the background and return status. How can I submit this task in asynchronous way? The wanted behaviour: user asked to create a new car instance, it will add a task to the queue and return true that indicating that the car was requested correctly. In the background it will run that command and return (somewhere - not sure yet where) that status. How to do it?

you just need to call create_car.delay(instance.pk), delay() make it async.
it's JSON encoded so make sure to pass only primary key or json serializable data (model instance is not)
Be carefull because post_save is not async too :)

In python+aiohttp, are the routes multithreaded? How can I communicate between the routes while their code is running?

I have a python-aiohttp web service. I want to have the routes interact with each other in this way:
# http://base/pause
async def pause_server(self, request):
self._is_paused = True
return json_response(data={'paused': true})
# http://base/resume
async def resume_server(self, request):
self._is_paused = False
return json_response(data={'paused': false})
# http://base/getData
async def get_data(self, request):
while self._is_paused:
time.sleep(0.1)
return json_response(data=data)
I've found that if I call pause_server, and then in a separate tab call get_data, the server sleeps as I expect, but once I call resume_server, the resume_server code never gets called and the server continues to sleep indefinitely in the get_data code. Is this something I can do with python-aiohttp? I would've guessed that each route runs on its own thread, but if that were the case, I'd have expected this code to work.
Why do I want to do this? I'm doing some behave testing of a python application which uses aiohttp to host REST services. While my page is loading, I want to show some "Loading..." text on the screen. When I write my behave tests now, they look like this:
Scenario: Load the page
Given the server takes 5 seconds to respone
And I go to the page
Then the data is 'Loading...'
Given I wait 5 seconds
Then the data is '<data>'
This relies on the server taking a certain amount of time to run. If I set the server to wait too long, the tests take forever to run. If I set the wait too short, the tests start to fail because it takes a while to actually run them.
I'd rather do something like this:
Scenario: Load the page
Given I pause the server # calls pause_server endpoint
And I go to the page # calls get_data endpoint
Then the data is 'Loading...'
Given I resume the server # calls resume_server endpoint
Then the data is '<data>'

Simple threading in Django

I need to implement threading in Django. I require three simple APIs:
/work?process=data1&jobid=1&jobtype=nonasync
/status
/kill?jobid=1
The API descriptions are:
The work api will take a process and spawn a thread that processes it. For now, we can assume it to be a simple sleep(10) method. It will name the thread as jobid-1. The thread should be retrievable by this name. A new thread cannot be created if a jobid already exists. The jobtype could be async i.e, api call will immediately return http status code 200 after spawning a thread. Or it could be nonasync such that the api waits for the server to complete the thread and return result.
status api should just show the statues of each running processes.
kill api should kill a process based on jobid. status api should not show this job any longer.
Here is my Django code:
processList = []
class Processes(threading.Thread):
""" The work api can instantiate a process object and monitor it completion"""
threadBeginTime = time.time()
def __init__(self, timeout, threadName, jobType):
threading.Thread.__init__(self)
self.totalWaitTime = timeout
self.threadName = threadName
self.jobType = jobtype
def beginThread(self):
self.thread = threading.Thread(target=self.execution,
name = self.threadName)
self.thread.start()
def execution(self):
time.sleep(self.totalWaitTime)
def calculatePercentDone(self):
"""Gets the current percent done for the thread."""
temp = time.time()
secondsDone = float(temp - self.threadBeginTime)
percentDone = float((secondsDone) * 100 / self.totalWaitTime)
return (secondsDone, percentDone)
def killThread(self):
pass
# time.sleep(self.totalWaitTime)
def work(request):
""" Django process initiation view """
data = {}
timeout = int(request.REQUEST.get('process'))
jobid = int(request.REQUEST.get('jobid'))
jobtype = int(request.REQUEST.get('jobtype'))
myProcess = Processes(timeout, jobid, jobtype)
myProcess.beginThread()
processList.append(myProcess)
return render_to_response('work.html',{'data':data}, RequestContext(request))
def status(request):
""" Django process status view """
data = {}
for p in processList:
print p.threadName, p.calculatePercentDone()
return render_to_response('server-status.html',{'data':data}, RequestContext(request))
def kill(request):
""" Django process kill view """
data = {}
jobid = int(request.REQUEST.get('jobid'))
# find jobid in processList and kill it
return render_to_response('server-status.html',{'data':data}, RequestContext(request))
There are several implementation issues in the above code. The thread spawning is not done in a proper way. I am not able to retrieve the processes status in status function. Also, the kill function is still implemented as I could not grab thread from its job id. Need help refactoring.
Update: I am doing this example for learning purposes, not for writing production code. Hence will not favour any off-the-shelf queueing libraries. The objective here is to understand how a multithreading works in conjunction with a web framework and what edge cases are there to be dealt.

As #Daniel Roseman mentioned above -- doing threading INSIDE of a Django request / response cycle is a very bad idea for many reasons.
What you're actually looking for here is task queueing.
There are a few libraries out there which make this sort of thing fairly simple -- I'll list them below in order of ease-of-use (the simplest ones are listed first):
Django-RQ (https://github.com/ui/django-rq) -- A very awesome, simple API that uses Redis to handle queueing and asynchronous tasks.
Celery (http://docs.celeryproject.org/en/latest/django/first-steps-with-django.html) -- A very powerful, flexible, and large projects which handles queuing and supports many different technologies for backends. I'd recommend this for large projects, but for everything else I'd use RQ as it's quite a bit simpler.
Just my two cents.

Running Thread even after the main thread terminates using Django

I am working on a django project where the user hits a submits a bunch of images that needs to be processed in the backend.
The images are already uploaded on the server but once the user hits submit, it takes a lot of time to process the request and about 15 seconds until a 'thank you for using us' message is displayed.
What I wanted to do is to put the time consuming part of the process into a different thread and display the thank you message right away. My code looks like this:
def processJob(request):
...
threading.Thread(target=processInBackground, args=(username, jobID)).start()
context = {}
context.update(csrf(request))
return render_to_response('checkout.html', context)
def processInBackground(username, jobID):
...
(processing the rest of the job)
However, once I run it: It create a new thread but it terminates the seconds the main thread terminates. Is there any way how I can process the stuff in the backend, while the user gets the thank message right away?

PROCESSORS = [] # empty list of background processors
Inherit from Trheading.thread
class ImgProcessor(threading.Thread):
def __init__(self, img, username, jobID):
self.img = img
self.username = username
self.jobID = jobID
threading.Thread.__init__(self)
self.start()
self.readyflag = False
def run(self):
... process the image ...
self.readyflag = True
Then, when receiving the request:
def processJob(request):
PROCESSORS.append(ImgProcessor(img, username, jobID))
.... remove all objects in PROCESSORS that have readyflag == True

This isn't necessarily what you're looking for, but with my photographer website platform we upload the photo asynchronously using Javascript (with PLUpload), which allows us to have callbacks for multiple photo upload statuses at a time. Once uploaded, the file is saved out to a folder with a unique name, and also into a database queue, where it is then picked up by a cron job that runs through the queue for anything that's not yet completed, and processes it. I am using Django's custom management commands so I get all the benefit of the Django framework, minus the web part.
Benefits of this are that you get a record of every upload, you can display status with a polling request, and you can perform the processing on any server that you wish if you'd like.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Threading in Django is not working in production - python

Related

Asynchronous execution of a function inside Django

How to send a task to Celery without waiting?

In python+aiohttp, are the routes multithreaded? How can I communicate between the routes while their code is running?

Simple threading in Django

Running Thread even after the main thread terminates using Django

Categories

Resources