I have the following code in my tasks.py file:
#app.task(bind=True)
def create_car(self, car):
if car is None:
return False
status = subprocess.run(["<some_command_to_run>"])
return True
It should run the command <some_command_to_run> but for some reason the website waits it to finish. I thought the whole point of Celery that it will be run in the background and return status. How can I submit this task in asynchronous way? The wanted behaviour: user asked to create a new car instance, it will add a task to the queue and return true that indicating that the car was requested correctly. In the background it will run that command and return (somewhere - not sure yet where) that status. How to do it?
you just need to call create_car.delay(instance.pk), delay() make it async.
it's JSON encoded so make sure to pass only primary key or json serializable data (model instance is not)
Be carefull because post_save is not async too :)
Related
I have a POST method in which accepts user inputs, saves them in db and then send emails as an alert to all the users. While sending the emails it takes too much time and holds the POST method. I wanted to make the send email class to run in background while the POST method completes.
I have little understanding of Process, thread or subprocess, here's what I tried.
class SendEmail():
def __init__(data):
self.data = data
def draft_email(self):
#do something
def run(self):
threading.Thread(target=self.draft_email).start()
From what I understand this will start a background job however, it will wait for the main to complete which again takes time.
Is there any way I can make this efficient what options I have. I am using Flask framework in Python. Thanks for any help or suggestion.
You could do this using background tasks with Celery. Once you complete the setup as explained on the documentation, it is quite simple to use.
For example I use a background task to deploy web apps on the server. This can take up to 2 minutes so it's too long to have an HTTP request depend on it. This is a snippet of my code:
#celery.task()
def celery_task_install_app(install_event_id, users_host):
e = InstallEvents.query.filter_by(id=install_event_id).first()
e.status = 'initializing'
db.session.commit()
cmd = '%s %d' % (app.config['WR_INST_SH'], install_event_id)
subprocess.run(['ssh', 'root#%s' % users_host, "%s" % cmd])
Then I invoke it as follows:
#app.route('/users/api/<op>', methods=['GET','POST'])
def users_api(op):
if _is_admin():
if op == 'blahblah':
...
...
elif op == 'install':
...
celery_task_install_app.delay(new_ie.id, inst_user_host)
...
I have a function in my Django views.py that looks like this.
def process(request):
form = ProcessForm(request.POST, request.FILES)
if form.is_valid():
instance = form.save(commit=False)
instance.requested_by = request.user
instance.save()
t = threading.Thread(target=utils.background_match, args=(instance,), kwargs={})
t.setDaemon(True)
t.start()
return HttpResponseRedirect(reverse('mart:processing'))
Here, I'm trying to call a function 'background_match' in a separate thread when ProcessForm is submitted. Since this thread takes some time to complete, I redirect the user to another page named 'mart:processing'.
The problem I am facing is that it all works fine in my local machine but doesn't work on production server which is an AWS EC2 instance. The thread doesn't start at all. There's a for loop inside the background_match function which doesn't move forward.
However, if I refresh (CTRL + R) the 'mart:processing' page, it does move by 1 or 2 iterations. So, for a complete loop consisting of 1000 iterations to run, I need to refresh the page 1000 times. If after, say, 100 iterations I don't refresh the page it gets stuck at that point and doesn't move to the 101st iteration. Please help!
Wrong architecture. Django and other web apps should be spawning threads like this. The correct way is to create an async task using a task queue. The most popular task queue for django happens to be Celery.
The mart:processing page should then check the async result to determine if the task has been completed. A rough sketch is as follows.
from celery.result import AsynResult
from myapp.tasks import my_task
...
if form.is_valid():
...
task_id = my_task()
request.session['task_id']=task_id
return HttpResponseRedirect(reverse('mart:processing'))
...
On the subsequent page
task_id = request.session.get('task_id')
if task_id:
task = AsyncResult(task_id)
So im wondering how this is done right.
I try to save the progress of a long running task inside the request.session object. And than be able to get the status of the process with another view method
Im using the Pool Class to make my long running progress async:
MyCalculation.py
def longrunning(x,request):
request.session['status'] = 5;
return x*x
views.py
def dolongrunning(request, x):
pool = Pool(processes=1)
result = pool.apply_async(MyCalculation.longrunning, [x, request])
return JsonResponse(..)
def status(request):
return JsonResponse(request.session.get('status))
so this doesnt work. My Async Job does executed but the request object doesnt get my progress informations.
How could i accomplish that or is there another way?
I have the feeling passing the request object is a bad idea in general.
What whould be a good practice to store the Status of a long running operation in Django/Python?
Different processes do not share the same memory space but they get a copy for each one of them.
In your case, the request object received by the worker process in the longrunning function is a copy of the one created in the parent process. Changes done on one of the processes do not affect the others.
What you want to do, is to send updates from the worker process to the parent one and then, within the parent one, update the request status.
from multiprocessing import Pool, Queue
def worker(task, message_queue): # longrunning
# do something
message_queue.put(5)
# do something else
message_queue.put(42)
def request_handler(request, task, message_queue): # dolongrunning
result = pool.apply_async(worker, [task, message_queue])
return JsonResponse(..)
def status(request):
status = message_queue.get() # this is blocking if no messages in queue
request.session['status'] = status;
return JsonResponse(request.session['status'])
pool = Pool(processes=1)
message_queue = Queue()
This is quite simplified and it's actually blocking on status requests if no status is set but it gives an idea.
A better way would be storing the requests in a buffer and keeping the message queue empty with a thread. Each time a status request is received the last status update received from the workers would be returned.
Since nobody provided a solution to this post plus the fact that I desperately need a workaround, here is my situation and some abstract solutions/ideas for debate.
My stack:
Tornado
Celery
MongoDB
Redis
RabbitMQ
My problem: Find a way for Tornado to dispatch a celery task ( solved ) and then asynchronously gather the result ( any ideas? ).
Scenario 1: (request/response hack plus webhook)
Tornado receives a (user)request, then saves in local memory (or in Redis) a { jobID : (user)request} to remember where to propagate the response, and fires a celery task with jobID
When celery completes the task, it performs a webhook at some url and tells tornado that this jobID has finished ( plus the results )
Tornado retrieves the (user)request and forwards a response to the (user)
Can this happen? Does it have any logic?
Scenario 2: (tornado plus long-polling)
Tornado dispatches the celery task and returns some primary json data to the client (jQuery)
jQuery does some long-polling upon receipt of the primary json, say, every x microseconds, and tornado replies according to some database flag. When the celery task completes, this database flag is set to True, then jQuery "loop" is finished.
Is this efficient?
Any other ideas/schemas?
My solution involves polling from tornado to celery:
class CeleryHandler(tornado.web.RequestHandlerr):
#tornado.web.asynchronous
def get(self):
task = yourCeleryTask.delay(**kwargs)
def check_celery_task():
if task.ready():
self.write({'success':True} )
self.set_header("Content-Type", "application/json")
self.finish()
else:
tornado.ioloop.IOLoop.instance().add_timeout(datetime.timedelta(0.00001), check_celery_task)
tornado.ioloop.IOLoop.instance().add_timeout(datetime.timedelta(0.00001), check_celery_task)
Here is post about it.
Here is our solution to the problem. Since we look for result in several handlers in our application we made the celery lookup a mixin class.
This also makes code more readable with the tornado.gen pattern.
from functools import partial
class CeleryResultMixin(object):
"""
Adds a callback function which could wait for the result asynchronously
"""
def wait_for_result(self, task, callback):
if task.ready():
callback(task.result)
else:
# TODO: Is this going to be too demanding on the result backend ?
# Probably there should be a timeout before each add_callback
tornado.ioloop.IOLoop.instance().add_callback(
partial(self.wait_for_result, task, callback)
)
class ARemoteTaskHandler(CeleryResultMixin, tornado.web.RequestHandler):
"""Execute a task asynchronously over a celery worker.
Wait for the result without blocking
When the result is available send it back
"""
#tornado.web.asynchronous
#tornado.web.authenticated
#tornado.gen.engine
def post(self):
"""Test the provided Magento connection
"""
task = expensive_task.delay(
self.get_argument('somearg'),
)
result = yield tornado.gen.Task(self.wait_for_result, task)
self.write({
'success': True,
'result': result.some_value
})
self.finish()
I stumbled upon this question and hitting the results backend repeatedly did not look optimal to me. So I implemented a Mixin similar to your Scenario 1 using Unix Sockets.
It notifies Tornado as soon as the task finishes (to be accurate, as soon as next task in chain runs) and only hits results backend once. Here is the link.
Now, https://github.com/mher/tornado-celery comes to rescue...
class GenAsyncHandler(web.RequestHandler):
#asynchronous
#gen.coroutine
def get(self):
response = yield gen.Task(tasks.sleep.apply_async, args=[3])
self.write(str(response.result))
self.finish()
I am working on a django project where the user hits a submits a bunch of images that needs to be processed in the backend.
The images are already uploaded on the server but once the user hits submit, it takes a lot of time to process the request and about 15 seconds until a 'thank you for using us' message is displayed.
What I wanted to do is to put the time consuming part of the process into a different thread and display the thank you message right away. My code looks like this:
def processJob(request):
...
threading.Thread(target=processInBackground, args=(username, jobID)).start()
context = {}
context.update(csrf(request))
return render_to_response('checkout.html', context)
def processInBackground(username, jobID):
...
(processing the rest of the job)
However, once I run it: It create a new thread but it terminates the seconds the main thread terminates. Is there any way how I can process the stuff in the backend, while the user gets the thank message right away?
PROCESSORS = [] # empty list of background processors
Inherit from Trheading.thread
class ImgProcessor(threading.Thread):
def __init__(self, img, username, jobID):
self.img = img
self.username = username
self.jobID = jobID
threading.Thread.__init__(self)
self.start()
self.readyflag = False
def run(self):
... process the image ...
self.readyflag = True
Then, when receiving the request:
def processJob(request):
PROCESSORS.append(ImgProcessor(img, username, jobID))
.... remove all objects in PROCESSORS that have readyflag == True
This isn't necessarily what you're looking for, but with my photographer website platform we upload the photo asynchronously using Javascript (with PLUpload), which allows us to have callbacks for multiple photo upload statuses at a time. Once uploaded, the file is saved out to a folder with a unique name, and also into a database queue, where it is then picked up by a cron job that runs through the queue for anything that's not yet completed, and processes it. I am using Django's custom management commands so I get all the benefit of the Django framework, minus the web part.
Benefits of this are that you get a record of every upload, you can display status with a polling request, and you can perform the processing on any server that you wish if you'd like.