The workflow of my app is -
User submits a file
On receiving -> process_file()
return response
This could result in timeouts if the process_file() takes much time, so how could I send back the response before and then process the file and send the desired output to user later.
I have checked out django-celery but I think it's quite heavy for a small app which I am trying to build.
Update: I searched a bit around on the internet, and if anyone would like to use celery, here is a nice blog post, that could help you solve this situation - [Link]
You can use Celery for that matter:
Celery is an asynchronous task queue/job queue based on distributed message passing. It is focused on real-time operation, but supports scheduling as well.
The execution units, called tasks, are executed concurrently on a single or more worker servers using multiprocessing, Eventlet, or gevent. Tasks can execute asynchronously (in the background) or synchronously (wait until ready).
Related
I am developing my final degree project and I am facing some problems with Python, Flask, socketIO and background threads.
My solution takes some files as input, process them, makes some calculations, and generates an image and a CSV file. Those files are then uploaded to some storage service. I want to make the processing of the files on a background thread and notify my clients (web, Android, and iOS) using websockets. Right now, I am using flask-socketIO with eventlet as the async_mode of my socket. When a client uploads the files, the process is started in a background thread (using socketio.start_background_task) but that heavy process (takes about 30 minutes to end) seems to take control of the main thread, as a result when I try to make an HTTP request to the server, the response is loading infinitely.
I would like to know if there is a way to make this work using eventlet or maybe using another different approach.
Thank you in advance.
Eventlet uses cooperative multitasking, which means that you cannot have a task using the CPU for long periods of time, as this prevents other tasks from running.
In general it is a bad idea to include CPU heavy tasks in an eventlet process, so one possible solution would be to offload the CPU heavy work to an external process, maybe through Celery or RQ. Another option that sometimes works (but not always) is to add calls to socketio.sleep(0) inside your CPU heavy task as frequently as possible. The sleep call interrupts the function for a moment and allows other functions waiting for the CPU to run.
I have a Flask app that is using external scripts to perform certain actions. In one of the scripts, I am using threading to run the threads.
I am using the following code for the actual threading:
for a_device in get_devices:
my_thread = threading.Thread(target=DMCA.do_connect, args=(self, a_device, cmd))
my_thread.start()
main_thread = threading.currentThread()
for some_thread in threading.enumerate():
if some_thread != main_thread:
some_thread.join()
However, when this script gets ran (from a form), the process will hang and I will get a continuous loading cycle on the webpage.
Is there another way to use multithreading within the app?
Implementing threading by myself in a Flask app has always ended in some kind of disaster for me. You might want to use a distributed task queue such as Celery. Even though it might be tempting to spin off threads by yourself to get it finished faster, you will start to face all kinds of problems along the way and just end up wasting a lot of time (IMHO).
Celery is an asynchronous task queue/job queue based on distributed
message passing. It is focused on real-time operation, but supports
scheduling as well.
The execution units, called tasks, are executed concurrently on a
single or more worker servers using multiprocessing, Eventlet, or
gevent. Tasks can execute asynchronously (in the background) or
synchronously (wait until ready).
Here are some good resources that you can use to get started
Using Celery With Flask - Miguel Grinberg
Celery Background Tasks - Flask Documentation
In my application, I have python celery tasks that connect to a rest API.. simple.
The problem I have is that the API does not allow multiple resuests with the same credentials.
Is there a way to have these api tasks blocking in the queue? Meaning, If multiple requests are made around the same time, can I have the tasks sit in the queue and execute one by one, waiting for the first in the queue to finish?
Currently, in the rabbitmq message queue (with one worker), i see the tasks go through (spawned) and not wait.
I looked over documentation but could not find a simple solution.
Thanks.
With one worker it's impossible for celery to do more than one task at a time. what you may be seeing is called prefetching which allows the worker to reserve tasks.
http://docs.celeryproject.org/en/latest/userguide/optimizing.html#prefetch-limits
The default prefetch value is 4, turn it down to one and see if that fixes it.
I need to to handle a large (time and memory-consuming) process asynchronously in a web2py application called inside a controller method.
My specific use case is to call a process via stdlib.subprocess and wait for it to exit without blocking the web server, but I am open to alternative methods.
Hands-on examples would be a plus.
3rd party library recommendations
are welcome.
CRON scheduling is not required/wanted.
Assuming you'll need to start multiple, possibly simultaneous, instances of the background task, the solution is a task queue. I've heard good things about Celery and RabbitMQ, if you're looking for 3rd-party options, and web2py includes it's own task queue system that might be sufficient for your needs.
With either tool, you'll define a function that encapsulates the operation you want the background process to perform. Then bring the task queue workers online. The web2py manual and forums indicate this can be done with an #reboot statement in the web2py cron system, which is triggered whenever the web server starts. There are probably other ways to start the workers if this is unsatisfactory.
In your controller you'll insert a task into the task queue, passing any necessary parameters as inputs to the function (the background function will not run in the same environment as the controller, so it won't have access to the session, DB, etc. unless you explicitly pass the appropriate values into the task function).
Now, to get the output of the background operation to the user. When you insert a task into the task queue, you should get back a unique ID for the task. You would then implement controller logic (either something that expects an AJAX call, or a page that keeps refreshing until the task completes) that calls the task queue's API to check the status of the specified task. If the task's status is "finished", return the data to the user. If not, keep waiting.
Maybe review the book section on running tasks in the background. You can use the new scheduler or create a homemade queue (email example). There's also a web2py-celery plugin, though I'm not sure what state that is in.
This is more difficult than one might expect. Note the deadlock warnings in the stdlib.subprocess documentation. It's easy if you don't mind blocking---use Popen.communicate. To work around the blocking, you can manage the process using stdlib.subprocess from a thread.
My favorite way to deal with subprocesses is to use Twisted's spawnProcess. But, it is not easy to get Twisted to play nicely with other frameworks.
I've been searching for an answer to this for awhile, it's possible that I haven't been searching for the right information though.
I'm trying to send data to a server, and once received the server executes a python script based on that data. I have been trying to spawn a thread and return, but I can't figure out how to "detach" the thread. I simply have to wait until the thread returns to be able to return an HttpResponse(). This is unacceptable, as the website interface has many other things that need to be able to be used while the thread runs on the server.
I'm not certain that was a clear explanation but I'll be more than happy to clarify if any part is confusing.
Have a look at Celery. It's quite nice in that you can accept the request, and it offload it quickly to workers, and return. It's simple to use.
http://celeryproject.org/
Most simply, you can do this with subprocess.Popen. See here for some information regarding the subprocess module:
http://docs.python.org/library/subprocess.html
There are other (possibly better) methods to doing this, but this one seems to fit your requirements.
Use message queue system, like celery (django-celery may help you.)
Use RDBMS and background process(es) which is periodically invoked by cron or always running.
First, the web server inserts data required by the background job into a database table. And then, background process (always running or run periodically by cron) gets the latest inserted row(s) and process it.
Spawn a thread.
worker_thread = threading.Thread(target=do_background_job, args=args)
worker_thread.setDaemon(False)
worker_thread.start()
return HttpResponse()
Even after HttpResponse is sent, do_background_job is processed. However, because Web server (apache) may kill any threads, execution of background_job is not guaranteed.