Threading an external script within a Flask app - python

I have a Flask app that is using external scripts to perform certain actions. In one of the scripts, I am using threading to run the threads.
I am using the following code for the actual threading:
for a_device in get_devices:
my_thread = threading.Thread(target=DMCA.do_connect, args=(self, a_device, cmd))
my_thread.start()
main_thread = threading.currentThread()
for some_thread in threading.enumerate():
if some_thread != main_thread:
some_thread.join()
However, when this script gets ran (from a form), the process will hang and I will get a continuous loading cycle on the webpage.
Is there another way to use multithreading within the app?

Implementing threading by myself in a Flask app has always ended in some kind of disaster for me. You might want to use a distributed task queue such as Celery. Even though it might be tempting to spin off threads by yourself to get it finished faster, you will start to face all kinds of problems along the way and just end up wasting a lot of time (IMHO).
Celery is an asynchronous task queue/job queue based on distributed
message passing. It is focused on real-time operation, but supports
scheduling as well.
The execution units, called tasks, are executed concurrently on a
single or more worker servers using multiprocessing, Eventlet, or
gevent. Tasks can execute asynchronously (in the background) or
synchronously (wait until ready).
Here are some good resources that you can use to get started
Using Celery With Flask - Miguel Grinberg
Celery Background Tasks - Flask Documentation

Related

Long running task with healthchecks

I'm currently implementing a containerized python app to process messages from a queue.
The main process would poll the queue every n seconds and then process all the messages it receives. However, I would also like this app to expose an API with healthchecks and other endpoints that could send jobs to the main process.
I was wondering what are the standard libraries to do this in python, if they exist. I have seen some examples using Background tasks on FastAPI but this would not meet my requirements as the service should poll the queue on startup without any request to its endpoints.
I have also seen the Celery library mentioned, but it seems like large complexity leap from what I need.
Is there a simple way to run a FastAPI application 'side-by-side' with a long running process in a way that both can communicate?
the multiprocessing module has its own version of queues. In your calling program, first create a queue, like this:
import multiprocessing as mp
self.outq_log=mp.Queue()
then pass this Queue object to the process by putting it in the arguments when you call mp.Process() to start your long running task.
then a function in your calling program to check for messages in the queue would look like:
def service_queues(self):
#look for data from the process, print it in the windows
try:
if self.outq_log is not None:
x=self.outq_log.get(block=False)
self.logtxt.write(x)
except queue.Empty:
pass
finally, in your long running process you can send items to the caller using
outq_log.put(stuff)
If you want to send messages the other way, to the task from the caller, you can create a separate queue and do the "put"ting in the caller and the "get"ting in the task.
You can run FastAPI programatically like in this example, after starting your other tasks in a thread or using asyncio. This way you should be able to communicate from the server endpoints to whatever objects you've started before.

Having trouble with flask-SocketIO and eventlet

I am developing my final degree project and I am facing some problems with Python, Flask, socketIO and background threads.
My solution takes some files as input, process them, makes some calculations, and generates an image and a CSV file. Those files are then uploaded to some storage service. I want to make the processing of the files on a background thread and notify my clients (web, Android, and iOS) using websockets. Right now, I am using flask-socketIO with eventlet as the async_mode of my socket. When a client uploads the files, the process is started in a background thread (using socketio.start_background_task) but that heavy process (takes about 30 minutes to end) seems to take control of the main thread, as a result when I try to make an HTTP request to the server, the response is loading infinitely.
I would like to know if there is a way to make this work using eventlet or maybe using another different approach.
Thank you in advance.
Eventlet uses cooperative multitasking, which means that you cannot have a task using the CPU for long periods of time, as this prevents other tasks from running.
In general it is a bad idea to include CPU heavy tasks in an eventlet process, so one possible solution would be to offload the CPU heavy work to an external process, maybe through Celery or RQ. Another option that sometimes works (but not always) is to add calls to socketio.sleep(0) inside your CPU heavy task as frequently as possible. The sleep call interrupts the function for a moment and allows other functions waiting for the CPU to run.

Running a python program which uses concurrent.futures within a celery based web app

I am working on a django application which uses celery for the distributed async processes. Now I have been tasked with integrating a process which was originally written with concurrent.futures in the code. So my question is, can this job with the concurrent futures processing work inside the celery task queue. Would it cause any problems ? If so what would be the best way to go forward. The concurrent process which was written earlier is resource intensive as it is able to avoid the GIL. Also, its very fast due to it. Not only that the process uses concurrent.futures.ProcessPoolExecutor and inside it another few (<5) concurrent.futures.ThreadPoolExecutor jobs.
So now the real question is should we extract all the core functions of the process and re-write them by breaking them as celery app tasks or just keep the original code and run it as one big piece of code within the celery queue.
As per the design of the system, a user of the system can submit several such celery tasks which would contain the concurrent futures code.
Any help will be appreciated.
Your library should work without modification. There's no harm in having threaded code running within Celery, unless you are mixing in gevent with non-gevent compatible code for example.
Reasons to break the code up would be for resource management (reduce memory/CPU overhead). With threading, the thing you want to monitor is CPU load. Once your concurrency causes enough load (e.g. threads doing CPU intensive work), the OS will start swapping between threads, and your processing gets slower, not faster.

Redis or threads in Python/Heroku?

I am new to this, so bear with me if I'm asking something completely stupid.
I am developing a basic web app and using Heroku+flask+python.
For the background tasks, Heroku recommends using a worker. I wonder if I could just create new threads for those background tasks? Or is there a reason why a worker+redis is a better solution?
Those background tasks are not critical, really.
The main benefit to doing this in a separate worker is you'd be completely decoupling your app from your background tasks, so if one breaks it can't affect the other. That said, if you don't care about that, or need your background tasks more tightly coupled to your app for whatever reason, you can use APScheduler to have the background tasks run as separate threads without spinning up another worker. A simple example of that to run a background job every 10 seconds is as follows:
from apscheduler.schedulers.background import BackgroundScheduler
def some_job():
print "successfully finished job!"
apsched = BackgroundScheduler()
apsched.start()
apsched.add_job(my_job, 'interval', seconds=10)
If you want tasks run asynchronously instead of on a schedule, you can use RQ, which has great examples of how to use it on its homepage. RQ is backed by Redis, but you don't need to run it in a separate worker process, although you can if you like.

Django request/response multithreading

The workflow of my app is -
User submits a file
On receiving -> process_file()
return response
This could result in timeouts if the process_file() takes much time, so how could I send back the response before and then process the file and send the desired output to user later.
I have checked out django-celery but I think it's quite heavy for a small app which I am trying to build.
Update: I searched a bit around on the internet, and if anyone would like to use celery, here is a nice blog post, that could help you solve this situation - [Link]
You can use Celery for that matter:
Celery is an asynchronous task queue/job queue based on distributed message passing. It is focused on real-time operation, but supports scheduling as well.
The execution units, called tasks, are executed concurrently on a single or more worker servers using multiprocessing, Eventlet, or gevent. Tasks can execute asynchronously (in the background) or synchronously (wait until ready).

Categories

Resources