I'm currently implementing a containerized python app to process messages from a queue.
The main process would poll the queue every n seconds and then process all the messages it receives. However, I would also like this app to expose an API with healthchecks and other endpoints that could send jobs to the main process.
I was wondering what are the standard libraries to do this in python, if they exist. I have seen some examples using Background tasks on FastAPI but this would not meet my requirements as the service should poll the queue on startup without any request to its endpoints.
I have also seen the Celery library mentioned, but it seems like large complexity leap from what I need.
Is there a simple way to run a FastAPI application 'side-by-side' with a long running process in a way that both can communicate?
the multiprocessing module has its own version of queues. In your calling program, first create a queue, like this:
import multiprocessing as mp
self.outq_log=mp.Queue()
then pass this Queue object to the process by putting it in the arguments when you call mp.Process() to start your long running task.
then a function in your calling program to check for messages in the queue would look like:
def service_queues(self):
#look for data from the process, print it in the windows
try:
if self.outq_log is not None:
x=self.outq_log.get(block=False)
self.logtxt.write(x)
except queue.Empty:
pass
finally, in your long running process you can send items to the caller using
outq_log.put(stuff)
If you want to send messages the other way, to the task from the caller, you can create a separate queue and do the "put"ting in the caller and the "get"ting in the task.
You can run FastAPI programatically like in this example, after starting your other tasks in a thread or using asyncio. This way you should be able to communicate from the server endpoints to whatever objects you've started before.
Related
I made a script that includes web scraping and api requests but I wanted to add discord.py for sending the results to my discord server but it stops after this:
client.run('token')
Is there any way to fix this?
You need to use threads.
Python threading allows you to have different parts of your program run concurrently and can simplify your design.
What Is a Thread?
A thread is a separate flow of execution. This means that your program will have two things happening at once.
Getting multiple tasks running simultaneously requires a non-standard implementation of Python, writing some of your code in a different language, or using multiprocessing which comes with some extra overhead.
Starting a thread
The Python standard library provides threading
import threading
x = threading.Thread(target=thread_function, args=(1,))
x.start()
wrapping up
you need to create two threads for each loop. Create and run the discord client in a thread, use another thread for web scraping and API requests.
The run method is completely blocking, so there are two ways I can see to solve this issue:
create and run the client in a separate thread, and use a queue of some sort to communicate between the client and the rest
use the start method, that returns an async coroutine which you can wrap into a task and multiplex with your scraping and API-requesting, assuming that also uses async coroutines
client.run seems to be a blocking operation.
E.g. your code is not meant to be executed after client.run
You can try using loop.create_task() as described here, to create another coroutine that would run in background and feed some messages into your client.
I have a Flask app that is using external scripts to perform certain actions. In one of the scripts, I am using threading to run the threads.
I am using the following code for the actual threading:
for a_device in get_devices:
my_thread = threading.Thread(target=DMCA.do_connect, args=(self, a_device, cmd))
my_thread.start()
main_thread = threading.currentThread()
for some_thread in threading.enumerate():
if some_thread != main_thread:
some_thread.join()
However, when this script gets ran (from a form), the process will hang and I will get a continuous loading cycle on the webpage.
Is there another way to use multithreading within the app?
Implementing threading by myself in a Flask app has always ended in some kind of disaster for me. You might want to use a distributed task queue such as Celery. Even though it might be tempting to spin off threads by yourself to get it finished faster, you will start to face all kinds of problems along the way and just end up wasting a lot of time (IMHO).
Celery is an asynchronous task queue/job queue based on distributed
message passing. It is focused on real-time operation, but supports
scheduling as well.
The execution units, called tasks, are executed concurrently on a
single or more worker servers using multiprocessing, Eventlet, or
gevent. Tasks can execute asynchronously (in the background) or
synchronously (wait until ready).
Here are some good resources that you can use to get started
Using Celery With Flask - Miguel Grinberg
Celery Background Tasks - Flask Documentation
I'm working in a REST service that is basically an wrapper to a library. I'm using flask and gunicorn. Basically each endpoint in the service maps to a different function in the library.
It happens that some of the calls to the library can take a long time to return, and that is making my service run out of workers once the service starts receiving a few requests. Right now I'm using the default gunicorn workers (sync).
I wanted to use gevent workers in order to be able to receive more requests, because not every endpoint takes that long to execute. However the function in the library does not use any of the patchable gevent functions, meaning that it won't cooperatively schedule to another green thread.
I had this idea of using a pool of threads or processes to handle the calls to the library asynchronously, and then each green thread produced by gunicorn would sleep until the process is not finished. Does this idea make sense at all?
Is it possible to use the multiprocessing.Process with gevent? and then have the join method to give up control to another green thread, and only return when the process is finished?
Yes, it makes perfect sense to use (real) threads or processes from within gevent for code that needs to be asynchronous but can't be monkeypatched by gevent.
Of course it can be tricky to get right—first, because you may have monkeypatched threading, and second, because you want your cooperative threads to be able to block on a pool or a pool result without blocking the whole main thread.
But that's exactly what gevent.threadpool is for.
If you would have used concurrent.futures.ThreadPoolExecutor in a non-gevent app, monkeypatch threading and then use gevent.threadpool.ThreadPoolExecutor.
If you would have used multiprocessing.dummy.Pool in a non-gevent app, monkeypatch threading and then use gevent.threadpool.ThreadPool.
Either way, methods like map, submit, apply_async, etc. work pretty much the way you'd expect. The Future and AsyncResult objects play nice with greenlets; you can gevent.wait things, or attach callbacks (which will run as greenlets), etc. Most of the time it just works like magic, and the rest of the time it's not too hard to figure out.
Using processes instead of threads is doable, but not as nice. AFAIK, there's no wrappers for anything as complete as multiprocessing.Process or multiprocessing.Pool, and trying to use the normal multiprocessing just hangs. You can manually fork if you're not on Windows, but that's about all that's built in. If you really need multiprocessing, you may need to do some multi-layered thing, where your greenlets don't talk to a process, but instead talk to a thread that creates a pipe, forks, execs, and then proxies between the gevent world and the child process.
If the calls are taking a long time because they're waiting on I/O from a backend service, or waiting on a subprocess, or doing GIL-releasing numpy work, I wouldn't bother trying to do multiprocessing. But if they're taking a long time because they're burning CPU… well, then you either need to get multiprocessing working, or go lower-level and just spin off a subprocess.Popen([sys.executable, 'workerscript.py']).
I am implementing a MQTT worker in python with paho-mqtt.
Are all the on_message() multi threaded in different threads, so that if one of the task is time consuming, other messages can still be processed?
If not, how to achieve this behaviour?
The python client doesn't actually start any threads, that's why you have to call the loop function to handle network events.
In Java you would use the onMessage callback to put the incoming message on to a local queue that a separate pool of threads will handle.
Python doesn't have native threading support but does have support for spawning processes to act like threads. Details of the multiprocessing can be found here:
https://docs.python.org/2.7/library/multiprocessing.html
EDIT:
On looking closer at the paho python code a little closer it appears it can actually start a new thread (using the loop_start() function) to handle the network side of things previously requiring the loop functions. This does not change the fact the all calls to the on_message callback will happen on this thread. If you need to do large amounts of work in this callback you should definitely look spinning up a pool of new threads to do this work.
http://www.tutorialspoint.com/python/python_multithreading.htm
I've been searching for an answer to this for awhile, it's possible that I haven't been searching for the right information though.
I'm trying to send data to a server, and once received the server executes a python script based on that data. I have been trying to spawn a thread and return, but I can't figure out how to "detach" the thread. I simply have to wait until the thread returns to be able to return an HttpResponse(). This is unacceptable, as the website interface has many other things that need to be able to be used while the thread runs on the server.
I'm not certain that was a clear explanation but I'll be more than happy to clarify if any part is confusing.
Have a look at Celery. It's quite nice in that you can accept the request, and it offload it quickly to workers, and return. It's simple to use.
http://celeryproject.org/
Most simply, you can do this with subprocess.Popen. See here for some information regarding the subprocess module:
http://docs.python.org/library/subprocess.html
There are other (possibly better) methods to doing this, but this one seems to fit your requirements.
Use message queue system, like celery (django-celery may help you.)
Use RDBMS and background process(es) which is periodically invoked by cron or always running.
First, the web server inserts data required by the background job into a database table. And then, background process (always running or run periodically by cron) gets the latest inserted row(s) and process it.
Spawn a thread.
worker_thread = threading.Thread(target=do_background_job, args=args)
worker_thread.setDaemon(False)
worker_thread.start()
return HttpResponse()
Even after HttpResponse is sent, do_background_job is processed. However, because Web server (apache) may kill any threads, execution of background_job is not guaranteed.