paho-mqtt : callback thread - python

I am implementing a MQTT worker in python with paho-mqtt.
Are all the on_message() multi threaded in different threads, so that if one of the task is time consuming, other messages can still be processed?
If not, how to achieve this behaviour?

The python client doesn't actually start any threads, that's why you have to call the loop function to handle network events.
In Java you would use the onMessage callback to put the incoming message on to a local queue that a separate pool of threads will handle.
Python doesn't have native threading support but does have support for spawning processes to act like threads. Details of the multiprocessing can be found here:
https://docs.python.org/2.7/library/multiprocessing.html
EDIT:
On looking closer at the paho python code a little closer it appears it can actually start a new thread (using the loop_start() function) to handle the network side of things previously requiring the loop functions. This does not change the fact the all calls to the on_message callback will happen on this thread. If you need to do large amounts of work in this callback you should definitely look spinning up a pool of new threads to do this work.
http://www.tutorialspoint.com/python/python_multithreading.htm

Related

Is Python's asyncio `loop.create_task(...)` threadsafe?

I have matplotlib running on the main thread with a live plot of data coming in from an external source. To handle the incoming data I have a simple UDP listener listening for packages using asyncio with the event loop running on a seperate thread.
I now want to add more sources and I'd like to run their listeners on the same loop/thread as the first one. To do this I'm just passing the loop object to the classes implementing the listeners and their constructor adds a task to the loop that will initialize and run the listener.
However since these classes are initialized in the main thread I'm calling the loop.create_task(...) function from there instead of the loop's thread. Will this cause any issues?
The answer is no, using loop.create_task(...) to schedule a coroutine from a different thread is not threadsafe, use asyncio.run_coroutine_threadsafe(...) instead.

Listening for events on a network and handling callbacks robostly

I am developing a small Python program for the Raspberry Pi that listens for some events on a Zigbee network.
The way I've written this is rather simplisic, I have a while(True): loop checking for a Uniquie ID (UID) from the Zigbee. If a UID is received it's sent to a dictionary containing some callback methods. So, for instance, in the dictionary the key 101 is tied to a method called PrintHello().
So if that key/UID is received method PrintHello will be executed - pretty simple, like so:
if self.expectedCallBacks.has_key(UID) == True:
self.expectedCallBacks[UID]()
I know this approach is probably too simplistic. My main concern is, what if the system is busy handling a method and the system receives another message?
On an embedded MCU I can handle easily with a circuler buffer + interrupts but I'm a bit lost with it comes to doing this with a RPi. Do I need to implement a new thread for the Zigbee module that basically fills a buffer that the call back handler can then retrieve/read from?
I would appreciate any suggestions on how to implement this more robustly.
Threads can definitely help to some degree here. Here's a simple example using a ThreadPool:
from multiprocessing.pool import ThreadPool
pool = ThreadPool(2) # Create a 2-thread pool
while True:
uid = zigbee.get_uid()
if uid in self.expectedCallbacks:
pool.apply_async(self.expectedCallbacks[UID])
That will kick off the callback in a thread in the thread pool, and should help prevent events from getting backed up before you can send them to a callback handler. The ThreadPool will internally handle queuing up any tasks that can't be run when all the threads in the pool are already doing work.
However, remember that Raspberry Pi's have only one CPU core, so you can't execute more than one CPU-based operation concurrently (and that's even ignoring the limitations of threading in Python caused by the GIL, which is normally solved by using multiple processes instead of threads). That means no matter how many threads/processes you have, only one can get access to the CPU at a time. For that reason, you probably don't want more than one thread actually running the callbacks, since as you add more you're just going to slow things down, due to the OS needing to constantly switch between threads.

How to use Tornado with APScheduler?

I am running python's apscheduler and periodically want to do some work POST-ing to some http resources which will involve using tornado's AsyncHttpClient as a scheduled job. Each job will do several POSTs. When each http request responds a callback is then called (I think that Tornado uses a future to accomplish this).
I am concerned with thread-safety here since Apscheduler runs jobs in various threads. I have not been able to find a well explained example of how tornado would best be used across multiple threads in this context.
How can I best use apscheduler with tornado in this manner?
Specific concerns:
Which tornado ioloop to use? The docs say that AsyncHTTPClient "works like magic". Well, magic scares me. Do I need to use AsyncHTTPClient from within the current thread or can I use the main one (it can be specified)?
Are there thread-safety issues with my callback with respect to which ioloop I use?
Not clear to me what happens when a thread completes but there is still a pending callback/future that needs to be called. Are there issues here?
Since apscheduler is run as threads in-process, and python has the GIL, then is it pretty much the same to have one IOLoop from the main thread - as opposed to multiple loops from different threads (with respect to performance)?
All of Tornado's utilities work around Tornado's IOLoop - this includes the AsyncHTTPClient as well. And an IOLoop is not considered thread safe. Therefore, it is not a great idea to be running AsyncHTTPClient from any thread other than the thread running your main IOLoop. For more details on how to use the IOLoop, read this.
If you use tornado.ioloop.IOLoop.instance(), then I suppose you will if your intention is not to add callbacks to the main thread's IOLoop. You can use tornado.ioloop.IOLoop.current() to correctly refer to the right IOLoop instance for the right thread. And you will have to do just too much book keeping to add a callback to a non-main thread's IOLoop from another non-main thread's IOLoop - it will just get too messy.
I don't quite get this. But the way I understand it, there are two scenarios. Either you are talking about a thread with an IOLoop or without an IOLoop. If the thread does not have an IOLoop running, then after whatever the thread does to reach completion, whatever callback has to be executed by the IOLoop in some other thread (perhaps main thread) will be executed. The other scenario is that the thread you are talking about has an IOLoop running. Then the thread won't complete unless you have stopped the IOLoop. And therefore, execution of the callback will really depend on when you stop the IOLoop.
Honestly, I don't see much point of using threads with Tornado. There won't be any performance gain unless you are running on PyPy, which I am not sure if Tornado will play well with (not all the things are known to work on it and honestly I don't know about Tornado as well). You might as well have multiple process of your Tornado app if it is webserver and use Nginx as a proxy and LB. Since you have brought in apscheduler, I would suggest using IOLoop's add_timeout which does pretty much the same thing that you need and it is native to Tornado which play much nicer with it. Callbacks are anyways much difficult to debug. Combine it with Python's threading and you can have a massive mess. If you are ready to consider another option, just move all the async processing out of this process - it will make life much easier. Think of something like Celery for this.

Multi-Threading and Asynchronous sockets in python

I'm quite new to python threading/network programming, but have an assignment involving both of the above.
One of the requirements of the assignment is that for each new request, I spawn a new thread, but I need to both send and receive at the same time to the browser.
I'm currently using the asyncore library in Python to catch each request, but as I said, I need to spawn a thread for each request, and I was wondering if using both the thread and the asynchronous is overkill, or the correct way to do it?
Any advice would be appreciated.
Thanks
EDIT:
I'm writing a Proxy Server, and not sure if my client is persistent. My client is my browser (using firefox for simplicity)
It seems to reconnect for each request. My problem is that if I open a tab with http://www.google.com in it, and http://www.stackoverflow.com in it, I only get one request at a time from each tab, instead of multiple requests from google, and from SO.
I answered a question that sounds amazingly similar to your, where someone had a homework assignment to create a client server setup, with each connection being handled in a new thread: https://stackoverflow.com/a/9522339/496445
The general idea is that you have a main server loop constantly looking for a new connection to come in. When it does, you hand it off to a thread which will then do its own monitoring for new communication.
An extra bit about asyncore vs threading
From the asyncore docs:
There are only two ways to have a program on a single processor do
“more than one thing at a time.” Multi-threaded programming is the
simplest and most popular way to do it, but there is another very
different technique, that lets you have nearly all the advantages of
multi-threading, without actually using multiple threads. It’s really
only practical if your program is largely I/O bound. If your program
is processor bound, then pre-emptive scheduled threads are probably
what you really need. Network servers are rarely processor bound,
however.
As this quote suggests, using asyncore and threading should be for the most part mutually exclusive options. My link above is an example of the threading approach, where the server loop (either in a separate thread or the main one) does a blocking call to accept a new client. And when it gets one, it spawns a thread which will then continue to handle the communication, and the server goes back into a blocking call again.
In the pattern of using asyncore, you would instead use its async loop which will in turn call your own registered callbacks for various activity that occurs. There is no threading here, but rather a polling of all the open file handles for activity. You get the sense of doing things all concurrently, but under the hood it is scheduling everything serially.

How to evaluate a python module is thread-safe or not?

Well, the initial thing to my mind was how to make sure if pydispatcher or pubsub is thread-safe or not. pubsub might be a little tricky or complex to figure out but pydispatcher seems simple to realize. Then I started to wonder how to figure out if a python module thread-safe or not. Any heuristics?
For determining if a library or application is thread safe, without author input, I would look for mechanisms for synchronizing threads: http://effbot.org/zone/thread-synchronization.htm
or that it contains threading methods: http://docs.python.org/library/threading.html
However, none of that will tell you how to use the API in a thread safe manner. Practically anything can be stuffed inside a thread object and communicated to using thread synchronization objects.
For something like pubsub one could create a class that wraps the API and communicates over Queues exclusively. If pubsub lived in the same thread as wx for example, then an API could be created to inject messages into the Queue using a threading API for sending messages. Then a pubsub loop or timer could be monitoring the Queue. It would then send out messages. One of the issues with wrapping something like pubsub is that somewhere it will require polling. It could be made transparent if the polling were done by timers. Each thread would have to allocate a timer to receive messages if pubsub did not reside in that thread. There might be more elegant approaches to this, but I am not aware of them.
From a theoretic point of view: There is no algorithm which does this for an arbitrary program. It is like the halting problem.
You can inspect the used modules and check if these are granted to be thread safe. But there is no general way to check the byte code of a module for thread safety.

Categories

Resources