Well, the initial thing to my mind was how to make sure if pydispatcher or pubsub is thread-safe or not. pubsub might be a little tricky or complex to figure out but pydispatcher seems simple to realize. Then I started to wonder how to figure out if a python module thread-safe or not. Any heuristics?
For determining if a library or application is thread safe, without author input, I would look for mechanisms for synchronizing threads: http://effbot.org/zone/thread-synchronization.htm
or that it contains threading methods: http://docs.python.org/library/threading.html
However, none of that will tell you how to use the API in a thread safe manner. Practically anything can be stuffed inside a thread object and communicated to using thread synchronization objects.
For something like pubsub one could create a class that wraps the API and communicates over Queues exclusively. If pubsub lived in the same thread as wx for example, then an API could be created to inject messages into the Queue using a threading API for sending messages. Then a pubsub loop or timer could be monitoring the Queue. It would then send out messages. One of the issues with wrapping something like pubsub is that somewhere it will require polling. It could be made transparent if the polling were done by timers. Each thread would have to allocate a timer to receive messages if pubsub did not reside in that thread. There might be more elegant approaches to this, but I am not aware of them.
From a theoretic point of view: There is no algorithm which does this for an arbitrary program. It is like the halting problem.
You can inspect the used modules and check if these are granted to be thread safe. But there is no general way to check the byte code of a module for thread safety.
Related
I am implementing a MQTT worker in python with paho-mqtt.
Are all the on_message() multi threaded in different threads, so that if one of the task is time consuming, other messages can still be processed?
If not, how to achieve this behaviour?
The python client doesn't actually start any threads, that's why you have to call the loop function to handle network events.
In Java you would use the onMessage callback to put the incoming message on to a local queue that a separate pool of threads will handle.
Python doesn't have native threading support but does have support for spawning processes to act like threads. Details of the multiprocessing can be found here:
https://docs.python.org/2.7/library/multiprocessing.html
EDIT:
On looking closer at the paho python code a little closer it appears it can actually start a new thread (using the loop_start() function) to handle the network side of things previously requiring the loop functions. This does not change the fact the all calls to the on_message callback will happen on this thread. If you need to do large amounts of work in this callback you should definitely look spinning up a pool of new threads to do this work.
http://www.tutorialspoint.com/python/python_multithreading.htm
I am running python's apscheduler and periodically want to do some work POST-ing to some http resources which will involve using tornado's AsyncHttpClient as a scheduled job. Each job will do several POSTs. When each http request responds a callback is then called (I think that Tornado uses a future to accomplish this).
I am concerned with thread-safety here since Apscheduler runs jobs in various threads. I have not been able to find a well explained example of how tornado would best be used across multiple threads in this context.
How can I best use apscheduler with tornado in this manner?
Specific concerns:
Which tornado ioloop to use? The docs say that AsyncHTTPClient "works like magic". Well, magic scares me. Do I need to use AsyncHTTPClient from within the current thread or can I use the main one (it can be specified)?
Are there thread-safety issues with my callback with respect to which ioloop I use?
Not clear to me what happens when a thread completes but there is still a pending callback/future that needs to be called. Are there issues here?
Since apscheduler is run as threads in-process, and python has the GIL, then is it pretty much the same to have one IOLoop from the main thread - as opposed to multiple loops from different threads (with respect to performance)?
All of Tornado's utilities work around Tornado's IOLoop - this includes the AsyncHTTPClient as well. And an IOLoop is not considered thread safe. Therefore, it is not a great idea to be running AsyncHTTPClient from any thread other than the thread running your main IOLoop. For more details on how to use the IOLoop, read this.
If you use tornado.ioloop.IOLoop.instance(), then I suppose you will if your intention is not to add callbacks to the main thread's IOLoop. You can use tornado.ioloop.IOLoop.current() to correctly refer to the right IOLoop instance for the right thread. And you will have to do just too much book keeping to add a callback to a non-main thread's IOLoop from another non-main thread's IOLoop - it will just get too messy.
I don't quite get this. But the way I understand it, there are two scenarios. Either you are talking about a thread with an IOLoop or without an IOLoop. If the thread does not have an IOLoop running, then after whatever the thread does to reach completion, whatever callback has to be executed by the IOLoop in some other thread (perhaps main thread) will be executed. The other scenario is that the thread you are talking about has an IOLoop running. Then the thread won't complete unless you have stopped the IOLoop. And therefore, execution of the callback will really depend on when you stop the IOLoop.
Honestly, I don't see much point of using threads with Tornado. There won't be any performance gain unless you are running on PyPy, which I am not sure if Tornado will play well with (not all the things are known to work on it and honestly I don't know about Tornado as well). You might as well have multiple process of your Tornado app if it is webserver and use Nginx as a proxy and LB. Since you have brought in apscheduler, I would suggest using IOLoop's add_timeout which does pretty much the same thing that you need and it is native to Tornado which play much nicer with it. Callbacks are anyways much difficult to debug. Combine it with Python's threading and you can have a massive mess. If you are ready to consider another option, just move all the async processing out of this process - it will make life much easier. Think of something like Celery for this.
I am creating an application using Python.
I first designed an API, that is working fine.
I am now designing my GUI. The GUI starts a Thread that is used to perform tasks against the API.
Up to now, I used the Observer pattern to handle communication through the different layers.
Basically, communication can be of two types (mainly):
- The GUI asking the Thread (and the API subsequently) to START/STOP
- The API giving information back to the Thread, that propagates to the GUI.
Here is a simple schema of the current architecture I am talking about.
One arrow means "notify", basically.
My concern is that when the application Thread communicates, both the Gui and the API receive the message because they subscribed. Thing is, each message is only meant to be read by one of the two.
What I did to solve that is to send an message together with an ID. Each of the three elements have an id and they know whether the message is for them or now.
But I am not sure if this is the "correct" (understand nicest) way to do it. What if I have more parties in the future ?
I started thinking about some kind of manager handling communication, but It would then have to be at the top of the architecture, and I am not sure how to further organize it :s.
I am not asking for a complete solution, but mainly ideas or best practises by more experienced people ;)
I can keep handling multiple Observer pattern in this simple case.
But I was thinking about porting my code on a server. In this case, I am likely to have way more than one thread for the application, and handling API calls will become quite impossible.
Link to the code I am talking about :
GUI, ApplicationThread and Application API.
You want to look at notify and update methods.
Thx for any piece of advice !
One of the nice implementation of the observer pattern I've met is the signal/slot system in Qt. Objects have signals, and slots (which are actually methods) can be connected to signals. The connected slots are called when the signals are emitted.
It seems to me that some of your problems may stem from the fact you have single communication canal in each of your objects. This forces you to have some dispatch mechanism in every update method, and makes the code quite complex.
Taking inspiration from Qt, you could have different signals for each kind of message and recipient. The code for signal would look like :
class Signal:
def __init__(self):
self.subs = []
def subscribe(self, s):
self.subs.append(s)
def signal(self, *args, **kwargs):
for s in self.subs:
s(*args, **kwargs)
For example, the gui would have a signal stop_signal and the thread a method to handle it :
def handle_gui_stop(self):
self.console_logger.debug("Facemovie is going to stop")
self.my_logger.debug("Facemovie is going to stop")
self.stop_process = True
# ...
Somewhere in the initialization code, we would tie everything together :
gui.stop_signal.subscribe(thread.handle_gui_stop)
I recently created a GUI app with similar architecture (GUI thread + a separate work thread), and I end up creating an explicit protocol between threads, in a form of two queues (from Queue Python module). One queue is for requests made by GUI and is consumed by worker thread(s). The other queue is for answers produces by worker threads and consumed by GUI.
I find it much clearer when communication between threads is explicit, you have full control over when and where the updating is done (GUI methods can be called only from the GUI thread).
A natural extension of this model in a server environment is a message queue protocol like AMQP.
Application thread must be more explicit about communications since it's the communicator between GUI and Application API. This may be achieved by separating working sets (queues) coming from GUI and from Application API. Also, Application Thread must be able to handle delivery-type of pattern, which includes command senders and recipients. This includes managing communications between different queues (e.g. GUI queue has pending command which is awaiting for the command in the Application API queue. Once this one completes, then the Application Thread passes results back between the queues). And each queue is the observer itself.
In terms of extending the application, it seems to me that you want to add more GUIs in the future, which will be handled by request/response (or sender/receiver) pattern implementation described above (that will suffice).
If you plan to add more layers vertically instead of horisontally then you should not use the same Application Thread to communicate upper between new layers. Well, physically it can be the same, but virtually it must be different at least. This can be achieved by exactly implementing what I described above again (separate queues). By introducing dynamically adding queues you will open the possibility to add a new layer (new layer simply corresponds to new queue then).
Specially with GUIs, I recomend another pattern: MVC. It includes the Observer patterns in it and is more robust than the Observer alone.
It solves your concern because it separates concerns: each layer has a very specific role and you can change any of them, as long as you don't change the interface between them.
I'm quite new to python threading/network programming, but have an assignment involving both of the above.
One of the requirements of the assignment is that for each new request, I spawn a new thread, but I need to both send and receive at the same time to the browser.
I'm currently using the asyncore library in Python to catch each request, but as I said, I need to spawn a thread for each request, and I was wondering if using both the thread and the asynchronous is overkill, or the correct way to do it?
Any advice would be appreciated.
Thanks
EDIT:
I'm writing a Proxy Server, and not sure if my client is persistent. My client is my browser (using firefox for simplicity)
It seems to reconnect for each request. My problem is that if I open a tab with http://www.google.com in it, and http://www.stackoverflow.com in it, I only get one request at a time from each tab, instead of multiple requests from google, and from SO.
I answered a question that sounds amazingly similar to your, where someone had a homework assignment to create a client server setup, with each connection being handled in a new thread: https://stackoverflow.com/a/9522339/496445
The general idea is that you have a main server loop constantly looking for a new connection to come in. When it does, you hand it off to a thread which will then do its own monitoring for new communication.
An extra bit about asyncore vs threading
From the asyncore docs:
There are only two ways to have a program on a single processor do
“more than one thing at a time.” Multi-threaded programming is the
simplest and most popular way to do it, but there is another very
different technique, that lets you have nearly all the advantages of
multi-threading, without actually using multiple threads. It’s really
only practical if your program is largely I/O bound. If your program
is processor bound, then pre-emptive scheduled threads are probably
what you really need. Network servers are rarely processor bound,
however.
As this quote suggests, using asyncore and threading should be for the most part mutually exclusive options. My link above is an example of the threading approach, where the server loop (either in a separate thread or the main one) does a blocking call to accept a new client. And when it gets one, it spawns a thread which will then continue to handle the communication, and the server goes back into a blocking call again.
In the pattern of using asyncore, you would instead use its async loop which will in turn call your own registered callbacks for various activity that occurs. There is no threading here, but rather a polling of all the open file handles for activity. You get the sense of doing things all concurrently, but under the hood it is scheduling everything serially.
I'm using Suds to access a SOAP web service from python. If I have multiple threading.Thread threads of execution, can each of them safely access the same suds.client.Client instance concurrently, or must I create separate Client objects for each thread?
As far as I know they are NOT thread safe. You could safely use the same client object so long as you are using a queue or thread pool. That way when one thread is done with the client, the next one can use it.
For network-based events however, you should probably ask yourself which is better. Threading or asynchronous network programming? There was recently a patch proposed to SUDS to enable support for asynch sockets for use with event-based packages such as Twisted, greenlets, etc.