I want to implement a master slave multithreaded architecture where a master thread keeps refreshing a list of tasks that needs to be done by the slave threads. Sometimes, the list can contain new tasks, or other taks be removed, which means it has to update the slave thread list.
Is there a way to control the thread save list from the master thread? For example, if the task list after refresh doesn't contain the thread "5", it must be stopped and be removed from redoing it, while the others already in the threadpool are ignored, or it would create, start and add to the pool a new thread that doesn't have the said task in the pool.
The slave tasks are a while loop listeners that would eventually write in a DB.
Which is the safest way to stop a thread without crashing the whole pool?
Thank you!
PS: I'm sorry about the slave/master terminology, but I found the children/parent just as bad!
Related
I have to set up a worker which handles some data after a certain event happens. I know I can start the worker with python manage.py runworker my_worker, but what I would need is to start the worker in the same process as the main Django app on a separate thread.
Why do I need it in a separate thread and not in a separate process? Because the worker would perform a pretty light-weight job which would not overload the server's resources, and, moreover, the effort of making the set up for a new process in the production is not worth the gain in performance. In other words, I would prefer to keep it in the Django's process if possible.
Why not perform the job synchronously? Because it is a separate logic that needs the possibility to be extended, and it is out of the main HTTP request-reply scope. It is a post-processing task which doesn't interfere with the main logic. I need to decouple this task from an infrastructural point-of-view, not only logical (e.g. with plain signals).
Is there any possibility provided by Django Channels to run a worker in such a way?
Would there be any downsides to start the worker manually on a separate thread?
Right now I have the setup for a message broker consumer thread (without using Channels), so I have the entry point for starting a new worker thread. But as I've seen from the Channel's runworker command, it loads the whole application, so it doesn't seem like a naïve worker.run() call is the proper way to do it (I might be wrong with this one).
I found an answer to my question.
The answer is no, you can't just start a worker within the same process. This is because the consumer needs to run inside an event loop thread and it is not good at all to have more than one event loop thread in the same process (Django WSGI application already runs the main thread with an event loop).
The best you can do is to start the worker in a separate process. As I mentioned in my question, I started a message broker consumer on a separate thread, which was not a good approach either, so I changed my configuration to start the consumers as separate processes.
I am in a situation where I have two endpoints I can ask for a value, and one may be faster than the other. The calls to the endpoints are blocking. I want to wait for one to complete and take that result without waiting for the other to complete.
My solution was to issue the requests in separate threads and have those threads set a flag to true when they complete. In the main thread, I continuously check the flags (I know it is a busy wait, but that is not my primary concern right now) and when one completes it takes that value and returns it as the result.
The issue I have is that I never clean up the other thread. I can't find any way to do it without using .join(), which would just block and defeat the purpose of this whole thing. So, how can I clean up that other, slower thread that is blocking without joining it from the main thread?
What you want is to make your threads daemons, so when you get the result and finish your main, the other running thread will be forced to finish. You do that by changing the daemon keyword to True:
tr = threading.Thread(daemon=True)
From the threading docs:
The significance of this flag is that the entire Python program exits
when only daemon threads are left.
Although:
Daemon threads are abruptly stopped at shutdown. Their resources (such
as open files, database transactions, etc.) may not be released
properly. If you want your threads to stop gracefully, make them
non-daemonic and use a suitable signalling mechanism such as an Event.
I don't have any particular experience with Events so can't elaborate on that. Feel free to click the link and read on.
One bad and dirty solution is to implement a methode for the threads which close the socket which is blocking. Now you have to catch the exception in the main thread.
I'm pretty new to multiprocessing in Python and I've done a lot of digging around, but can't seem to find exactly what I'm looking for. I have a bit of a consumer/producer problem where I have a simple server with an endpoint that consumes from a queue and a function that produces onto the queue. The queue can be full, so the producer doesn't always need to be running.
While the queue isn't full, I want the producer task to run but I don't want it to block the server from receiving or servicing requests. I tried using multithreading but this producing process is very slow and the GIL slows it down too much. I want the server to be running all the time, and whenever the queue is no longer full (something has been consumed), I want to kick off this producer task as a separate process and I want it to run until the queue is full again. What is the best way to share the queue so that the producer process can access the queue used by the main process?
What is the best way to share the queue so that the producer process can access the queue used by the main process?
If this is the important part of your question (which seems like it's actually several questions), then multiprocessing.Queue seems to be exactly what you need. I've used this in several projects to have multiple processes feed a queue for consumption by a separate process, so if that's what you're looking for, this should work.
I have read lots about python threading and the various means to 'talk' across thread boundaries. My case seems a little different, so I would like to get advice on the best option:
Instead of having many identical worker threads waiting for items in a shared queue, I have a handful of mostly autonomous, non-daemonic threads with unique identifiers going about their business. These threads do not block and normally do not care about each other. They sleep most of the time and wake up periodically. Occasionally, based on certain conditions, one thread needs to 'tell' another thread to do something specific - an action -, meaningful to the receiving thread. There are many different combinations of actions and recipients, so using Events for every combination seems unwieldly. The queue object seems to be the recommended way to achieve this. However, if I have a shared queue and post an item on the queue having just one recipient thread, then every other thread needs monitor the queue, pull every item, check if it is addressed to it, and put it back in the queue if it was addressed to another thread. That seems a lot of getting and putting items from the queue for nothing. Alternatively, I could employ a 'router' thread: one shared-by-all queue plus one queue for every 'normal' thread, shared with the router thread. Normal threads only ever put items in the shared queue, the router pulls every item, inspects it and puts it on the addressee's queue. Still, a lot of putting and getting items from queues....
Are there any other ways to achieve what I need to do ? It seems a pub-sub class is the right approach, but there is no such thread-safe module in standard python, at least to my knowledge.
Many thanks for your suggestions.
Instead of having many identical worker threads waiting for items in a shared queue
I think this is the right approach to do this. Just remove identical and shared from the above statement. i.e.
having many worker threads waiting for items in queues
So I would suggest using Celery for this approach.
Occasionally, based on certain conditions, one thread needs to 'tell'
another thread to do something specific - an action, meaningful to the receiving thread.
This can be done by calling another celery task from within the calling task. All the tasks can have separate queues.
Thanks for the response. After some thoughts, I have decided to use the approach of many queues and a router-thread (hub-and-spoke). Every 'normal' thread has its private queue to the router, enabling separate send and receive queues or 'channels'. The router's queue is shared by all threads (as a property) and used by 'normal' threads as a send-only-channel, ie they only post items to this queue, and only the router listens to it, ie pulls items. Additionally, each 'normal' thread uses its own queue as a 'receive-only-channel' on which it listens and which is shared only with the router. Threads register themselves with the router on the router queue/channel, the router maintains a list of registered threads including their queues, so it can send an item to a specific thread after its registration.
This means that peer to peer communication is not possible, all communication is sent via the router.
There are several reasons I did it this way:
1. There is no logic in the thread for checking if an item is addressed to 'me', making the code simpler and no constant pulling, checking and re-putting of items on one shared queue. Threads only listen on their queue, when a message arrives the thread can be sure that the message is addressed to it, including the router itself.
2. The router can act as a message bus, do vocabulary translation and has the possibility to address messages to external programs or hosts.
3. Threads don't need to know anything about other threads capabilities, ie they just speak the language of the router. In a peer-to-peer world, all peers must be able to understand each other, and since my threads are of many different classes, I would have to teach each class all other classes' vocabulary.
Hope this helps someone some day when faced with a similar challenge.
So I have the following program:
https://github.com/eWizardII/homobabel/blob/master/Experimental/demo_async_falcon.py
However, when it's run I only get two active threads are running, how can I make it so that there are more threads running. I have tried doing stuff like urlv2 = birdofprey(ip2) where ip2 = str(host+1) however that just ends up sending the same thing to two threads. Any help would be appreciated.
Thanks,
Active count=2 means that you have one of your designed thread (birdofprey), and the main thread. This is because you use lock, so the second birdofprey thread waits for the first and so on. I didn't get deeper into the algorithm, but it seems that you don't need to lock birdofprey threads, since they don't share any data (I could get wrong). If they share, you should make exclusive access to the shared data, and not to lock the whole body of run.
Update upon comment
remove locks (if there is no shared data. storage_i is not a shared data.);
in the for loop` create threads, start them, append to a list;
make the second loop over the list of threads, call join collect the information you need.
line 75, urlv.join() blocks until the thread finishes. So you actually create one thread, wait until it's done and then start the next. The other thread is the main thread.
I think the problem is that you need to pull urlv.join() out of the for loop. Right now, because of the join, you're waiting for your new thread to complete before starting the next one.
But for general readability, maintainability, etc., you might want to consider using Python's Queue class to set up a work queue and have a pool of worker threads pulling from it.