Python - Best practices to queue / dequeue threads - python

I have a queue which can contain max 4 queued objects. These objects are threads running web service requests. The thread part is OK.
I have followed many tutorials which talk about consumer and producer threads used to fill in and out the queue object.
My question is about the consumer part. In all these tutorials and regarding the python doc, the only way I have found to pile out objects from the queue is :
while len(requltArray) < amountOfThreads:
thread = q.get(True)
thread.join()
Imagine the q.get(True) piles out a thread with an invalid web service request. And imagine this thread have to wait for urllib timeout to end. My consumer will be blocked for some seconds. As my queue is limited to 4 threads and maybe the 3 others have ended yet, I waste time until consumer can continue the pile-outs (and producer can fill the queue).
Is there any way or well-known design pattern to avoid this waste of time ?
Thanks for your help

Maybe you could use conditions
Imagine you want to put a new thread into your queue but it is full. Then you could wait() for the condition object (pseudo code):
condition.acquire()
while not queue.has_free_place():
condition.wait()
add_new_thread_to_queue()
condition.release()
And inside your queued threads you could place something like the following code at the end of execution:
condition.acquire()
remove_myself_from_queue()
condition.notify()
condition.release()

Related

Is it reasonable to terminate multi-threaded processing via a queue by means of a special object in the queue?

I am quite experienced in single-threaded Python as well as embarrasingly parallel multi-processing, but this is the first time I attempt processing something with a producer- and a consumer-thread via a shared queue.
The producer thread is going to download data items from URLs and put them in a queue. Simultaneously, a consumer thread is going to process the data items as they arrive on the queue.
Eventually, there will be no more data items to download and the program should terminate. I wish for the consumer thread to be able to distinguish whether it should keep waiting at an empty queue, because more items may be coming in, or it should terminate, because the producer thread is done.
I am considering signaling the latter situation by placing a special object on the queue in the producer thread when there are no more data items to download. When the consumer thread sees this object, it then stops waiting at the queue and terminates.
Is this a sensible approach?

threading and multithreading in python with an example

I am a beginner in python and unable to get an idea about threading.By using simple example could someone please explain threading and multithreading in python?
-Thanks
Here is Alex Martelli's answer about multithreading, as linked above.
He uses a simple program that tries some URLs then returns the contents of first one to respond.
import Queue
import threading
import urllib2
# called by each thread
def get_url(q, url):
q.put(urllib2.urlopen(url).read())
theurls = ["http://google.com", "http://yahoo.com"]
q = Queue.Queue()
for u in theurls:
t = threading.Thread(target=get_url, args = (q,u))
t.daemon = True
t.start()
s = q.get()
print s
This is a case where threading is used as a simple optimization: each subthread is waiting for a URL to resolve and respond, in order to put its contents on the queue; each thread is a daemon (won't keep the process up if main thread ends -- that's more common than not); the main thread starts all subthreads, does a get on the queue to wait until one of them has done a put, then emits the results and terminates (which takes down any subthreads that might still be running, since they're daemon threads).
Proper use of threads in Python is invariably connected to I/O operations (since CPython doesn't use multiple cores to run CPU-bound tasks anyway, the only reason for threading is not blocking the process while there's a wait for some I/O). Queues are almost invariably the best way to farm out work to threads and/or collect the work's results, by the way, and they're intrinsically threadsafe so they save you from worrying about locks, conditions, events, semaphores, and other inter-thread coordination/communication concepts.

prevent thread starvation Python

I have some function which does some file writing. The semaphore is for limiting a number of threads to 2. The total number of threads are 3. How can I prevent from the 3 threads a starvation? Is the queue is an option for that?
import time
import threading
sema = threading.Semaphore(2)
def write_file(file,data):
sema.acquire()
try:
f=open(file,"a")
f.write(data)
f.close()
finally:
sema.release()
I have to object to the accepted question. It is true that Condition queues the waits, but the more important part is when it tries to acquire the Condition lock.
The order in which threads are released is not deterministic
The implementation may pick one at random, so the order in which blocked threads are awakened should not be relied on.
In the case of three threads, there I agree, it's very unlikely that two are trying to acquire the lock at the same time (one working, one in wait, one acquiring the lock), but there still might be interferences.
A good solution for your problem IMO would be a thread that's single purpose is to read your data from a queue and write it to a file. All other threads can write to the queue and continue working.
If a thread is waiting to acquire the semaphore, either of the other two threads will be done writing and release the semaphore.
If you are worried that if there is a lot of writing going on, the writers might reacquire the semaphore before the waiting thread is notified. This can not happen, I think.
The Semaphore object in Python (2.7) uses a Condition. The Condition adds waiting threads (actually a lock, which the waiting thread is blocking on) to the end of an waiters list and when notifying threads, the notified threads are taken from the beginning of the list. So the list acts like a FIFO-queue.
It looks something like this:
def wait(self, timeout=None):
self.__waiters.append(waiter)
...
def notify(self, n=1):
...
waiters = self.__waiters[:n]
for waiter in waiters:
waiter.release()
...
My understanding, after reading the source code, is that Python's Semaphores are FIFO. I couldn't find any other information about this, so please correct me if I'm wrong.

How to wakeup thread from thread pool in python?

I new to Python and am developing an application in Python 2.7. I am using a thread pool provided by the concurrent.futures library. Once a thread from ThreadPool is started, it needs to wait for some message from RabbitMQ.
How can I implement this logic in Python to make this thread from the pool wait for event messages? Basically I need to wake up a waiting thread once I receive message from RabbitMQ (i.e wait and notify implementation on ThreadPool).
First you define a Queue:
from Queue import Queue
q = Queue()
then, in your thread, you attempt to get an item from that queue:
msg = q.get()
this will block the entire thread until there is something to be found in the queue.
Now, at the same time, assuming your incoming events are notified by means of triggering callbacks, you register a callback that simply puts the received RabbitMQ message in the queue:
def on_message(msg):
q.put(msg)
rabbitmq_channel.register_callback(on_message)
or if you like shorter code:
rabbitmq_channel.register_callback(lambda msg: q.put(msg))
(the above is pseudocode because I've not used RabbitMQ nor whatever Python bindings for RabbitMQ, but you should be able to easily figure out how to adapt the snippet to your real application code; the key part to pay attention to is q.put(msg)—just make sure that part gets invoked as soon as a new message is notified.)
as soon as this happens, the thread is awakened and is free to process the message. In order to reuse the same thread for multiple messages, just use a while loop:
while True:
msg = q.get()
process_message(msg)
P.S. I would suggest looking into Gevent and how to combine it with RabbitMQ in your Python application so as to be able to get rid of threads and use more lightweight and scalable green threading mechanism instead without ever having to manage a threadpool (because you can just have tens of thousands of greenlets spawned and killed on the fly):
# this thing always called in a green thread; forget about pools and queues.
def on_message(msg):
# you're in a green thread now; just process away!
benefit_from("all the gevent goodness!")
spawn_and_join_10_sub_greenlets()
rabbitmq_channel.register_callback(lambda msg: gevent.spawn(on_message, msg))

Synchronising multiple threads in python

I have a problem where I need x threads to wait until they have all reached a synchronization point. My solution uses the synchronise method below which is called by each threaded function when they need to synchronise.
Is there a better way to do this?
thread_count = 0
semaphore = threading.Semaphore()
event = threading.Event()
def synchronise(count):
""" All calls to this method will block until the last (count) call is made """
with semaphore:
thread_count += 1
if thread_count == count:
event.set()
event.wait()
def threaded_function():
# Do something
# Block until 4 threads have reached this point
synchronise(4)
# Continue doing something else
Note that Barrier has been implemented as of Python 3.2
Example of using barriers:
from threading import Barrier, Thread
def get_votes(site):
ballots = conduct_election(site)
all_polls_closed.wait() # do not count until all polls are closed
totals = summarize(ballots)
publish(site, totals)
all_polls_closed = Barrier(len(sites))
for site in sites:
Thread(target=get_votes, args=(site,)).start()
There are many ways to synchronize threads. Many.
In addition to synchronize, you can do things like the following.
Break your tasks into two steps around the synchronization point. Start threads doing the pre-sync step. Then use "join" to wait until all threads finish step 1. Start new threads doing the post-sync step. I prefer this, over synchronize.
Create a queue; acquire a synchronization lock. Start all threads. Each thread puts an entry in the queue and waits on the synchronization lock. The "main" thread sits in a loop dequeueing items from the queue. When all threads have put an item in the queue, the "main" thread releases the synchronization lock. All other threads are now free to run again.
There are a number of interprocess communication (IPC) techniques -- all of which can be used for thread synchronization.
The functionality you want is called a "barrier". (Unfortunately that term has 2 meanings when talking about threading. So if you Google it, just ignore articles that talk about "memory barriers" - that's a very different thing).
Your code looks quite reasonable - it's simple and safe.
I couldn't find any "standard" implementations of barriers for Python, so I suggest you keep using your code.

Categories

Resources