Queue with Thread hangs when worker crashes - python

I am using Python Queue with Thread. I noticed when a worker crashes, the script hangs and doesn't let me terminate. The following is an example:
from Queue import Queue
from threading import Thread
num_worker_threads = 2
def worker():
while True:
item = q.get()
1/item
q.task_done()
q = Queue()
for i in range(num_worker_threads):
t = Thread(target=worker)
t.daemon = True
t.start()
q.put(0)
q.join()

An even cleaner approach would be to have task_done() in the finally block instead.
from Queue import Queue
from threading import Thread
num_worker_threads = 2
def worker():
while True:
item = q.get()
try:
1/item
except Exception as e:
print e
finally:
q.task_done()
q = Queue()
for i in range(num_worker_threads):
t = Thread(target=worker)
t.daemon = True
t.start()
q.put(0)
q.join()

I fixed this by wrapping the job in an exception. I would have thought that when a worker crashes, the script would exit. This is not the case. It looks like q.task_done() never gets called so it hangs on q.join().
Solution:
from Queue import Queue
from threading import Thread
num_worker_threads = 2
def worker():
while True:
item = q.get()
try:
1/item
except Exception as e:
print e
finally:
q.task_done()
q = Queue()
for i in range(num_worker_threads):
t = Thread(target=worker)
t.daemon = True
t.start()
q.put(0)
q.join()
Added suggestion by rsy.

You're calling while True which is continuously trying to get elements from the Queue by using q.get() in your thread, but there may not be any items in the queue. Which will throw an Empty exception since there's nothing in the queue to actually get.
Your loop should be while not q.empty(): or you should be catching the Queue.Empty exception.

Related

Quit signal when waiting for blocking read from queue.Queue

In many cases I have a worker thread which pops data from a Queue and acts on it. At some kind of event I want my worker thread to stop. The simple solution is to add a timeout to the get call and check the Event/flag every time the get times out. This however as two problems:
Causes an unnecessary context switch
Delays the shutdown until a timeout occurs
Is there any better way to listen both to a stop event and new data in the Queue? Is it possible to listen to two Queue's at the same time and block until there's data in the first one? (In this case one can use a second Queue just to trigger the shutdown.)
The solution I'm currently using:
from queue import Queue, Empty
from threading import Event, Thread
from time import sleep
def worker(exit_event, queue):
print("Worker started.")
while not exit_event.isSet():
try:
data = queue.get(timeout=10)
print("got {}".format(data))
except Empty:
pass
print("Worker quit.")
if __name__ == "__main__":
exit_event = Event()
queue = Queue()
th = Thread(target=worker, args=(exit_event, queue))
th.start()
queue.put("Testing")
queue.put("Hello!")
sleep(2)
print("Asking worker to quit")
exit_event.set()
th.join()
print("All done..")
I guess you may easily reduce timeout to 0.1...0.01 sec. Slightly different solution is to use the queue to send both data and control commands to the thread:
import queue
import threading
import time
THREADSTOP = 0
class ThreadControl:
def __init__(self, command):
self.command = command
def worker(q):
print("Worker started.")
while True:
data = q.get()
if isinstance(data, ThreadControl):
if data.command == THREADSTOP:
break
print("got {}".format(data))
print("Worker quit.")
if __name__ == '__main__':
q = queue.Queue()
th = threading.Thread(target=worker, args=(q,))
th.start()
q.put("Testing")
q.put("Hello!")
time.sleep(2)
print("Asking worker to quit")
q.put(ThreadControl(command=THREADSTOP)) # sending command
th.join()
print("All done..")
Another option is to use sockets instead of queues.

Should you join() a completed daemon Thread?

I have some worker threads consuming data from a pre-populated input queue, and putting results into another queue.
import queue
import threading
worker_count = 8
input_queue = queue.Queue()
output_queue = queue.Queue()
threads = []
for _ in range(worker_count):
thread = threading.Thread(target=perform_work, args=(input_queue, output_queue)
thread.daemon = True
thread.start()
threads.append(thread)
I am processing the results in the main thread, and I want to make sure I process all of the results.
while True:
try:
result = output_queue.get(True, 0.1)
except queue.Empty:
pass
else:
process_result(result)
if not any([t.is_alive() for t in threads]) and not output_queue.empty():
# All results have been processed, stop.
break
Is it safe to just use .is_alive() in this case? Or is there a particular reason to use .join() instead?
NOTE: I'm making my threads daemon = True because it makes it easier to debug and terminate the program.

python avoid busy wait in event processing thread

How can I avoid busy_wait from the event consumer thread using asyncio?
I have a main thread which generates events which are processed by other thread. My event thread has busy_wait as it is trying to see if event queue has some item in it...
from Queue import Queue
from threading import Thread
import threading
def do_work(p):
print("print p - %s %s" % (p, threading.current_thread()))
def worker():
print("starting %s" % threading.current_thread())
while True: # <------------ busy wait
item = q.get()
do_work(item)
time.sleep(1)
q.task_done()
q = Queue()
t = Thread(target=worker)
t.daemon = True
t.start()
for item in range(20):
q.put(item)
q.join() # block until all tasks are done
How can I achieve something similar to the above code using asyncio?
asyncio makes sense only if you are working with IO, for example running an HTTP server or client. In the following example asyncio.sleep() simulates I/O calls. If you have a bunch of I/O tasks it can get as simple as:
import asyncio
import random
async def do_work(i):
print("[#{}] work part 1".format(i))
await asyncio.sleep(random.uniform(0.5, 2))
print("[#{}] work part 2".format(i))
await asyncio.sleep(random.uniform(0.1, 1))
print("[#{}] work part 3".format(i))
return "#{}".format(i)
loop = asyncio.get_event_loop()
tasks = [do_work(item + 1) for item in range(20)]
print("Start...")
results = loop.run_until_complete(asyncio.gather(*tasks))
print("...Done!")
print(results)
loop.close()
see also ensure_future and asyncio.Queue.

queue and thread, while loop in thread

I'm writing something in Python 3 to get proxies from sites and check if the proxies are valid.
I used queue and threading module to make the check procedure faster.
However, the consequence was weird.
def worker():
while True:
item = q.get()
do_work(item)
q.task_done()
q = Queue()
for i in range(num_worker_threads):
t = Thread(target=worker)
t.daemon = True
t.start()
for item in source():
q.put(item)
q.join()
This is an example from queue document. My code is based on this example.
So, my question is:
When will the while loop in worker() end?
When the number of item in queue more than 200, the q keeps block code and 1 item in the queue can't get processed and 1 thread keeps doing q.get(), while other threads say that q was empty.
Please help me out. Thanks.
And sorry about my poor English. I'm still working on it.
----Update ---------------------------------------------------------------------
I tried ThreadPoolExecutor, and it worked, like threading and queue. But the blocking situation didn't change.
After a 20 min game, one trial running of the code ended and printed the expected output.
I found that the check procedure ends in 2 or 3 minutes (for 100 proxies), and the code just kept blocking for about 10 minutes before it ended.
And the second question:
What may cause this?
Thank you! :)
----Update----------------------------------------------------------------------
Problem solved!!
I thought it was the thread thing that cause the block, but it turns out that the connection and transfer time is the causation.
Since I use pycurl for proxy check, and pycurl's default TIMEOUT is 300.
I only set CONNECTTIMEOUT to 5 and ignored TIMEOUT which limits the whole transfer time.
And this is the new code I use for proxy check:
c = pycurl.Curl()
c.setopt(c.URL, url)
c.setopt(c.HTTPHEADER, headers)
c.setopt(c.PROXY, proxy)
c.setopt(c.WRITEFUNCTION, lambda x: None)
c.setopt(c.CONNECTTIMEOUT, 5)
*c.setopt(c.TIMEOUT, 5)*
c.perform()
c.close()
However, setting TIMEOUT to 5 reduced the number of valid proxies significantly. I will keep trying for the best TIMEOUT value.
A while True loop without like that will never end, and your threads will never quit. You have to tell explicitly your threads when to exit.
A way of doing this is by using a sentinel, like this:
end_of_queue = object()
def worker():
while True:
item = q.get()
if item is end_of_queue:
q.task_done()
break
do_work(item)
q.task_done()
q = Queue()
for i in range(num_worker_threads):
t = Thread(target=worker)
t.daemon = True
t.start()
for item in source():
q.put(item)
for i in range(num_worker_threads):
q.put(end_of_queue)
q.join()
What I've done here is adding a few end_of_queue elements to your queue, one for each thread. When a thread sees this end_of_queue object, it means it has to quit and can break out of the loop.
If you prefer a different approach, you can consider using an Event object to notify the threads when they have to quit, like this:
quit_event = Event()
def worker():
while not q.empty() or not quit_event.is_set():
try:
item = q.get(timeout=.1)
except Empty:
continue
do_work(item)
q.task_done()
q = Queue()
for i in range(num_worker_threads):
t = Thread(target=worker)
t.daemon = True
t.start()
for item in source():
q.put(item)
quit_event.set()
q.join()
The drawback of this solution is that you have to get() with a timeout.
Last but not least, your code seems could benefit from using a thread pool, like this:
with ThreadPoolExecutor(max_workers=num_worker_threads) as executor:
executor.map(do_work, source())
(For the reference, ThreadPoolExecutor uses the end_of_queue approach, the only two differences are that end_of_queue is None and each thread is responsible for notifying the other ones.)
just another example of using thread, queue and loop from a class
import threading
import Queue
q = Queue.Queue()
class listener(object):
def __init__(self):
thread = threading.Thread(target=self.loop)
# thread.daemon = True
thread.start()
def loop(self):
for i in xrange(0,13):
q.put(i)
class ui(object):
def __init__(self):
listener()
while True:
item = q.get()
print item
if item == 10:
break
ui()

In Producer/Consumer pattern, how could I kill the consumer thread?

I will run the consumer in another work thread, the code is as following:
def Consumer(self):
while True:
condition.acquire()
if not queue:
condition.wait()
json = queue.pop()
clients[0].write_message(json)
condition.notify()
condition.release()
t = threading.Thread(target=self.Consumer);
t.start()
However, I find that I could not kill this work thread, the thread will be wait() all the time after the job...
I try to send a single from Procedurer to Consumer whenever finish the procedure work, if the consumer receive the single, the work thread should exit(), is it possible to do that ?
My standard way to notify a consumer thread that should stop its work is send a fake message (I rewrite it to make it runnable):
import threading
condition = threading.Condition()
queue = []
class Client():
def write_message(self,msg):
print(msg)
clients=[Client()]
jobdone=object()
def Consumer():
while True:
condition.acquire()
try:
if not queue:
condition.wait()
json = queue.pop()
if json is jobdone:
break;
clients[0].write_message(json)
finally:
condition.release()
t = threading.Thread(target=Consumer);
t.start()
import time
time.sleep(2)
condition.acquire()
queue.append(jobdone)
condition.notify()
condition.release()
Anyway consider to use queue.Queue that is standard and make synchronization simple. Here is how my example become:
import threading
import queue
import time
queue = queue.Queue()
class Client():
def write_message(self,msg):
print(msg)
clients=[Client()]
jobdone=object()
def Consumer():
while True:
json = queue.get()
if json is jobdone:
break;
clients[0].write_message(json)
t = threading.Thread(target=Consumer);
t.start()
queue.put("Hello")
queue.put("Word")
time.sleep(2)
queue.put(jobdone)
t.join()
#You can use also q.join()
print("Job Done")

Categories

Resources