I have a program that has two threads, the main thread and one additional that works on handling jobs from a FIFO queue.
Something like this:
import queue
import threading
q = queue.Queue()
def _worker():
while True:
msg = q.get(block=True)
print(msg)
q.task_done()
t = threading.Thread(target=_worker)
#t.daemon = True
t.start()
q.put('asdf-1')
q.put('asdf-2')
q.put('asdf-4')
q.put('asdf-4')
What I want to accomplish is basically to make sure the queue is emptied before the main thread exits.
If I set t.daemon to be True the program will exit before the queue is emptied, however if it's set to False the program will never exit. Is there some way to make sure the thread running the _worker() method clears the queue on main thread exit?
The comments touch on using .join(), but depending on your use case, using a join may make threading pointless.
I assume that your main thread will be doing things other than adding items to the queue - and may be shut down at any point, you just want to ensure that your queue is empty before shutting down is complete.
At the end of your main thread, you could add a simple empty check in a loop.
while not q.empty():
sleep(1)
If you don't set t.daemon = True then the thread will never finish. Setting the thread as a daemon thread will mean that the thread does not cause your program to stay running when the main thread finishes.
Put a special item (e.g. None) in the queue, that signals the worker thread to stop.
import queue
import threading
q = queue.Queue()
def _worker():
while True:
msg = q.get(block=True)
if msg is None:
return
print(msg) # do your stuff here
t = threading.Thread(target=_worker)
#t.daemon = True
t.start()
q.put('asdf-1')
q.put('asdf-2')
q.put('asdf-4')
q.put('asdf-4')
q.put(None)
t.join()
Related
I have some worker threads consuming data from a pre-populated input queue, and putting results into another queue.
import queue
import threading
worker_count = 8
input_queue = queue.Queue()
output_queue = queue.Queue()
threads = []
for _ in range(worker_count):
thread = threading.Thread(target=perform_work, args=(input_queue, output_queue)
thread.daemon = True
thread.start()
threads.append(thread)
I am processing the results in the main thread, and I want to make sure I process all of the results.
while True:
try:
result = output_queue.get(True, 0.1)
except queue.Empty:
pass
else:
process_result(result)
if not any([t.is_alive() for t in threads]) and not output_queue.empty():
# All results have been processed, stop.
break
Is it safe to just use .is_alive() in this case? Or is there a particular reason to use .join() instead?
NOTE: I'm making my threads daemon = True because it makes it easier to debug and terminate the program.
I'm writing something in Python 3 to get proxies from sites and check if the proxies are valid.
I used queue and threading module to make the check procedure faster.
However, the consequence was weird.
def worker():
while True:
item = q.get()
do_work(item)
q.task_done()
q = Queue()
for i in range(num_worker_threads):
t = Thread(target=worker)
t.daemon = True
t.start()
for item in source():
q.put(item)
q.join()
This is an example from queue document. My code is based on this example.
So, my question is:
When will the while loop in worker() end?
When the number of item in queue more than 200, the q keeps block code and 1 item in the queue can't get processed and 1 thread keeps doing q.get(), while other threads say that q was empty.
Please help me out. Thanks.
And sorry about my poor English. I'm still working on it.
----Update ---------------------------------------------------------------------
I tried ThreadPoolExecutor, and it worked, like threading and queue. But the blocking situation didn't change.
After a 20 min game, one trial running of the code ended and printed the expected output.
I found that the check procedure ends in 2 or 3 minutes (for 100 proxies), and the code just kept blocking for about 10 minutes before it ended.
And the second question:
What may cause this?
Thank you! :)
----Update----------------------------------------------------------------------
Problem solved!!
I thought it was the thread thing that cause the block, but it turns out that the connection and transfer time is the causation.
Since I use pycurl for proxy check, and pycurl's default TIMEOUT is 300.
I only set CONNECTTIMEOUT to 5 and ignored TIMEOUT which limits the whole transfer time.
And this is the new code I use for proxy check:
c = pycurl.Curl()
c.setopt(c.URL, url)
c.setopt(c.HTTPHEADER, headers)
c.setopt(c.PROXY, proxy)
c.setopt(c.WRITEFUNCTION, lambda x: None)
c.setopt(c.CONNECTTIMEOUT, 5)
*c.setopt(c.TIMEOUT, 5)*
c.perform()
c.close()
However, setting TIMEOUT to 5 reduced the number of valid proxies significantly. I will keep trying for the best TIMEOUT value.
A while True loop without like that will never end, and your threads will never quit. You have to tell explicitly your threads when to exit.
A way of doing this is by using a sentinel, like this:
end_of_queue = object()
def worker():
while True:
item = q.get()
if item is end_of_queue:
q.task_done()
break
do_work(item)
q.task_done()
q = Queue()
for i in range(num_worker_threads):
t = Thread(target=worker)
t.daemon = True
t.start()
for item in source():
q.put(item)
for i in range(num_worker_threads):
q.put(end_of_queue)
q.join()
What I've done here is adding a few end_of_queue elements to your queue, one for each thread. When a thread sees this end_of_queue object, it means it has to quit and can break out of the loop.
If you prefer a different approach, you can consider using an Event object to notify the threads when they have to quit, like this:
quit_event = Event()
def worker():
while not q.empty() or not quit_event.is_set():
try:
item = q.get(timeout=.1)
except Empty:
continue
do_work(item)
q.task_done()
q = Queue()
for i in range(num_worker_threads):
t = Thread(target=worker)
t.daemon = True
t.start()
for item in source():
q.put(item)
quit_event.set()
q.join()
The drawback of this solution is that you have to get() with a timeout.
Last but not least, your code seems could benefit from using a thread pool, like this:
with ThreadPoolExecutor(max_workers=num_worker_threads) as executor:
executor.map(do_work, source())
(For the reference, ThreadPoolExecutor uses the end_of_queue approach, the only two differences are that end_of_queue is None and each thread is responsible for notifying the other ones.)
just another example of using thread, queue and loop from a class
import threading
import Queue
q = Queue.Queue()
class listener(object):
def __init__(self):
thread = threading.Thread(target=self.loop)
# thread.daemon = True
thread.start()
def loop(self):
for i in xrange(0,13):
q.put(i)
class ui(object):
def __init__(self):
listener()
while True:
item = q.get()
print item
if item == 10:
break
ui()
I want to use threads to do some blocking work. What should I do to:
Spawn a thread safely
Do useful work
Wait until the thread finishes
Continue with the function
Here is my code:
import threading
def my_thread(self):
# Wait for the server to respond..
def main():
a = threading.thread(target=my_thread)
a.start()
# Do other stuff here
You can use Thread.join. Few lines from docs.
Wait until the thread terminates. This blocks the calling thread until the thread whose join() method is called terminates – either normally or through an unhandled exception – or until the optional timeout occurs.
For your example it will be like.
def main():
a = threading.thread(target = my_thread)
a.start()
a.join()
On the completion of thread we emit task_done() to mark the completion of the thread for a Queue object, then in case if a Socket error happens like connection reset by peer then what to emit to mark the exit for thread ?
You would be better of by using threading.Event objects rather than trying to fit another signal into the Queue. Just pass the event to the threads and set the event flag when you are done and make the threads always check the Event or use threading.Event.wait() if your threads have nothing better to do.
Check out the example down below.
import time
import threading
def main():
job_done = threading.Event()
thread1 = threading.Thread(target=job, args=(job_done, "Fe Fye"))
thread2 = threading.Thread(target=job, args=(job_done, "Fo Fum"))
thread1.start()
thread2.start()
time.sleep(2)
job_done.set()
def job(job_done, message):
while not job_done.is_set():
print(message)
import threading
import Queue
import urllib2
import time
class ThreadURL(threading.Thread):
def __init__(self, queue):
threading.Thread.__init__(self)
self.queue = queue
def run(self):
while True:
host = self.queue.get()
sock = urllib2.urlopen(host)
data = sock.read()
self.queue.task_done()
hosts = ['http://www.google.com', 'http://www.yahoo.com', 'http://www.facebook.com', 'http://stackoverflow.com']
start = time.time()
def main():
queue = Queue.Queue()
for i in range(len(hosts)):
t = ThreadURL(queue)
t.start()
for host in hosts:
queue.put(host)
queue.join()
if __name__ == '__main__':
main()
print 'Elapsed time: {0}'.format(time.time() - start)
I've been trying to get my head around how to perform Threading and after a few tutorials, I've come up with the above.
What it's supposed to do is:
Initialiase the queue
Create my Thread pool and then queue up the list of hosts
My ThreadURL class should then begin work once a host is in the queue and read the website data
The program should finish
What I want to know first off is, am I doing this correctly? Is this the best way to handle threads?
Secondly, my program fails to exit. It prints out the Elapsed time line and then hangs there. I have to kill my terminal for it to go away. I'm assuming this is due to my incorrect use of queue.join() ?
Your code looks fine and is quite clean.
The reason your application still "hangs" is that the worker threads are still running, waiting for the main application to put something in the queue, even though your main thread is finished.
The simplest way to fix this is to mark the threads as daemons, by doing t.daemon = True before your call to start. This way, the threads will not block the program stopping.
looks fine. yann is right about the daemon suggestion. that will fix your hang. my only question is why use the queue at all? you're not doing any cross thread communication, so it seems like you could just send the host info as an arg to ThreadURL init() and drop the queue.
nothing wrong with it, just wondering.
One thing, in the thread run function, the while True loop, if some exception happened, the task_done() may not be called however the get() has already been called. Thus the queue.join() may never end.