I'm writing something in Python 3 to get proxies from sites and check if the proxies are valid.
I used queue and threading module to make the check procedure faster.
However, the consequence was weird.
def worker():
while True:
item = q.get()
do_work(item)
q.task_done()
q = Queue()
for i in range(num_worker_threads):
t = Thread(target=worker)
t.daemon = True
t.start()
for item in source():
q.put(item)
q.join()
This is an example from queue document. My code is based on this example.
So, my question is:
When will the while loop in worker() end?
When the number of item in queue more than 200, the q keeps block code and 1 item in the queue can't get processed and 1 thread keeps doing q.get(), while other threads say that q was empty.
Please help me out. Thanks.
And sorry about my poor English. I'm still working on it.
----Update ---------------------------------------------------------------------
I tried ThreadPoolExecutor, and it worked, like threading and queue. But the blocking situation didn't change.
After a 20 min game, one trial running of the code ended and printed the expected output.
I found that the check procedure ends in 2 or 3 minutes (for 100 proxies), and the code just kept blocking for about 10 minutes before it ended.
And the second question:
What may cause this?
Thank you! :)
----Update----------------------------------------------------------------------
Problem solved!!
I thought it was the thread thing that cause the block, but it turns out that the connection and transfer time is the causation.
Since I use pycurl for proxy check, and pycurl's default TIMEOUT is 300.
I only set CONNECTTIMEOUT to 5 and ignored TIMEOUT which limits the whole transfer time.
And this is the new code I use for proxy check:
c = pycurl.Curl()
c.setopt(c.URL, url)
c.setopt(c.HTTPHEADER, headers)
c.setopt(c.PROXY, proxy)
c.setopt(c.WRITEFUNCTION, lambda x: None)
c.setopt(c.CONNECTTIMEOUT, 5)
*c.setopt(c.TIMEOUT, 5)*
c.perform()
c.close()
However, setting TIMEOUT to 5 reduced the number of valid proxies significantly. I will keep trying for the best TIMEOUT value.
A while True loop without like that will never end, and your threads will never quit. You have to tell explicitly your threads when to exit.
A way of doing this is by using a sentinel, like this:
end_of_queue = object()
def worker():
while True:
item = q.get()
if item is end_of_queue:
q.task_done()
break
do_work(item)
q.task_done()
q = Queue()
for i in range(num_worker_threads):
t = Thread(target=worker)
t.daemon = True
t.start()
for item in source():
q.put(item)
for i in range(num_worker_threads):
q.put(end_of_queue)
q.join()
What I've done here is adding a few end_of_queue elements to your queue, one for each thread. When a thread sees this end_of_queue object, it means it has to quit and can break out of the loop.
If you prefer a different approach, you can consider using an Event object to notify the threads when they have to quit, like this:
quit_event = Event()
def worker():
while not q.empty() or not quit_event.is_set():
try:
item = q.get(timeout=.1)
except Empty:
continue
do_work(item)
q.task_done()
q = Queue()
for i in range(num_worker_threads):
t = Thread(target=worker)
t.daemon = True
t.start()
for item in source():
q.put(item)
quit_event.set()
q.join()
The drawback of this solution is that you have to get() with a timeout.
Last but not least, your code seems could benefit from using a thread pool, like this:
with ThreadPoolExecutor(max_workers=num_worker_threads) as executor:
executor.map(do_work, source())
(For the reference, ThreadPoolExecutor uses the end_of_queue approach, the only two differences are that end_of_queue is None and each thread is responsible for notifying the other ones.)
just another example of using thread, queue and loop from a class
import threading
import Queue
q = Queue.Queue()
class listener(object):
def __init__(self):
thread = threading.Thread(target=self.loop)
# thread.daemon = True
thread.start()
def loop(self):
for i in xrange(0,13):
q.put(i)
class ui(object):
def __init__(self):
listener()
while True:
item = q.get()
print item
if item == 10:
break
ui()
Related
In many cases I have a worker thread which pops data from a Queue and acts on it. At some kind of event I want my worker thread to stop. The simple solution is to add a timeout to the get call and check the Event/flag every time the get times out. This however as two problems:
Causes an unnecessary context switch
Delays the shutdown until a timeout occurs
Is there any better way to listen both to a stop event and new data in the Queue? Is it possible to listen to two Queue's at the same time and block until there's data in the first one? (In this case one can use a second Queue just to trigger the shutdown.)
The solution I'm currently using:
from queue import Queue, Empty
from threading import Event, Thread
from time import sleep
def worker(exit_event, queue):
print("Worker started.")
while not exit_event.isSet():
try:
data = queue.get(timeout=10)
print("got {}".format(data))
except Empty:
pass
print("Worker quit.")
if __name__ == "__main__":
exit_event = Event()
queue = Queue()
th = Thread(target=worker, args=(exit_event, queue))
th.start()
queue.put("Testing")
queue.put("Hello!")
sleep(2)
print("Asking worker to quit")
exit_event.set()
th.join()
print("All done..")
I guess you may easily reduce timeout to 0.1...0.01 sec. Slightly different solution is to use the queue to send both data and control commands to the thread:
import queue
import threading
import time
THREADSTOP = 0
class ThreadControl:
def __init__(self, command):
self.command = command
def worker(q):
print("Worker started.")
while True:
data = q.get()
if isinstance(data, ThreadControl):
if data.command == THREADSTOP:
break
print("got {}".format(data))
print("Worker quit.")
if __name__ == '__main__':
q = queue.Queue()
th = threading.Thread(target=worker, args=(q,))
th.start()
q.put("Testing")
q.put("Hello!")
time.sleep(2)
print("Asking worker to quit")
q.put(ThreadControl(command=THREADSTOP)) # sending command
th.join()
print("All done..")
Another option is to use sockets instead of queues.
I have some worker threads consuming data from a pre-populated input queue, and putting results into another queue.
import queue
import threading
worker_count = 8
input_queue = queue.Queue()
output_queue = queue.Queue()
threads = []
for _ in range(worker_count):
thread = threading.Thread(target=perform_work, args=(input_queue, output_queue)
thread.daemon = True
thread.start()
threads.append(thread)
I am processing the results in the main thread, and I want to make sure I process all of the results.
while True:
try:
result = output_queue.get(True, 0.1)
except queue.Empty:
pass
else:
process_result(result)
if not any([t.is_alive() for t in threads]) and not output_queue.empty():
# All results have been processed, stop.
break
Is it safe to just use .is_alive() in this case? Or is there a particular reason to use .join() instead?
NOTE: I'm making my threads daemon = True because it makes it easier to debug and terminate the program.
I am using Python Queue with Thread. I noticed when a worker crashes, the script hangs and doesn't let me terminate. The following is an example:
from Queue import Queue
from threading import Thread
num_worker_threads = 2
def worker():
while True:
item = q.get()
1/item
q.task_done()
q = Queue()
for i in range(num_worker_threads):
t = Thread(target=worker)
t.daemon = True
t.start()
q.put(0)
q.join()
An even cleaner approach would be to have task_done() in the finally block instead.
from Queue import Queue
from threading import Thread
num_worker_threads = 2
def worker():
while True:
item = q.get()
try:
1/item
except Exception as e:
print e
finally:
q.task_done()
q = Queue()
for i in range(num_worker_threads):
t = Thread(target=worker)
t.daemon = True
t.start()
q.put(0)
q.join()
I fixed this by wrapping the job in an exception. I would have thought that when a worker crashes, the script would exit. This is not the case. It looks like q.task_done() never gets called so it hangs on q.join().
Solution:
from Queue import Queue
from threading import Thread
num_worker_threads = 2
def worker():
while True:
item = q.get()
try:
1/item
except Exception as e:
print e
finally:
q.task_done()
q = Queue()
for i in range(num_worker_threads):
t = Thread(target=worker)
t.daemon = True
t.start()
q.put(0)
q.join()
Added suggestion by rsy.
You're calling while True which is continuously trying to get elements from the Queue by using q.get() in your thread, but there may not be any items in the queue. Which will throw an Empty exception since there's nothing in the queue to actually get.
Your loop should be while not q.empty(): or you should be catching the Queue.Empty exception.
I have a program that has two threads, the main thread and one additional that works on handling jobs from a FIFO queue.
Something like this:
import queue
import threading
q = queue.Queue()
def _worker():
while True:
msg = q.get(block=True)
print(msg)
q.task_done()
t = threading.Thread(target=_worker)
#t.daemon = True
t.start()
q.put('asdf-1')
q.put('asdf-2')
q.put('asdf-4')
q.put('asdf-4')
What I want to accomplish is basically to make sure the queue is emptied before the main thread exits.
If I set t.daemon to be True the program will exit before the queue is emptied, however if it's set to False the program will never exit. Is there some way to make sure the thread running the _worker() method clears the queue on main thread exit?
The comments touch on using .join(), but depending on your use case, using a join may make threading pointless.
I assume that your main thread will be doing things other than adding items to the queue - and may be shut down at any point, you just want to ensure that your queue is empty before shutting down is complete.
At the end of your main thread, you could add a simple empty check in a loop.
while not q.empty():
sleep(1)
If you don't set t.daemon = True then the thread will never finish. Setting the thread as a daemon thread will mean that the thread does not cause your program to stay running when the main thread finishes.
Put a special item (e.g. None) in the queue, that signals the worker thread to stop.
import queue
import threading
q = queue.Queue()
def _worker():
while True:
msg = q.get(block=True)
if msg is None:
return
print(msg) # do your stuff here
t = threading.Thread(target=_worker)
#t.daemon = True
t.start()
q.put('asdf-1')
q.put('asdf-2')
q.put('asdf-4')
q.put('asdf-4')
q.put(None)
t.join()
I am trying to implement a Python (2.6.x/2.7.x) thread pool that would check for network connectivity(ping or whatever), the entire pool threads must be killed/terminated when the check is successful.
So I am thinking of creating a pool of, let's say, 10 worker threads. If any one of them is successful in pinging, the main thread should terminate all the rest.
How do I implement this?
This is not a compilable code, this is just to give you and idea how to make threads communicate..
Inter process or threads communication happens through queues or pipes and some other ways..here I'm using queues for communication.
It works like this.. I'll send ip addresses in in_queue and add response to out_queue, my main thread monitors out_queue and if it gets desired result, it marks all the threads to terminate.
Below is the pinger thread definition..
import threading
from Queue import Queue, Empty
# A thread that pings ip.
class Pinger(threading.Thread):
def __init__(self, kwargs=None):
threading.Thread.__init__(self)
self.kwargs = kwargs
self.stop_pinging = False
def run(self):
ip_queue = self.kwargs.get('in_queue')
out_queue = self.kwargs.get('out_queue')
while not self.stop_pinging:
try:
data = ip_quque.get(timeout=1)
ping_status = ping(data)
# This is pseudo code, you've to takecare of
# your own ping.
if ping_status:
out_queue.put('success')
# you can even break here if you don't want to
# continue after one success
else:
out_queue.put('failure')
if ip_queue.empty()
break
except Empty, e:
pass
Here is the main thread block..
# Create the shared queue and launch both thread pools
in_queue = Queue()
out_queue = Queue()
ip_list = ['ip1', 'ip2', '....']
# This is to add all the ips to the queue or you can
# customize to add through some producer way.
for ip in ip_list:
in_queue.put(ip)
pingerer_pool = []
for i in xrange(1, 10):
pingerer_worker = Pinger(kwargs={'in_queue': in_queue, 'out_queue': out_queue}, name=str(i))
pingerer_pool.append(pinger_worker)
pingerer_worker.start()
while 1:
if out_queue.get() == 'success':
for pinger in pinger_pool:
pinger_worker.stop_pinging = True
break
Note: This is a pseudo code, you should make this workable as you like.