Understanding Python Queues and Setting threads to run as a Daemon thread - python

Lets say I have the below code:
import Queue
import threading
import time
def basic_worker(queue, thread_name):
while True:
if queue.empty(): break
print "Starting %s" % (threading.currentThread().getName()) + "\n"
item = queue.get()
##do_work on item which might take 10-15 minutes to complete
queue.task_done()
print "Ending %s" % (threading.currentThread().getName()) + "\n"
def basic(queue):
# http://docs.python.org/library/queue.html
for i in range(10):
t = threading.Thread(target=basic_worker,args=(queue,tName,))
t.daemon = True
t.start()
queue.join() # block until all tasks are done
print 'got here' + '\n'
queue = Queue.Queue()
for item in range(4):
queue.put(item)
basic(queue)
print "End of program"
My question is, if I set t.daemon = True will it exit the code killing the threads that are taking 10-15 minutes to do some work on the item from the queue? Because from what I have read it says that the program will exit if there are any daemonic threads alive. My understanding is that the threads working on the item taking a long time will also exit incompletely. If I don't set t.daemon = True my program hangs forever and doesn't exit when there are no items in the queue.

The reason why the programm hangs forever if t.daemon = False, is that the following code block ...
if queue.empty(): break
... leads to a race-condition.
Imagine there is only one item left in the queue and two threads evaluate the condition above nearly simultaneously. The condition evaluates to False for both threads ... so they don't break.
The faster thread gets the last item, while the slower hangs forever in the statement item = queue.get().
Respecting the fact that daemon mode is False the program waits for all threads to be finished. That never happens.
From my point of view, the code you provided (with t.daemon = True), works fine.
May the following sentence confuses you:
The entire Python program exits when no alive non-daemon threads are left.
... but consider: If you start all threads from the main thread with t.daemon = True, the only non-daemon thread is the main thread itself. So the program exists when the main thread is finished.
... and that does not happen until the queue is empty, because of the queue.join() statement. So you long running computations inside the child threads will not be interrupted.
There is no need to check the queue.empty(), when using daemon threads and queue.join().
This should be enough:
#!/bin/python
import Queue
import threading
import time
def basic_worker(queue, thread_name):
print "Starting %s" % (threading.currentThread().getName()) + "\n"
while True:
item = queue.get()
##do_work on item which might take 10-15 minutes to complete
time.sleep(5) # to simulate work
queue.task_done()
def basic(queue):
# http://docs.python.org/library/queue.html
for i in range(10):
print 'enqueuing', i
t = threading.Thread(target=basic_worker, args=(queue, i))
t.daemon = True
t.start()
queue.join() # block until all tasks are done
print 'got here' + '\n'
queue = Queue.Queue()
for item in range(4):
queue.put(item)
basic(queue)
print "End of program"

Related

killing Finished threads in python

My multi-threading script raising this error :
thread.error : can't start new thread
when it reached 460 threads :
threading.active_count() = 460
I assume the old threads keeps stack up, since the script didn't kill them. This is my code:
import threading
import Queue
import time
import os
import csv
def main(worker):
#Do Work
print worker
return
def threader():
while True:
worker = q.get()
main(worker)
q.task_done()
def main_threader(workers):
global q
global city
q = Queue.Queue()
for x in range(20):
t = threading.Thread(target=threader)
t.daemon = True
print "\n\nthreading.active_count() = " + str(threading.active_count()) + "\n\n"
t.start()
for worker in workers:
q.put(worker)
q.join()
How do I kill the old threads when their job is done? (Is return not enough?)
i'm sure the old threads work is done as i'm printing the results , but i'm not sure why they still active afterward , any direct way to kill a thread after it finish his work ?

Wait and notify multiple threads at the same time python

I am new to threading and python and I want to hit a server with multiple (10) http requests at the same time. I have a utility for sending the request. I wrote a code as follows:
import time
import threading
def send_req():
start = time.time()
response = http_lib.request(ip,port,headers,body,url)
end = time.time()
response_time = start - end
print "Response Time: ", response_time
def main():
thread_list = []
for thread in range(10):
t = threading.Thread(target=send_req)
t.start()
thread_list.append(t)
for i in thread_list:
i.join()
if (__name__ == "__main__"):
main()
It runs and prints out the response times. But then, since I am creating the threads one after the other their execution seems to be sequential and not concurrent. Can I create 10 threads at the same time and then let them execute together or create threads one by one keeping the created ones on wait until they are all finished creating and then execute them at the same time?
What do you mean by "at the same time" ?, threads do work in parallel behavior but you cannot start the threads at same exact time because python is a scripting language n its executes line by line.
however, One possible solution is, you can start the threads one by one, then inside the threads, you wait for some flag to trigger and keep that flag global in all of your created threads. As that flag gets True, your threads will start their process at same time. Make sure to trigger that flag=True AFTER starting all of the threads. i.e.;
def send_req():
global flag
while flag==False:
pass # stay here unless the flag gets true
start = time.time()
response = http_lib.request(ip,port,headers,body,url)
end = time.time()
response_time = start - end
print "Response Time: ", response_time
run_once=True
def main():
flag=False
thread_list = []
for thread in range(10):
t = threading.Thread(target=send_req) # creating threads one by one
#t.start()
thread_list.append(t)
for j in thread_list: # now starting threads (still one by one)
j.start()
flag=True # now start the working of each thread by releasing this flag from False to true
for i in thread_list:
i.join()

how to quit python script after multiprocessing processes are done?

Update: with the help of dano, I solved this problem.
I didn't invoke producers with join(), it made my script hanging.
Only need to add one line as dano said:
...
producer = multiprocessing.Process(target=produce,args=(file_queue,row_queue))
producer.daemon = True
producer.start()
...
Old script:
import multiprocessing
import Queue
QUEUE_SIZE = 2000
def produce(file_queue, row_queue,):
while not file_queue.empty():
src_file = file_queue.get()
zip_reader = gzip.open(src_file, 'rb')
try:
csv_reader = csv.reader(zip_reader, delimiter=SDP_DELIMITER)
for row in csv_reader:
new_row = process_sdp_row(row)
if new_row:
row_queue.put(new_row)
finally:
zip_reader.close()
def consume(row_queue):
'''processes all rows, once queue is empty, break the infinit loop'''
while True:
try:
# takes a row from queue and process it
pass
except multiprocessing.TimeoutError as toe:
print "timeout, all rows have been processed, quit."
break
except Queue.Empty:
print "all rows have been processed, quit."
break
except Exception as e:
print "critical error"
print e
break
def main(args):
file_queue = multiprocessing.Queue()
row_queue = multiprocessing.Queue(QUEUE_SIZE)
file_queue.put(file1)
file_queue.put(file2)
file_queue.put(file3)
# starts 3 producers
for i in xrange(4):
producer = multiprocessing.Process(target=produce,args=(file_queue,row_queue))
producer.start()
# starts 1 consumer
consumer = multiprocessing.Process(target=consume,args=(row_queue,))
consumer.start()
# blocks main thread until consumer process finished
consumer.join()
# prints statistics results after consumer is done
sys.exit(0)
if __name__ == "__main__":
main(sys.argv[1:])
Purpose:
I am using python 2.7 multiprocessing to generate 3 producers reading 3 files at the same time, and then put the file lines into a row_queue and generate 1 consumer to do more processing about all rows. Print statistics result in main thread after consumer is done, so I use join() method. Finally invoke sys.exit(0) to quit the script.
Problem:
Cannot quit the script.
I tried to replace sys.exit(0) with print "the end", "the end" displayed on console. Am I doing something wrong? why the script does not quit, and how to let it quit? Thanks
Your producers do not have multiprocessing.Process.daemon propery set:
daemon
The process’s daemon flag, a Boolean value. This must be set before start() is called.
The initial value is inherited from the creating process.
When a process exits, it attempts to terminate all of its daemonic child processes.
Note that a daemonic process is not allowed to create child processes. Otherwise a daemonic process would leave its children orphaned if it gets terminated when its parent process exits. Additionally, these are not Unix daemons or services, they are normal processes that will be terminated (and not joined) if non-daemonic processes have exited.
https://docs.python.org/2/library/multiprocessing.html#multiprocessing.Process.daemon
Just add producer.daemon = True:
...
producer = multiprocessing.Process(target=produce,args=(file_queue,row_queue))
producer.daemon = True
producer.start()
...
That should make it possible for the whole program to end when the consumer is joined.
By the way, you should probably join the producers too.

Python threading app not terminating

I have a simple python app that will not terminate if i use queue.join(). Below is the code:
import threading
import Queue
q = Queue.Queue()
for i in range(5):
q.put("BLAH")
def worker():
while True:
print q.qsize()
a = q.get()
print q.qsize()
q.task_done()
print q.qsize()
for i in range(2):
t = threading.Thread(target=worker())
t.daemon = True
t.start()
q.join()
I've also created a watchdog thread that print's threading.enumerate(), then sleeps for 2 seconds. The only thread left is the MainThread, and the queue size is in fact 0. This script will never terminate. I have to ctrl + z, then kill it. What's going on?
t = threading.Thread(target=worker)
You want to pass a reference to the worker function, you should not call it.
worker function does not exit, therefore it will not join. Second you probably want to join thread not queue.
I'm not an expert in python threading, but queue is just for data passing between threads.

How can I implement a multi-producer, multi-consumer paradigm in Gevent?

I have some producer function which rely on I/O heavy blocking calls and some consumer functions which too rely on I/O heavy blocking calls. In order to speed them up, I used the Gevent micro-threading library as glue.
Here's what my paradigm looks like:
import gevent
from gevent.queue import *
import time
import random
q = JoinableQueue()
workers = []
producers = []
def do_work(wid, value):
gevent.sleep(random.randint(0,2))
print 'Task', value, 'done', wid
def worker(wid):
while True:
item = q.get()
try:
print "Got item %s" % item
do_work(wid, item)
finally:
print "No more items"
q.task_done()
def producer():
while True:
item = random.randint(1, 11)
if item == 10:
print "Signal Received"
return
else:
print "Added item %s" % item
q.put(item)
for i in range(4):
workers.append(gevent.spawn(worker, random.randint(1, 100000)))
#This doesnt work.
for j in range(2):
producers.append(gevent.spawn(producer))
#Uncommenting this makes this script work.
#producer()
q.join()
I have four consumer and would like to have two producers. The producers exit when they a signal i.e. 10. The consumers keep feeding off this queue and the whole task finishes when the producers and consumers are over.
However, this doesn't work. If I comment out the for loop which spawns multiple producers and use only a single producer, the script runs fine.
I can't seem to figure out what I've done wrong.
Any ideas?
Thanks
You don't actually want to quit when the queue has no unfinished work, because conceptually that's not when the application should finish.
You want to quit when the producers have finished, and then when there is no unfinished work.
# Wait for all producers to finish producing
gevent.joinall(producers)
# *Now* we want to make sure there's no unfinished work
q.join()
# We don't care about workers. We weren't paying them anything, anyways
gevent.killall(workers)
# And, we're done.
I think it does q.join() before anything is put in the queue and exits immediately. Try joining all producers before joining queue.
What you want do to is to block the main program while the producers and workers communicate. Blocking on the queue will wait until the queue is empty and then yield, which could be immediately. Put this at the end of your program instead of q.join()
gevent.joinall(producers)
I have met same issues like yours. The main problem with your code was that your producer has been spawned in gevent thread which make worker couldn't get task immediately.
I suggest that you should run producer() in the main process not spawn in gevent thread When the process run met the producer which could push the task immediately.
import gevent
from gevent.queue import *
import time
import random
q = JoinableQueue()
workers = []
producers = []
def do_work(wid, value):
gevent.sleep(random.randint(0,2))
print 'Task', value, 'done', wid
def worker(wid):
while True:
item = q.get()
try:
print "Got item %s" % item
do_work(wid, item)
finally:
print "No more items"
q.task_done()
def producer():
while True:
item = random.randint(1, 11)
if item == 10:
print "Signal Received"
return
else:
print "Added item %s" % item
q.put(item)
producer()
for i in range(4):
workers.append(gevent.spawn(worker, random.randint(1, 100000)))
Codes above make sense.. :)

Categories

Resources