Python threading app not terminating - python

I have a simple python app that will not terminate if i use queue.join(). Below is the code:
import threading
import Queue
q = Queue.Queue()
for i in range(5):
q.put("BLAH")
def worker():
while True:
print q.qsize()
a = q.get()
print q.qsize()
q.task_done()
print q.qsize()
for i in range(2):
t = threading.Thread(target=worker())
t.daemon = True
t.start()
q.join()
I've also created a watchdog thread that print's threading.enumerate(), then sleeps for 2 seconds. The only thread left is the MainThread, and the queue size is in fact 0. This script will never terminate. I have to ctrl + z, then kill it. What's going on?

t = threading.Thread(target=worker)
You want to pass a reference to the worker function, you should not call it.

worker function does not exit, therefore it will not join. Second you probably want to join thread not queue.
I'm not an expert in python threading, but queue is just for data passing between threads.

Related

Why the queue still joining after I called task_done()?

Python3.6
First I put some items in a queue, then start a thread and called join() of the queue in the main thread, then called get() in the thread loop, when the size of queue == 0, I called task_done() and break loop and exit from the thread. But the join() method still blocked in the main thread. I can not figure out what`s wrong.
Below is the code
Thanks
import queue
import threading
def worker(work_queue):
while True:
if work_queue.empty():
print("Task 1 Over!")
work_queue.task_done()
break
else:
_ = work_queue.get()
print(work_queue.qsize())
# do actual work
def main():
work_queue = queue.Queue()
for i in range(10):
work_queue.put("Item %d" % (i + 1))
t = threading.Thread(target=worker, args=(work_queue, ))
t.setDaemon(True)
t.start()
print("Main Thread 1")
work_queue.join()
print("Main Thread 2")
t.join()
print("Finish!")
if __name__ == "__main__":
main()
task_done should be called for each work item which is dequeued and processed, not once the queue is entirely empty. (There'd be no reason for that-- the queue already knows when it's empty.) join() will block until task_done has been called as many times as put was called.
So:
def worker(work_queue):
while True:
if work_queue.empty():
print("Task 1 Over!")
break
else:
_ = work_queue.get()
print(work_queue.qsize())
# do actual work
Note that it's weird for a worker to exit as soon as it sees an empty queue. Normally it would get() with blocking, and only exit when it got a "time to exit" work item out of the queue.

killing Finished threads in python

My multi-threading script raising this error :
thread.error : can't start new thread
when it reached 460 threads :
threading.active_count() = 460
I assume the old threads keeps stack up, since the script didn't kill them. This is my code:
import threading
import Queue
import time
import os
import csv
def main(worker):
#Do Work
print worker
return
def threader():
while True:
worker = q.get()
main(worker)
q.task_done()
def main_threader(workers):
global q
global city
q = Queue.Queue()
for x in range(20):
t = threading.Thread(target=threader)
t.daemon = True
print "\n\nthreading.active_count() = " + str(threading.active_count()) + "\n\n"
t.start()
for worker in workers:
q.put(worker)
q.join()
How do I kill the old threads when their job is done? (Is return not enough?)
i'm sure the old threads work is done as i'm printing the results , but i'm not sure why they still active afterward , any direct way to kill a thread after it finish his work ?

Understanding Python Queues and Setting threads to run as a Daemon thread

Lets say I have the below code:
import Queue
import threading
import time
def basic_worker(queue, thread_name):
while True:
if queue.empty(): break
print "Starting %s" % (threading.currentThread().getName()) + "\n"
item = queue.get()
##do_work on item which might take 10-15 minutes to complete
queue.task_done()
print "Ending %s" % (threading.currentThread().getName()) + "\n"
def basic(queue):
# http://docs.python.org/library/queue.html
for i in range(10):
t = threading.Thread(target=basic_worker,args=(queue,tName,))
t.daemon = True
t.start()
queue.join() # block until all tasks are done
print 'got here' + '\n'
queue = Queue.Queue()
for item in range(4):
queue.put(item)
basic(queue)
print "End of program"
My question is, if I set t.daemon = True will it exit the code killing the threads that are taking 10-15 minutes to do some work on the item from the queue? Because from what I have read it says that the program will exit if there are any daemonic threads alive. My understanding is that the threads working on the item taking a long time will also exit incompletely. If I don't set t.daemon = True my program hangs forever and doesn't exit when there are no items in the queue.
The reason why the programm hangs forever if t.daemon = False, is that the following code block ...
if queue.empty(): break
... leads to a race-condition.
Imagine there is only one item left in the queue and two threads evaluate the condition above nearly simultaneously. The condition evaluates to False for both threads ... so they don't break.
The faster thread gets the last item, while the slower hangs forever in the statement item = queue.get().
Respecting the fact that daemon mode is False the program waits for all threads to be finished. That never happens.
From my point of view, the code you provided (with t.daemon = True), works fine.
May the following sentence confuses you:
The entire Python program exits when no alive non-daemon threads are left.
... but consider: If you start all threads from the main thread with t.daemon = True, the only non-daemon thread is the main thread itself. So the program exists when the main thread is finished.
... and that does not happen until the queue is empty, because of the queue.join() statement. So you long running computations inside the child threads will not be interrupted.
There is no need to check the queue.empty(), when using daemon threads and queue.join().
This should be enough:
#!/bin/python
import Queue
import threading
import time
def basic_worker(queue, thread_name):
print "Starting %s" % (threading.currentThread().getName()) + "\n"
while True:
item = queue.get()
##do_work on item which might take 10-15 minutes to complete
time.sleep(5) # to simulate work
queue.task_done()
def basic(queue):
# http://docs.python.org/library/queue.html
for i in range(10):
print 'enqueuing', i
t = threading.Thread(target=basic_worker, args=(queue, i))
t.daemon = True
t.start()
queue.join() # block until all tasks are done
print 'got here' + '\n'
queue = Queue.Queue()
for item in range(4):
queue.put(item)
basic(queue)
print "End of program"

Mutual exclusion thread locking, with dropping of queued functions upon mutex/lock release, in Python?

This is the problem I have: I'm using Python 2.7, and I have a code which runs in a thread, which has a critical region that only one thread should execute at the time. That code currently has no mutex mechanisms, so I wanted to inquire what I could use for my specific use case, which involves "dropping" of "queued" functions. I've tried to simulate that behavior with the following minimal working example:
useThreading=False # True
if useThreading: from threading import Thread, Lock
else: from multiprocessing import Process, Lock
mymutex = Lock()
import time
tstart = None
def processData(data):
#~ mymutex.acquire()
try:
print('thread {0} [{1:.5f}] Do some stuff'.format(data, time.time()-tstart))
time.sleep(0.5)
print('thread {0} [{1:.5f}] 1000'.format(data, time.time()-tstart))
time.sleep(0.5)
print('thread {0} [{1:.5f}] done'.format(data, time.time()-tstart))
finally:
#~ mymutex.release()
pass
# main:
tstart = time.time()
for ix in xrange(0,3):
if useThreading: t = Thread(target = processData, args = (ix,))
else: t = Process(target = processData, args = (ix,))
t.start()
time.sleep(0.001)
Now, if you run this code, you get a printout like this:
thread 0 [0.00173] Do some stuff
thread 1 [0.00403] Do some stuff
thread 2 [0.00642] Do some stuff
thread 0 [0.50261] 1000
thread 1 [0.50487] 1000
thread 2 [0.50728] 1000
thread 0 [1.00330] done
thread 1 [1.00556] done
thread 2 [1.00793] done
That is to say, the three threads quickly get "queued" one after another (something like 2-3 ms after each other). Actually, they don't get queued, they simply start executing in parallel after 2-3 ms after each other.
Now, if I enable the mymutex.acquire()/.release() commands, I get what would be expected:
thread 0 [0.00174] Do some stuff
thread 0 [0.50263] 1000
thread 0 [1.00327] done
thread 1 [1.00350] Do some stuff
thread 1 [1.50462] 1000
thread 1 [2.00531] done
thread 2 [2.00547] Do some stuff
thread 2 [2.50638] 1000
thread 2 [3.00706] done
Basically, now with locking, the threads don't run in parallel, but they run one after another thanks to the lock - as long as one thread is working, the others will block at the .acquire(). But this is not exactly what I want to achieve, either.
What I want to achieve is this: let's assume that when .acquire() is first triggered by a thread function, it registers an id of a function (say a pointer to it) in a queue. After that, the behavior is basically the same as with the Lock - while the one thread works, the others block at .acquire(). When the first thread is done, it goes in the finally: block - and here, I'd like to check to see how many threads are waiting in the queue; then I'd like to delete/drop all waiting threads except for the very last one - and finally, I'd .release() the lock; meaning that after this, what was the last thread in the queue would execute next. I'd imagine, I would want to write something like the following pseudocode:
...
finally:
if (len(mymutex.queue) > 2): # more than this instance plus one other waiting:
while (len(mymutex.queue) > 2):
mymutex.queue.pop(1) # leave alone [0]=this instance, remove next element
# at this point, there should be only queue[0]=this instance, and queue[1]= what was the last thread queued previously
mymutex.release() # once we releace, queue[0] should be gone, and the next in the queue should acquire the mutex/lock..
pass
...
With that, I'd expect a printout like this:
thread 0 [0.00174] Do some stuff
thread 0 [0.50263] 1000
thread 0 [1.00327] done
# here upon lock release, thread 1 would be deleted - and the last one in the queue, thread 2, would acquire the lock next:
thread 2 [1.00350] Do some stuff
thread 2 [1.50462] 1000
thread 2 [2.00531] done
What would be the most straightforward way to achieve this in Python?
Seems like you want a queue-like behaviour, so why not use Queue?
import threading
from Queue import Queue
import time
# threads advertise to this queue when they're waiting
wait_queue = Queue()
# threads get their task from this queue
task_queue = Queue()
def do_stuff():
print "%s doing stuff" % str(threading.current_thread())
time.sleep(5)
def queue_thread(sleep_time):
# advertise current thread waiting
time.sleep(sleep_time)
wait_queue.put("waiting")
# wait for permission to pass
message = task_queue.get()
print "%s got task: %s" % (threading.current_thread(), message)
# unregister current thread waiting
wait_queue.get()
if message == "proceed":
do_stuff()
# kill size-1 threads waiting
for _ in range(wait_queue.qsize() - 1):
task_queue.put("die")
# release last
task_queue.put("proceed")
if message == "die":
print "%s died without doing stuff" % threading.current_thread()
pass
t1 = threading.Thread(target=queue_thread, args=(1, ))
t2 = threading.Thread(target=queue_thread, args=(2, ))
t3 = threading.Thread(target=queue_thread, args=(3, ))
t4 = threading.Thread(target=queue_thread, args=(4, ))
# allow first thread to pass
task_queue.put("proceed")
t1.start()
t2.start()
t3.start()
t4.start()
thread-1 arrives first and "acquires" the section, other threads come later to wait at the queue (and advertise they're waiting). Then, when thread-1 leaves it gives permission to the last thread at the queue by telling all other thread to die, and the last thread to proceed.
You can have finer control using different messages, a typical one would be a thread-id in the wait_queue (so you know who is waiting, and the order in which it arrived).
You can probably utilize non-blocking operations (queue.put(block=False) and queue.get(block=False)) in your favour when you're set on what you need.

How can I implement a multi-producer, multi-consumer paradigm in Gevent?

I have some producer function which rely on I/O heavy blocking calls and some consumer functions which too rely on I/O heavy blocking calls. In order to speed them up, I used the Gevent micro-threading library as glue.
Here's what my paradigm looks like:
import gevent
from gevent.queue import *
import time
import random
q = JoinableQueue()
workers = []
producers = []
def do_work(wid, value):
gevent.sleep(random.randint(0,2))
print 'Task', value, 'done', wid
def worker(wid):
while True:
item = q.get()
try:
print "Got item %s" % item
do_work(wid, item)
finally:
print "No more items"
q.task_done()
def producer():
while True:
item = random.randint(1, 11)
if item == 10:
print "Signal Received"
return
else:
print "Added item %s" % item
q.put(item)
for i in range(4):
workers.append(gevent.spawn(worker, random.randint(1, 100000)))
#This doesnt work.
for j in range(2):
producers.append(gevent.spawn(producer))
#Uncommenting this makes this script work.
#producer()
q.join()
I have four consumer and would like to have two producers. The producers exit when they a signal i.e. 10. The consumers keep feeding off this queue and the whole task finishes when the producers and consumers are over.
However, this doesn't work. If I comment out the for loop which spawns multiple producers and use only a single producer, the script runs fine.
I can't seem to figure out what I've done wrong.
Any ideas?
Thanks
You don't actually want to quit when the queue has no unfinished work, because conceptually that's not when the application should finish.
You want to quit when the producers have finished, and then when there is no unfinished work.
# Wait for all producers to finish producing
gevent.joinall(producers)
# *Now* we want to make sure there's no unfinished work
q.join()
# We don't care about workers. We weren't paying them anything, anyways
gevent.killall(workers)
# And, we're done.
I think it does q.join() before anything is put in the queue and exits immediately. Try joining all producers before joining queue.
What you want do to is to block the main program while the producers and workers communicate. Blocking on the queue will wait until the queue is empty and then yield, which could be immediately. Put this at the end of your program instead of q.join()
gevent.joinall(producers)
I have met same issues like yours. The main problem with your code was that your producer has been spawned in gevent thread which make worker couldn't get task immediately.
I suggest that you should run producer() in the main process not spawn in gevent thread When the process run met the producer which could push the task immediately.
import gevent
from gevent.queue import *
import time
import random
q = JoinableQueue()
workers = []
producers = []
def do_work(wid, value):
gevent.sleep(random.randint(0,2))
print 'Task', value, 'done', wid
def worker(wid):
while True:
item = q.get()
try:
print "Got item %s" % item
do_work(wid, item)
finally:
print "No more items"
q.task_done()
def producer():
while True:
item = random.randint(1, 11)
if item == 10:
print "Signal Received"
return
else:
print "Added item %s" % item
q.put(item)
producer()
for i in range(4):
workers.append(gevent.spawn(worker, random.randint(1, 100000)))
Codes above make sense.. :)

Categories

Resources