When I use the eventlet package to run a multi-coroutines task, even when the coroutines pool is empty, the program won't continue to run, but will get stuck in a loop. Following is my code and the last row never get executed.
import eventlet
global count
post_id=[]
last_id=0
def download(post_id):
global count
print "coroutines :",post_id
if count<last_id:
count=count+1
q.put(count) # put new coroutines in the queue
pool = eventlet.GreenPool()
q = eventlet.Queue()
for i in range(100,200):
post_id.append(i)
for i in range(0,5):
q.put(post_id[i]) # keep 6 coroutines in the pool
count=post_id[5]
last_id=200
while not q.empty() or pool.running()!=0:
pool.spawn_n(download,q.get()) #start corroutines
print "The end" #nerver reach to this line
The last row never gets executed because your final call to q.get() blocks forever, waiting for something to be added to the queue. There are a few ways you could fix this, including passing a timeout value to get. I think the cleanest solution is to wait for the current tasks to finish if the queue is empty before attempting another iteration of the loop again:
while not q.empty():
pool.spawn_n(download, q.get())
if q.empty(): pool.waitall()
Related
So basically, I've this function th() which counts till certain number and then prints "done".
I'd want to start n number of such threads at the same time, running simultaneously.
So I wrote:
thread_num = 3 #here n is 3, but I'd normally want something way higher
thrds = []
i = 0
while i < thread_num:
thr = Thread(target=th, args=())
thrds.append(thr)
i += 1
print("thread", str(i), "added")
for t in thrds:
t.start()
t.join()
I'd want all the threads to print "done" at the same time, but they have a noticeable lag in between of them. They print "thread i started" at seemingly the same time, but print "done" with quite a bit of time lag.
Why is this happening?
Edit: Since someone asked me to add th() function as well, here it is:
def th():
v = 0
num = 10**7
while v < num:
v += 1
print("done")
This is happening because of the t.join() method that you are calling on each thread before start the next one. t.join() blocks the execution of the current thread until the thread t has completed execution. So, each thread is starting after the previous one has finished.
You first have to start all the threads, then join all the threads in separate for loops; otherwise, each thread starts but runs to completion due to join before starting another thread.
for t in thrds: # start all the threads
t.start()
for t in thrds: # wait for all threads to finish
t.join()
If you only have a simple counting thread, you may need to add some short sleep to actually see the threads output intermingle as they may still run fast enough to complete before another thread starts.
Because you start and join each thread sequentially, one thread will run to completion before the next even starts. You'd be better off running a thread pool which is a more comprehensive implementation that handles multiple issues in multithreading.
Because of memory management and object reference count issues, python only lets a single thread execute byte code at a time. Periodically, each thread will release and reacquire the Global Interpreter Lock (GIL) to let other threads run. Exactly which thread runs at any given time is up to the operating system and you may find one gets more slices than another, causing staggered results.
To get them all to print "done" at the same time, you could use a control structure like a barrier for threads to wait until all are done. With a barrier, all threads must call wait before any can continue.
thread_num = 3 #here n is 3, but I'd normally want something way higher
wait_done = threading.Barrier(thread_num)
def th(waiter):
x = 1 # to what you want
waiter.wait()
print("done")
thrds = []
i = 0
while i < thread_num:
thr = Thread(target=th, args=(wait_done,))
thrds.append(thr)
i += 1
print("thread", str(i), "added")
for t in thrds:
t.start()
for t in thrds:
t.join()
I have got I simple code modelling a more complicated problem I am to solve. Here I have 3 funcs- worker, task submitter (seek tasks and put it to queue once it gets new ones) and function creating a pool and adding new tasks to this pool. But the code doesnt happen to finish the run after queue gets empty and all the tasks in a list turn finished.I am too dump to have an idea why the hell it doesnt terminate the While loop with condition... I have tried a different ways to code the thing, nothing works
from concurrent.futures import ThreadPoolExecutor as Tpe
import time
import random
import queue
import threading
def task_submit(q):
for i in range(7):
threading.currentThread().setName('task_submit')
new_task = random.randint(10, 20)
q.put_nowait(new_task)
print(f' {i} new task with argument {new_task} has been added to queue')
time.sleep(5)
def worker(t):
threading.currentThread().setName(f'worker {t}')
print(f'{threading.currentThread().getName()} started')
time.sleep(t)
print(f'{threading.currentThread().getName()} FINISHED!')
def execution():
executor = Tpe(max_workers=4)
q = queue.Queue(maxsize=100)
q_thread = executor.submit(task_submit, q)
tasks = [executor.submit(worker, q.get())]
execution_finished = False
while not execution_finished: #all([task.done() for task in tasks]):
if not all([task.done() for task in tasks]):
print(' still in progress .....................')
tasks.append(executor.submit(worker, q.get()))
else:
print(' all done!')
executor.shutdown()
execution_finished = True
execution()
It doesn't terminate because you are trying to remove an item from an empty queue. The problem is here:
while not execution_finished:
if not all([task.done() for task in tasks]):
print(' still in progress .....................')
tasks.append(executor.submit(worker, q.get()))
The last line here submits a new work item to the executor. Suppose that happens to be the last item in the queue. At that moment, the executor is not finished and will not be finished for a few seconds. Your main thread goes back to the while not execution_finished line, and the if statement evaluates true because some of the tasks are still running. So you try to submit one more item but you can't, because the queue is now empty. The call to q.get blocks the main loop until the queue contains an item, which never happens. The other threads finish but the program doesn't exit because the main thread is blocked.
Perhaps you should check for an empty queue, but I'm not sure that's the right idea because I probably don't understand your requirements. In any case, that's why your script doesn't exit.
I have the following script (don't refer to the contents):
import _thread
def func1(arg1, arg2):
print("Write to CLI")
def verify_result():
func1()
for _ in range (4):
_thread.start_new_thread(func1, (DUT1_CLI, '0'))
verify_result()
I want to concurrently execute (say 4 threads) func1() which in my case includes a function call that can take time to execute. Then, only after the last thread finished its work I want to execute verify_result().
Currently, the result I get is that all threads finish their job, but verify_result() is executed before all threads finish their job.
I have even tried to use the following code (of course I imported threading) under the for loop but that didn't do the work (don't refer to the arguments)
t = threading.Thread(target = Enable_WatchDog, args = (URL_List[x], 180, Terminal_List[x], '0'))
t.start()
t.join()
Your last threading example is close, but you have to collect the threads in a list, start them all at once, then wait for them to complete all at once. Here's a simplified example:
import threading
import time
# Lock to serialize console output
output = threading.Lock()
def threadfunc(a,b):
for i in range(a,b):
time.sleep(.01) # sleep to make the "work" take longer
with output:
print(i)
# Collect the threads
threads = []
for i in range(10,100,10):
# Create 9 threads counting 10-19, 20-29, ... 90-99.
thread = threading.Thread(target=threadfunc,args=(i,i+10))
threads.append(thread)
# Start them all
for thread in threads:
thread.start()
# Wait for all to complete
for thread in threads:
thread.join()
Say you have a list of threads.
You loop(each_thread) over them -
for each_thread in thread_pool:
each_thread.start()
within the loop to start execution of the run function within each thread.
The same way, you write another loop after you start all threads and have
for each_thread in thread_pool:
each_thread.join()
what join does is that it will wait for thread i to finish execution before letting i+1th thread to finish execution.
The threads would run concurrently, join() would just synchronize the way each thread returns its results.
In your case specifically, you can the join() loop and the run verify_result() function.
This code prints nothing:
def foo(i):
print i
def main():
pool = eventlet.GreenPool(size=100)
for i in xrange(100):
pool.spawn_n(foo, i)
while True:
pass
But this code prints numbers:
def foo(i):
print i
def main():
pool = eventlet.GreenPool(size=100)
for i in xrange(100):
pool.spawn_n(foo, i)
pool.waitall()
while True:
pass
The only difference is pool.waitall(). In my mind, waitall() means wait until all greenthreads in the pool are finished working, but an infinite loop waits for every greenthread, so pool.waitall() is not necessary.
So why does this happen?
Reference: http://eventlet.net/doc/modules/greenpool.html#eventlet.greenpool.GreenPool.waitall
The threads created in an eventlet GreenPool are green threads. This means that they all exist within one thread at the operating-system level, and the Python interpreter handles switching between them. This switching can only happen when one thread either yields (deliberately provides an opportunity for other threads to run) or is waiting for I/O.
When your code runs:
while True:
pass
… that thread of execution is blocked – stuck on that code – and no other green threads can get scheduled.
When you instead run:
pool.waitall()
… eventlet makes sure that it yields while waiting.
You could emulate this same behaviour by modifying your while loop slightly to call the eventlet.sleep function, which yields:
while True:
eventlet.sleep()
This could be useful if you wanted to do something else in the while True: loop while waiting for the threads in your pool to complete. Otherwise, just use pool.waitall() – that’s what it’s for.
I am trying to get some code working where I can implement logging into a multi-threaded program using gevent. What I'd like to do is set up custom logging handlers to put log events into a Queue, while a listener process is continuously watching for new log events to handle appropriately. I have done this in the past with Multiprocessing, but never with Gevent.
I'm having an issue where the program is getting caught up in the infinite loop (listener process), and not allowing the other threads to "do work"...
Ideally, after the worker processes have finished, I can pass an arbitrary value to the listener process to tell it to break the loop, and then join all the processes together. Here's what I have so far:
import gevent
from gevent.pool import Pool
import Queue
import random
import time
def listener(q):
while True:
if not q.empty():
num = q.get()
print "The number is: %s" % num
if num <= 100:
print q.get()
# got passed 101, break out
else:
break
else:
continue
def worker(pid,q):
if pid == 0:
listener(q)
else:
gevent.sleep(random.randint(0,2)*0.001)
num = random.randint(1,100)
q.put(num)
def main():
q = Queue.Queue()
all_threads = []
all_threads = [gevent.spawn(worker, pid,q) for pid in xrange(10)]
gevent.wait(all_threads[1:])
q.put(101)
gevent.joinall(all_threads)
if __name__ == '__main__':
main()
As I said, the program seems to be getting hung up on that first process and does not allow the other workers to do their thing. I have also tried spawning the listener process completely separately itself (which is actually how I would rather do it), but that didn't seem to work either so I tried this way.
Any help would be appreciated, feel like I am probably just missing something obvious about gevent's back end.
Thanks
The first problem is that your listener is never yielding if the queue is initially empty. The first task you spawn is your listener. When it starts, there's a while True:, the q will be empty, so you go to the else branch, which just continues, looping back to the start of the while loop, and then the q is still empty. So you just sit in the first thread constantly checking the q is empty.
The key thing here is that gevent does not use "native" threads or processes. Unlike "real" threads, which can be switched to at any time by something behind the scenes (like your OS scheduler), gevent uses 'greenlets', which require that you do something to "yield control" to another task. That something is whatever gevent thinks would block, such as read from the network, disk, or use one of the blocking gevent operations.
One crude fix would be to start your listener when pid == 9 rather than 0. By making it spawn last, there will be items in the q, and it will go into the main if branch. The downside is that this doesn't fix the logic problem, so the first time the queue is empty, you'll get stuck in your infinite loop again.
A more correct fix would be to put gevent.sleep() instead of continue. sleep is a blocking operation, so your other tasks will get a chance to run. Without arguments, it waits for no time, but still gives gevent the chance to decide to switch to another task if it is ready to run. This still isn't very efficient, though, as if the Queue is empty, it's going to spend a lot of pointless time checking that over and over and asking to run again as soon as it can. sleep'ing for longer than the default of 0 will be more efficient, but would delay processing your log messages.
However, you can instead take advantage of the fact that many of gevent's types, such as Queue, can be used in more Pythonic ways and make your code a lot simpler and easier to understand, as well as more efficient.
import gevent
from gevent.queue import Queue
def listener(q):
for msg in q:
print "the number is %d" % msg
def worker(pid,q):
gevent.sleep(random.randint(0,2)*0.001)
num = random.randint(1,100)
q.put(num)
def main():
q = Queue()
listener_task = gevent.spawn(listener, q)
worker_tasks = [gevent.spawn(worker, pid, q) for pid in xrange(1, 10)]
gevent.wait(worker_tasks)
q.put(StopIteration)
gevent.join(listener_task)
Here, Queue can operate as an iterator in a for loop. As long as there are messages, it will get an item, run the loop, and then wait for another item. If there are no items, it will just block and hang around until the next one arrives. Since it blocks, though, gevent will switch to one of your other tasks to run, avoiding the infinite loop problem your example code has.
Because this version is using the Queue as a for loop iterator, there's also automatically a nice sentinel value we can put in the queue to make the listener task quit. If a for loop gets StopIteration from its iterator, it will exit cleanly. So when our for loop that's reading from q gets StopIteration from the q, it exits, and then the function exits, and the spawned task is finished.