I have got I simple code modelling a more complicated problem I am to solve. Here I have 3 funcs- worker, task submitter (seek tasks and put it to queue once it gets new ones) and function creating a pool and adding new tasks to this pool. But the code doesnt happen to finish the run after queue gets empty and all the tasks in a list turn finished.I am too dump to have an idea why the hell it doesnt terminate the While loop with condition... I have tried a different ways to code the thing, nothing works
from concurrent.futures import ThreadPoolExecutor as Tpe
import time
import random
import queue
import threading
def task_submit(q):
for i in range(7):
threading.currentThread().setName('task_submit')
new_task = random.randint(10, 20)
q.put_nowait(new_task)
print(f' {i} new task with argument {new_task} has been added to queue')
time.sleep(5)
def worker(t):
threading.currentThread().setName(f'worker {t}')
print(f'{threading.currentThread().getName()} started')
time.sleep(t)
print(f'{threading.currentThread().getName()} FINISHED!')
def execution():
executor = Tpe(max_workers=4)
q = queue.Queue(maxsize=100)
q_thread = executor.submit(task_submit, q)
tasks = [executor.submit(worker, q.get())]
execution_finished = False
while not execution_finished: #all([task.done() for task in tasks]):
if not all([task.done() for task in tasks]):
print(' still in progress .....................')
tasks.append(executor.submit(worker, q.get()))
else:
print(' all done!')
executor.shutdown()
execution_finished = True
execution()
It doesn't terminate because you are trying to remove an item from an empty queue. The problem is here:
while not execution_finished:
if not all([task.done() for task in tasks]):
print(' still in progress .....................')
tasks.append(executor.submit(worker, q.get()))
The last line here submits a new work item to the executor. Suppose that happens to be the last item in the queue. At that moment, the executor is not finished and will not be finished for a few seconds. Your main thread goes back to the while not execution_finished line, and the if statement evaluates true because some of the tasks are still running. So you try to submit one more item but you can't, because the queue is now empty. The call to q.get blocks the main loop until the queue contains an item, which never happens. The other threads finish but the program doesn't exit because the main thread is blocked.
Perhaps you should check for an empty queue, but I'm not sure that's the right idea because I probably don't understand your requirements. In any case, that's why your script doesn't exit.
Related
I am trying to build a server to handle long time task job, so I made a global variable tasks so that request can easily returned by only put task info into tasks, and I using threading to build a function to handle the long time task job.
however I can't receive the tasks change in test(), why had this happen?
import time
import threading
from collections import OrderedDict
tasks = OrderedDict()
def request():
# network gross
# ...
global tasks
tasks['zdx'] = 2
def test():
print('test runing')
while True:
if tasks:
task = tasks.popitem()
print('I get the source!')
# very long time resolve task
time.sleep(1)
def init():
threading.Thread(target=test, daemon=True).start()
init()
time.sleep(3)
request()
You may want to review what daemon=True does for a thread. Effectively, right as you called request() to put an entry into tasks, your program exits and the thread gets terminated (as it has daemon=True set) before it finished sleeping and never got a chance to find out if anything is in tasks, thus it never got a chance to run. To correct for this, putting in a time.sleep(3) after request() at the end will ensure more than enough time for the loop in the thread to finish sleeping and process the check.
I'm using a concurrent.futures.ThreadPoolExecutor with a Queue, the code is something like this:
from concurrent.futures import ThreadPoolExecutor
from queue import Queue
def func(parent):
return parent//2, parent//2, parent<=2
def worker(parent, q):
child1, child2, end = func(parent)
print(parent)
if not end:
q.put(child1)
q.put(child2)
if __name__ == "__main__":
q = Queue()
q.put(100)
executor = ThreadPoolExecutor(max_workers=6)
while True:
parent = q.get()
future = executor.submit(worker, parent, q)
if q.empty() and future.done():
break
The problem with this code is that the future.done() is never True, and I cannot get out of this infinite while loop.
My expected outcome is to wait until there's nothing to process. i.e. the queue is empty, and all worker have done their job, nothing further shall be put into this queue. Then I can stop this loop and do sth else.
P.S. the actual func I'm using is more complex than the above example, but the problem is the same.
Your problem is that you're not patient enough: The call to executor.submit returns immediately, even if the implied call to worker hasn't happened -- that is exactly the point of asynchronous constructs like futures. So when you check future.done() directly afterwards, there is a good chance that this check is performed before the executor had time to execute worker, meaning that your future won't be done, yet.
You can verify this by inserting the following code between the call to submit and your if statement:
import time
time.sleep(0.1)
This does achieve what you're looking for, but it doesn't really solve your problem in an elegant way.
Looking deeper, you're problem is that your scheduled tasks may generate new tasks, and that you only know whether they did so when they have completed. This means you have to wait until the task you just scheduled has executed before you can decide whether to stop scheduling new tasks:
if __name__ == "__main__":
q = Queue()
q.put(100)
with ThreadPoolExecutor(max_workers=6) as executor:
while not q.empty():
parent = q.get()
future = executor.submit(worker, parent, q)
future.result() # Wait for task
Also make sure to call Executor.shutdown, or better to use the executor in a context manager (as shown above) so that all resources are freed correctly once you're done.
I am trying to get some code working where I can implement logging into a multi-threaded program using gevent. What I'd like to do is set up custom logging handlers to put log events into a Queue, while a listener process is continuously watching for new log events to handle appropriately. I have done this in the past with Multiprocessing, but never with Gevent.
I'm having an issue where the program is getting caught up in the infinite loop (listener process), and not allowing the other threads to "do work"...
Ideally, after the worker processes have finished, I can pass an arbitrary value to the listener process to tell it to break the loop, and then join all the processes together. Here's what I have so far:
import gevent
from gevent.pool import Pool
import Queue
import random
import time
def listener(q):
while True:
if not q.empty():
num = q.get()
print "The number is: %s" % num
if num <= 100:
print q.get()
# got passed 101, break out
else:
break
else:
continue
def worker(pid,q):
if pid == 0:
listener(q)
else:
gevent.sleep(random.randint(0,2)*0.001)
num = random.randint(1,100)
q.put(num)
def main():
q = Queue.Queue()
all_threads = []
all_threads = [gevent.spawn(worker, pid,q) for pid in xrange(10)]
gevent.wait(all_threads[1:])
q.put(101)
gevent.joinall(all_threads)
if __name__ == '__main__':
main()
As I said, the program seems to be getting hung up on that first process and does not allow the other workers to do their thing. I have also tried spawning the listener process completely separately itself (which is actually how I would rather do it), but that didn't seem to work either so I tried this way.
Any help would be appreciated, feel like I am probably just missing something obvious about gevent's back end.
Thanks
The first problem is that your listener is never yielding if the queue is initially empty. The first task you spawn is your listener. When it starts, there's a while True:, the q will be empty, so you go to the else branch, which just continues, looping back to the start of the while loop, and then the q is still empty. So you just sit in the first thread constantly checking the q is empty.
The key thing here is that gevent does not use "native" threads or processes. Unlike "real" threads, which can be switched to at any time by something behind the scenes (like your OS scheduler), gevent uses 'greenlets', which require that you do something to "yield control" to another task. That something is whatever gevent thinks would block, such as read from the network, disk, or use one of the blocking gevent operations.
One crude fix would be to start your listener when pid == 9 rather than 0. By making it spawn last, there will be items in the q, and it will go into the main if branch. The downside is that this doesn't fix the logic problem, so the first time the queue is empty, you'll get stuck in your infinite loop again.
A more correct fix would be to put gevent.sleep() instead of continue. sleep is a blocking operation, so your other tasks will get a chance to run. Without arguments, it waits for no time, but still gives gevent the chance to decide to switch to another task if it is ready to run. This still isn't very efficient, though, as if the Queue is empty, it's going to spend a lot of pointless time checking that over and over and asking to run again as soon as it can. sleep'ing for longer than the default of 0 will be more efficient, but would delay processing your log messages.
However, you can instead take advantage of the fact that many of gevent's types, such as Queue, can be used in more Pythonic ways and make your code a lot simpler and easier to understand, as well as more efficient.
import gevent
from gevent.queue import Queue
def listener(q):
for msg in q:
print "the number is %d" % msg
def worker(pid,q):
gevent.sleep(random.randint(0,2)*0.001)
num = random.randint(1,100)
q.put(num)
def main():
q = Queue()
listener_task = gevent.spawn(listener, q)
worker_tasks = [gevent.spawn(worker, pid, q) for pid in xrange(1, 10)]
gevent.wait(worker_tasks)
q.put(StopIteration)
gevent.join(listener_task)
Here, Queue can operate as an iterator in a for loop. As long as there are messages, it will get an item, run the loop, and then wait for another item. If there are no items, it will just block and hang around until the next one arrives. Since it blocks, though, gevent will switch to one of your other tasks to run, avoiding the infinite loop problem your example code has.
Because this version is using the Queue as a for loop iterator, there's also automatically a nice sentinel value we can put in the queue to make the listener task quit. If a for loop gets StopIteration from its iterator, it will exit cleanly. So when our for loop that's reading from q gets StopIteration from the q, it exits, and then the function exits, and the spawned task is finished.
The docs of both eventlet and gevent have several examples on how to asyncronously spawn IO tasks and get the results latter.
But so far, all the examples where a value should be returned from the async call,I allways find a blocking call after all the calls to spawn(). Either join(), joinall(), wait(), waitall().
This assumes that calling the functions that use IO is immediate and we can jump right into the point where we are waiting for the results.
But in my case I want to get the jobs from a generator that can be slow and or arbitrarily large or even infinite.
I obviously can't do this
pile = eventlet.GreenPile(pool)
for url in mybiggenerator():
pile.spawn(fetch_title, url)
titles = '\n'.join(pile)
because mybiggenerator() can take a long time before it is exhausted. So I have to start consuming the results while I am still spawning async calls.
This is probably usually done with resource to queues, but I'm not really sure how. Say I create a queue to hold jobs, push a bunch of jobs from a greenlet called P and pop them from another greenlet C.
When in C, if I find that the queue is empty, how do I know if P has pushed every job it had to push or if it is just in the middle of an iteration?
Alternativey,Eventlet allows me to loop through a pile to get the return values, but can I start doing this without having spawn all the jobs I have to spawn? How? This would be a simpler alternative.
You don't need any pool or pile by default. They're just convenient wrappers to implement a particular strategy. First you should get idea how exactly your code must work under all circumstances, that is: when and why you start another greenthread, when and why wait for something.
When you have some answers to these questions and doubt in others, ask away. In the meanwhile, here's a prototype that processes infinite "generator" (actually a queue).
queue = eventlet.queue.Queue(10000)
wait = eventlet.semaphore.CappedSemaphore(1000)
def fetch(url):
# httplib2.Http().request
# or requests.get
# or urllib.urlopen
# or whatever API you like
return response
def crawl(url):
with wait:
response = fetch(url)
links = parse(response)
for url in link:
queue.put(url)
def spawn_crawl_next():
try:
url = queue.get(block=False)
except eventlet.queue.Empty:
return False
# use another CappedSemaphore here to limit number of outstanding connections
eventlet.spawn(crawl, url)
return True
def crawler():
while True:
if spawn_crawl_next():
continue
while wait.balance != 0:
eventlet.sleep(1)
# if last spawned `crawl` enqueued more links -- process them
if not spawn_crawl_next():
break
def main():
queue.put('http://initial-url')
crawler()
Re: "concurrent.futures from Python3 does not really apply to "eventlet or gevent" part."
In fact, eventlet can be combined to deploy the concurrent.futures ThreadPoolExecutor as a GreenThread executor.
See: https://github.com/zopefiend/green-concurrent.futures-with-eventlet/commit/aed3b9f17ac27eeaf8c56210e0c8e4aff2ecbdb5
I had the same problem and it has been super difficult to find any answers.
I think I managed to get something working by having a consumer running on a separate thread and using Event for synchronization. Seems to work fine.
Only caveat is that you have to be careful with monkey-patching. If you monkey-patch threading facilities this will probably not work.
import gevent
import gevent.queue
import threading
import time
q = gevent.queue.JoinableQueue()
queue_not_empty = threading.Event()
def run_task(task):
print(f"Started task {task} # {time.time()}")
# Use whatever has been monkey-patched with gevent here
gevent.sleep(1)
print(f"Finished task {task} # {time.time()}")
def consumer():
while True:
print("Waiting for item in queue")
queue_not_empty.wait()
try:
task = q.get()
print(f"Dequed task {task} for consumption # {time.time()}")
except gevent.exceptions.LoopExit:
queue_not_empty.clear()
continue
try:
gevent.spawn(run_task, task)
finally:
q.task_done()
gevent.sleep(0) # Kickstart task
def enqueue(item):
q.put(item)
queue_not_empty.set()
# Run consumer on separate thread
consumer_thread = threading.Thread(target=consumer, daemon=True)
consumer_thread.start()
# Add some tasks
for i in range(5):
enqueue(i)
time.sleep(2)
Output:
Waiting for item in queue
Dequed task 0 for consumption # 1643232632.0220542
Started task 0 # 1643232632.0222237
Waiting for item in queue
Dequed task 1 for consumption # 1643232632.0222733
Started task 1 # 1643232632.0222948
Waiting for item in queue
Dequed task 2 for consumption # 1643232632.022315
Started task 2 # 1643232632.02233
Waiting for item in queue
Dequed task 3 for consumption # 1643232632.0223525
Started task 3 # 1643232632.0223687
Waiting for item in queue
Dequed task 4 for consumption # 1643232632.022386
Started task 4 # 1643232632.0224123
Waiting for item in queue
Finished task 0 # 1643232633.0235817
Finished task 1 # 1643232633.0236874
Finished task 2 # 1643232633.0237293
Finished task 3 # 1643232633.0237558
Finished task 4 # 1643232633.0237799
Waiting for item in queue
With the new concurrent.futures module in Py3k, I would say (assuming that the processing you want to do is actually something more complex than join):
with concurrent.futures.ThreadPoolExecutor(max_workers=foo) as wp:
res = [wp.submit(fetchtitle, url) for url in mybiggenerator()]
ans = '\n'.join([a for a in concurrent.futures.as_completed(res)]
This will allow you to start processing results before all of your fetchtitle calls complete. However, it will require you to exhaust mybiggenerator before you continue -- it's not clear how you want to get around this, unless you want to set some max_urls parameter or similar. That would still be something you could do with your original implementation, though.
When I use the eventlet package to run a multi-coroutines task, even when the coroutines pool is empty, the program won't continue to run, but will get stuck in a loop. Following is my code and the last row never get executed.
import eventlet
global count
post_id=[]
last_id=0
def download(post_id):
global count
print "coroutines :",post_id
if count<last_id:
count=count+1
q.put(count) # put new coroutines in the queue
pool = eventlet.GreenPool()
q = eventlet.Queue()
for i in range(100,200):
post_id.append(i)
for i in range(0,5):
q.put(post_id[i]) # keep 6 coroutines in the pool
count=post_id[5]
last_id=200
while not q.empty() or pool.running()!=0:
pool.spawn_n(download,q.get()) #start corroutines
print "The end" #nerver reach to this line
The last row never gets executed because your final call to q.get() blocks forever, waiting for something to be added to the queue. There are a few ways you could fix this, including passing a timeout value to get. I think the cleanest solution is to wait for the current tasks to finish if the queue is empty before attempting another iteration of the loop again:
while not q.empty():
pool.spawn_n(download, q.get())
if q.empty(): pool.waitall()