automatically assign asyncio task.result to variable when finished - python

I'm a lazy guy. Instead of having to manually do:
res = task.result()
after a scheduled task
loop = asyncio.get_event_loop()
task = loop.create_task(some_func())
has finished, I would like to have some piece of blocking code after task = ... that waits until the task is done and directly assigns the output result to the variable res.
Besides the convenience of not having to do task.result() every time manually, I'm also running a bunch of tasks sequentially as independent jupyter notebook cells, and I only want the next task to start after the previous one has completely finished. It is ok if the jupyter notebook kernel is blocked by the task while it's running. This doesn't work:
loop = asyncio.get_event_loop()
task = loop.create_task(some_func())
# blocking code to wait for task to finish
# ...
# res = task.result()
loop = asyncio.get_event_loop()
task = loop.create_task(some_other_func())
# blocking code to wait for task to finish
# ...
# res = task.result()
as it would schedule both coros immediately for execution and run them at the same time, I think.
Obviously, the variable res will be overwritten by subsequent tasks, but that's ok, because I only need to look at res when I'm running a single instance of this task. The main requirement is to get the tasks running as a sequential chain and to just be able to do print(res) after the task has finished.

Related

Python, ThreadPoolExecutor, pool execution doesnt terminate

I have got I simple code modelling a more complicated problem I am to solve. Here I have 3 funcs- worker, task submitter (seek tasks and put it to queue once it gets new ones) and function creating a pool and adding new tasks to this pool. But the code doesnt happen to finish the run after queue gets empty and all the tasks in a list turn finished.I am too dump to have an idea why the hell it doesnt terminate the While loop with condition... I have tried a different ways to code the thing, nothing works
from concurrent.futures import ThreadPoolExecutor as Tpe
import time
import random
import queue
import threading
def task_submit(q):
for i in range(7):
threading.currentThread().setName('task_submit')
new_task = random.randint(10, 20)
q.put_nowait(new_task)
print(f' {i} new task with argument {new_task} has been added to queue')
time.sleep(5)
def worker(t):
threading.currentThread().setName(f'worker {t}')
print(f'{threading.currentThread().getName()} started')
time.sleep(t)
print(f'{threading.currentThread().getName()} FINISHED!')
def execution():
executor = Tpe(max_workers=4)
q = queue.Queue(maxsize=100)
q_thread = executor.submit(task_submit, q)
tasks = [executor.submit(worker, q.get())]
execution_finished = False
while not execution_finished: #all([task.done() for task in tasks]):
if not all([task.done() for task in tasks]):
print(' still in progress .....................')
tasks.append(executor.submit(worker, q.get()))
else:
print(' all done!')
executor.shutdown()
execution_finished = True
execution()
It doesn't terminate because you are trying to remove an item from an empty queue. The problem is here:
while not execution_finished:
if not all([task.done() for task in tasks]):
print(' still in progress .....................')
tasks.append(executor.submit(worker, q.get()))
The last line here submits a new work item to the executor. Suppose that happens to be the last item in the queue. At that moment, the executor is not finished and will not be finished for a few seconds. Your main thread goes back to the while not execution_finished line, and the if statement evaluates true because some of the tasks are still running. So you try to submit one more item but you can't, because the queue is now empty. The call to q.get blocks the main loop until the queue contains an item, which never happens. The other threads finish but the program doesn't exit because the main thread is blocked.
Perhaps you should check for an empty queue, but I'm not sure that's the right idea because I probably don't understand your requirements. In any case, that's why your script doesn't exit.

How to run a blocking code independently from asyncio loop

My project requires me to run a blocking code (from another library), whilst continuing my asyncio while: true loop. The code looks something like this:
async def main():
while True:
session_timeout = aiohttp.ClientTimeout()
async with aiohttp.ClientSession() as session:
// Do async stuffs like session.get and so on
# At a certain point, I have a blocking code that I need to execute
// Blocking_code() starts here. The blocking code needs time to get the return value.
Running blocking_code() is the last thing to do in my main() function.
# My objective is to run the blocking code separately.
# Such that whilst the blocking_code() runs, I would like my loop to start from the beginning again,
# and not having to wait until blocking_code() completes and returns.
# In other words, go back to the top of the while loop.
# Separately, the blocking_code() will continue to run independently, which would eventually complete
# and returns. When it returns, nothing in main() will need the return value. Rather the returned
# result continue to be used in blocking_code()
asyncio.run(main())
I have tried using pool = ThreadPool(processes=1) and thread = pool.apply_async(blocking_code, params). It sort of works if there are things that needs to be done after blocking_code() within main(); but blocking_code() is the last thing in main(), and it would cause the whole while loop to pause, until blocking_code() completes, before starting back from the top.
I don't know if this is possible, and if it is, how it's done; but the ideal scenario is this.
Run main(), then run blocking_code() in its own instance. As if executing another .py file. So once the loops reaches blocking_code() in main(), it triggers the blocking_code.py file, and whilst blocking_code.py script runs, the while loops continues from the top again.
If by the time on the 2nd pass in the while loop, it reaches blocking_code() again and the previous run has not complete; another instance of blocking_code() will run on its own instance, independently.
Does what I say make sense? Is it possible to achieve the desired outcome?
Thank you!
This is possible with threads. So you don't block your main loop, you'll need to wrap your thread in an asyncio task. You can wait for return values once your loop is finished if you need to. You can do this with a combination of asyncio.create_task and asyncio.to_thread
import aiohttp
import asyncio
import time
def blocking_code():
print('Starting blocking code.')
time.sleep(5)
print('Finished blocking code.')
async def main():
blocking_code_tasks = []
while True:
session_timeout = aiohttp.ClientTimeout()
async with aiohttp.ClientSession() as session:
print('Executing GET.')
result = await session.get('https://www.example.com')
blocking_code_task = asyncio.create_task(asyncio.to_thread(blocking_code))
blocking_code_tasks.append(blocking_code_task)
#do something with blocking_code_tasks, wait for them to finish, extract errors, etc.
asyncio.run(main())
The above code runes blocking code in a thread and then puts that into an asyncio task. We then add this to the blocking_code_tasks list to keep track of all the currently running tasks. Later on, you can get the values or errors out with something like asyncio.gather

Run a same celery task in loop

how to run this kind of celery task properly?
#app.task
def add(x)
x + 1
def some_func():
result = 'result'
for i in range(10):
task_id = uuid()
add.apply_async((i,)), task_id=task_id)
return result
I need all tasks to be performed sequentially after the previous one is completed.
I tried using time.sleep() but in this case returning result waits until all tasks are completed. But I need the result returned and all 10 tasks are running sequentially in the background.
there is a group() in celery, but it runs tasks in parallel
Finally, I solved it by using immutable signature and chain
tasks = [
add.si(x).set(task_id=uuid())
for x in range(10)
]
chain(*tasks).apply_async()
If some_func() is executed outside Celery (say a script is used as "producer" to just send those tasks to be executed), then nothing stops you from calling .get() on AsyncResult to wait for task to finish, and loop that as much as you like.
If, however, you want to execute that loop as some sort of Celery workflow, then you have to build a Chain and use it.

Is it possible to detect when all async tasks are suspended?

I'm trying to test an async code, but I'm having trouble because of the complex connection between some tasks.
The context I need this is some code which reads a file in parallel to it being written by another process. There's some logic in the code where reading a truncated record will make it back off and wait() on an asyncio.Condition to be later released by an inotify event. This code should let it recover by re-reading the record when a future write has been completed by another process. I specifically want to test that this recovery works.
So my plan would be:
write a partial file
run the event loop until it suspends on the condition
write the rest of the file
run the event loop to completion
I had thought this was the anser: Detect an idle asyncio event loop
However a trial test shows that it exits too soon:
import asyncio
import random
def test_ping_pong():
async def ping_pong(idx: int, oth_idx: int):
for i in range(random.randint(100, 1000)):
counters[idx] += 1
async with conditions[oth_idx]:
conditions[oth_idx].notify()
async with conditions[idx]:
await conditions[idx].wait()
async def detect_iowait():
loop = asyncio.get_event_loop()
rsock, wsock = socket.socketpair()
wsock.close()
try:
await loop.sock_recv(rsock, 1)
finally:
rsock.close()
conditions = [asyncio.Condition(), asyncio.Condition()]
counters = [0, 0]
loop = asyncio.get_event_loop()
loop.create_task(ping_pong(0, 1))
loop.create_task(ping_pong(1, 0))
loop.run_until_complete(loop.create_task(detect_iowait()))
assert counters[0] > 10
assert counters[1] > 10
After digging through the source code for python's event loops, I've found nothing exposed that can do this publicly.
It is however possible to use the _ready deque created by the BaseEventLoop. See here. This contains every task that is immediately ready to run. When a task is run it is popped from the _ready deque. When a suspended task is released by another task (eg by calling future.set_result()) the suspended task is immediately added back to the deque. This has existed since python 3.5.
One thing that you can do is repeatedly inject a callback to check how many items in _ready. When all other tasks are suspended, there will be nothing left in the dqueue at the moment the callback runs.
The callback will run at most once per iteration of the event loop:
async def wait_for_deadlock(empty_loop_threshold: int = 0):
def check_for_deadlock():
nonlocal empty_loop_count
# pylint: disable=protected-access
if loop._ready:
empty_loop_count = 0
loop.call_soon(check_for_deadlock)
elif empty_loop_count < empty_loop_threshold:
empty_loop_count += 1
loop.call_soon(check_for_deadlock)
else:
future.set_result(None)
empty_loop_count = 0
loop = asyncio.get_running_loop()
future = loop.create_future()
asyncio.get_running_loop().call_soon(check_for_deadlock)
await future
In the above code the empty_loop_threshold is not really necessary in most cases but exists for cases where tasks communicate with IO. For example if one task communicates to another through IO, there may be a moment where all tasks are suspended even through one has data ready to read. Setting empty_loop_threshold = 1 should get round this.
Using this is relatively simple. You can:
loop.run_until_complete(wait_for_deadlock())
Or as requested in my question:
def some_test():
async def async_test():
await wait_for_deadlock()
inject_something()
await wait_for_deadlock()
loop = loop.get_event_loop()
loop.create_task(task_to_test())
loop.run_until_complete(loop.create_task(async_test)
assert something

How do I feed an infinite generator to eventlet (or gevent)?

The docs of both eventlet and gevent have several examples on how to asyncronously spawn IO tasks and get the results latter.
But so far, all the examples where a value should be returned from the async call,I allways find a blocking call after all the calls to spawn(). Either join(), joinall(), wait(), waitall().
This assumes that calling the functions that use IO is immediate and we can jump right into the point where we are waiting for the results.
But in my case I want to get the jobs from a generator that can be slow and or arbitrarily large or even infinite.
I obviously can't do this
pile = eventlet.GreenPile(pool)
for url in mybiggenerator():
pile.spawn(fetch_title, url)
titles = '\n'.join(pile)
because mybiggenerator() can take a long time before it is exhausted. So I have to start consuming the results while I am still spawning async calls.
This is probably usually done with resource to queues, but I'm not really sure how. Say I create a queue to hold jobs, push a bunch of jobs from a greenlet called P and pop them from another greenlet C.
When in C, if I find that the queue is empty, how do I know if P has pushed every job it had to push or if it is just in the middle of an iteration?
Alternativey,Eventlet allows me to loop through a pile to get the return values, but can I start doing this without having spawn all the jobs I have to spawn? How? This would be a simpler alternative.
You don't need any pool or pile by default. They're just convenient wrappers to implement a particular strategy. First you should get idea how exactly your code must work under all circumstances, that is: when and why you start another greenthread, when and why wait for something.
When you have some answers to these questions and doubt in others, ask away. In the meanwhile, here's a prototype that processes infinite "generator" (actually a queue).
queue = eventlet.queue.Queue(10000)
wait = eventlet.semaphore.CappedSemaphore(1000)
def fetch(url):
# httplib2.Http().request
# or requests.get
# or urllib.urlopen
# or whatever API you like
return response
def crawl(url):
with wait:
response = fetch(url)
links = parse(response)
for url in link:
queue.put(url)
def spawn_crawl_next():
try:
url = queue.get(block=False)
except eventlet.queue.Empty:
return False
# use another CappedSemaphore here to limit number of outstanding connections
eventlet.spawn(crawl, url)
return True
def crawler():
while True:
if spawn_crawl_next():
continue
while wait.balance != 0:
eventlet.sleep(1)
# if last spawned `crawl` enqueued more links -- process them
if not spawn_crawl_next():
break
def main():
queue.put('http://initial-url')
crawler()
Re: "concurrent.futures from Python3 does not really apply to "eventlet or gevent" part."
In fact, eventlet can be combined to deploy the concurrent.futures ThreadPoolExecutor as a GreenThread executor.
See: https://github.com/zopefiend/green-concurrent.futures-with-eventlet/commit/aed3b9f17ac27eeaf8c56210e0c8e4aff2ecbdb5
I had the same problem and it has been super difficult to find any answers.
I think I managed to get something working by having a consumer running on a separate thread and using Event for synchronization. Seems to work fine.
Only caveat is that you have to be careful with monkey-patching. If you monkey-patch threading facilities this will probably not work.
import gevent
import gevent.queue
import threading
import time
q = gevent.queue.JoinableQueue()
queue_not_empty = threading.Event()
def run_task(task):
print(f"Started task {task} # {time.time()}")
# Use whatever has been monkey-patched with gevent here
gevent.sleep(1)
print(f"Finished task {task} # {time.time()}")
def consumer():
while True:
print("Waiting for item in queue")
queue_not_empty.wait()
try:
task = q.get()
print(f"Dequed task {task} for consumption # {time.time()}")
except gevent.exceptions.LoopExit:
queue_not_empty.clear()
continue
try:
gevent.spawn(run_task, task)
finally:
q.task_done()
gevent.sleep(0) # Kickstart task
def enqueue(item):
q.put(item)
queue_not_empty.set()
# Run consumer on separate thread
consumer_thread = threading.Thread(target=consumer, daemon=True)
consumer_thread.start()
# Add some tasks
for i in range(5):
enqueue(i)
time.sleep(2)
Output:
Waiting for item in queue
Dequed task 0 for consumption # 1643232632.0220542
Started task 0 # 1643232632.0222237
Waiting for item in queue
Dequed task 1 for consumption # 1643232632.0222733
Started task 1 # 1643232632.0222948
Waiting for item in queue
Dequed task 2 for consumption # 1643232632.022315
Started task 2 # 1643232632.02233
Waiting for item in queue
Dequed task 3 for consumption # 1643232632.0223525
Started task 3 # 1643232632.0223687
Waiting for item in queue
Dequed task 4 for consumption # 1643232632.022386
Started task 4 # 1643232632.0224123
Waiting for item in queue
Finished task 0 # 1643232633.0235817
Finished task 1 # 1643232633.0236874
Finished task 2 # 1643232633.0237293
Finished task 3 # 1643232633.0237558
Finished task 4 # 1643232633.0237799
Waiting for item in queue
With the new concurrent.futures module in Py3k, I would say (assuming that the processing you want to do is actually something more complex than join):
with concurrent.futures.ThreadPoolExecutor(max_workers=foo) as wp:
res = [wp.submit(fetchtitle, url) for url in mybiggenerator()]
ans = '\n'.join([a for a in concurrent.futures.as_completed(res)]
This will allow you to start processing results before all of your fetchtitle calls complete. However, it will require you to exhaust mybiggenerator before you continue -- it's not clear how you want to get around this, unless you want to set some max_urls parameter or similar. That would still be something you could do with your original implementation, though.

Categories

Resources