Waiting on condition variable with timeout: lock not reacquired in time

Waiting on condition variable with timeout: lock not reacquired in time - python

I have an asyncio.Condition named cond. I wish to wait on it, but only for so long before giving up. As asyncio.Condition.wait does not take a timeout, this cannot be done directly. The docs state that asyncio.wait_for should be used to wrap and provide a timeout instead:
The asyncio.wait_for() function can be used to cancel a task after a timeout.
Thus, we arrive at the following solution:
async def coro():
print("Taking lock...")
async with cond:
print("Lock acquired.")
print("Waiting!")
await asyncio.wait_for(cond.wait(), timeout=999)
print("Was notified!")
print("Lock released.")
Now assume that coro itself is cancelled five seconds after being run. This throws CancelledError in the wait_for, which cancels the cond.wait before re-raising the error. The error then propagates to coro, which due to the async with block, implicitly attempts to release the lock in cond. However, the lock is not currently held; cond.wait has been cancelled but hasn't had a chance to process that cancellation and re-acquire the lock. Thus, we get an ugly exception like the following:
Taking lock...
Lock acquired.
Waiting!
ERROR:asyncio:Task exception was never retrieved
future: <Task finished coro=<coro() done, defined at [REDACTED]> exception=RuntimeError('Lock is not acquired.',)>
Traceback (most recent call last):
[REDACTED], in coro
await asyncio.wait_for(cond.wait(), timeout=999)
[REDACTED], in wait_for
yield from waiter
concurrent.futures._base.CancelledError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
[REDACTED], in coro
print("Was notified!")
[REDACTED], in coro
res = func(*args, **kw)
[REDACTED], in __aexit__
self.release()
[REDACTED], in release
raise RuntimeError('Lock is not acquired.')
RuntimeError: Lock is not acquired.
In other words, whilst handling the CancelledError, coro raised a RuntimeError from trying to release a lock that wasn't held. The reason the stacktrace shows the print("Was notified!") line is because that is the last line of the offending async with block.
This doesn't feel like something I can fix; I'm starting to suspect it's a bug in the library itself. However, I can't think of any way to avoid the issue or create a workaround, so any ideas would be appreciated.
Whilst writing this question and investigating further, I stumbled into similar problems on the Python bug tracker, ended up inspecting the asyncio source code, and determined that this is, in fact, a bug in asyncio itself.
I've submitted it to the issue tracker here for those who have the same problem, and answered my own question with a workaround that I have created.
EDIT: As requested by ParkerD, here is the full runnable example that produces the above issue:
EDIT 2: updated example to use the new asyncio.run and asyncio.create_task features from Python 3.7+
import asyncio
async def coro():
cond = asyncio.Condition()
print("Taking lock...")
async with cond:
print("Lock acquired.")
print("Waiting!")
await asyncio.wait_for(cond.wait(), timeout=999)
print("Was notified!")
print("Lock released.")
async def cancel_after_5(c):
task = asyncio.create_task(c)
await asyncio.sleep(5)
task.cancel()
await asyncio.wait([task])
asyncio.run(cancel_after_5(coro()))

As stated at the end of the question, I've determined that the issue is actually a bug in the library. I'll reiterate that the issue tracker for that bug is here, and present my workaround.
The following function is based on wait_for itself (source here), and is a version of it specialised for waiting on conditions, with the added guarantee that cancelling it is safe.
Calling wait_on_condition_with_timeout(cond, timeout) is roughly equivalent to asyncio.wait_for(cond.wait(), timeout).
async def wait_on_condition_with_timeout(condition: asyncio.Condition, timeout: float) -> bool:
loop = asyncio.get_event_loop()
# Create a future that will be triggered by either completion or timeout.
waiter = loop.create_future()
# Callback to trigger the future. The varargs are there to consume and void any arguments passed.
# This allows the same callback to be used in loop.call_later and wait_task.add_done_callback,
# which automatically passes the finished future in.
def release_waiter(*_):
if not waiter.done():
waiter.set_result(None)
# Set up the timeout
timeout_handle = loop.call_later(timeout, release_waiter)
# Launch the wait task
wait_task = loop.create_task(condition.wait())
wait_task.add_done_callback(release_waiter)
try:
await waiter # Returns on wait complete or timeout
if wait_task.done():
return True
else:
raise asyncio.TimeoutError()
except (asyncio.TimeoutError, asyncio.CancelledError):
# If timeout or cancellation occur, clean up, cancel the wait, let it handle the cancellation,
# then re-raise.
wait_task.remove_done_callback(release_waiter)
wait_task.cancel()
await asyncio.wait([wait_task])
raise
finally:
timeout_handle.cancel()
The crucial part is that if a timeout or cancellation occurs, the method waits for the condition to re-acquire the lock before re-raising the exception:
except (asyncio.TimeoutError, asyncio.CancelledError):
# If timeout or cancellation occur, clean up, cancel the wait, let it handle the cancellation,
# then re-raise.
wait_task.remove_done_callback(release_waiter)
wait_task.cancel()
await asyncio.wait([wait_task]) # This line is missing from the real wait_for
raise
I've tested this on Python 3.6.9 and it works perfectly. The same bug exists in 3.7 and 3.8 too, so I imagine it is also useful for those versions. If you want to know when the bug will be fixed, check the issue tracker above. If you want a version for things other than Conditions, it should be trivial to change the parameter and create_task line.

Related

How to handle Timeout Exception by the asyncio task itself just before asyncio raise TimeoutError

Due to one use case, One of my long-running functions executes multiple instructions. But I have to give a maximum time for its execution. If the function is not able to finish its execution within the allocated time, it should clean up the progress and return.
Let's have a look at a sample code below:
import asyncio
async def eternity():
# Sleep for one hour
try:
await asyncio.sleep(3600)
print('yay!, everything is done..')
except Exception as e:
print("I have to clean up lot of thing in case of Exception or not able to finish by the allocated time")
async def main():
try:
ref = await asyncio.wait_for(eternity(), timeout=5)
except asyncio.exceptions.TimeoutError:
print('timeout!')
asyncio.run(main())
The function eternity is the long-running function. The catch is that, in case of some exception or reaching the maximum allocated time, the function needs to clean up the mess it has made.
P.S. eternity is an independent function and only it can understand what to clean.
I am looking for a way to raise an exception inside my task just before the timeout, OR send some interrupt or terminate signal to the task and handle it.
Basically, I want to execute some peice of code in my task before asyncio raises the TimeoutError and take control.
Also, I am using Python 3.9.
Hope I was able to explain the problem.

What you need is async context manager:
import asyncio
class MyClass(object):
async def eternity(self):
# Sleep for one hour
await asyncio.sleep(3600)
print('yay!, everything is done..')
async def __aenter__(self):
return self
async def __aexit__(self, exc_type, exc, tb):
print("I have to clean up lot of thing in case of Exception or not able to finish by the allocated time")
async def main():
try:
async with MyClass() as my_class:
ref = await asyncio.wait_for(my_class.eternity(), timeout=5)
except asyncio.exceptions.TimeoutError:
print('timeout!')
asyncio.run(main())
And this is the output:
I have to clean up lot of thing in case of Exception or not able to finish by the allocated time
timeout!
For more details take a look at here.

How to timeout asyncio.to_thread?

I experimenting with the new asyncio features in Python 3.9, and have the following code:
import asyncio
async def inc(start):
i = start
while True:
print(i)
i += 2
return None
async def main():
x = asyncio.gather(
asyncio.to_thread(inc, 0),
asyncio.to_thread(inc, 1)
)
try:
await asyncio.wait_for(x, timeout=5.0)
except asyncio.TimeoutError:
print("timeout!")
asyncio.run(main())
My expectation is that the program will print numbers starting from 0 and terminate after 5 seconds. However, when I execute the program here, I see no output and the following warning:
/usr/lib/python3.9/asyncio/events.py:80: RuntimeWarning: coroutine 'inc' was never awaited
self._context.run(self._callback, *self._args)
RuntimeWarning: Enable tracemalloc to get the object allocation traceback
/usr/lib/python3.9/concurrent/futures/thread.py:85: RuntimeWarning: coroutine 'inc' was never awaited
del work_item
RuntimeWarning: Enable tracemalloc to get the object allocation traceback

The asyncio.to_thread converts a regular function to a coroutine.
All you need is to change the async def inc to def inc. You can do this, because there is no await, it is not a real coroutine.
However, you cannot kill a thread. The timeout will cause to stop waiting, not to stop computing numbers.
UPDATE: In detail, the asyncio.gather awaits the coroutines until the timeout. When cancelled by the timeout, the gather cancels all still running coroutines a and waits for their termination. Cancelling a coroutine means to deliver an exception at the nearest await statement. In this case, there is no await and the coroutines will never receive the cancellation exception. The program hangs at that point.

Exception in python asyncio.Task not raised until main Task complete

Here is the code, I thought the program will crash at once because of the uncaught exception. However it waited 10s when the main task coro2 completes.
import asyncio
#asyncio.coroutine
def coro1():
print("coro1 primed")
yield
raise Exception("abc")
#asyncio.coroutine
def coro2(loop):
try:
print("coro2 primed")
ts = [asyncio.Task(coro1(),loop=loop) for _ in range(2)]
res = yield from asyncio.sleep(10)
print(res)
except Exception as e:
print(e)
raise
loop= asyncio.get_event_loop()
loop.run_until_complete(coro2(loop))
I think this is a serious problems because in more complicated programs, this makes the process stuck forever, instead of crashing with exception information.
Besides, I set a breakpoint in the except block in source code of run_until_complete but it's not triggered. I am interested in which piece of code handled that exception in python asyncio.

First, there is no reason to use generator-based coroutines in Python with the async/await syntax available for many years, and the coroutine decorator now deprecated and scheduled for removal. Also, you don't need to pass the event loop down to each coroutine, you can always use asyncio.get_event_loop() to obtain it when you need it. But these are unrelated to your question.
The except block in coro2 didn't trigger because the exception raised in coro1 didn't propagate to coro2. This is because you explicitly ran coro1 as a task, which executed it in the background, and didn't await it. You should always ensure that your tasks are awaited and then exceptions won't pass unnoticed; doing this systematically is sometimes referred to as structured concurrency.
The correct way to write the above would be something like:
async def coro1():
print("coro1 primed")
await asyncio.sleep(0) # yield to the event loop
raise Exception("abc")
async def coro2():
try:
print("coro2 primed")
ts = [asyncio.create_task(coro1()) for _ in range(2)]
await asyncio.sleep(10)
# ensure we pick up results of the tasks that we've started
for t in ts:
await t
print(res)
except Exception as e:
print(e)
raise
asyncio.run(coro2())
Note that this will run sleep() to completion and only then propagate the exceptions raised by the background tasks. If you wanted to propagate immediately, you could use asyncio.gather(), in which case you wouldn't have to bother with explicitly creating tasks in the first place:
async def coro2():
try:
print("coro2 primed")
res, *ignored = await asyncio.gather(
asyncio.sleep(10),
*[(coro1()) for _ in range(2)]
)
print(res)
except Exception as e:
print(e)
raise
I am interested in which piece of code handled that exception in python asyncio.
An exception raised by a coroutine which is not handled is caught by asyncio and stored in the task object. This allows you to await the task or (if you know it's completed) obtain its result using the result() method, either of which will propagate (re-raise) the exception. Since your code never accessed the task's result, the exception instance remained forgotten inside the task object. Python goes so far to notice this and print a "Task exception was never retrieved" warning when the task object is destroyed along with a traceback, but this warning is provided on a best-effort basis, usually comes too late, and should not be relied upon.

RuntimeError: Cannot close a running event loop

I'm trying to resolve this error: RuntimeError: Cannot close a running event loop in my asyncio process. I believe it's happening because there's a failure while tasks are still pending, and then I try to close the event loop. I'm thinking I need to await the remaining responses prior to closing the event loop, but I'm not sure how to accomplish that correctly in my specific situation.
def start_job(self):
if self.auth_expire_timestamp < get_timestamp():
api_obj = api_handler.Api('Api Name', self.dbObj)
self.api_auth_resp = api_obj.get_auth_response()
self.api_attr = api_obj.get_attributes()
try:
self.queue_manager(self.do_stuff(json_data))
except aiohttp.ServerDisconnectedError as e:
logging.info("Reconnecting...")
api_obj = api_handler.Api('API Name', self.dbObj)
self.api_auth_resp = api_obj.get_auth_response()
self.api_attr = api_obj.get_attributes()
self.run_eligibility()
async def do_stuff(self, data):
tasks = []
async with aiohttp.ClientSession() as session:
for row in data:
task = asyncio.ensure_future(self.async_post('url', session, row))
tasks.append(task)
result = await asyncio.gather(*tasks)
self.load_results(result)
def queue_manager(self, method):
self.loop = asyncio.get_event_loop()
future = asyncio.ensure_future(method)
self.loop.run_until_complete(future)
async def async_post(self, resource, session, data):
async with session.post(self.api_attr.api_endpoint + resource, headers=self.headers, data=data) as response:
resp = []
try:
headers = response.headers['foo']
content = await response.read()
resp.append(headers)
resp.append(content)
except KeyError as e:
logging.error('KeyError at async_post response')
logging.error(e)
return resp
def shutdown(self):
//need to do something here to await the remaining tasks and then I need to re-start a new event loop, which i think i can do, just don't know how to appropriately stop the current one.
self.loop.close()
return True
How can I handle the error and properly close the event loop so I can start a new one and essentially re-boot the whole program and continue on.
EDIT:
This is what I'm trying now, based on this SO answer. Unfortunately, this error only happens rarely, so unless I can force it, i will have to wait and see if it works. In my queue_manager method I changed it to this:
try:
self.loop.run_until_complete(future)
except Exception as e:
future.cancel()
self.loop.run_until_complete(future)
future.exception()
UPDATE:
I got rid of the shutdown() method and added this to my queue_manager() method instead and it seems to be working without issue:
try:
self.loop.run_until_complete(future)
except Exception as e:
future.cancel()
self.check_in_records()
self.reconnect()
self.start_job()
future.exception()

To answer the question as originally stated, there is no need to close() a running loop, you can reuse the same loop for the whole program.
Given the code in the update, your queue_manager could look like this:
try:
self.loop.run_until_complete(future)
except Exception as e:
self.check_in_records()
self.reconnect()
self.start_job()
Cancelling future is not necessary and as far as I can tell has no effect. This is different from the referenced answer which specifically reacts to KeyboardInterrupt, special because it is raised by asyncio itself. KeyboardInterrupt can be propagated by run_until_complete without the future having actually completed. Handling Ctrl-C correctly in asyncio is very hard or even impossible (see here for details), but fortunately the question is not about Ctrl-C at all, it is about exceptions raised by the coroutine. (Note that KeyboardInterrupt doesn't inherit from Exception, so in case of Ctrl-C the except body won't even execute.)
I was canceling the future because in this instance there are remaining tasks pending and i want to essentially remove those tasks and start a fresh event loop.
This is a correct thing to want to do, but the code in the (updated) question is only canceling a single future, the one already passed to run_until_complete. Recall that a future is a placeholder for a result value that will be provided at a later point. Once the value is provided, it can be retrieved by calling future.result(). If the "value" of the future is an exception, future.result() will raise that exception. run_until_complete has the contract that it will run the event loop for as long as it takes for the given future to produce a value, and then it returns that value. If the "value" is in fact an exception to raise, then run_until_complete will re-raise it. For example:
loop = asyncio.get_event_loop()
fut = loop.create_future()
loop.call_soon(fut.set_exception, ZeroDivisionError)
# raises ZeroDivisionError, as that is the future's result,
# manually set
loop.run_until_complete(fut)
When the future in question is in fact a Task, an asyncio-specific object that wraps a coroutine into a Future, the result of such future is the object returned by the coroutine. If the coroutine raises an exception, then retrieving the result will re-raise it, and so will run_until_complete:
async def fail():
1/0
loop = asyncio.get_event_loop()
fut = loop.create_task(fail())
# raises ZeroDivisionError, as that is the future's result,
# because the coroutine raises it
loop.run_until_complete(fut)
When dealing with a task, run_until_complete finishing means that the coroutine has finished as well, having either returned a value or raised an exception, as determined by run_until_complete returning or raising.
On the other hand, cancelling a task works by arranging for the task to be resumed and the await expression that suspended it to raise CancelledError. Unless the task specifically catches and suppresses this exception (which well-behaved asyncio code is not supposed to do), the task will stop executing and the CancelledError will become its result. However, if the coroutine is already finished when cancel() is called, then cancel() cannot do anything because there is no pending await to inject CancelledError into.

I got the same error below:
RuntimeError: Cannot close a running event loop
When I called loop.close() in test() as shown below:
import asyncio
async def test(loop):
print("Test")
loop.stop()
loop.close() # Here
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
loop.create_task(test(loop))
loop.run_forever()
So, I used loop.close() after loop.run_forever() with try: and finally: as shown below, then the error was solved:
import asyncio
async def test(loop):
print("Test")
loop.stop()
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
loop.create_task(test(loop))
try:
loop.run_forever()
finally:
loop.close() # Here

How to detect write failure in asyncio?

As a simple example, consider the network equivalent of /dev/zero, below. (Or more realistically, just a web server sending a large file.)
If a client disconnects early, you get a barrage of log messages:
WARNING:asyncio:socket.send() raised exception.
But I'm not finding any way to catch said exception. The hypothetical server continues reading gigabytes from disk and sending them to a dead socket, with no effort on the client's part, and you've got yourself a DoS attack.
The only thing I've found from the docs is to yield from a read, with an empty string indicating closure. But that's no good here because a normal client isn't going to send anything, blocking the write loop.
What's the right way to detect failed writes, or be notified that the TCP connection has been closed, with the streams API or otherwise?
Code:
from asyncio import *
import logging
#coroutine
def client_handler(reader, writer):
while True:
writer.write(bytes(1))
yield from writer.drain()
logging.basicConfig(level=logging.INFO)
loop = get_event_loop()
coro = start_server(client_handler, '', 12345)
server = loop.run_until_complete(coro)
loop.run_forever()

I did some digging into the asyncio source to expand on dano's answer on why the exceptions aren't being raised without explicitly passing control to the event loop. Here's what I've found.
Calling yield from wirter.drain() gives the control over to the StreamWriter.drain coroutine. This coroutine checks for and raises any exceptions that that the StreamReaderProtocol set on the StreamReader. But since we passed control over to drain, the protocol hasn't had the chance to set the exception yet. drain then gives control over to the FlowControlMixin._drain_helper coroutine. This coroutine the returns immediately because some more flags haven't been set yet, and the control ends up back with the coroutine that called yield from wirter.drain().
And so we have gone full circle without giving control to the event loop to allow it handle other coroutines and bubble up the exceptions to writer.drain().
yielding before a drain() gives the transport/protocol a chance to set the appropriate flags and exceptions.
Here's a mock up of what's going on, with all the nested calls collapsed:
import asyncio as aio
def set_exception(ctx, exc):
ctx["exc"] = exc
#aio.coroutine
def drain(ctx):
if ctx["exc"] is not None:
raise ctx["exc"]
return
#aio.coroutine
def client_handler(ctx):
i = 0
while True:
i += 1
print("write", i)
# yield # Uncommenting this allows the loop.call_later call to be scheduled.
yield from drain(ctx)
CTX = {"exc": None}
loop = aio.get_event_loop()
# Set the exception in 5 seconds
loop.call_later(5, set_exception, CTX, Exception("connection lost"))
loop.run_until_complete(client_handler(CTX))
loop.close()
This should probably fixed upstream in the Streams API by the asyncio developers.

This is a little bit strange, but you can actually allow an exception to reach the client_handler coroutine by forcing it to yield control to the event loop for one iteration:
import asyncio
import logging
#asyncio.coroutine
def client_handler(reader, writer):
while True:
writer.write(bytes(1))
yield # Yield to the event loop
yield from writer.drain()
logging.basicConfig(level=logging.INFO)
loop = asyncio.get_event_loop()
coro = asyncio.start_server(client_handler, '', 12345)
server = loop.run_until_complete(coro)
loop.run_forever()
If I do that, I get this output when I kill the client connection:
ERROR:asyncio:Task exception was never retrieved
future: <Task finished coro=<client_handler() done, defined at aio.py:4> exception=ConnectionResetError(104, 'Connection reset by peer')>
Traceback (most recent call last):
File "/usr/lib/python3.4/asyncio/tasks.py", line 238, in _step
result = next(coro)
File "aio.py", line 9, in client_handler
yield from writer.drain()
File "/usr/lib/python3.4/asyncio/streams.py", line 301, in drain
raise exc
File "/usr/lib/python3.4/asyncio/selector_events.py", line 700, in write
n = self._sock.send(data)
ConnectionResetError: [Errno 104] Connection reset by peer
I'm really not quite sure why you need to explicitly let the event loop get control for the exception to get through - don't have time at the moment to dig into it. I assume some bit needs to get flipped to indicate the connection dropped, and calling yield from writer.drain() (which can short-circuit going through the event loop) in a loop is preventing that from happening, but I'm really not sure. If I get a chance to investigate, I'll update the answer with that info.

The stream based API doesn't have a callback you can specify for when the connection is closed. But the Protocol API does, so use it instead: https://docs.python.org/3/library/asyncio-protocol.html#connection-callbacks

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.