I'm writing a spider to crawl web pages. I know asyncio maybe my best choice. So I use coroutines to process the work asynchronously. Now I scratch my head about how to quit the program by keyboard interrupt. The program could shut down well after all the works have been done. The source code could be run in python 3.5 and is attatched below.
import asyncio
import aiohttp
from contextlib import suppress
class Spider(object):
def __init__(self):
self.max_tasks = 2
self.task_queue = asyncio.Queue(self.max_tasks)
self.loop = asyncio.get_event_loop()
self.counter = 1
def close(self):
for w in self.workers:
w.cancel()
async def fetch(self, url):
try:
async with aiohttp.ClientSession(loop = self.loop) as self.session:
with aiohttp.Timeout(30, loop = self.session.loop):
async with self.session.get(url) as resp:
print('get response from url: %s' % url)
except:
pass
finally:
pass
async def work(self):
while True:
url = await self.task_queue.get()
await self.fetch(url)
self.task_queue.task_done()
def assign_work(self):
print('[*]assigning work...')
url = 'https://www.python.org/'
if self.counter > 10:
return 'done'
for _ in range(self.max_tasks):
self.counter += 1
self.task_queue.put_nowait(url)
async def crawl(self):
self.workers = [self.loop.create_task(self.work()) for _ in range(self.max_tasks)]
while True:
if self.assign_work() == 'done':
break
await self.task_queue.join()
self.close()
def main():
loop = asyncio.get_event_loop()
spider = Spider()
try:
loop.run_until_complete(spider.crawl())
except KeyboardInterrupt:
print ('Interrupt from keyboard')
spider.close()
pending = asyncio.Task.all_tasks()
for w in pending:
w.cancel()
with suppress(asyncio.CancelledError):
loop.run_until_complete(w)
finally:
loop.stop()
loop.run_forever()
loop.close()
if __name__ == '__main__':
main()
But if I press 'Ctrl+C' while it's running, some strange errors may occur. I mean sometimes the program could be shut down by 'Ctrl+C' gracefully. No error message. However, in some cases the program will be still running after pressing 'Ctrl+C' and wouldn't stop until all the works have been done. If I press 'Ctrl+C' at that moment, 'Task was destroyed but it is pending!' would be there.
I have read some topics about asyncio and add some code in main() to close coroutines gracefully. But it not work. Is someone else has the similar problems?
I bet problem happens here:
except:
pass
You should never do such thing. And your situation is one more example of what can happen otherwise.
When you cancel task and await for its cancellation, asyncio.CancelledError raised inside task and shouldn't be suppressed anywhere inside. Line where you await of your task cancellation should raise this exception, otherwise task will continue execution.
That's why you do
task.cancel()
with suppress(asyncio.CancelledError):
loop.run_until_complete(task) # this line should raise CancelledError,
# otherwise task will continue
to actually cancel task.
Upd:
But I still hardly understand why the original code could quit well by
'Ctrl+C' at a uncertain probability?
It dependence of state of your tasks:
If at the moment you press 'Ctrl+C' all tasks are done, non of
them will raise CancelledError on awaiting and your code will finished normally.
If at the moment you press 'Ctrl+C' some tasks are pending, but close to finish their execution, your code will stuck a bit on tasks cancellation and finished when tasks are finished shortly after it.
If at the moment you press 'Ctrl+C' some tasks are pending and
far from being finished, your code will stuck trying to cancel these tasks (which
can't be done). Another 'Ctrl+C' will interrupt process of
cancelling, but tasks wouldn't be cancelled or finished then and you'll get
warning 'Task was destroyed but it is pending!'.
I assume you are using any flavor of Unix; if this is not the case, my comments might not apply to your situation.
Pressing Ctrl-C in a terminal sends all processes associated with this tty the signal SIGINT. A Python process catches this Unix signal and translates this into throwing a KeyboardInterrupt exception. In a threaded application (I'm not sure if the async stuff internally is using threads, but it very much sounds like it does) typically only one thread (the main thread) receives this signal and thus reacts in this fashion. If it is not prepared especially for this situation, it will terminate due to the exception.
Then the threading administration will wait for the still running fellow threads to terminate before the Unix process as a whole terminates with an exit code. This can take quite a long time. See this question about killing fellow threads and why this isn't possible in general.
What you want to do, I assume, is kill your process immediately, killing all threads in one step.
The easiest way to achieve this is to press Ctrl-\. This will send a SIGQUIT instead of a SIGINT which typically influences also the fellow threads and causes them to terminate.
If this is not enough (because for whatever reason you need to react properly on Ctrl-C), you can send yourself a signal:
import os, signal
os.kill(os.getpid(), signal.SIGQUIT)
This should terminate all running threads unless they especially catch SIGQUIT in which case you still can use SIGKILL to perform a hard kill on them. This doesn't give them any option of reacting, though, and might lead to problems.
Related
import signal
import asyncio
from typing import Optional
from types import FrameType
async def main() -> None:
server = (
await asyncio.start_server(
lambda reader, writer: None,
'127.0.0.1',
7070
)
)
tasks = (
asyncio.create_task(server.serve_forever()),
asyncio.create_task(event.wait())
)
async with server:
await asyncio.wait(tasks, return_when=asyncio.FIRST_COMPLETED)
# Some shutdown logic goes here.
def handle_signal(signal: int, frame: Optional[FrameType]) -> None:
event.set()
if __name__ == '__main__':
event = asyncio.Event()
signal.signal(signal.SIGINT, handle_signal)
asyncio.run(main())
I want to gracefully terminate an asyncio server if user sends SIGINT (or, in other words, presses CTRL+C).
For this reason, asyncio.Event is used: I use asyncio.wait in the main coroutine to wait till either the server has been stopped or SIGINT has been received. Signal handler has been set accordingly.
The problem is, the solution does not work (tested on Alpine Linux). Can somebody explain why exactly? Can I workaround it somehow?
An interrupt can happen at any time and the handler is called between two Python bytecode instructions. In general, there are only few simple functions that are safe to call in a signal handler, beacause buffers or internal data may be in an inconsistent state. The recommendation is only to set a flag that is periodically checked in the program's main loop.
In asyncio, we can handle the interrupt like something happening in another thread. Technically it is in the same thread, but the point is it is not controlled by the event loop.
Asyncio is not threadsafe, but there are few helpers. call_soon_threadsafe schedules a callback to be called asap (like call_soon), but in addition it wakes up the event loop.
def handle_signal(signal: int, frame: Optional[FrameType]) -> None:
asyncio.get_running_loop().call_soon_threadsafe(evset)
def evset():
event.set()
I have created a proxychecker that operates fine when it is left alone to check all the proxies. But I would like to implement functionality so that upon keyboard interrupt it cancels all pending proxychecking coroutines and exits gracefully. In its current state, upon keyboard interrupt the program does not exit gracefully and I get error messages - "Task was destroyed but it is pending!"
After doing some research, I realize that this is happening because I am closing the event loop before the coroutines have finished canceling. I have decided to try and attempt my own implementation of the solution found in this stackoverflow post:
What's the correct way to clean up after an interrupted event loop?
However my implementation does not work; it seems that the execution gets stuck in the loop.run_forever() because upon keyboard interrupt, my terminal is stuck in processing.
If possible, I would truly appreciate a solution that does not involve waiting for pending tasks to be finished. The target functionality is that upon keyboard interrupt, the program drops everything, issues a report, and exits.
Also I am new to asyncio so any constructive criticism of how I've structured my program are also truly appreciated.
async def check_proxy(self, id, session):
proxy = self.plist[0]
del self.plist[0]
try:
self.stats["tries"] += 1
async with session.head(self.test_url, proxy=proxy, timeout=self.timeout) as response:
if response and response.status == 200:
self.stats["alive"] += 1
self.alive.append(proxy)
print(f"{id} has found a live proxy : " + proxy)
except Exception:
pass
async def main(self):
tasks = []
connector = ProxyConnector()
async with aiohttp.ClientSession(connector=connector, request_class=ProxyClientRequest) as session:
while len(self.plist) > 0:
if len(self.plist) >= self.threads:
for i in range(self.threads):
tasks.append(asyncio.ensure_future(self.check_proxy(i+1, session)))
else:
for i in range(len(self.plist)):
tasks.append(asyncio.ensure_future(self.check_proxy(i+1, session)))
await asyncio.gather(*tasks)
def start(self):
self.load_proxies()
loop = asyncio.get_event_loop()
try:
loop.run_until_complete(self.main())
except KeyboardInterrupt:
for task in asyncio.Task.all_tasks():
task.cancel()
loop.run_forever()
asyncio.Task.all_tasks().exception()
finally:
loop.close()
self.report_stats()
Ideally, the output should look something like this:
...
55 has found a live proxy : socks5://81.10.222.118:1080
83 has found a live proxy : socks5://173.245.239.223:16938
111 has found a live proxy : socks5://138.68.41.90:1080
^C
# of Tries: 160
# of Alive: 32
Took me a while because I was branching out on the wrong track but here was the fix in case others run into the same problem.
except KeyboardInterrupt:
for task in asyncio.Task.all_tasks():
task.cancel()
loop.stop() #-- rather unintuitive in my opinion, works though.
loop.run_forever()
finally:
loop.close()
Very simple fix; rather unfortunate that I didn't realize sooner. Hopefully this can prevent someone else from a similar descent into madness. Cheers!
Similar Question (but answer does not work for me): How to cancel long-running subprocesses running using concurrent.futures.ProcessPoolExecutor?
Unlike the question linked above and the solution provided, in my case the computation itself is rather long (CPU bound) and cannot be run in a loop to check if some event has happened.
Reduced version of the code below:
import asyncio
import concurrent.futures as futures
import time
class Simulator:
def __init__(self):
self._loop = None
self._lmz_executor = None
self._tasks = []
self._max_execution_time = time.monotonic() + 60
self._long_running_tasks = []
def initialise(self):
# Initialise the main asyncio loop
self._loop = asyncio.get_event_loop()
self._loop.set_default_executor(
futures.ThreadPoolExecutor(max_workers=3))
# Run separate processes of long computation task
self._lmz_executor = futures.ProcessPoolExecutor(max_workers=3)
def run(self):
self._tasks.extend(
[self.bot_reasoning_loop(bot_id) for bot_id in [1, 2, 3]]
)
try:
# Gather bot reasoner tasks
_reasoner_tasks = asyncio.gather(*self._tasks)
# Send the reasoner tasks to main monitor task
asyncio.gather(self.sample_main_loop(_reasoner_tasks))
self._loop.run_forever()
except KeyboardInterrupt:
pass
finally:
self._loop.close()
async def sample_main_loop(self, reasoner_tasks):
"""This is the main monitor task"""
await asyncio.wait_for(reasoner_tasks, None)
for task in self._long_running_tasks:
try:
await asyncio.wait_for(task, 10)
except asyncio.TimeoutError:
print("Oops. Some long operation timed out.")
task.cancel() # Doesn't cancel and has no effect
task.set_result(None) # Doesn't seem to have an effect
self._lmz_executor.shutdown()
self._loop.stop()
print('And now I am done. Yay!')
async def bot_reasoning_loop(self, bot):
import math
_exec_count = 0
_sleepy_time = 15
_max_runs = math.floor(self._max_execution_time / _sleepy_time)
self._long_running_tasks.append(
self._loop.run_in_executor(
self._lmz_executor, really_long_process, _sleepy_time))
while time.monotonic() < self._max_execution_time:
print("Bot#{}: thinking for {}s. Run {}/{}".format(
bot, _sleepy_time, _exec_count, _max_runs))
await asyncio.sleep(_sleepy_time)
_exec_count += 1
print("Bot#{} Finished Thinking".format(bot))
def really_long_process(sleepy_time):
print("I am a really long computation.....")
_large_val = 9729379273492397293479237492734 ** 344323
print("I finally computed this large value: {}".format(_large_val))
if __name__ == "__main__":
sim = Simulator()
sim.initialise()
sim.run()
The idea is that there is a main simulation loop that runs and monitors three bot threads. Each of these bot threads then perform some reasoning but also start a really long background process using ProcessPoolExecutor, which may end up running longer their own threshold/max execution time for reasoning on things.
As you can see in the code above, I attempted to .cancel() these tasks when a timeout occurs. Though this is not really cancelling the actual computation, which keeps happening in the background and the asyncio loop doesn't terminate until after all the long running computation have finished.
How do I terminate such long running CPU-bound computations within a method?
Other similar SO questions, but not necessarily related or helpful:
asyncio: Is it possible to cancel a future been run by an Executor?
How to terminate a single async task in multiprocessing if that single async task exceeds a threshold time in Python
Asynchronous multiprocessing with a worker pool in Python: how to keep going after timeout?
How do I terminate such long running CPU-bound computations within a method?
The approach you tried doesn't work because the futures returned by ProcessPoolExecutor are not cancellable. Although asyncio's run_in_executor tries to propagate the cancellation, it is simply ignored by Future.cancel once the task starts executing.
There is no fundamental reason for that. Unlike threads, processes can be safely terminated, so it would be perfectly possible for ProcessPoolExecutor.submit to return a future whose cancel terminated the corresponding process. Asyncio coroutines have well-defined cancellation semantics and could automatically make use of it. Unfortunately, ProcessPoolExecutor.submit returns a regular concurrent.futures.Future, which assumes the lowest common denominator of the underlying executors, and treats a running future as untouchable.
As a result, to cancel tasks executed in subprocesses, one must circumvent the ProcessPoolExecutor altogether and manage one's own processes. The challenge is how to do this without reimplementing half of multiprocessing. One option offered by the standard library is to (ab)use multiprocessing.Pool for this purpose, because it supports reliable shutdown of worker processes. A CancellablePool could work as follows:
Instead of spawning a fixed number of processes, spawn a fixed number of 1-worker pools.
Assign tasks to pools from an asyncio coroutine. If the coroutine is canceled while waiting for the task to finish in the other process, terminate the single-process pool and create a new one.
Since everything is coordinated from the single asyncio thread, don't worry about race conditions such as accidentally killing a process which has already started executing another task. (This would need to be prevented if one were to support cancellation in ProcessPoolExecutor.)
Here is a sample implementation of that idea:
import asyncio
import multiprocessing
class CancellablePool:
def __init__(self, max_workers=3):
self._free = {self._new_pool() for _ in range(max_workers)}
self._working = set()
self._change = asyncio.Event()
def _new_pool(self):
return multiprocessing.Pool(1)
async def apply(self, fn, *args):
"""
Like multiprocessing.Pool.apply_async, but:
* is an asyncio coroutine
* terminates the process if cancelled
"""
while not self._free:
await self._change.wait()
self._change.clear()
pool = usable_pool = self._free.pop()
self._working.add(pool)
loop = asyncio.get_event_loop()
fut = loop.create_future()
def _on_done(obj):
loop.call_soon_threadsafe(fut.set_result, obj)
def _on_err(err):
loop.call_soon_threadsafe(fut.set_exception, err)
pool.apply_async(fn, args, callback=_on_done, error_callback=_on_err)
try:
return await fut
except asyncio.CancelledError:
pool.terminate()
usable_pool = self._new_pool()
finally:
self._working.remove(pool)
self._free.add(usable_pool)
self._change.set()
def shutdown(self):
for p in self._working | self._free:
p.terminate()
self._free.clear()
A minimalistic test case showing cancellation:
def really_long_process():
print("I am a really long computation.....")
large_val = 9729379273492397293479237492734 ** 344323
print("I finally computed this large value: {}".format(large_val))
async def main():
loop = asyncio.get_event_loop()
pool = CancellablePool()
tasks = [loop.create_task(pool.apply(really_long_process))
for _ in range(5)]
for t in tasks:
try:
await asyncio.wait_for(t, 1)
except asyncio.TimeoutError:
print('task timed out and cancelled')
pool.shutdown()
asyncio.get_event_loop().run_until_complete(main())
Note how the CPU usage never exceeds 3 cores, and how it starts dropping near the end of the test, indicating that the processes are being terminated as expected.
To apply it to the code from the question, make self._lmz_executor an instance of CancellablePool and change self._loop.run_in_executor(...) to self._loop.create_task(self._lmz_executor.apply(...)).
I've read every post I could find about how to gracefully handle a script with an asyncio event loop getting terminated with Ctrl-C, and I haven't been able to get any of them to work without printing one or more tracebacks as I do so. The answers are pretty much all over the place, and I haven't been able implement any of them into this small script:
import asyncio
import datetime
import functools
import signal
async def display_date(loop):
end_time = loop.time() + 5.0
while True:
print(datetime.datetime.now())
if (loop.time() + 1.0) >= end_time:
break
await asyncio.sleep(1)
def stopper(signame, loop):
print("Got %s, stopping..." % signame)
loop.stop()
loop = asyncio.get_event_loop()
for signame in ('SIGINT', 'SIGTERM'):
loop.add_signal_handler(getattr(signal, signame), functools.partial(stopper, signame, loop))
loop.run_until_complete(display_date(loop))
loop.close()
What I want to happen is for the script to exit without printing any tracebacks following a Ctrl-C (or SIGTERM/SIGINT sent via kill). This code prints RuntimeError: Event loop stopped before Future completed. In the MANY other forms I've tried based on previous answers, I've gotten a plethora of other types of exception classes and error messages with no idea how to fix them. The code above is minimal right now, but some of the attempts I made earlier were anything but, and none of them were correct.
If you're able to modify the script so that it terminates gracefully, an explanation of why your way of doing it is the right way would be greatly appreciated.
Use signal handlers:
import asyncio
from signal import SIGINT, SIGTERM
async def main_coro():
try:
await awaitable()
except asyncio.CancelledError:
do_cleanup()
if __name__ == "__main__":
loop = asyncio.get_event_loop()
main_task = asyncio.ensure_future(main_coro())
for signal in [SIGINT, SIGTERM]:
loop.add_signal_handler(signal, main_task.cancel)
try:
loop.run_until_complete(main_task)
finally:
loop.close()
Stopping the event loop while it is running will never be valid.
Here, you need to catch the Ctrl-C, to indicate to Python that you wish to handle it yourself instead of displaying the default stacktrace. This can be done with a classic try/except:
coro = display_date(loop)
try:
loop.run_until_complete(coro)
except KeyboardInterrupt:
print("Received exit, exiting")
And, for your use-case, that's it!
For a more real-life program, you would probably need to cleanup some resources. See also Graceful shutdown of asyncio coroutines
I've seen many topics about this particular problem but i still can't figure why i'm not catching a SIGINT in my main Thread.
Here is my code:
def connect(self, retry=100):
tries=retry
logging.info('connecting to %s' % self.path)
while True:
try:
self.sp = serial.Serial(self.path, 115200)
self.pileMessage = pilemessage.Pilemessage()
self.pileData = pilemessage.Pilemessage()
self.reception = reception.Reception(self.sp,self.pileMessage,self.pileData)
self.reception.start()
self.collisionlistener = collisionListener.CollisionListener(self)
self.message = messageThread.Message(self.pileMessage,self.collisionlistener)
self.datastreaminglistener = dataStreamingListener.DataStreamingListener(self)
self.datastreaming = dataStreaming.Data(self.pileData,self.datastreaminglistener)
return
except serial.serialutil.SerialException:
logging.info('retrying')
if not retry:
raise SpheroError('failed to connect after %d tries' % (tries-retry))
retry -= 1
def disconnect(self):
self.reception.stop()
self.message.stop()
self.datastreaming.stop()
while not self.pileData.isEmpty():
self.pileData.pop()
self.datastreaminglistener.remove()
while not self.pileMessage.isEmpty():
self.pileMessage.pop()
self.collisionlistener.remove()
self.sp.close()
if __name__ == '__main__':
import time
try:
logging.getLogger().setLevel(logging.DEBUG)
s = Sphero("/dev/rfcomm0")
s.connect()
s.set_motion_timeout(65525)
s.set_rgb(0,255,0)
s.set_back_led_output(255)
s.configure_locator(0,0)
except KeyboardInterrupt:
s.disconnect()
In the main function I call Connect() which is launching Threads over which i don't have direct controll.
When I launch this script I would like to be able to stop it when hitting Control+C by calling the "disconnect()" function which stops all the other threads.
In the code i provided it doesn't work because there is no thread in the main function. But I already tryied putting all the instuctions from Main() in a Thread with a While loop without success.
Is there a simple way to solve my problem ?
Thanx
Your indentation is messed up, but there's enough to go on.
Your main thread isn't catching SIGINT because it's not alive. There is nothing that stops your main thread from continuing past the try block, seeing no more code, and closing up shop.
I am not familiar with Sphero. I just attempted to google its docs and was linked to a bunch of 404 pages, so I'll tell you what you would normally do in a threaded environment - join your threads to the main thread so that the main thread can't finish execution before the worker threads.
for t in my_thread_list:
t.join() #main thread can't get past here until all the threads finish
If your Sphero object doesn't provide join-like functionality, you could hack something in that blocks, i.e.
raw_input('Press Enter to disconnect')
s.disconnect()