We suddenly started see "Interrupted system call" on Queue operations like this:
Exception in thread Thread-2:
Traceback (most recent call last):
[ . . . ]
result = self.pager.results.get(True, self.WAIT_SECONDS)
File "/usr/lib/python2.5/site-packages/processing-0.52-py2.5-linux-x86_64.egg/processing/queue.py", line 128, in get
if not self._poll(block and (deadline-time.time()) or 0.0):
IOError: [Errno 4] Interrupted system call
This is a Fedora 10 / Python 2.5 machine that recently had a security update. Prior to that our software had run for about a year without incident, now it is crashing daily.
Is it correct/necessary to catch this exception and retry the Queue operation?
We don't have any signal handlers that we set, but this is a Tkinter app maybe it sets some. Is it safe to clear the SIGINT handler, would that solve the problem? Thanks.
Based on this thread on comp.lang.python and this reply from Dan Stromberg I wrote a RetryQueue which is a drop-in replacement for Queue and which does the job for us:
from multiprocessing.queues import Queue
import errno
def retry_on_eintr(function, *args, **kw):
while True:
try:
return function(*args, **kw)
except IOError, e:
if e.errno == errno.EINTR:
continue
else:
raise
class RetryQueue(Queue):
"""Queue which will retry if interrupted with EINTR."""
def get(self, block=True, timeout=None):
return retry_on_eintr(Queue.get, self, block, timeout)
Finally this got fixed in python itself, so the other solution is to update to a newer python:
http://bugs.python.org/issue17097
Related
This question already has answers here:
Paho MQTT Python Client: No exceptions thrown, just stops
(3 answers)
Closed 2 years ago.
I have a place in my code where I made a mistake in the name of the key of a dict. It took some time to understand why the code was not running past that place because a traceback was not thrown.
The code is below, I put it for completeness, highlighting with →→→ the place where the issue is:
class Alert:
lock = threading.Lock()
sent_alerts = {}
#staticmethod
def start_alert_listener():
# load existing alerts to keep persistancy
try:
with open("sent_alerts.json") as f:
json.load(f)
except FileNotFoundError:
# there is no file, never mind - it will be created at some point
pass
# start the listener
log.info("starting alert listener")
client = paho.mqtt.client.Client()
client.on_connect = Alert.mqtt_connection_alert
client.on_message = Alert.alert
client.connect("mqtt.XXXX", 1883, 60)
client.loop_forever()
#staticmethod
def mqtt_connection_alert(client, userdata, flags, rc):
if rc:
log.critical(f"error connecting to MQTT: {rc}")
sys.exit()
topic = "monitor/+/state"
client.subscribe(topic)
log.info(f"subscribed alert to {topic}")
#staticmethod
def alert(client, userdata, msg):
event = json.loads(msg.payload)
log.debug(f"received alert: {event}")
→→→ if event['ok']:
# remove existing sent flag, not thread safe!
with Alert.lock:
Alert.sent_alerts.pop(msg['id'], None)
return
(...)
The log coming from the line just above is
2021-01-14 22:03:02,617 [monitor] DEBUG received alert: {'full_target_name': 'ThisAlwaysFails → a failure', 'isOK': False, 'why': 'explicit fail', 'extra': None, 'id': '6507a61c9688199a34cb006b354c8433', 'last': '2021-01-14T22:03:02.612912+01:00', 'last_ko': '2021-01-14T22:03:02.612912+01:00'}
This is the dict in which I am trying to erroneously access ok, which should raise an exception and a traceback. But nothing happens. The code does not further than that as if the error was silently discarded (and the method silently fails).
I tried to put a raise Exception("wazaa") between the log.debug() and the if - same thing, the method fails at that point but an exception is not raised.
I am at loss about the reason where an exception could not be visible though a traceback?
The alert() method is called in a separate thread, if this matters. For completeness I tried the follwong code just to make sure threading does not interfere but no (I do not see a reason why it should)
import threading
class Hello:
#staticmethod
def a():
raise Exception("I am an exception")
threading.Thread(target=Hello.a).start()
outputs
Exception in thread Thread-1:
Traceback (most recent call last):
File "C:\Python38\lib\threading.py", line 932, in _bootstrap_inner
self.run()
File "C:\Python38\lib\threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "C:/Users/yop/AppData/Roaming/JetBrains/PyCharm2020.3/scratches/scratch_1.py", line 7, in a
raise Exception("I am an exception")
Exception: I am an exception
It appears to call your callback within a try, then logs that error:
try:
self.on_message(self, self._userdata, message)
except Exception as err:
self._easy_log(
MQTT_LOG_ERR, 'Caught exception in on_message: %s', err)
if not self.suppress_exceptions:
raise
What I can't explain however is why the exception isn't being raised. I can't see why self.suppress_exceptions would be true for you since you never set it, but try:
Manually setting suppress_exceptions using client.suppress_exceptions = False. This shouldn't be necessary since that appears to be the default, but it's worth a try.
Checking the log that it apparently maintains. You'll need to refer to the docs though on how to do that, since I've never touched this library before.
I'm having problems catching a custom exception when thrown from a signal handler callback when using asyncio.
If I throw ShutdownApp from within do_io() below, I am able to properly catch it in run_app(). However, when the exception is raised from handle_sig(), I can't seem to catch it.
Minimal, Reproducible Example Tested using Python 3.8.5:
import asyncio
from functools import partial
import os
import signal
from signal import Signals
class ShutdownApp(BaseException):
pass
os.environ["PYTHONASYNCIODEBUG"] = "1"
class App:
def __init__(self):
self.loop = asyncio.get_event_loop()
def _add_signal_handler(self, signal, handler):
self.loop.add_signal_handler(signal, handler, signal)
def setup_signals(self) -> None:
self._add_signal_handler(signal.SIGINT, self.handle_sig)
def handle_sig(self, signum):
print(f"\npid: {os.getpid()}, Received signal: {Signals(signum).name}, raising error for exit")
raise ShutdownApp("Exiting")
async def do_io(self):
print("io start. Press Ctrl+C now.")
await asyncio.sleep(5)
print("io end")
def run_app(self):
print("Starting Program")
try:
self.loop.run_until_complete(self.do_io())
except ShutdownApp as e:
print("ShutdownApp caught:", e)
# TODO: do other shutdown related items
except:
print("Other error")
finally:
self.loop.close()
if __name__ == "__main__":
my_app = App()
my_app.setup_signals()
my_app.run_app()
print("Finished")
The output after pressing CTRL+C (for SIGINT) with asyncio debug mode:
(env_aiohttp) anav#anav-pc:~/Downloads/test$ python test_asyncio_signal.py
Starting Program
io start. Press Ctrl+C now.
^C
pid: 20359, Received signal: SIGINT, raising error for exit
Exception in callback App.handle_sig(<Signals.SIGINT: 2>)
handle: <Handle App.handle_sig(<Signals.SIGINT: 2>) created at /home/anav/miniconda3/envs/env_aiohttp/lib/python3.8/asyncio/unix_events.py:99>
source_traceback: Object created at (most recent call last):
File "test_asyncio_signal.py", line 50, in <module>
my_app.setup_signals()
File "test_asyncio_signal.py", line 25, in setup_signals
self._add_signal_handler(signal.SIGINT, self.handle_sig)
File "test_asyncio_signal.py", line 22, in _add_signal_handler
self.loop.add_signal_handler(signal, handler, signal)
File "/home/anav/miniconda3/envs/env_aiohttp/lib/python3.8/asyncio/unix_events.py", line 99, in add_signal_handler
handle = events.Handle(callback, args, self, None)
Traceback (most recent call last):
File "/home/anav/miniconda3/envs/env_aiohttp/lib/python3.8/asyncio/events.py", line 81, in _run
self._context.run(self._callback, *self._args)
File "test_asyncio_signal.py", line 31, in handle_sig
raise ShutdownApp("Exiting")
ShutdownApp: Exiting
io end
Finished
Expected output:
Starting Program
io start. Press Ctrl+C now.
^C
pid: 20359, Received signal: SIGINT, raising error for exit
ShutdownApp caught: Exiting
io end
Finished
Is it possible to raise a custom exception from a signal handler in asyncio? If so, how do I properly catch/except it?
handle_sig is a callback, so it runs directly off the event loop and its exceptions are just reported to the user via a global hook. If you want the exception raised there to be caught elsewhere in the program, you need to use a future to transfer the exception from handle_sig to where you want it noticed.
To catch the exception at top-level, you probably want to introduce another method, let's call it async_main(), that waits for either self.do_io() or the previously-created future to complete:
def __init__(self):
self.loop = asyncio.get_event_loop()
self.done_future = self.loop.create_future()
async def async_main(self):
# wait for do_io or done_future, whatever happens first
io_task = asyncio.create_task(self.do_io())
await asyncio.wait([self.done_future, io_task],
return_when=asyncio.FIRST_COMPLETED)
if self.done_future.done():
io_task.cancel()
await self.done_future # propagate the exception, if raised
else:
self.done_future.cancel()
To raise the exception from inside handle_sig, you just need to set the exception on the future object:
def handle_sig(self, signum):
print(f"\npid: {os.getpid()}, Received signal: {Signals(signum).name}, raising error for exit")
self.done_future.set_exception(ShutdownApp("Exiting"))
Finally, you modify run_app to pass self.async_main() to run_until_complete, and you're all set:
$ python3 x.py
Starting Program
io start. Press Ctrl+C now.
^C
pid: 2069230, Received signal: SIGINT, raising error for exit
ShutdownApp caught: Exiting
Finished
In closing, note that reliably catching keyboard interrupts is a notoriously tricky undertaking and the above code might not cover all the corner cases.
If I throw ShutdownApp from within do_io() below, I am able to properly catch it in run_app(). However, when the exception is raised from handle_sig(), I can't seem to catch it.
Response to the query given above
Custom Exception Implementation:
class RecipeNotValidError(Exception):
def __init__(self):
self.message = "Your recipe is not valid"
try:
raise RecipeNotValidError
except RecipeNotValidError as e:
print(e.message)
In the method handle_sig, add try and except block. Also, you can customize messages in custom exceptions.
def handle_sig(self, signum):
try:
print(f"\npid: {os.getpid()}, Received signal: {Signals(signum).name}, raising
error for exit")
raise ShutdownApp("Exiting")
except ShutdownApp as e:
print(e.message)
In response to your second query :
Is it possible to raise a custom exception from a signal handler in asyncio? If so, how do I properly catch/except it?
In-built error handling in Asyncio. For more documentation visit https://docs.python.org/3/library/asyncio-exceptions.html
import asyncio
async def f():
try:
while True: await asyncio.sleep(0)
except asyncio.CancelledError:
print('I was cancelled!')
else:
return 111
I cannot understand why sometimes I cannot catch the TimeOutError inside my flash_serial_buffer method.
When running my program I sometimes get a TimeOutError that is not caught and I cannot understand why. I indicate the code of the signal handlers and the methods where the TimeOutError is not caught. How could this be happening?
This is the code for my signal handler definition and callback function.
Basically if the time ends, the signal handler is called and raises a timeout error.
def signal_handler(signum, frame):
print "PUM"
raise TimedOutError("Time out Error")
signal.signal(signal.SIGALRM, signal_handler)
The flush serial buffer blocks if there is no answer to
answer = xbee.wait_read_frame()
The idea is to clean everything in the buffer until there aren’t any more messages. When there are no more messages, it just waits for the SIGALRM to explode and raises the timeout error.
def flush_serial_buffer(xbee):
# Flush coordinators serial buffer if problem happened before
logging.info(" Flashing serial buffer")
try:
signal.alarm(1) # Seconds
while True:
answer = xbee.wait_read_frame()
signal.alarm(1)
logging.error(" Mixed messages in buffer")
except TimedOutError:
signal.alarm(0) # Seconds
logging.error(" No more messages in buffer")
signal.alarm(0) # Supposedly it never leaves without using Except, but...
Is there a case where the TimeOutError might be raised, but not caught by the try: statement?
Here is my error class definition:
class TimedOutError(Exception):
pass
I was able to repeat the error again. I really cannot understand why the try does not catch the error it.
INFO:root: Flashing serial buffer
PUM
Traceback (most recent call last):
File "/home/ls/bin/pycharm-community-4.0.6/helpers/pydev/pydevd.py", line 1458, in trace_dispatch
if self._finishDebuggingSession and not self._terminationEventSent:
File "/home/ls/PiProjects/Deployeth/HW-RPI-API/devices.py", line 42, in signal_handler
raise TimedOutError("Time out Error")
TimedOutError: Time out Error
I would recommend in this case replacing the try and except code with this:
try:
signal.alarm(1) # Seconds
while True:
answer = xbee.wait_read_frame()
signal.alarm(1)
logging.error(" Mixed messages in buffer")
except:
signal.alarm(0) # Seconds
logging.error(" No more messages in buffer")
PS: You don't need to include try (whatever error) in your try and except statements.
The Python program below starts one thread and then continues to perform actions in the main thread. I wrap the whole main thread in a try-except block so I can tear down all running threads if an exception occured.
When I run the script using Python 2.7.5 and invoke a KeyboardInterrupt at a random point during the programs execution, the exception is triggered but not catched. The program continues to run.
$ python test.py
Running server ...
Searching for servers ...
^CTraceback (most recent call last):
File "test.py", line 50, in <module>
main()
File "test.py", line 40, in main
app_main()
File "test.py", line 35, in app_main
searchservers()
File "test.py", line 26, in searchservers
time.sleep(0.0005)
KeyboardInterrupt
I miss a line in the output which is printed in main() when an exception occured.
Code
import time
import threading
thread_pool = []
running = False
def stop():
global running
running = False
def runserver():
print "Running server ..."
global running
running = True
while running:
time.sleep(0.07)
def searchservers():
print "Searching for servers ..."
for i in xrange(256):
for j in xrange(256):
time.sleep(0.0005)
def app_main():
server = threading.Thread(target=runserver)
thread_pool.append(server)
server.start()
time.sleep(0.1)
searchservers()
stop()
def main():
try:
app_main()
except Exception as exc:
stop()
print "%s occured, joining all threads..." % exc.__class__.__name__
for thread in thread_pool:
thread.join()
raise exc
if __name__ == "__main__":
main()
Why is the KeyboardInterrupt not catched? What is the proper way of catching exceptions in a threaded program and tear down the complete process?
KeyboardInterrupt is a special exception; like MemoryError, GeneratorExit and SystemExit, it does not derive from the base Exception class.
Catching just Exception is thus not enough; you'd normally catch it explicitly:
except (Exception, KeyboardInterrupt) as exc:
However, you are also trying to catch exceptions in threads; threads have their own separate stack; you cannot just go and catch exceptions thrown in those stack in your main thread. You'd have to catch exceptions in that thread:
def runserver():
print "Running server ..."
global running
running = True
try:
while running:
time.sleep(0.07)
except (Exception, KeyboardInterrupt) as exc:
print "Error in the runserver thread"
To handle this in a generic way and 'pass' exceptions to the main thread, you'd need some sort of inter-thread communication. See Catch a thread's exception in the caller thread in Python for a full solution to that.
Currently I have a basic HTTP server set up using BaseHTTPRequestHandler and I use the do_GET method of the same. Id like a function check to be invoked if a request does not come in for 5 seconds.
I'm considering using multiprocessing along with the time module for the same, but I'm concerned about its reliability. Are there any suggestions for best practices relating to the same?
Thanks.
[EDIT]
Marjin's solution is really cool but I end up with the following traceback :-
Traceback (most recent call last):
File "test.py", line 89, in <module>
main()
File "test.py", line 83, in main
server.serve_forever()
File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/SocketServer.py", line 224, in serve_forever
r, w, e = select.select([self], [], [], poll_interval)
select.error: (4, 'Interrupted system call')
[EDIT 2]
I tried it on Python 2.7 but the error still occurs.
[EDIT 3]
Traceback (most recent call last):
File "test.py", line 90, in <module>
main()
File "test.py", line 84, in main
server.serve_forever()
File "/usr/local/lib/python2.7/SocketServer.py", line 225, in serve_forever
r, w, e = select.select([self], [], [], poll_interval)
select.error: (4, 'Interrupted system call')
For a simple server such as one based on BaseHTTPRequestHandler you could use a signal handler:
import time
import signal
import sys
last_request = sys.maxint # arbitrary high value to *not* trigger until there has been 1 requests at least
def itimer_handler(signum, frame):
print 'itimer heartbeat'
if time.time() - last_request > 300: # 5 minutes have passed at least with no request
# do stuff now to log, kill, restart, etc.
print 'Timeout, no requests for 5 minutes!'
signal.signal(signal.SIGALRM, itimer_handler)
signal.setitimer(signal.ITIMER_REAL, 30, 30) # check for a timeout every 30 seconds
# ...
def do_GET(..):
global last_request
last_request = time.time() # reset the timer again
The signal.setitimer() call causes the OS to send a periodic SIGALRM signal to our process. This isn't too precise; the setitimer) call is set for 30 second intervals. Any incoming request resets a global timestamp and the itimer_handler being called every 30 seconds compares checks if 5 minutes have passed since the last time the timestamp has been set.
The SIGALRM signal will interrupt a running request as well, so whatever you do in that handler needs to finish quickly. When the function returns the normal python code flow resumes, just like a thread.
Note that this requires at least Python 2.7.4 for this to work; see issue 7978, and 2.7.4 is not yet released. You can either download the SocketServer.py file that will be included in Python 2.7.4, or you could apply the following backport to add the errorno.EINTR handling introduced in that version:
'''Backport of 2.7.4 EINTR handling'''
import errno
import select
import SocketServer
def _eintr_retry(func, *args):
"""restart a system call interrupted by EINTR"""
while True:
try:
return func(*args)
except (OSError, select.error) as e:
if e.args[0] != errno.EINTR:
raise
def serve_forever(self, poll_interval=0.5):
"""Handle one request at a time until shutdown.
Polls for shutdown every poll_interval seconds. Ignores
self.timeout. If you need to do periodic tasks, do them in
another thread.
"""
self._BaseServer__is_shut_down.clear()
try:
while not self._BaseServer__shutdown_request:
# XXX: Consider using another file descriptor or
# connecting to the socket to wake this up instead of
# polling. Polling reduces our responsiveness to a
# shutdown request and wastes cpu at all other times.
r, w, e = _eintr_retry(select.select, [self], [], [],
poll_interval)
if self in r:
self._handle_request_noblock()
finally:
self._BaseServer__shutdown_request = False
self._BaseServer__is_shut_down.set()
def handle_request(self):
"""Handle one request, possibly blocking.
Respects self.timeout.
"""
# Support people who used socket.settimeout() to escape
# handle_request before self.timeout was available.
timeout = self.socket.gettimeout()
if timeout is None:
timeout = self.timeout
elif self.timeout is not None:
timeout = min(timeout, self.timeout)
fd_sets = _eintr_retry(select.select, [self], [], [], timeout)
if not fd_sets[0]:
self.handle_timeout()
return
self._handle_request_noblock()
# patch in updated methods
SocketServer.BaseServer.serve_forever = serve_forever
SocketServer.BaseServer.handle_request = handle_request