Python: wait for sigchild outside main thread

Python: wait for sigchild outside main thread - python

I'm working on a task scheduler which should run tasks described in xml file.
Scheduler should be multithread and asynchronous to minimize overhead spent in waiting.
The problem is that os.wait* throws OSError when no child is running.
So, here is my code:
print("Waiting")
try:
(pid, exit_status) = os.waitpid(-1, 0)
exit_status >>= 8
except OSError:
print("Got error")
#when wait is called, and there is no child processes
#OSError is being raised
import time
time.sleep(1)
continue
I wonder if it is possible to eliminate sleep() call.

Related

signal handler hangs in Popen.wait(timeout) if an infinite wait() was started already

I have come across a Python sub-process issue that I have replicated on Python 3.6 and 3.7 that I do not understand. I have a program, call it Main, that launches an external process using subprocess.Popen(), call it "Slave". The Main program registers a SIGTERM signal handler. Main waits on the Slave process to complete using either proc.wait(None) or proc.wait(timeout). The Slave process can be interrupted by sending a SIGTERM signal to Main. The sigterm handler will send a SIGINT signal to the Slave and wait(30) for it to terminate. If Main is using wait(None), then the sigterm handler's wait(30) will wait the full 30 seconds even though the slave process has terminated. If Main is using the wait(timeout) version, then the sigterm handler's wait(30) will return as soon as the Slave terminates.
Here is a small test app that demonstrates the issue. Run it via python wait_test.py to use the non-timeout wait(None). Run it via python wait_test.py <timeout value> to provide a specific timeout to the Main wait.
Once the program is running, execute kill -15 <pid> and see how the app reacts.
#
# Save this to a file called wait_test.py
#
import signal
import subprocess
import sys
from datetime import datetime
slave_proc = None
def sigterm_handler(signum, stack):
print("Process received SIGTERM signal {} while processing job!".format(signum))
print("slave_proc is {}".format(slave_proc))
if slave_proc is not None:
try:
print("{}: Sending SIGINT to slave.".format(datetime.now()))
slave_proc.send_signal(signal.SIGINT)
slave_proc.wait(30)
print("{}: Handler wait completed.".format(datetime.now()))
except subprocess.TimeoutExpired:
slave_proc.terminate()
except Exception as exception:
print('Sigterm Exception: {}'.format(exception))
slave_proc.terminate()
slave_proc.send_signal(signal.SIGKILL)
def main(wait_val=None):
with open("stdout.txt", 'w+') as stdout:
with open("stderr.txt", 'w+') as stderr:
proc = subprocess.Popen(["python", "wait_test.py", "slave"],
stdout=stdout,
stderr=stderr,
universal_newlines=True)
print('Slave Started')
global slave_proc
slave_proc = proc
try:
proc.wait(wait_val) # If this is a no-timeout wait, ie: wait(None), then will hang in sigterm_handler.
print('Slave Finished by itself.')
except subprocess.TimeoutExpired as te:
print(te)
print('Slave finished by timeout')
proc.send_signal(signal.SIGINT)
proc.wait()
print("Job completed")
if __name__ == '__main__':
if len(sys.argv) > 1 and sys.argv[1] == 'slave':
while True:
pass
signal.signal(signal.SIGTERM, sigterm_handler)
main(int(sys.argv[1]) if len(sys.argv) > 1 else None)
print("{}: Exiting main.".format(datetime.now()))
Here is an example of the two runs:
Note here the 30 second delay
--------------------------------
[mkurtz#localhost testing]$ python wait_test.py
Slave Started
Process received SIGTERM signal 15 while processing job!
slave_proc is <subprocess.Popen object at 0x7f79b50e8d90>
2022-03-30 11:08:15.526319: Sending SIGINT to slave. <--- 11:08:15
Slave Finished by itself.
Job completed
2022-03-30 11:08:45.526942: Exiting main. <--- 11:08:45
Note here the instantaneous shutdown
-------------------------------------
[mkurtz#localhost testing]$ python wait_test.py 100
Slave Started
Process received SIGTERM signal 15 while processing job!
slave_proc is <subprocess.Popen object at 0x7fa2412a2dd0>
2022-03-30 11:10:03.649931: Sending SIGINT to slave. <--- 11:10:03.649
2022-03-30 11:10:03.653170: Handler wait completed. <--- 11:10:03.653
Slave Finished by itself.
Job completed
2022-03-30 11:10:03.673234: Exiting main. <--- 11:10:03.673
These specific tests were run using Python 3.7.9 on CentOS 7.
Can someone explain this behavior?

The Popen class has an internal lock for wait operations:
# Held while anything is calling waitpid before returncode has been
# updated to prevent clobbering returncode if wait() or poll() are
# called from multiple threads at once. After acquiring the lock,
# code must re-check self.returncode to see if another thread just
# finished a waitpid() call.
self._waitpid_lock = threading.Lock()
The major difference between wait() and wait(timeout=...) is that the former waits indefinitely while holding the lock, whereas the latter is a busy loop that releases the lock on each iteration.
if timeout is not None:
...
while True:
if self._waitpid_lock.acquire(False):
try:
...
# wait without any delay
(pid, sts) = self._try_wait(os.WNOHANG)
...
finally:
self._waitpid_lock.release()
...
time.sleep(delay)
else:
while self.returncode is None:
with self._waitpid_lock: # acquire lock unconditionally
...
# wait indefinitley
(pid, sts) = self._try_wait(0)
This isn't a problem for regular concurrent code – i.e. threading – since the thread running wait() and holding the lock will be woken up as soon as the subprocess finishes. This in turn allows all other threads waiting on the lock/subprocess to proceed promptly.
However, things look different when a) the main thread holds the lock in wait() and b) a signal handler attempts to wait. A subtle point of signal handlers is that they interrupt the main thread:
signal: Signals and Threads
Python signal handlers are always executed in the main Python thread of the main interpreter, even if the signal was received in another thread. […]
Since the signal handler runs in the main thread, the main thread's regular code execution is paused until the signal handler finishes!
By running wait in the signal handler, a) the signal handler blocks waiting for the lock and b) the lock blocks waiting for the signal handler. Only once the signal handler wait times out does the "main thread" resume, receive confirmation that the suprocess finished, set the return code and release the lock.

Immediately raise exceptions in concurrent.futures

I run several threads concurrently using concurrent.futures. All of them are necessary to run successfully for the next steps in the code to succeed.
While at the end of all processes I can raise any exceptions by running .result(), ideally any exception raised in a single thread would immediately stop all threads. This would be helpful to identify bugs in any task sooner, rather than waiting until all long-running processes complete.
Is this possible?

It's possible to exit after the first exception and not submit any new jobs to the Executor. However, once a job has been submitted, it can't be cancelled, you have to wait for all submitted jobs to finish (or timeout). See this question for details. Here's a short example that cancels any unsubmitted jobs once the first exception occurs. However, it still waits for the already submitted jobs to finish. This uses the "FIRST EXCEPTION" keyword listed in the concurrent.futures docs.
import time
import concurrent.futures
def example(i):
print(i)
assert i != 1
time.sleep(i)
return i
if __name__ == "__main__":
futures = []
with concurrent.futures.ThreadPoolExecutor() as executor:
for number in range(5):
futures.append(executor.submit(example, number))
exception = False
for completed, running_or_error in concurrent.futures.wait(futures, return_when="FIRST_EXCEPTION"):
try:
running_or_error.result()
except Exception as e:
for future in futures:
print(future.cancel()) # cancel all unstarted futures
raise e

How to graceful shut down coroutines with Ctrl+C?

I'm writing a spider to crawl web pages. I know asyncio maybe my best choice. So I use coroutines to process the work asynchronously. Now I scratch my head about how to quit the program by keyboard interrupt. The program could shut down well after all the works have been done. The source code could be run in python 3.5 and is attatched below.
import asyncio
import aiohttp
from contextlib import suppress
class Spider(object):
def __init__(self):
self.max_tasks = 2
self.task_queue = asyncio.Queue(self.max_tasks)
self.loop = asyncio.get_event_loop()
self.counter = 1
def close(self):
for w in self.workers:
w.cancel()
async def fetch(self, url):
try:
async with aiohttp.ClientSession(loop = self.loop) as self.session:
with aiohttp.Timeout(30, loop = self.session.loop):
async with self.session.get(url) as resp:
print('get response from url: %s' % url)
except:
pass
finally:
pass
async def work(self):
while True:
url = await self.task_queue.get()
await self.fetch(url)
self.task_queue.task_done()
def assign_work(self):
print('[*]assigning work...')
url = 'https://www.python.org/'
if self.counter > 10:
return 'done'
for _ in range(self.max_tasks):
self.counter += 1
self.task_queue.put_nowait(url)
async def crawl(self):
self.workers = [self.loop.create_task(self.work()) for _ in range(self.max_tasks)]
while True:
if self.assign_work() == 'done':
break
await self.task_queue.join()
self.close()
def main():
loop = asyncio.get_event_loop()
spider = Spider()
try:
loop.run_until_complete(spider.crawl())
except KeyboardInterrupt:
print ('Interrupt from keyboard')
spider.close()
pending = asyncio.Task.all_tasks()
for w in pending:
w.cancel()
with suppress(asyncio.CancelledError):
loop.run_until_complete(w)
finally:
loop.stop()
loop.run_forever()
loop.close()
if __name__ == '__main__':
main()
But if I press 'Ctrl+C' while it's running, some strange errors may occur. I mean sometimes the program could be shut down by 'Ctrl+C' gracefully. No error message. However, in some cases the program will be still running after pressing 'Ctrl+C' and wouldn't stop until all the works have been done. If I press 'Ctrl+C' at that moment, 'Task was destroyed but it is pending!' would be there.
I have read some topics about asyncio and add some code in main() to close coroutines gracefully. But it not work. Is someone else has the similar problems?

I bet problem happens here:
except:
pass
You should never do such thing. And your situation is one more example of what can happen otherwise.
When you cancel task and await for its cancellation, asyncio.CancelledError raised inside task and shouldn't be suppressed anywhere inside. Line where you await of your task cancellation should raise this exception, otherwise task will continue execution.
That's why you do
task.cancel()
with suppress(asyncio.CancelledError):
loop.run_until_complete(task) # this line should raise CancelledError,
# otherwise task will continue
to actually cancel task.
Upd:
But I still hardly understand why the original code could quit well by
'Ctrl+C' at a uncertain probability?
It dependence of state of your tasks:
If at the moment you press 'Ctrl+C' all tasks are done, non of
them will raise CancelledError on awaiting and your code will finished normally.
If at the moment you press 'Ctrl+C' some tasks are pending, but close to finish their execution, your code will stuck a bit on tasks cancellation and finished when tasks are finished shortly after it.
If at the moment you press 'Ctrl+C' some tasks are pending and
far from being finished, your code will stuck trying to cancel these tasks (which
can't be done). Another 'Ctrl+C' will interrupt process of
cancelling, but tasks wouldn't be cancelled or finished then and you'll get
warning 'Task was destroyed but it is pending!'.

I assume you are using any flavor of Unix; if this is not the case, my comments might not apply to your situation.
Pressing Ctrl-C in a terminal sends all processes associated with this tty the signal SIGINT. A Python process catches this Unix signal and translates this into throwing a KeyboardInterrupt exception. In a threaded application (I'm not sure if the async stuff internally is using threads, but it very much sounds like it does) typically only one thread (the main thread) receives this signal and thus reacts in this fashion. If it is not prepared especially for this situation, it will terminate due to the exception.
Then the threading administration will wait for the still running fellow threads to terminate before the Unix process as a whole terminates with an exit code. This can take quite a long time. See this question about killing fellow threads and why this isn't possible in general.
What you want to do, I assume, is kill your process immediately, killing all threads in one step.
The easiest way to achieve this is to press Ctrl-\. This will send a SIGQUIT instead of a SIGINT which typically influences also the fellow threads and causes them to terminate.
If this is not enough (because for whatever reason you need to react properly on Ctrl-C), you can send yourself a signal:
import os, signal
os.kill(os.getpid(), signal.SIGQUIT)
This should terminate all running threads unless they especially catch SIGQUIT in which case you still can use SIGKILL to perform a hard kill on them. This doesn't give them any option of reacting, though, and might lead to problems.

Can't catch SIGINT in multithreaded program

I've seen many topics about this particular problem but i still can't figure why i'm not catching a SIGINT in my main Thread.
Here is my code:
def connect(self, retry=100):
tries=retry
logging.info('connecting to %s' % self.path)
while True:
try:
self.sp = serial.Serial(self.path, 115200)
self.pileMessage = pilemessage.Pilemessage()
self.pileData = pilemessage.Pilemessage()
self.reception = reception.Reception(self.sp,self.pileMessage,self.pileData)
self.reception.start()
self.collisionlistener = collisionListener.CollisionListener(self)
self.message = messageThread.Message(self.pileMessage,self.collisionlistener)
self.datastreaminglistener = dataStreamingListener.DataStreamingListener(self)
self.datastreaming = dataStreaming.Data(self.pileData,self.datastreaminglistener)
return
except serial.serialutil.SerialException:
logging.info('retrying')
if not retry:
raise SpheroError('failed to connect after %d tries' % (tries-retry))
retry -= 1
def disconnect(self):
self.reception.stop()
self.message.stop()
self.datastreaming.stop()
while not self.pileData.isEmpty():
self.pileData.pop()
self.datastreaminglistener.remove()
while not self.pileMessage.isEmpty():
self.pileMessage.pop()
self.collisionlistener.remove()
self.sp.close()
if __name__ == '__main__':
import time
try:
logging.getLogger().setLevel(logging.DEBUG)
s = Sphero("/dev/rfcomm0")
s.connect()
s.set_motion_timeout(65525)
s.set_rgb(0,255,0)
s.set_back_led_output(255)
s.configure_locator(0,0)
except KeyboardInterrupt:
s.disconnect()
In the main function I call Connect() which is launching Threads over which i don't have direct controll.
When I launch this script I would like to be able to stop it when hitting Control+C by calling the "disconnect()" function which stops all the other threads.
In the code i provided it doesn't work because there is no thread in the main function. But I already tryied putting all the instuctions from Main() in a Thread with a While loop without success.
Is there a simple way to solve my problem ?
Thanx

Your indentation is messed up, but there's enough to go on.
Your main thread isn't catching SIGINT because it's not alive. There is nothing that stops your main thread from continuing past the try block, seeing no more code, and closing up shop.
I am not familiar with Sphero. I just attempted to google its docs and was linked to a bunch of 404 pages, so I'll tell you what you would normally do in a threaded environment - join your threads to the main thread so that the main thread can't finish execution before the worker threads.
for t in my_thread_list:
t.join() #main thread can't get past here until all the threads finish
If your Sphero object doesn't provide join-like functionality, you could hack something in that blocks, i.e.
raw_input('Press Enter to disconnect')
s.disconnect()

How to kill a child thread with Ctrl+C?

I would like to stop the execution of a process with Ctrl+C in Python. But I have read somewhere that KeyboardInterrupt exceptions are only raised in the main thread. I have also read that the main thread is blocked while the child thread executes. So how can I kill the child thread?
For instance, Ctrl+C has no effect with the following code:
def main():
try:
thread = threading.Thread(target=f)
thread.start() # thread is totally blocking (e.g. while True)
thread.join()
except KeyboardInterrupt:
print "Ctrl+C pressed..."
sys.exit(1)
def f():
while True:
pass # do the actual work

If you want to have main thread to receive the CTRL+C signal while joining, it can be done by adding timeout to join() call.
The following seems to be working (don't forget to add daemon=True if you want main to actually end):
thread1.start()
while True:
thread1.join(600)
if not thread1.isAlive():
break

The problem there is that you are using thread1.join(), which will cause your program to wait until that thread finishes to continue.
The signals will always be caught by the main process, because it's the one that receives the signals, it's the process that has threads.
Doing it as you show, you are basically running a 'normal' application, without thread features, as you start 1 thread and wait until it finishes to continue.

KeyboardInterrupt exceptions are raised only in the main thread of each process. But the method Thread.join blocks the calling thread, including KeyboardInterrupt exceptions. That is why Ctrl+C seems to have no effect.
A simple solution to your problem is to make the method Thread.join time out to unblock KeyboardInterrupt exceptions, and make the child thread daemonic to let the parent thread kill it at exit (non-daemonic child threads are not killed but joined by their parent at exit):
def main():
try:
thread = threading.Thread(target=f)
thread.daemon = True # let the parent kill the child thread at exit
thread.start()
while thread.is_alive():
thread.join(1) # time out not to block KeyboardInterrupt
except KeyboardInterrupt:
print "Ctrl+C pressed..."
sys.exit(1)
def f():
while True:
pass # do the actual work
A better solution if you control the code of the child thread is to notify the child thread to exit gracefully (instead of abruptly like with the simple solution), for instance using a threading.Event:
def main():
try:
event = threading.Event()
thread = threading.Thread(target=f, args=(event,))
thread.start()
event.wait() # wait without blocking KeyboardInterrupt
except KeyboardInterrupt:
print "Ctrl+C pressed..."
event.set() # notify the child thread to exit
sys.exit(1)
def f(event):
while not event.is_set():
pass # do the actual work

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.