I'm trying to make something like supervisor for my python daemon process and found out that same code works in python2 and doesn't work in python3.
Generally, I've come to this minimal example code.
daemon.py
#!/usr/bin/env python
import signal
import sys
import os
def stop(*args, **kwargs):
print('daemon exited', os.getpid())
sys.exit(0)
signal.signal(signal.SIGTERM, stop)
print('daemon started', os.getpid())
while True:
pass
supervisor.py
import os
import signal
import subprocess
from time import sleep
parent_pid = os.getpid()
commands = [
[
'./daemon.py'
]
]
popen_list = []
for command in commands:
popen = subprocess.Popen(command, preexec_fn=os.setsid)
popen_list.append(popen)
def stop_workers(*args, **kwargs):
for popen in popen_list:
print('send_signal', popen.pid)
popen.send_signal(signal.SIGTERM)
while True:
popen_return_code = popen.poll()
if popen_return_code is not None:
break
sleep(5)
signal.signal(signal.SIGTERM, stop_workers)
for popen in popen_list:
print('wait_main', popen.wait())
If you run supervisor.py and then call kill -15 on its pid, then it will hang in infinite loop, because popen_return_code will never be not None. I discovered, that it's basically because of adding threading.Lock for wait_pid operation (source), but how can I rewrite code so it'll handle child exit correctly?
This is an interesting case.
I've spent few hours trying to figure out the reason why this happens and the only thing I came up with at this moment is that the implementation of wait() and poll() have been changed in python3 versus python2.7.
Looking into the source code of python3/suprocess.py implementation, we can see that there is a lock acquire happens when you call wait() method of Popen object, see
https://github.com/python/cpython/blob/master/Lib/subprocess.py#L1402.
This lock prevents further poll() calls to work as expected until the lock acquired by wait() will be released, see
https://github.com/python/cpython/blob/master/Lib/subprocess.py#L1355
and comment there
Something else is busy calling waitpid. Don't allow two
at once. We know nothing yet.
There is no such a lock in python2.7/subprocess.py so this looks like a reason why it works in python2.7 and doesn't work in python3.
However I don't see a reason why are you trying to poll() inside the signal handler, try rewrite your supervisor.py as following, this should work as expected both on python3 and python2.7
supervisor.py
import os
import signal
import subprocess
from time import sleep
parent_pid = os.getpid()
commands = [
[
'./daemon.py'
]
]
popen_list = []
for command in commands:
popen = subprocess.Popen(command, preexec_fn=os.setsid)
popen_list.append(popen)
def stop_workers(*args, **kwargs):
for popen in popen_list:
print('send_signal', popen.pid)
popen.send_signal(signal.SIGTERM)
signal.signal(signal.SIGTERM, stop_workers)
for popen in popen_list:
print('wait_main', popen.wait())
Hope this helps
Generally, I agree with answer from #risboo6909, but also have some thoughts, how to fix this situation.
You can change subproccess.Popen to psutil.Popen.
In main loop instead of popen.wait() you can just do infinite loop, because process will exit in signal handler.
Related
Right now, I'm using subprocess to run a long-running job in the background. For multiple reasons (PyInstaller + AWS CLI) I can't use subprocess anymore.
Is there an easy way to achieve the same thing as below ? Running a long running python function in a multiprocess pool (or something else) and do real time processing of stdout/stderr ?
import subprocess
process = subprocess.Popen(
["python", "long-job.py"],
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
shell=True,
)
while True:
out = process.stdout.read(2000).decode()
if not out:
err = process.stderr.read().decode()
else:
err = ""
if (out == "" or err == "") and process.poll() is not None:
break
live_stdout_process(out)
Thanks
getting it cross platform is messy .... first of all windows implementation of non-blocking pipe is not user friendly or portable.
one option is to just have your application read its command line arguments and conditionally execute a file, and you get to use subprocess since you will be launching yourself with different argument.
but to keep it to multiprocessing :
the output must be logged to queues instead of pipes.
you need the child to execute a python file, this can be done using runpy to execute the file as __main__.
this runpy function should run under a multiprocessing child, this child must first redirect its stdout and stderr in the initializer.
when an error happens, your main application must catch it .... but if it is too busy reading the output it won't be able to wait for the error, so a child thread has to start the multiprocess and wait for the error.
the main process has to create the queues and launch the child thread and read the output.
putting it all together:
import multiprocessing
from multiprocessing import Queue
import sys
import concurrent.futures
import threading
import traceback
import runpy
import time
class StdoutQueueWrapper:
def __init__(self,queue:Queue):
self._queue = queue
def write(self,text):
self._queue.put(text)
def flush(self):
pass
def function_to_run():
# runpy.run_path("long-job.py",run_name="__main__") # run long-job.py
print("hello") # print something
raise ValueError # error out
def initializer(stdout_queue: Queue,stderr_queue: Queue):
sys.stdout = StdoutQueueWrapper(stdout_queue)
sys.stderr = StdoutQueueWrapper(stderr_queue)
def thread_function(child_stdout_queue,child_stderr_queue):
with concurrent.futures.ProcessPoolExecutor(1, initializer=initializer,
initargs=(child_stdout_queue, child_stderr_queue)) as pool:
result = pool.submit(function_to_run)
try:
result.result()
except Exception as e:
child_stderr_queue.put(traceback.format_exc())
if __name__ == "__main__":
child_stdout_queue = multiprocessing.Queue()
child_stderr_queue = multiprocessing.Queue()
child_thread = threading.Thread(target=thread_function,args=(child_stdout_queue,child_stderr_queue),daemon=True)
child_thread.start()
while True:
while not child_stdout_queue.empty():
var = child_stdout_queue.get()
print(var,end='')
while not child_stderr_queue.empty():
var = child_stderr_queue.get()
print(var,end='')
if not child_thread.is_alive():
break
time.sleep(0.01) # check output every 0.01 seconds
Note that a direct consequence of running as a multiprocess is that if the child runs into a segmentation fault or some unrecoverable error the parent will also die, hencing running yourself under subprocess might seem a better option if segfaults are expected.
Is there any argument or options to setup a timeout for Python's subprocess.Popen method?
Something like this:
subprocess.Popen(['..'], ..., timeout=20) ?
I would advise taking a look at the Timer class in the threading module. I used it to implement a timeout for a Popen.
First, create a callback:
def timeout( p ):
if p.poll() is None:
print 'Error: process taking too long to complete--terminating'
p.kill()
Then open the process:
proc = Popen( ... )
Then create a timer that will call the callback, passing the process to it.
t = threading.Timer( 10.0, timeout, [proc] )
t.start()
t.join()
Somewhere later in the program, you may want to add the line:
t.cancel()
Otherwise, the python program will keep running until the timer has finished running.
EDIT: I was advised that there is a race condition that the subprocess p may terminate between the p.poll() and p.kill() calls. I believe the following code can fix that:
import errno
def timeout( p ):
if p.poll() is None:
try:
p.kill()
print 'Error: process taking too long to complete--terminating'
except OSError as e:
if e.errno != errno.ESRCH:
raise
Though you may want to clean the exception handling to specifically handle just the particular exception that occurs when the subprocess has already terminated normally.
subprocess.Popen doesn't block so you can do something like this:
import time
p = subprocess.Popen(['...'])
time.sleep(20)
if p.poll() is None:
p.kill()
print 'timed out'
else:
print p.communicate()
It has a drawback in that you must always wait at least 20 seconds for it to finish.
import subprocess, threading
class Command(object):
def __init__(self, cmd):
self.cmd = cmd
self.process = None
def run(self, timeout):
def target():
print 'Thread started'
self.process = subprocess.Popen(self.cmd, shell=True)
self.process.communicate()
print 'Thread finished'
thread = threading.Thread(target=target)
thread.start()
thread.join(timeout)
if thread.is_alive():
print 'Terminating process'
self.process.terminate()
thread.join()
print self.process.returncode
command = Command("echo 'Process started'; sleep 2; echo 'Process finished'")
command.run(timeout=3)
command.run(timeout=1)
The output of this should be:
Thread started
Process started
Process finished
Thread finished
0
Thread started
Process started
Terminating process
Thread finished
-15
where it can be seen that, in the first execution, the process finished correctly (return code 0), while the in the second one the process was terminated (return code -15).
I haven't tested in windows; but, aside from updating the example command, I think it should work since I haven't found in the documentation anything that says that thread.join or process.terminate is not supported.
You could do
from twisted.internet import reactor, protocol, error, defer
class DyingProcessProtocol(protocol.ProcessProtocol):
def __init__(self, timeout):
self.timeout = timeout
def connectionMade(self):
#defer.inlineCallbacks
def killIfAlive():
try:
yield self.transport.signalProcess('KILL')
except error.ProcessExitedAlready:
pass
d = reactor.callLater(self.timeout, killIfAlive)
reactor.spawnProcess(DyingProcessProtocol(20), ...)
using Twisted's asynchronous process API.
A python subprocess auto-timeout is not built in, so you're going to have to build your own.
This works for me on Ubuntu 12.10 running python 2.7.3
Put this in a file called test.py
#!/usr/bin/python
import subprocess
import threading
class RunMyCmd(threading.Thread):
def __init__(self, cmd, timeout):
threading.Thread.__init__(self)
self.cmd = cmd
self.timeout = timeout
def run(self):
self.p = subprocess.Popen(self.cmd)
self.p.wait()
def run_the_process(self):
self.start()
self.join(self.timeout)
if self.is_alive():
self.p.terminate() #if your process needs a kill -9 to make
#it go away, use self.p.kill() here instead.
self.join()
RunMyCmd(["sleep", "20"], 3).run_the_process()
Save it, and run it:
python test.py
The sleep 20 command takes 20 seconds to complete. If it doesn't terminate in 3 seconds (it won't) then the process is terminated.
el#apollo:~$ python test.py
el#apollo:~$
There is three seconds between when the process is run, and it is terminated.
As of Python 3.3, there is also a timeout argument to the blocking helper functions in the subprocess module.
https://docs.python.org/3/library/subprocess.html
Unfortunately, there isn't such a solution. I managed to do this using a threaded timer that would launch along with the process that would kill it after the timeout but I did run into some stale file descriptor issues because of zombie processes or some such.
No there is no time out. I guess, what you are looking for is to kill the sub process after some time. Since you are able to signal the subprocess, you should be able to kill it too.
generic approach to sending a signal to subprocess:
proc = subprocess.Popen([command])
time.sleep(1)
print 'signaling child'
sys.stdout.flush()
os.kill(proc.pid, signal.SIGUSR1)
You could use this mechanism to terminate after a time out period.
Yes, https://pypi.python.org/pypi/python-subprocess2 will extend the Popen module with two additional functions,
Popen.waitUpTo(timeout=seconds)
This will wait up to acertain number of seconds for the process to complete, otherwise return None
also,
Popen.waitOrTerminate
This will wait up to a point, and then call .terminate(), then .kill(), one orthe other or some combination of both, see docs for full details:
http://htmlpreview.github.io/?https://github.com/kata198/python-subprocess2/blob/master/doc/subprocess2.html
For Linux, you can use a signal. This is platform dependent so another solution is required for Windows. It may work with Mac though.
def launch_cmd(cmd, timeout=0):
'''Launch an external command
It launchs the program redirecting the program's STDIO
to a communication pipe, and appends those responses to
a list. Waits for the program to exit, then returns the
ouput lines.
Args:
cmd: command Line of the external program to launch
time: time to wait for the command to complete, 0 for indefinitely
Returns:
A list of the response lines from the program
'''
import subprocess
import signal
class Alarm(Exception):
pass
def alarm_handler(signum, frame):
raise Alarm
lines = []
if not launch_cmd.init:
launch_cmd.init = True
signal.signal(signal.SIGALRM, alarm_handler)
p = subprocess.Popen(cmd, stdout=subprocess.PIPE)
signal.alarm(timeout) # timeout sec
try:
for line in p.stdout:
lines.append(line.rstrip())
p.wait()
signal.alarm(0) # disable alarm
except:
print "launch_cmd taking too long!"
p.kill()
return lines
launch_cmd.init = False
I am using python 2.7 and Python thread doesn't kill its process after the main program exits. (checking this with the ps -ax command on ubuntu machine)
I have the below thread class,
import os
import threading
class captureLogs(threading.Thread):
'''
initialize the constructor
'''
def __init__(self, deviceIp, fileTag):
threading.Thread.__init__(self)
super(captureLogs, self).__init__()
self._stop = threading.Event()
self.deviceIp = deviceIp
self.fileTag = fileTag
def stop(self):
self._stop.set()
def stopped(self):
return self._stop.isSet()
'''
define the run method
'''
def run(self):
'''
Make the thread capture logs
'''
cmdTorun = "adb logcat > " + self.deviceIp +'_'+self.fileTag+'.log'
os.system(cmdTorun)
And I am creating a thread in another file sample.py,
import logCapture
import os
import time
c = logCapture.captureLogs('100.21.143.168','somefile')
c.setDaemon(True)
c.start()
print "Started the log capture. now sleeping. is this a dameon?", c.isDaemon()
time.sleep(5)
print "Sleep tiime is over"
c.stop()
print "Calling stop was successful:", c.stopped()
print "Thread is now completed and main program exiting"
I get the below output from the command line:
Started the log capture. now sleeping. is this a dameon? True
Sleep tiime is over
Calling stop was successful: True
Thread is now completed and main program exiting
And the sample.py exits.
But when I use below command on a terminal,
ps -ax | grep "adb"
I still see the process running. (I am killing them manually now using the kill -9 17681 17682)
Not sure what I am missing here.
My question is,
1) why is the process still alive when I already killed it in my program?
2) Will it create any problem if I don't bother about it?
3) is there any other better way to capture logs using a thread and monitor the logs?
EDIT: As suggested by #bug Killer, I added the below method in my thread class,
def getProcessID(self):
return os.getpid()
and used os.kill(c.getProcessID(), SIGTERM) in my sample.py . The program doesn't exit at all.
It is likely because you are using os.system in your thread. The spawned process from os.system will stay alive even after the thread is killed. Actually, it will stay alive forever unless you explicitly terminate it in your code or by hand (which it sounds like you are doing ultimately) or the spawned process exits on its own. You can do this instead:
import atexit
import subprocess
deviceIp = '100.21.143.168'
fileTag = 'somefile'
# this is spawned in the background, so no threading code is needed
cmdTorun = "adb logcat > " + deviceIp +'_'+fileTag+'.log'
proc = subprocess.Popen(cmdTorun, shell=True)
# or register proc.kill if you feel like living on the edge
atexit.register(proc.terminate)
# Here is where all the other awesome code goes
Since all you are doing is spawning a process, creating a thread to do it is overkill and only complicates your program logic. Just spawn the process in the background as shown above and then let atexit terminate it when your program exits. And/or call proc.terminate explicitly; it should be fine to call repeatedly (much like close on a file object) so having atexit call it again later shouldn't hurt anything.
I'm trying to write a small script which will use plink.exe (from the same folder) to create a ssh tunnel (on windows).
I'm basically using os.system to launch the the command:
import time
import threading
from os.path import join, dirname, realpath
pc_tunnel_command = '-ssh -batch -pw xxxx -N -L 1234:host1:5678 user#host2'
if __name__ == '__main__':
t = threading.Thread(target = os.system, \
args = (join(dirname(realpath(__file__)), 'plink.exe ') + \
pc_tunnel_command,))
t.daemon = True
t.start()
#without this line it will die. I guess that plink doesn't have enough time to start.
time.sleep(5)
print 'Should die now'
However, it seems that the thread (and plink.exe) keep running. Why is this happening? Any way to force the thread to close? Better way to launch plink?
I want plink.exe to die when my program ends. Using a daemon thread was my plan of having the tunnel run in the background, and then dying when my main code exits.
BTW - same thing happens with subprocess.call.
You can use the atexit and signal modules to register calls back that will explicitly kill the process when your program exits normally or receives SIGTERM, respectively:
import sys
import time
import atexit
import signal
import subprocess
from functools import partial
from os.path import join, dirname, realpath
pc_tunnel_command = '-ssh -batch -pw xxxx -N -L 1234:host1:5678 user#host2'
def handle_exit(p, *args):
print("killing it")
p.terminate()
sys.exit(0)
if __name__ == '__main__':
p = subprocess.Popen(join(dirname(realpath(__file__)), 'plink.exe ') + pc_tunnel_command, shell=True)
func = partial(handle_exit, p)
signal.signal(signal.SIGTERM, func)
atexit.register(func)
print 'Should die now'
The one thing that is odd about the behavior your desrcibed is that I would have expected your program to exit after your sleep call, but leave plink running in the background, rather than having your program hang until the os.system call completes. That's the behavior I see on Linux, at least. In any case, explicitly terminating the child process should solve the issue for you.
os.system does not return until the child process exits. The same is true for subprocess.call. That's why your thread is sitting there, waiting for plink to finish. You can probably use subprocess.Popen to launch the process asynchronously and then exit. In any case, the additional thread you are creating is unnecessary.
can I use Popen from python subprocess to close started process? For example, from popen I run some application. In some part of my code I have to close that ran app.
For example, from console in Linux I do:
./some_bin
... It works and logs stdout here ...
Ctrl + C and it breaks
I need something like Ctrl + C but in my program code.
from subprocess import Popen
process = Popen(['slow', 'running', 'program'])
while process.poll():
if raw_input() == 'Kill':
if process.poll(): process.kill()
kill() will kill a process. See more here: Python subprocess module
Use the subprocess module.
import subprocess
# all arguments must be passed one at a time inside a list
# they must all be string elements
arguments = ["sleep", "3600"] # first argument is the program's name
process = subprocess.Popen(arguments)
# do whatever you want
process.terminate()
Some time ago I needed a 'gentle' shutdown for a process by sending CTRL+C in Windows console.
Here's what I have:
import win32api
import win32con
import subprocess
import time
import shlex
cmdline = 'cmd.exe /k "timeout 60"'
args = shlex.split(cmdline)
myprocess = subprocess.Popen(args)
pid = myprocess.pid
print(myprocess, pid)
time.sleep(5)
win32api.GenerateConsoleCtrlEvent(win32con.CTRL_C_EVENT, pid)
# ^^^^^^^^^^^^^^^^^^^^ instead of myprocess.terminate()