I am using python-daemon module to manage the daemon process of my Python script.
However, I am running into a headache when running the script that I simply can't figure out. Nor do I really know how to begin to debug it.
I have the code:
def run_application():
#Do something here...
class App():
def __init__(self):
self.stdin_path = '/dev/null'
self.stdout_path = 'stdout.txt'
self.stderr_path = 'stdlog.log'
self.pidfile_path = 'filelock.pid'
self.pidfile_timeout = 5
def run(self):
run_application()
app = App()
daemon_runner = runner.DaemonRunner(app)
daemon_runner.do_action()
When run, it always writes the following to stdlog.log:
Traceback (most recent call last):
File "MyApp.py", line 335, in <module>
daemon_runner.do_action()
File "/anaconda/lib/python2.7/site-packages/daemon/runner.py", line 189, in do_action
func(self)
File "/anaconda/lib/python2.7/site-packages/daemon/runner.py", line 124, in _start
self.daemon_context.open()
File "/anaconda/lib/python2.7/site-packages/daemon/daemon.py", line 346, in open
self.pidfile.__enter__()
File "/anaconda/lib/python2.7/site-packages/lockfile/__init__.py", line 229, in __enter__
self.acquire()
File "/anaconda/lib/python2.7/site-packages/daemon/pidfile.py", line 42, in acquire
super(TimeoutPIDLockFile, self).acquire(timeout, *args, **kwargs)
File "/anaconda/lib/python2.7/site-packages/lockfile/pidlockfile.py", line 88, in acquire
self.path)
lockfile.LockTimeout: Timeout waiting to acquire lock for /MyApp/filelock.pid
So it appears to be timing out when attempting to lock filelock.pid. I have no idea why this is. I have deleted filelock.pid, I've changed permissions; same error every time.
How can I begin to debug this??? I'm at a loss.
I am using python-daemon version 1.6 (if it matters).
Update:
Following the advice here, I now see that there is already a process running. Now how can I figure out how to determine the PID of the running daemon process.
I agree with #ExploWare as far as how he demonstrates you can capture those LockTimeout exceptions.
So as far as a way to debug and see what process is holding on to this lock, here is an external bit of code you can run...
import daemon.pidfile
import os
import lockfile
# We know the lockfile name.
pidfile = daemon.pidfile.PIDLockFile(
os.path.join("/MyApp/","filelock.pid"))
# This current process id...
os.getpid()
# 46337
So what process has acquired this lock if any?
pidfile.is_locked()
# True
pidfile.read_pid()
# 96856
When our PIDLockFile instance tries to "acquire",
pidfile.__dict__
# {'unique_name': '/MyApp/filelock.pid', 'lock_file': '/MyApp/filelock.pid.lock', 'hostname':
# 'MyMachine.local', 'pid': 46337, 'timeout': None, 'tname': '', 'path': '/MyApp/filelock.pid'}
pidfile.acquire()
#
# (Had to Control-C quit because I didnt set a timeout on PIDLockFile )
#
# ^CTraceback (most recent call last):
# File "<stdin>", line 1, in <module>
# File "/Users/michal/venf/lib/python2.7/site-packages/lockfile/pidlockfile.py", line 92, in acquire
# time.sleep(timeout is not None and timeout/10 or 0.1)
# KeyboardInterrupt
So instead, use #ExploWare 's exception catching.
# Wait only 5 seconds.
pidfile.timeout = 5
try:
pidfile.acquire()
except lockfile.LockTimeout:
print 'locked . need to wait or move on.'
#
# locked . need to wait or move on.
I found a nice way to handle this exception, so maybe its helpful for you as well:
Add
from lockfile import LockTimeout
to the beginning of the script and surround daemon_runner.doaction() like this
try:
daemon_runner.do_action()
except LockTimeout:
print "Error: couldn't aquire lock"
#you can exit here or try something else
Related
I'm facing a strange situation, I've searched on google without any good results.
I'm running a python script as a subprocess from a parent subprocess with nohup using subprocess package:
cmd = list()
cmd.append("nohup")
cmd.append(sys.executable)
cmd.append(os.path.abspath(script))
cmd.append(os.path.abspath(conf_path))
_env = os.environ.copy()
if env:
_env.update({k: str(v) for k, v in env.items()})
p = subprocess.Popen(cmd, env=_env, cwd=os.getcwd())
After some time the parent process exists and the subprocess (the one with the nohup continues to run).
After another minute or two the process with the nohup exits, and with obvious reasons, becomes a zombie.
When running it on local PC with python3.6 and ubuntu 18.04, I manage to run the following code and everything works like a charm:
comp_process = psutil.Process(pid)
if comp_process.status() == "zombie":
comp_status_code = comp_process.wait(timeout=10)
As I said, everything works like a charm, The zombie process removed and I got the status code of the mentioned process.
But for some reason, when doing the SAME at docker container with the SAME python version and Ubuntu version, It fails after the timeout (Doesn't matter if its 10 seconds or 10 minutes)
The error:
psutil.TimeoutExpired timeout after 60 seconds (pid=779)
Traceback (most recent call last): File
"/usr/local/lib/python3.6/dist-packages/psutil/_psposix.py", line 84,
in wait_pid
retpid, status = waitcall() File "/usr/local/lib/python3.6/dist-packages/psutil/_psposix.py", line 75,
in waitcall
return os.waitpid(pid, os.WNOHANG) ChildProcessError: [Errno 10] No child processes
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File ".py", line 41, in
run
comp_status_code = comp_process.wait(timeout=60) File "/usr/local/lib/python3.6/dist-packages/psutil/init.py", line
1383, in wait
return self._proc.wait(timeout) File "/usr/local/lib/python3.6/dist-packages/psutil/_pslinux.py", line
1517, in wrapper
return fun(self, *args, **kwargs) File "/usr/local/lib/python3.6/dist-packages/psutil/_pslinux.py", line
1725, in wait
return _psposix.wait_pid(self.pid, timeout, self._name) File "/usr/local/lib/python3.6/dist-packages/psutil/_psposix.py", line 96,
in wait_pid
delay = check_timeout(delay) File "/usr/local/lib/python3.6/dist-packages/psutil/_psposix.py", line 68,
in check_timeout
raise TimeoutExpired(timeout, pid=pid, name=proc_name) psutil.TimeoutExpired: psutil.TimeoutExpired timeout after 60 seconds
(pid=779)
One possibility may be the lack of an init process to reap zombies. You can fix this by running with docker run --init, or using e.g. tini. See https://hynek.me/articles/docker-signals/
How can I stop a python process in such a way that any active context managers will gracefully call their __exit__ function before closing?
I use context managers (__enter__() and __exit__()) to reliably and safely close connections to optical hardware. This has been working great, although we are now starting to execute routines that run for hours. Often we will realize shortly after starting one that we have a bug, and would rather to stop the process short.
I have been running code from PyCharm, which has allows you to "stop" a running process. This seems to instantly kill the process, whether I'm in debug or run. The __exit__ functions don't seem to get called.
Also, the computer that controls the hardware runs windows, if that somehow comes into play. ***
***Indeed in comes into play. Macosx seems to call the exit function while windows does not.
I decided to write a basic test:
from abc import *
import time
class Test(object):
__metaclass__ = ABCMeta
def __init__(self, *args, **kwargs):
print("Init called.")
def __enter__(self, *args, **kwargs):
print("enter called")
def __exit__(self, type, value, traceback):
print("Exit called")
with Test() as t:
time.sleep(100)
print("Should never get here.")
I run this code from PyCharm, and while it is in the sleep statement I press the stop button in pycharm. Here is the output from both:
Macosx:
Init called.
enter called
Exit called
Traceback (most recent call last):
File "/Applications/PyCharm CE.app/Contents/helpers/pydev/pydevd.py", line 1591, in <module>
globals = debugger.run(setup['file'], None, None, is_module)
File "/Applications/PyCharm CE.app/Contents/helpers/pydev/pydevd.py", line 1018, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "/Users/.../Library/Preferences/PyCharmCE2017.1/scratches/scratch_25.py", line 22, in <module>
time.sleep(100)
KeyboardInterrupt
Windows
Init called.
enter called
Process finished with exit code 1
I found a workaround on the PyCharm bug tracker:
https://youtrack.jetbrains.com/issue/PY-17252
In PyCharm go to help->Edit Custom Properties
Agree to create the idea.properties file (if it asks)
Add the following line: kill.windows.processes.softly=true
If you use Numpy or Scipy, you will also need to add the following environment variable:
os.environ['FOR_DISABLE_CONSOLE_CTRL_HANDLER'] = "1"
Restart Pycharm
Now when I run my test with this applied (on Windows!) I get the following output:
Init called.
enter called
Exit called
Traceback (most recent call last):
File "C:/Users/.../.PyCharmCE2017.1/config/scratches/scratch_3.py", line 20, in <module>
time.sleep(100)
KeyboardInterrupt
Process finished with exit code 1
I wrote this function that handles the "rate limit error" of a Tweepy's cursor in order to keep downloading from Twitter APIs.
def limit_handled(cursor, user):
over = False
while True:
try:
if (over == True):
print "Routine Riattivata, Serviamo il numero:", user
over = False
yield cursor.next()
except tweepy.RateLimitError:
print "Raggiunto Limite, Routine in Pausa"
threading.Event.wait(15*60 + 15)
over = True
except tweepy.TweepError:
print "TweepError"
threading.Event.wait(5)
Since I am using serveral threads to connect I would like to stop each one of them when the RateLimitError error raises and restart them after 15 minutes.
I previously used the function:
time.sleep(x)
But I understood that doesn't work well for threads (the counter do not increase if the thread is not active) so I tried to use:
threading.Event.wait(x)
But it this error raises:
Exception in thread Thread-15:
Traceback (most recent call last):
File "/home/xor/anaconda/lib/python2.7/threading.py", line 810, in __bootstrap_inner
self.run()
File "/home/xor/anaconda/lib/python2.7/threading.py", line 763, in run
self.__target(*self.__args, **self.__kwargs)
File "/home/xor/spyder/algo/HW2/hw2.py", line 117, in work
storeFollowersOnMDB(ids, api, k)
File "/home/xor/spyder/algo/HW2/hw2.py", line 111, in storeFollowersOnMDB
for followersPag in limit_handled(tweepy.Cursor(api.followers_ids, id = user, count=5000).pages(), user):
File "/home/xor/spyder/algo/HW2/hw2.py", line 52, in limit_handled
threading.Event.wait(15*60 + 15)
AttributeError: 'function' object has no attribute 'wait'
How can I "sleep/wait" my threads being sure that they will wake up at the right moment?
Try doing it like this instead:
import threading
dummy_event = threading.Event()
dummy_event.wait(timeout=1)
also try google-ing next time first: Issues with time.sleep and Multithreading in Python
I'm trying to execute ssh commands using paramiko from inside a python daemon process.
I'm using the following implementation for the daemon: https://pypi.python.org/pypi/python-daemon/
When the program is started pycrypto raises an IOError with a Bad file descriptor when paramiko tries to connect.
If I remove the daemon code (just uncomment the last line and comment the two above) the ssh connection is established as expected.
The code for a short test program looks like this:
#!/usr/bin/env python2
from daemon import runner
import paramiko
class App():
def __init__(self):
self.stdin_path = '/dev/null'
self.stdout_path = '/dev/tty'
self.stderr_path = '/dev/tty'
self.pidfile_path = '/tmp/testdaemon.pid'
self.pidfile_timeout = 5
def run(self):
ssh = paramiko.SSHClient()
ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
ssh.load_system_host_keys()
ssh.connect("hostname", username="username")
ssh.close()
app = App()
daemon_runner = runner.DaemonRunner(app)
daemon_runner.do_action()
#app.run()
The trace looks like this:
Traceback (most recent call last):
File "./daemon-test.py", line 31, in <module>
daemon_runner.do_action()
File "/usr/lib/python2.7/site-packages/daemon/runner.py", line 189, in do_action
func(self)
File "/usr/lib/python2.7/site-packages/daemon/runner.py", line 134, in _start
self.app.run()
File "./daemon-test.py", line 22, in run
ssh.connect("hostname", username="username")
File "/usr/lib/python2.7/site-packages/paramiko/client.py", line 311, in connect
t.start_client()
File "/usr/lib/python2.7/site-packages/paramiko/transport.py", line 460, in start_client
Random.atfork()
File "/usr/lib/python2.7/site-packages/Crypto/Random/__init__.py", line 37, in atfork
_UserFriendlyRNG.reinit()
File "/usr/lib/python2.7/site-packages/Crypto/Random/_UserFriendlyRNG.py", line 224, in reinit
_get_singleton().reinit()
File "/usr/lib/python2.7/site-packages/Crypto/Random/_UserFriendlyRNG.py", line 171, in reinit
return _UserFriendlyRNG.reinit(self)
File "/usr/lib/python2.7/site-packages/Crypto/Random/_UserFriendlyRNG.py", line 99, in reinit
self._ec.reinit()
File "/usr/lib/python2.7/site-packages/Crypto/Random/_UserFriendlyRNG.py", line 62, in reinit
block = self._osrng.read(32*32)
File "/usr/lib/python2.7/site-packages/Crypto/Random/OSRNG/rng_base.py", line 76, in read
data = self._read(N)
File "/usr/lib/python2.7/site-packages/Crypto/Random/OSRNG/posix.py", line 65, in _read
d = self.__file.read(N - len(data))
IOError: [Errno 9] Bad file descriptor
I'm guessing this has something to do with the stream redirection when the daemon spawns. I've tried to set them all to /dev/tty or even to a normal file but nothing works.
When I run the program with strace I can see that something tries to close a file twice and that's when I get the error. But I couldn't find out which file the descriptor actually points to (strace shows a memory location that doesn't seem to be set anywhere).
This is a known issue that I am actually experiencing myself (which is what led me to this question). Basically, it has to do with the definition of a UNIX daemon process and the way paramiko implements its random number generator (RNG).
If you refer to PEP 3143 - Standard daemon process library, the first step in becoming a correct daemon is to "close all open file descriptors." Unfortunately, this closes the file descriptor to /dev/urandom which is used in the Crypto module's RNG which is in turn used by paramiko.
There are some workarounds for the moment, but the author has indicated that he doesn't currently have time to pursue this bug (although the last post in the first link is by the author and is 8 days old as of this writing).
Daemonizing after importing paramiko breaks the random number
generator
EAGAIN on file when using RNG after daemon fork
In summary, if you import paramiko after your process becomes a daemon, then it should work as desired because the file descriptor will have been opened after the daemonizing closes all file descriptors.
The user #xraj also had a hackish, yet clever workaround for finding and preserving the file descriptor to /dev/urandom when daemonizing (first link above):
import os
from resource import getrlimit, RLIMIT_NOFILE
def files_preserve_by_path(*paths):
wanted=[]
for path in paths:
fd = os.open(path, os.O_RDONLY)
try:
wanted.append(os.fstat(fd)[1:3])
finally:
os.close(fd)
def fd_wanted(fd):
try:
return os.fstat(fd)[1:3] in wanted
except OSError:
return False
fd_max = getrlimit(RLIMIT_NOFILE)[1]
return [ fd for fd in xrange(fd_max) if fd_wanted(fd) ]
daemon_context.files_preserve = files_preserve_by_path('/dev/urandom')
This recently happens for daemons and multithreading applications which makes mass close() in cycle of separated thread. I've found the problem in class pipe.PosixPipe. There is no synchronization between set() and close() methods. Methods of PosixPipe could read/write and close descriptor of socket at the same time.
Issue was created: https://github.com/paramiko/paramiko/issues/692
Pull was requested: https://github.com/paramiko/paramiko/pull/691/files
Sorry in advance, this is going to be long ...
Possibly related:
Python Multiprocessing atexit Error "Error in atexit._run_exitfuncs"
Definitely related:
python parallel map (multiprocessing.Pool.map) with global data
Keyboard Interrupts with python's multiprocessing Pool
Here's a "simple" script I hacked together to illustrate my problem...
import time
import multiprocessing as multi
import atexit
cleanup_stuff=multi.Manager().list([])
##################################################
# Some code to allow keyboard interrupts
##################################################
was_interrupted=multi.Manager().list([])
class _interrupt(object):
"""
Toy class to allow retrieval of the interrupt that triggered it's execution
"""
def __init__(self,interrupt):
self.interrupt=interrupt
def interrupt():
was_interrupted.append(1)
def interruptable(func):
"""
decorator to allow functions to be "interruptable" by
a keyboard interrupt when in python's multiprocessing.Pool.map
**Note**, this won't actually cause the Map to be interrupted,
It will merely cause the following functions to be not executed.
"""
def newfunc(*args,**kwargs):
try:
if(not was_interrupted):
return func(*args,**kwargs)
else:
return False
except KeyboardInterrupt as e:
interrupt()
return _interrupt(e) #If we really want to know about the interrupt...
return newfunc
#atexit.register
def cleanup():
for i in cleanup_stuff:
print(i)
return
#interruptable
def func(i):
print(i)
cleanup_stuff.append(i)
time.sleep(float(i)/10.)
return i
#Must wrap func here, otherwise it won't be found in __main__'s dict
#Maybe because it was created dynamically using the decorator?
def wrapper(*args):
return func(*args)
if __name__ == "__main__":
#This is an attempt to use signals -- I also attempted something similar where
#The signals were only caught in the child processes...Or only on the main process...
#
#import signal
#def onSigInt(*args): interrupt()
#signal.signal(signal.SIGINT,onSigInt)
#Try 2 with signals (only catch signal on main process)
#import signal
#def onSigInt(*args): interrupt()
#signal.signal(signal.SIGINT,onSigInt)
#def startup(): signal.signal(signal.SIGINT,signal.SIG_IGN)
#p=multi.Pool(processes=4,initializer=startup)
#Try 3 with signals (only catch signal on child processes)
#import signal
#def onSigInt(*args): interrupt()
#signal.signal(signal.SIGINT,signal.SIG_IGN)
#def startup(): signal.signal(signal.SIGINT,onSigInt)
#p=multi.Pool(processes=4,initializer=startup)
p=multi.Pool(4)
try:
out=p.map(wrapper,range(30))
#out=p.map_async(wrapper,range(30)).get() #This doesn't work either...
#The following lines don't work either
#Effectively trying to roll my own p.map() with p.apply_async
# results=[p.apply_async(wrapper,args=(i,)) for i in range(30)]
# out = [ r.get() for r in results() ]
except KeyboardInterrupt:
print ("Hello!")
out=None
finally:
p.terminate()
p.join()
print (out)
This works just fine if no KeyboardInterrupt is raised. However, if I raise one, the following exception occurs:
10
7
9
12
^CHello!
None
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
File "/usr/lib/python2.6/atexit.py", line 24, in _run_exitfuncs
func(*targs, **kargs)
File "test.py", line 58, in cleanup
for i in cleanup_stuff:
File "<string>", line 2, in __getitem__
File "/usr/lib/python2.6/multiprocessing/managers.py", line 722, in _callmethod
self._connect()
File "/usr/lib/python2.6/multiprocessing/managers.py", line 709, in _connect
conn = self._Client(self._token.address, authkey=self._authkey)
File "/usr/lib/python2.6/multiprocessing/connection.py", line 143, in Client
c = SocketClient(address)
File "/usr/lib/python2.6/multiprocessing/connection.py", line 263, in SocketClient
s.connect(address)
File "<string>", line 1, in connect
error: [Errno 2] No such file or directory
Error in sys.exitfunc:
Traceback (most recent call last):
File "/usr/lib/python2.6/atexit.py", line 24, in _run_exitfuncs
func(*targs, **kargs)
File "test.py", line 58, in cleanup
for i in cleanup_stuff:
File "<string>", line 2, in __getitem__
File "/usr/lib/python2.6/multiprocessing/managers.py", line 722, in _callmethod
self._connect()
File "/usr/lib/python2.6/multiprocessing/managers.py", line 709, in _connect
conn = self._Client(self._token.address, authkey=self._authkey)
File "/usr/lib/python2.6/multiprocessing/connection.py", line 143, in Client
c = SocketClient(address)
File "/usr/lib/python2.6/multiprocessing/connection.py", line 263, in SocketClient
s.connect(address)
File "<string>", line 1, in connect
socket.error: [Errno 2] No such file or directory
Interestingly enough, the code does exit the Pool.map function without calling any of the additional functions ... The problem seems to be that the KeyboardInterrupt isn't handled properly at some point, but it is a little confusing where that is, and why it isn't handled in interruptable. Thanks.
Note, the same problem happens if I use out=p.map_async(wrapper,range(30)).get()
EDIT 1
A little closer ... If I enclose the out=p.map(...) in a try,except,finally clause, it gets rid of the first exception ... the other ones are still raised in atexit however. The code and traceback above have been updated.
EDIT 2
Something else that does not work has been added to the code above as a comment. (Same error). This attempt was inspired by:
http://jessenoller.com/2009/01/08/multiprocessingpool-and-keyboardinterrupt/
EDIT 3
Another failed attempt using signals added to the code above.
EDIT 4
I have figured out how to restructure my code so that the above is no longer necessary. In the (unlikely) event that someone stumbles upon this thread with the same use-case that I had, I will describe my solution ...
Use Case
I have a function which generates temporary files using the tempfile module. I would like those temporary files to be cleaned up when the program exits. My initial attempt was to pack each temporary file name into a list and then delete all the elements of the list with a function registered via atexit.register. The problem is that the updated list was not being updated across multiple processes. This is where I got the idea of using multiprocessing.Manager to manage the list data. Unfortunately, this fails on a KeyboardInterrupt no matter how hard I tried because the communication sockets between processes were broken for some reason. The solution to this problem is simple. Prior to using multiprocessing, set the temporary file directory ... something like tempfile.tempdir=tempfile.mkdtemp() and then register a function to delete the temporary directory. Each of the processes writes to the same temporary directory, so it works. Of course, this solution only works where the shared data is a list of files that needs to be deleted at the end of the program's life.