Sorry in advance, this is going to be long ...
Possibly related:
Python Multiprocessing atexit Error "Error in atexit._run_exitfuncs"
Definitely related:
python parallel map (multiprocessing.Pool.map) with global data
Keyboard Interrupts with python's multiprocessing Pool
Here's a "simple" script I hacked together to illustrate my problem...
import time
import multiprocessing as multi
import atexit
cleanup_stuff=multi.Manager().list([])
##################################################
# Some code to allow keyboard interrupts
##################################################
was_interrupted=multi.Manager().list([])
class _interrupt(object):
"""
Toy class to allow retrieval of the interrupt that triggered it's execution
"""
def __init__(self,interrupt):
self.interrupt=interrupt
def interrupt():
was_interrupted.append(1)
def interruptable(func):
"""
decorator to allow functions to be "interruptable" by
a keyboard interrupt when in python's multiprocessing.Pool.map
**Note**, this won't actually cause the Map to be interrupted,
It will merely cause the following functions to be not executed.
"""
def newfunc(*args,**kwargs):
try:
if(not was_interrupted):
return func(*args,**kwargs)
else:
return False
except KeyboardInterrupt as e:
interrupt()
return _interrupt(e) #If we really want to know about the interrupt...
return newfunc
#atexit.register
def cleanup():
for i in cleanup_stuff:
print(i)
return
#interruptable
def func(i):
print(i)
cleanup_stuff.append(i)
time.sleep(float(i)/10.)
return i
#Must wrap func here, otherwise it won't be found in __main__'s dict
#Maybe because it was created dynamically using the decorator?
def wrapper(*args):
return func(*args)
if __name__ == "__main__":
#This is an attempt to use signals -- I also attempted something similar where
#The signals were only caught in the child processes...Or only on the main process...
#
#import signal
#def onSigInt(*args): interrupt()
#signal.signal(signal.SIGINT,onSigInt)
#Try 2 with signals (only catch signal on main process)
#import signal
#def onSigInt(*args): interrupt()
#signal.signal(signal.SIGINT,onSigInt)
#def startup(): signal.signal(signal.SIGINT,signal.SIG_IGN)
#p=multi.Pool(processes=4,initializer=startup)
#Try 3 with signals (only catch signal on child processes)
#import signal
#def onSigInt(*args): interrupt()
#signal.signal(signal.SIGINT,signal.SIG_IGN)
#def startup(): signal.signal(signal.SIGINT,onSigInt)
#p=multi.Pool(processes=4,initializer=startup)
p=multi.Pool(4)
try:
out=p.map(wrapper,range(30))
#out=p.map_async(wrapper,range(30)).get() #This doesn't work either...
#The following lines don't work either
#Effectively trying to roll my own p.map() with p.apply_async
# results=[p.apply_async(wrapper,args=(i,)) for i in range(30)]
# out = [ r.get() for r in results() ]
except KeyboardInterrupt:
print ("Hello!")
out=None
finally:
p.terminate()
p.join()
print (out)
This works just fine if no KeyboardInterrupt is raised. However, if I raise one, the following exception occurs:
10
7
9
12
^CHello!
None
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
File "/usr/lib/python2.6/atexit.py", line 24, in _run_exitfuncs
func(*targs, **kargs)
File "test.py", line 58, in cleanup
for i in cleanup_stuff:
File "<string>", line 2, in __getitem__
File "/usr/lib/python2.6/multiprocessing/managers.py", line 722, in _callmethod
self._connect()
File "/usr/lib/python2.6/multiprocessing/managers.py", line 709, in _connect
conn = self._Client(self._token.address, authkey=self._authkey)
File "/usr/lib/python2.6/multiprocessing/connection.py", line 143, in Client
c = SocketClient(address)
File "/usr/lib/python2.6/multiprocessing/connection.py", line 263, in SocketClient
s.connect(address)
File "<string>", line 1, in connect
error: [Errno 2] No such file or directory
Error in sys.exitfunc:
Traceback (most recent call last):
File "/usr/lib/python2.6/atexit.py", line 24, in _run_exitfuncs
func(*targs, **kargs)
File "test.py", line 58, in cleanup
for i in cleanup_stuff:
File "<string>", line 2, in __getitem__
File "/usr/lib/python2.6/multiprocessing/managers.py", line 722, in _callmethod
self._connect()
File "/usr/lib/python2.6/multiprocessing/managers.py", line 709, in _connect
conn = self._Client(self._token.address, authkey=self._authkey)
File "/usr/lib/python2.6/multiprocessing/connection.py", line 143, in Client
c = SocketClient(address)
File "/usr/lib/python2.6/multiprocessing/connection.py", line 263, in SocketClient
s.connect(address)
File "<string>", line 1, in connect
socket.error: [Errno 2] No such file or directory
Interestingly enough, the code does exit the Pool.map function without calling any of the additional functions ... The problem seems to be that the KeyboardInterrupt isn't handled properly at some point, but it is a little confusing where that is, and why it isn't handled in interruptable. Thanks.
Note, the same problem happens if I use out=p.map_async(wrapper,range(30)).get()
EDIT 1
A little closer ... If I enclose the out=p.map(...) in a try,except,finally clause, it gets rid of the first exception ... the other ones are still raised in atexit however. The code and traceback above have been updated.
EDIT 2
Something else that does not work has been added to the code above as a comment. (Same error). This attempt was inspired by:
http://jessenoller.com/2009/01/08/multiprocessingpool-and-keyboardinterrupt/
EDIT 3
Another failed attempt using signals added to the code above.
EDIT 4
I have figured out how to restructure my code so that the above is no longer necessary. In the (unlikely) event that someone stumbles upon this thread with the same use-case that I had, I will describe my solution ...
Use Case
I have a function which generates temporary files using the tempfile module. I would like those temporary files to be cleaned up when the program exits. My initial attempt was to pack each temporary file name into a list and then delete all the elements of the list with a function registered via atexit.register. The problem is that the updated list was not being updated across multiple processes. This is where I got the idea of using multiprocessing.Manager to manage the list data. Unfortunately, this fails on a KeyboardInterrupt no matter how hard I tried because the communication sockets between processes were broken for some reason. The solution to this problem is simple. Prior to using multiprocessing, set the temporary file directory ... something like tempfile.tempdir=tempfile.mkdtemp() and then register a function to delete the temporary directory. Each of the processes writes to the same temporary directory, so it works. Of course, this solution only works where the shared data is a list of files that needs to be deleted at the end of the program's life.
Related
I am trying to use the example Pika Async consumer (http://pika.readthedocs.io/en/0.10.0/examples/asynchronous_consumer_example.html) as a multiprocessing process (by making the ExampleConsumer class subclass multiprocessing.Process). However, I'm running into some issues with gracefully shutting down everything.
Let's say for example I have defined my procs as below:
for k, v in queues_callbacks.iteritems():
proc = ExampleConsumer(queue, k, v, rabbit_user, rabbit_pw, rabbit_host, rabbit_port)
"queues_callbacks" is basically just a dictionary of exchange : callback_function (ideally I'd like to be able to connect to several exchanges with this architecture).
Then I do the normal python way of dealing with starting processes:
try:
for proc in self.consumers:
proc.start()
for proc in self.consumers:
proc.join()
except KeyboardInterrupt:
for proc in self.consumers:
proc.terminate()
proc.join(1)
The issue is coming when I try to stop everything. Let's say I've overriden the "terminate" method to call the consumer's "stop" method then continue on with the normal terminate of Process. With this structure, I am getting some strange attribute errors
Traceback (most recent call last):
File "/Users/christopheralexander/PycharmProjects/new_bot/abstract_bot.py", line 154, in <module>
main()
File "/Users/christopheralexander/PycharmProjects/new_bot/abstract_bot.py", line 150, in main
mybot.start()
File "/Users/christopheralexander/PycharmProjects/new_bot/abstract_bot.py", line 71, in start
self.stop()
File "/Users/christopheralexander/PycharmProjects/new_bot/abstract_bot.py", line 53, in stop
self.__stop_consumers__()
File "/Users/christopheralexander/PycharmProjects/new_bot/abstract_bot.py", line 130, in __stop_consumers__
self.consumers[0].terminate()
File "/Users/christopheralexander/PycharmProjects/new_bot/rabbit_consumer.py", line 414, in terminate
self.stop()
File "/Users/christopheralexander/PycharmProjects/new_bot/rabbit_consumer.py", line 399, in stop
self._connection.ioloop.start()
AttributeError: 'NoneType' object has no attribute 'ioloop'
It's as if these attributes somehow disappear at some point. In the particular case above, _connection is initialized as None, but then gets set when the Consumer is started. However, when the "stop" method is called, it has already reverted back to None (with nothing set to do so). I'm also observing other strange behavior, such as times when it appears that things are getting called twice (even though "stop" is called once). Any ideas as to what is going on here, or is this not the proper way of architecting this?
Thanks!
I wrote this function that handles the "rate limit error" of a Tweepy's cursor in order to keep downloading from Twitter APIs.
def limit_handled(cursor, user):
over = False
while True:
try:
if (over == True):
print "Routine Riattivata, Serviamo il numero:", user
over = False
yield cursor.next()
except tweepy.RateLimitError:
print "Raggiunto Limite, Routine in Pausa"
threading.Event.wait(15*60 + 15)
over = True
except tweepy.TweepError:
print "TweepError"
threading.Event.wait(5)
Since I am using serveral threads to connect I would like to stop each one of them when the RateLimitError error raises and restart them after 15 minutes.
I previously used the function:
time.sleep(x)
But I understood that doesn't work well for threads (the counter do not increase if the thread is not active) so I tried to use:
threading.Event.wait(x)
But it this error raises:
Exception in thread Thread-15:
Traceback (most recent call last):
File "/home/xor/anaconda/lib/python2.7/threading.py", line 810, in __bootstrap_inner
self.run()
File "/home/xor/anaconda/lib/python2.7/threading.py", line 763, in run
self.__target(*self.__args, **self.__kwargs)
File "/home/xor/spyder/algo/HW2/hw2.py", line 117, in work
storeFollowersOnMDB(ids, api, k)
File "/home/xor/spyder/algo/HW2/hw2.py", line 111, in storeFollowersOnMDB
for followersPag in limit_handled(tweepy.Cursor(api.followers_ids, id = user, count=5000).pages(), user):
File "/home/xor/spyder/algo/HW2/hw2.py", line 52, in limit_handled
threading.Event.wait(15*60 + 15)
AttributeError: 'function' object has no attribute 'wait'
How can I "sleep/wait" my threads being sure that they will wake up at the right moment?
Try doing it like this instead:
import threading
dummy_event = threading.Event()
dummy_event.wait(timeout=1)
also try google-ing next time first: Issues with time.sleep and Multithreading in Python
I am using python-daemon module to manage the daemon process of my Python script.
However, I am running into a headache when running the script that I simply can't figure out. Nor do I really know how to begin to debug it.
I have the code:
def run_application():
#Do something here...
class App():
def __init__(self):
self.stdin_path = '/dev/null'
self.stdout_path = 'stdout.txt'
self.stderr_path = 'stdlog.log'
self.pidfile_path = 'filelock.pid'
self.pidfile_timeout = 5
def run(self):
run_application()
app = App()
daemon_runner = runner.DaemonRunner(app)
daemon_runner.do_action()
When run, it always writes the following to stdlog.log:
Traceback (most recent call last):
File "MyApp.py", line 335, in <module>
daemon_runner.do_action()
File "/anaconda/lib/python2.7/site-packages/daemon/runner.py", line 189, in do_action
func(self)
File "/anaconda/lib/python2.7/site-packages/daemon/runner.py", line 124, in _start
self.daemon_context.open()
File "/anaconda/lib/python2.7/site-packages/daemon/daemon.py", line 346, in open
self.pidfile.__enter__()
File "/anaconda/lib/python2.7/site-packages/lockfile/__init__.py", line 229, in __enter__
self.acquire()
File "/anaconda/lib/python2.7/site-packages/daemon/pidfile.py", line 42, in acquire
super(TimeoutPIDLockFile, self).acquire(timeout, *args, **kwargs)
File "/anaconda/lib/python2.7/site-packages/lockfile/pidlockfile.py", line 88, in acquire
self.path)
lockfile.LockTimeout: Timeout waiting to acquire lock for /MyApp/filelock.pid
So it appears to be timing out when attempting to lock filelock.pid. I have no idea why this is. I have deleted filelock.pid, I've changed permissions; same error every time.
How can I begin to debug this??? I'm at a loss.
I am using python-daemon version 1.6 (if it matters).
Update:
Following the advice here, I now see that there is already a process running. Now how can I figure out how to determine the PID of the running daemon process.
I agree with #ExploWare as far as how he demonstrates you can capture those LockTimeout exceptions.
So as far as a way to debug and see what process is holding on to this lock, here is an external bit of code you can run...
import daemon.pidfile
import os
import lockfile
# We know the lockfile name.
pidfile = daemon.pidfile.PIDLockFile(
os.path.join("/MyApp/","filelock.pid"))
# This current process id...
os.getpid()
# 46337
So what process has acquired this lock if any?
pidfile.is_locked()
# True
pidfile.read_pid()
# 96856
When our PIDLockFile instance tries to "acquire",
pidfile.__dict__
# {'unique_name': '/MyApp/filelock.pid', 'lock_file': '/MyApp/filelock.pid.lock', 'hostname':
# 'MyMachine.local', 'pid': 46337, 'timeout': None, 'tname': '', 'path': '/MyApp/filelock.pid'}
pidfile.acquire()
#
# (Had to Control-C quit because I didnt set a timeout on PIDLockFile )
#
# ^CTraceback (most recent call last):
# File "<stdin>", line 1, in <module>
# File "/Users/michal/venf/lib/python2.7/site-packages/lockfile/pidlockfile.py", line 92, in acquire
# time.sleep(timeout is not None and timeout/10 or 0.1)
# KeyboardInterrupt
So instead, use #ExploWare 's exception catching.
# Wait only 5 seconds.
pidfile.timeout = 5
try:
pidfile.acquire()
except lockfile.LockTimeout:
print 'locked . need to wait or move on.'
#
# locked . need to wait or move on.
I found a nice way to handle this exception, so maybe its helpful for you as well:
Add
from lockfile import LockTimeout
to the beginning of the script and surround daemon_runner.doaction() like this
try:
daemon_runner.do_action()
except LockTimeout:
print "Error: couldn't aquire lock"
#you can exit here or try something else
I'm trying to execute ssh commands using paramiko from inside a python daemon process.
I'm using the following implementation for the daemon: https://pypi.python.org/pypi/python-daemon/
When the program is started pycrypto raises an IOError with a Bad file descriptor when paramiko tries to connect.
If I remove the daemon code (just uncomment the last line and comment the two above) the ssh connection is established as expected.
The code for a short test program looks like this:
#!/usr/bin/env python2
from daemon import runner
import paramiko
class App():
def __init__(self):
self.stdin_path = '/dev/null'
self.stdout_path = '/dev/tty'
self.stderr_path = '/dev/tty'
self.pidfile_path = '/tmp/testdaemon.pid'
self.pidfile_timeout = 5
def run(self):
ssh = paramiko.SSHClient()
ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
ssh.load_system_host_keys()
ssh.connect("hostname", username="username")
ssh.close()
app = App()
daemon_runner = runner.DaemonRunner(app)
daemon_runner.do_action()
#app.run()
The trace looks like this:
Traceback (most recent call last):
File "./daemon-test.py", line 31, in <module>
daemon_runner.do_action()
File "/usr/lib/python2.7/site-packages/daemon/runner.py", line 189, in do_action
func(self)
File "/usr/lib/python2.7/site-packages/daemon/runner.py", line 134, in _start
self.app.run()
File "./daemon-test.py", line 22, in run
ssh.connect("hostname", username="username")
File "/usr/lib/python2.7/site-packages/paramiko/client.py", line 311, in connect
t.start_client()
File "/usr/lib/python2.7/site-packages/paramiko/transport.py", line 460, in start_client
Random.atfork()
File "/usr/lib/python2.7/site-packages/Crypto/Random/__init__.py", line 37, in atfork
_UserFriendlyRNG.reinit()
File "/usr/lib/python2.7/site-packages/Crypto/Random/_UserFriendlyRNG.py", line 224, in reinit
_get_singleton().reinit()
File "/usr/lib/python2.7/site-packages/Crypto/Random/_UserFriendlyRNG.py", line 171, in reinit
return _UserFriendlyRNG.reinit(self)
File "/usr/lib/python2.7/site-packages/Crypto/Random/_UserFriendlyRNG.py", line 99, in reinit
self._ec.reinit()
File "/usr/lib/python2.7/site-packages/Crypto/Random/_UserFriendlyRNG.py", line 62, in reinit
block = self._osrng.read(32*32)
File "/usr/lib/python2.7/site-packages/Crypto/Random/OSRNG/rng_base.py", line 76, in read
data = self._read(N)
File "/usr/lib/python2.7/site-packages/Crypto/Random/OSRNG/posix.py", line 65, in _read
d = self.__file.read(N - len(data))
IOError: [Errno 9] Bad file descriptor
I'm guessing this has something to do with the stream redirection when the daemon spawns. I've tried to set them all to /dev/tty or even to a normal file but nothing works.
When I run the program with strace I can see that something tries to close a file twice and that's when I get the error. But I couldn't find out which file the descriptor actually points to (strace shows a memory location that doesn't seem to be set anywhere).
This is a known issue that I am actually experiencing myself (which is what led me to this question). Basically, it has to do with the definition of a UNIX daemon process and the way paramiko implements its random number generator (RNG).
If you refer to PEP 3143 - Standard daemon process library, the first step in becoming a correct daemon is to "close all open file descriptors." Unfortunately, this closes the file descriptor to /dev/urandom which is used in the Crypto module's RNG which is in turn used by paramiko.
There are some workarounds for the moment, but the author has indicated that he doesn't currently have time to pursue this bug (although the last post in the first link is by the author and is 8 days old as of this writing).
Daemonizing after importing paramiko breaks the random number
generator
EAGAIN on file when using RNG after daemon fork
In summary, if you import paramiko after your process becomes a daemon, then it should work as desired because the file descriptor will have been opened after the daemonizing closes all file descriptors.
The user #xraj also had a hackish, yet clever workaround for finding and preserving the file descriptor to /dev/urandom when daemonizing (first link above):
import os
from resource import getrlimit, RLIMIT_NOFILE
def files_preserve_by_path(*paths):
wanted=[]
for path in paths:
fd = os.open(path, os.O_RDONLY)
try:
wanted.append(os.fstat(fd)[1:3])
finally:
os.close(fd)
def fd_wanted(fd):
try:
return os.fstat(fd)[1:3] in wanted
except OSError:
return False
fd_max = getrlimit(RLIMIT_NOFILE)[1]
return [ fd for fd in xrange(fd_max) if fd_wanted(fd) ]
daemon_context.files_preserve = files_preserve_by_path('/dev/urandom')
This recently happens for daemons and multithreading applications which makes mass close() in cycle of separated thread. I've found the problem in class pipe.PosixPipe. There is no synchronization between set() and close() methods. Methods of PosixPipe could read/write and close descriptor of socket at the same time.
Issue was created: https://github.com/paramiko/paramiko/issues/692
Pull was requested: https://github.com/paramiko/paramiko/pull/691/files
I'm trying to debug an application which makes use of the pynetdicom library. I'm not sure how relevant that specific detail is, however what IS relevant is that it makes heavy use of multithreading to run background socket listener tasks without blocking the main thread. The storescp.py example can be used to reproduce this.
Whenever I place a breakpoint that gets encountered (regardless of what thread, main or child, it gets encountered in), I get the following traceback:
Traceback (most recent call last):
File "/Applications/Aptana Studio 3/plugins/org.python.pydev_2.7.0.2013012902/pysrc/pydevd.py", line 1397, in <module>
debugger.run(setup['file'], None, None)
File "/Applications/Aptana Studio 3/plugins/org.python.pydev_2.7.0.2013012902/pysrc/pydevd.py", line 1090, in run
pydev_imports.execfile(file, globals, locals) #execute the script
File "/Users/alexw/Development/Python/kreport2/KReport2/dicomdatascraper.py", line 183, in <module>
oldDicomList = copy.copy(newData)
File "/Users/alexw/Development/Python/kreport2/KReport2/dicomdatascraper.py", line 183, in <module>
oldDicomList = copy.copy(newData)
File "/Applications/Aptana Studio 3/plugins/org.python.pydev_2.7.0.2013012902/pysrc/pydevd_frame.py", line 135, in trace_dispatch
self.doWaitSuspend(thread, frame, event, arg)
File "/Applications/Aptana Studio 3/plugins/org.python.pydev_2.7.0.2013012902/pysrc/pydevd_frame.py", line 25, in doWaitSuspend
self._args[0].doWaitSuspend(*args, **kwargs)
File "/Applications/Aptana Studio 3/plugins/org.python.pydev_2.7.0.2013012902/pysrc/pydevd.py", line 832, in doWaitSuspend
self.processInternalCommands()
File "/Applications/Aptana Studio 3/plugins/org.python.pydev_2.7.0.2013012902/pysrc/pydevd.py", line 360, in processInternalCommands
thread_id = GetThreadId(t)
File "/Applications/Aptana Studio 3/plugins/org.python.pydev_2.7.0.2013012902/pysrc/pydevd_constants.py", line 140, in GetThreadId
return thread.__pydevd_id__
File "/Users/alexw/.virtualenvs/kreport2dev/devlibs/pynetdicom/source/netdicom/applicationentity.py", line 73, in __getattr__
obj = eval(attr)()
File "<string>", line 1, in <module>
NameError: name '__pydevd_id__' is not defined
My thought is that, perhaps, in order to make things work, PyDev monkey-patches a __pydevd_id__ into spawned threads, however fails to patch those into these threads because they are, in fact, subclasses and not direct instances of threading.Thread (in this case, the worker is an instance of class Association(threading.Thread):).
Of course, I don't know PyDev well enough to confirm this theory, or else fix it. And it seems neither does the internet.
Is subclassing Thread so rarely used a pattern that it's simply not considered in the PyDev architecture? Without re-architecting the library, how could this issue be remedied?
I simply needed to look harder at that traceback.
The pynetdicom library, in its subclassing of threading.Thread, overrode __getattr__ and somewhat broke it. The problem was:
def __getattr__(self, attr):
#while not self.AssociationEstablished:
# time.sleep(0.001)
obj = eval(attr)
# do some stuff
return obj
when a nonexistent attribute is passed, a NameError is raised. This isn't caught by pydev's monkeypatching routine (if thread.__pydevd_id__ raises AttributeError, thread.__pydevd_id__ = stuff)
The solution was to update that section thusly:
def __getattr__(self, attr):
#while not self.AssociationEstablished:
# time.sleep(0.001)
try:
obj = eval(attr)
except NameError:
raise AttributeError
# do some stuff
return obj
This intercepts the NameError and raises an AttributeError instead, as __getattr__ should if the queried attribute doesn't exist.