Python multiprocess debugging

Python multiprocess debugging - python

I'm trying to debug a simple python application but no luck so far.
import multiprocessing
def worker(num):
for a in range(0, 10):
print a
if __name__ == '__main__':
for i in range(5):
p = multiprocessing.Process(target=worker, args=(i,))
p.start()
I want to set a breakpoint inside the for-loop to track the values of 'a' but non of the tools that I tried are able to do that.
So far I tried debuging with:
PyCharm and get the following error: ImportError: No module named
pydevd - http://youtrack.jetbrains.com/issue/PY-6649 It looks like
they are still working on a fix for this and from what I understand, no ETA on this
I also tried debuging with Winpdb - http://winpdb.org but it simply won't go inside my 'worker' method and just print the values of 'a'
I would really appreciate any help with this!

I found it very useful to replace multiprocessing.Process() with threading.Thread() when I'm going to set breakpoints. Both classes have similar arguments so in most cases they are interchangeable.
Usually my scripts use Process() until I specify command line argument --debug which effectively replaces those calls with Thread(). That allows me to debug those scripts with pdb.

You should be able to do it with remote-pdb.
from multiprocessing import Pool
def test(thing):
from remote_pdb import set_trace
set_trace()
s = thing*2
print(s)
return s
if __name__ == '__main__':
with Pool(5) as p:
print(p.map(test,['dog','cat','bird']))
Then just telnet to the port thats listed in the log.
Example:
RemotePdb session open at 127.0.0.1:54273, waiting for connection ...
telnet 127.0.0.1 54273
<telnet junk>
-> s = thing*2
(Pdb)
or
nc -tC 127.0.0.1 54273
-> s = thing * 2
(Pdb)
You should be able to debug the process at that point.

I copied everything in /Applications/PyCharm\ 2.6\ EAP.app/helpers/pydev/*.py to site-packages in my virtualenv and it worked for my (I'm debugging celery/kombu, breakpoints work as expected).

It would be great if regular pdb/ipdb would work with multiprocessing. If I can get away with it, I handle calls to multiprocessing serially if the number of configured processes is 1.
if processes == 1:
for record in data:
worker_function(data)
else:
pool.map(worker_function, data)
Then when debugging, configure the application to only use a single process. This doesn't cover all cases, especially when dealing with concurrency issues, but it might help.

I've rarely needed to use a traditional debugger when attempting to debug Python code, preferring instead to liberally sprinkle my code with trace statements. I'd change your code to the following:
import multiprocessing
import logging
def worker(num):
for a in range(0, 10):
logging.debug("(%d, %d)" % (num, a))
if __name__ == '__main__':
logging.basicConfig(level=logging.DEBUG)
for i in range(5):
p = multiprocessing.Process(target=worker, args=(i,))
logging.info("Starting process %d" % i)
p.start()
In production, you disable the debug trace statements by setting the trace level to logging.WARNING so you only log warnings and errors.
There's a good basic and advanced logging tutorial on the official Python site.

If you are trying to debug multiple processes running simultaneously, as shown in your example, then there's no obvious way to do that from a single terminal: which process should get the keyboard input? Because of this, Python always connects sys.stdin in the child process to os.devnull. But this means that when the debugger tries to get input from stdin, it immediately reaches end-of-file and reports an error.
If you can limit yourself to one subprocess at a time, at least for debugging, then you could get around this by setting sys.stdin = open(0) to reopen the main stdin, as described here.
But if multiple subprocesses may be at breakpoints simultaneously, then you will need a different solution, since they would all end up fighting over input from the single terminal. In that case, RemotePdb is probably your best bet, as described by #OnionKnight.

WingIDE Pro provides this functionality right out-of-the-box.
No additional code (e.g., use of the traceback module) is needed. You just run your program, and the Wing debugger will not only print stdout from subprocesses, but it will break on errors in a subprocess and instantly create and an interactive shell so you can debug the offending thread. It doesn't get any easier than this, and I know of no other IDE that exposes subprocesses in this way.
Yes, it's a commercial product. But I have yet to find any other IDE that provides a debugger to match. PyCharm Professional, Visual Studio Community, Komodo IDE - I've tried them all. WingIDE also leads in parsing source documentation as well, in my opinion. And the Eye Ease Green color scheme is something I can't live without now.
(Yes, I realize this question is 5+ years old. I'm answering it anyway.)

Related

Print output from python to C# application

I have made a C# app which calls a python script.
C# app uses Process object to call python script.
I also have redirected the sub-process standard output so I can process the output from python script.
But the problem is:
The output(via print function) from python will always arrive at once when the script terminates.
I want the output to arrive in real time while script running.
I can say I have tried almost all of method can get from google, like add flush of sys.out, redirect sysout in python, C# event driven message receiving or just using while to wait message etc,.
How to flush output of print function?
PyInstaller packaged application works fine in Console mode, crashes in Window mode
I am very wondering that like PyCharm or other python IDE, they run python script inside, but they can print the output one by one without hacking original python script, how they do that?
The python version is 2.7.
Hope to have advise.
Thank you!

I just use very stupid but working method to resolve it:
using thread to periodically flush the sys.out, the code piece is like this:
import sys
import os
import threading
import time
run_thread = False
def flush_print():
while run_thread:
# print 'something'
sys.stdout.flush()
time.sleep(1)
in main function:
if __name__ == '__main__':
thread = threading.Thread(target=flush_print)
run_thread = True
thread.start()
# my big functions with some prints, the function will block until completed
run_thread = False
thread.join()
Apparently this is ugly, but I have no better method to make work done .

using pycharm + gevent greenlet.join() no longer blocks

so after the upgrade of gevent to 1.1rc4 (from 1.0.2) while running through pycharm, I can't get greenlets to join properly... take this code for example:
from gevent import monkey, Greenlet, hub
import gevent
hub.Hub.resolver_class = ['gevent.resolver_ares.Resolver']
monkey.patch_all()
def sleepy(time):
gevent.sleep(time)
print "done like a good script"
if __name__ == '__main__':
g = gevent.spawn(sleepy,10)
g.start()
g.join()
print "if this is the only log line, then join didn't work"
will output:
"if this is the only log line, then join didn't work"
from the IDE, it executes normally using the same interpreter from the CLI
I have followed the code in the cli and gui and there is a difference in behavior greenlet.join() caused by a change in behavior in hub.switch() :
def switch(self):
switch_out = getattr(getcurrent(), 'switch_out', None)
if switch_out is not None:
switch_out()
return greenlet.switch(self)
where that last line, will return immediately, before the greenlet is executed...
the pycharm debugger wont let me step into that code...
any help would be great... coroutine flow control is hard enough when it works...

Pycharm does some magic (not always nicely) with stdin.
Try to flush your stdin right after each print statements by doing this:
import sys
sys.stdin.flush()
Also add time.sleep(1) at the very end.
It might be that your script exit before the print statement was outputed. I don't know if that will work, but definitely worse a try.
If it does not work, try outputting to a file and see if that works properly.

I opened a bug at jetbrains, it was marked as a duplicate of: https://youtrack.jetbrains.com/issue/PY-14992
I haven't looked at all of my debug issues with gevent, but this simple case was fixed with the 2016.1 release that went out 3/23/2016

Why does the billiard multiprocessing module require the "if name=='main'" line?

If I have the following code:
def f():
print 'ok!'
import sys
sys.exit()
if __name__=='__main__':
import billiard
billiard.forking_enable(0)
p = billiard.Process( target=f)
p.start()
while p.is_alive():
pass
The script behaves as expected, printing "ok!" and ending. But if I omit the if __name__=='__main__': line and de-indent the following lines, my machine (OS X) goes crazy, continually spawning tons of Python processes until I killall Python. Any idea what's going on here?
(To those marking this as a duplicate, note that while the other question asks the purpose of if __name__=='__main__' generally, I'm specifically asking why failure to use it here causes dramatically unexpected behaviour)

You're disabling fork support with the line:
billiard.forking_enable(0)
That means that the library will need to spawn (instead of fork) your child process, and have it re-import the __main__ module to run f, just like Windows does. Without the if __name__ ... guard, re-importing the __main__ module in the children will also mean re-running your code that creates the billiard.Process, which creates an infinite loop.
If you leave fork enabled, the re-import in the child process isn't necessary, so everything works fine with or without the if __name__ ... guard.

Unable to open a Python subprocess in Web2py (SIGABRT)

I've got an Apache2/web2py server running using the wsgi handler functionality. Within one of the controllers, I am trying to run an external executable to perform some processing on 2 files.
My approach to this is to use the subprocess module to kick off the executable. I have simplified the code to a bare-bones implementation with little success.
from subprocess import *
p = Popen(("echo", "Hello"), shell=False)
ret = p.wait()
print "Process ended with status %s" % ret
When running the above code on its own (create new file and running via python command line), it works exactly as expected.
However, as soon as I place the exact same code into my web2py controller, the external process stops working. Instead of the process returning with code 0 as is expected in the above example, it always returns -6 and "Hello" is not printed to stdout.
After doing some digging, I found that negative results from p.wait() implies that a signal caused the process to end abnormally. And, according to some docs I found, -6 corresponds to the SIGABRT signal.
I would have expected this signal to be a result of some poorly executed code in my child process. However, since this is only running echo (and since it works outside of web2py) I have my doubts that the child process is signalling itself.
Is there some web2py limitation/configuration that causes Popen() requests to always fail? If so, how can I modify my logic so that the controller (or whatever) is actually able to spawn this external process?
** EDIT: Looks like web2py applications may not like the subprocess module. According to a reply to a message reply in the web2py email group:
"You should not use subprocess in a web2py application (if you really need too, look into the admin/controllers/shell.py) but you can use it in a web2py program running from shell (web2py.py -R myprogram.py)."
I will be checking out some options based on the note here and see if any solution presents itself.

In the end, the best I was able to come up with involved setting up a simple XML RPC server and call the functions from that:
my_server.py
#my_server.py
from SimpleXMLRPCServer import SimpleXMLRPCServer, SimpleXMLRPCRequestHandler
from subprocess import *
proc_srvr = xmlrpclib.ServerProxy("http://localhost:12345")
def echo_fn():
p = Popen(("echo", "hello"), shell=False)
ret = p.wait()
print "Process ended with status %s" % ret
return True # RPC Server doesn't like to return None
def main():
server = SimpleXMLRPCServer(("localhost", 12345), ErrorHandler)
server.register_function(echo_fn, "echo_fn")
while True:
server.handle_request()
if __name__ == "__main__":
main()
web2py_controller.py
#web2py_controller.py
def run_echo():
proc_srvr = xmlrpclib.ServerProxy("http://localhost:12345")
proc_srvr.echo_fn()
I'll be honest, I'm not a Python nor SimpleRPCServer guru, so the overall code may not be up to best-practice standards. However, going this route did allow me to, in effect, call a subprocess from a controller in web2py.
(Note, this was a quick and dirty simplification of the code that I have in my project. I have not validated it is in a working state, so it may require some tweaks.)

Python: How to determine subprocess children have all finished running

I am trying to detect when an installation program finishes executing from within a Python script. Specifically, the application is the Oracle 10gR2 Database. Currently I am using the subprocess module with Popen. Ideally, I would simply use the wait() method to wait for the installation to finish executing, however, the documented command actually spawns child processes to handle the actual installation. Here is some sample code of the failing code:
import subprocess
OUI_DATABASE_10GR2_SUBPROCESS = ['sudo',
'-u',
'oracle',
os.path.join(DATABASE_10GR2_TMP_PATH,
'database',
'runInstaller'),
'-ignoreSysPrereqs',
'-silent',
'-noconfig',
'-responseFile '+ORACLE_DATABASE_10GR2_SILENT_RESPONSE]
oracle_subprocess = subprocess.Popen(OUI_DATABASE_10GR2_SUBPROCESS)
oracle_subprocess.wait()
There is a similar question here: Killing a subprocess including its children from python, but the selected answer does not address the children issue, instead it instructs the user to call directly the application to wait for. I am looking for a specific solution that will wait for all children of the subprocess. What if there are an unknown number of subprocesses? I will select the answer that addresses the issue of waiting for all children subprocesses to finish.
More clarity on failure: The child processes continue executing after the wait() command since that command only waits for the top level process (in this case it is 'sudo'). Here is a simple diagram of the known child processes in this problem:
Python subprocess module -> Sudo -> runInstaller -> java -> (unknown)

Ok, here is a trick that will work only under Unix. It is similar to one of the answers to this question: Ensuring subprocesses are dead on exiting Python program. The idea is to create a new process group. You can then wait for all processes in the group to terminate.
pid = os.fork()
if pid == 0:
os.setpgrp()
oracle_subprocess = subprocess.Popen(OUI_DATABASE_10GR2_SUBPROCESS)
oracle_subprocess.wait()
os._exit(0)
else:
os.waitpid(-pid)
I have not tested this. It creates an extra subprocess to be the leader of the process group, but avoiding that is (I think) quite a bit more complicated.
I found this web page to be helpful as well. http://code.activestate.com/recipes/278731-creating-a-daemon-the-python-way/

You can just use os.waitpid with the the pid set to -1, this will wait for all the subprocess of the current process until they finish:
import os
import sys
import subprocess
proc = subprocess.Popen([sys.executable,
'-c',
'import subprocess;'
'subprocess.Popen("sleep 5", shell=True).wait()'])
pid, status = os.waitpid(-1, 0)
print pid, status
This is the result of pstree <pid> of different subprocess forked:
python───python───sh───sleep
Hope this can help :)

Check out the following link http://www.oracle-wiki.net/startdocsruninstaller which details a flag you can use for the runInstaller command.
This flag is definitely available for 11gR2, but I have not got a 10g database to try out this flag for the runInstaller packaged with that version.
Regards

Everywhere I look seems to say it's not possible to solve this in the general case. I've whipped up a library called 'pidmon' that combines some answers for Windows and Linux and might do what you need.
I'm planning to clean this up and put it on github, possibly called 'pidmon' or something like that. I'll post a link if/when I get it up.
EDIT: It's available at http://github.com/dbarnett/python-pidmon.
I made a special waitpid function that accepts a graft_func argument so that you can loosely define what sort of processes you want to wait for when they're not direct children:
import pidmon
pidmon.waitpid(oracle_subprocess.pid, recursive=True,
graft_func=(lambda p: p.name == '???' and p.parent.pid == ???))
or, as a shotgun approach, to just wait for any processes started since the call to waitpid to stop again, do:
import pidmon
pidmon.waitpid(oracle_subprocess.pid, graft_func=(lambda p: True))
Note that this is still barely tested on Windows and seems very slow on Windows (but did I mention it's on github where it's easy to fork?). This should at least get you started, and if it works at all for you, I have plenty of ideas on how to optimize it.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.