Why does the billiard multiprocessing module require the "if __name__=='__main__'" line? - python

If I have the following code:
def f():
print 'ok!'
import sys
sys.exit()
if __name__=='__main__':
import billiard
billiard.forking_enable(0)
p = billiard.Process( target=f)
p.start()
while p.is_alive():
pass
The script behaves as expected, printing "ok!" and ending. But if I omit the if __name__=='__main__': line and de-indent the following lines, my machine (OS X) goes crazy, continually spawning tons of Python processes until I killall Python. Any idea what's going on here?
(To those marking this as a duplicate, note that while the other question asks the purpose of if __name__=='__main__' generally, I'm specifically asking why failure to use it here causes dramatically unexpected behaviour)

You're disabling fork support with the line:
billiard.forking_enable(0)
That means that the library will need to spawn (instead of fork) your child process, and have it re-import the __main__ module to run f, just like Windows does. Without the if __name__ ... guard, re-importing the __main__ module in the children will also mean re-running your code that creates the billiard.Process, which creates an infinite loop.
If you leave fork enabled, the re-import in the child process isn't necessary, so everything works fine with or without the if __name__ ... guard.

Related

Utilizing multiprocessing.Pipe() with subprocess.Popen/run as stdin/stdout

I'm currently working on a POC with the following results to be desired
python script working as a parent, meaning it will start a child process while running it
the child process is oblivious to the fact another script is running it, the very same child script can also be executed as the main script by the user
comfortable way to read the subprocess's outputs (to sys.stdout via print), and the parent's inputs will be sent to the sys.stdin (via input)
I've already done some research on the topic and I am aware that I can pass to Popen/run subprocess.PIPE, and call it a day.
However I saw multiprocessing.Pipe() produces a linked socket pair which allows to send objects through them as a whole, so I don't need to get into when to stop reading a stream and continue afterward
# parent.py
import multiprocessing
import subprocess
import os
pipe1, pipe2 = multiprocessing.Pipe()
if os.fork():
while True:
print(pipe1.recv())
exit() # avoid fork colision
if os.fork():
# subprocess.run is busy wait
subprocess.run(args['python3', 'child.py'], stdin=pipe2.fileno(), stdout=pipe2.fileno())
exit() # avoid fork colision
while True:
user_input = input('> ')
pipe1.send(user_input)
# child.py
import os
import time
if os.fork:
while True:
print('child sends howdy')
time.sleep(1)
with open('child.txt, 'w') as file
while True:
user_input = input('> ')
# We supposedly can't write to sys.stdout because parent.py took control of it
file.write(f'{user_input}\n')
So to finally reach the essence of the problem, child.py is installed as a package,
meaning parent.py doesn't call on the actual file to run the script.
The subprocess is run by calling upon the package
And for some bizarre reason, when child.py is a package vs a script, the code written above doesn't seem to work.
child.py's sys.stdin and sys.stdout fail to work entirely, parent.py is unable to receive ANY of the child.py's prints (even sys.stdout.write(<some_data>) and sys.stdout.flush()),
and the same applies to sys.stdin.
If anyone can shed any light on how to solve it, I would be delighted !
Side Note
When calling upon a package, you don't call upon its main.py (image it's dunder_main_dunder.py) directly.
you call upon a python file which it actually starts up the package.
I assume something fishy might be happening over there when that happens and that what causes the interference, but that's just a theory

Python doctest hangs using ProcessPoolExecutor

This code runs fine under regular CPython 3.5:
import concurrent.futures
def job(text):
print(text)
with concurrent.futures.ProcessPoolExecutor(1) as pool:
pool.submit(job, "hello")
But if you run it as python -m doctest myfile.py, it hangs. Changing submit(job to submit(print makes it not hang, as does using ThreadPoolExecutor instead of ProcessPoolExecutor.
Why does it hang when run under doctest?
So I think the issue is because of your with statement. When you have below
with concurrent.futures.ProcessPoolExecutor(1) as pool:
pool.submit(job, "hello")
It enforces the thread to be executed and closed then an there itself. When you run this as main process it works and gives time for thread to execute the job. But when you import it as a module then it doesn't give the background thread a chance and the shutdown on the pool waits for the work to be executed and hence a deadlock
So the workaround that you can use is below
import concurrent.futures
def job(text):
print(text)
pool = concurrent.futures.ProcessPoolExecutor(1)
pool.submit(job, "hello")
if __name__ == "__main__":
pool.shutdown(True)
This will prevent the deadlock and will let you run doctest as well as import the module if you want
The problem is that importing a module acquires a lock (which lock depends on your python version), see the docs for imp.lock_held.
Locks are shared over multiprocessing so your deadlock occurs because your main process, while it is importing your module, loads and waits for a subprocess which attempts to import your module, but can't acquire the lock to import it because it is currently being imported by your main process.
In step form:
Main process acquires lock to import myfile.py
Main process starts importing myfile.py (it has to import myfile.py because that is where your job() function is defined, which is why it didn't deadlock for print()).
Main process starts and blocks on subprocess.
Subprocess tries to acquire lock to import myfile.py
=> Deadlock.
doctest imports your module in order to process it. Try adding this to prevent execution on import:
if __name__ == "__main__":
with concurrent.futures.ProcessPoolExecutor(1) as pool:
pool.submit(job, "hello")
This should actually be a comment, but it's too long to be one.
Your code fails if it's imported as a module too, with the same error as doctest. I get _pickle.PicklingError: Can't pickle <function job at 0x7f28cb0d2378>: import of module 'a' failed (I named the file as a.py).
Your lack of if __name__ == "__main__": violates the programming guidelines for multiprocessing:
https://docs.python.org/3.6/library/multiprocessing.html#the-spawn-and-forkserver-start-methods
I guess that the child processes will also try to import the module, which then tries to start another child process (because the pool unconditionally executes). But I'm not 100% sure about this.
I'm also not sure why the error you get is can't pickle <function>.
The issue here seems to be that you want the module to auto start a process on import. I'm not sure if this is possible.

Python Process which is joined will not call atexit

I thought Python Processes call their atexit functions when they terminate. Note that I'm using Python 2.7. Here is a simple example:
from __future__ import print_function
import atexit
from multiprocessing import Process
def test():
atexit.register(lambda: print("atexit function ran"))
process = Process(target=test)
process.start()
process.join()
I'd expect this to print "atexit function ran" but it does not.
Note that this question:
Python process won't call atexit
is similar, but it involves Processes that are terminated with a signal, and the answer involves intercepting that signal. The Processes in this question are exiting gracefully, so (as far as I can tell anyway) that question & answer do not apply (unless these Processes are exiting due to a signal somehow?).
I did some research by looking at how this is implemented in CPython. This is assumes you are running on Unix. If you are running on Windows the following might not be valid as the implementation of processes in multiprocessing differs.
It turns out that os._exit() is always called at the end of the process. That, together with the following note from the documentation for atexit, should explain why your lambda isn't running.
Note: The functions registered via this module are not called when the
program is killed by a signal not handled by Python, when a Python
fatal internal error is detected, or when os._exit() is called.
Here's an excerpt from the Popen class for CPython 2.7, used for forking processes. Note that the last statement of the forked process is a call to os._exit().
# Lib/multiprocessing/forking.py
class Popen(object):
def __init__(self, process_obj):
sys.stdout.flush()
sys.stderr.flush()
self.returncode = None
self.pid = os.fork()
if self.pid == 0:
if 'random' in sys.modules:
import random
random.seed()
code = process_obj._bootstrap()
sys.stdout.flush()
sys.stderr.flush()
os._exit(code)
In Python 3.4, the os._exit() is still there if you are starting a forking process, which is the default. But it seems like you can change it, see Contexts and start methods for more information. I haven't tried it, but perhaps using a start method of spawn would work? Not available for Python 2.7 though.

Python multiprocess debugging

I'm trying to debug a simple python application but no luck so far.
import multiprocessing
def worker(num):
for a in range(0, 10):
print a
if __name__ == '__main__':
for i in range(5):
p = multiprocessing.Process(target=worker, args=(i,))
p.start()
I want to set a breakpoint inside the for-loop to track the values of 'a' but non of the tools that I tried are able to do that.
So far I tried debuging with:
PyCharm and get the following error: ImportError: No module named
pydevd - http://youtrack.jetbrains.com/issue/PY-6649 It looks like
they are still working on a fix for this and from what I understand, no ETA on this
I also tried debuging with Winpdb - http://winpdb.org but it simply won't go inside my 'worker' method and just print the values of 'a'
I would really appreciate any help with this!
I found it very useful to replace multiprocessing.Process() with threading.Thread() when I'm going to set breakpoints. Both classes have similar arguments so in most cases they are interchangeable.
Usually my scripts use Process() until I specify command line argument --debug which effectively replaces those calls with Thread(). That allows me to debug those scripts with pdb.
You should be able to do it with remote-pdb.
from multiprocessing import Pool
def test(thing):
from remote_pdb import set_trace
set_trace()
s = thing*2
print(s)
return s
if __name__ == '__main__':
with Pool(5) as p:
print(p.map(test,['dog','cat','bird']))
Then just telnet to the port thats listed in the log.
Example:
RemotePdb session open at 127.0.0.1:54273, waiting for connection ...
telnet 127.0.0.1 54273
<telnet junk>
-> s = thing*2
(Pdb)
or
nc -tC 127.0.0.1 54273
-> s = thing * 2
(Pdb)
You should be able to debug the process at that point.
I copied everything in /Applications/PyCharm\ 2.6\ EAP.app/helpers/pydev/*.py to site-packages in my virtualenv and it worked for my (I'm debugging celery/kombu, breakpoints work as expected).
It would be great if regular pdb/ipdb would work with multiprocessing. If I can get away with it, I handle calls to multiprocessing serially if the number of configured processes is 1.
if processes == 1:
for record in data:
worker_function(data)
else:
pool.map(worker_function, data)
Then when debugging, configure the application to only use a single process. This doesn't cover all cases, especially when dealing with concurrency issues, but it might help.
I've rarely needed to use a traditional debugger when attempting to debug Python code, preferring instead to liberally sprinkle my code with trace statements. I'd change your code to the following:
import multiprocessing
import logging
def worker(num):
for a in range(0, 10):
logging.debug("(%d, %d)" % (num, a))
if __name__ == '__main__':
logging.basicConfig(level=logging.DEBUG)
for i in range(5):
p = multiprocessing.Process(target=worker, args=(i,))
logging.info("Starting process %d" % i)
p.start()
In production, you disable the debug trace statements by setting the trace level to logging.WARNING so you only log warnings and errors.
There's a good basic and advanced logging tutorial on the official Python site.
If you are trying to debug multiple processes running simultaneously, as shown in your example, then there's no obvious way to do that from a single terminal: which process should get the keyboard input? Because of this, Python always connects sys.stdin in the child process to os.devnull. But this means that when the debugger tries to get input from stdin, it immediately reaches end-of-file and reports an error.
If you can limit yourself to one subprocess at a time, at least for debugging, then you could get around this by setting sys.stdin = open(0) to reopen the main stdin, as described here.
But if multiple subprocesses may be at breakpoints simultaneously, then you will need a different solution, since they would all end up fighting over input from the single terminal. In that case, RemotePdb is probably your best bet, as described by #OnionKnight.
WingIDE Pro provides this functionality right out-of-the-box.
No additional code (e.g., use of the traceback module) is needed. You just run your program, and the Wing debugger will not only print stdout from subprocesses, but it will break on errors in a subprocess and instantly create and an interactive shell so you can debug the offending thread. It doesn't get any easier than this, and I know of no other IDE that exposes subprocesses in this way.
Yes, it's a commercial product. But I have yet to find any other IDE that provides a debugger to match. PyCharm Professional, Visual Studio Community, Komodo IDE - I've tried them all. WingIDE also leads in parsing source documentation as well, in my opinion. And the Eye Ease Green color scheme is something I can't live without now.
(Yes, I realize this question is 5+ years old. I'm answering it anyway.)

Child processes created with python multiprocessing module won't print

I have a problem with the code below, and with any code that uses the print function in the child processes. I can't see any printed statements, even if I use sys.std[err|out].write('worker') instead of print.
This is the code (from the official python documentation):
from multiprocessing import Process
def f(name):
print 'hello', name
if __name__ == '__main__':
p = Process(target=f, args=('bob',))
p.start()
p.join()
The output is blank.
Note: The following code uses the threading module and it prints the output:
import threading
def f(name):
print 'hello', name
if __name__ == '__main__':
p = threading.Thread(target=f, args=('bob',))
p.start()
p.join()
Output: hello bob
Can you please point me to the solution? Thanks in advance.
Try this:
from multiprocessing import Process
import sys
def f(name):
print 'hello', name
sys.stdout.flush()
...
AFAIK the standard output of processed spawned by the multiprocessing module is buffered, hence you will see the output only if the buffer becomes full or you explicitly flush sys.stdout.
The docs for multiprocessing clearly explain why this won't work!
"Note: Functionality within this package requires that the __main__ method be importable by the children. This is covered in Programming guidelines however it is worth pointing out here. This means that some examples, such as the multiprocessing.Pool examples will not work in the interactive interpreter."
Having run into this issue myself, sometimes this can be because the child process is actually silently failing before ever getting to the print statement. If this is the case, wrapping the child process code in a try-except block and returning the exception object (to be printed in the parent process) is an effective way to debug this.
I was using PyCharm IDE, and by checking the "Emulate terminal in output console" field in Run/Debug Configurations, it printed the desired result.
Hope it helps if you're using PyCharm.

Categories

Resources