Python `multiprocessing` spawned process using base Python, not virtualenv Python

Python `multiprocessing` spawned process using base Python, not virtualenv Python - python

On a standard installation of python (e.g. via miniconda), I run this script (also pasted below) and I get the following output:
python test_python_multiprocessing.py
arg1: called directly
sys.executable: C:\ProgramData\Miniconda3\envs\python3_7_4\python.exe
-----
arg1: called via multiprocessing
sys.executable: C:\ProgramData\Miniconda3\envs\python3_7_4\python.exe
-----
The two exes:
C:\ProgramData\Miniconda3\envs\python3_7_4\python.exe
C:\ProgramData\Miniconda3\envs\python3_7_4\python.exe
This is what I expect.
When I run the same script from a python from a virtual environment whose base is a python embedded in a scriptable application, I get the following result:
arg1: called directly
sys.executable: C:\[virtual_environment_path]\Scripts\python.exe
-----
arg1: called via multiprocessing
sys.executable: C:\[application_with_python]\contrib\Python37\python.exe
-----
The two exes:
C:\[virtual_environment_path]\Scripts\python.exe
C:\[application_with_python]\contrib\Python37\python.exe
Traceback (most recent call last):
File ".\test_python_multiprocessing.py", line 67, in <module>
test_exes()
File ".\test_python_multiprocessing.py", line 64, in test_exes
assert exe1 == exe2
AssertionError
Crucially, the child sys.executable does not match the parent sys.executable, but instead matches the base of the parent.
I suspected that the python that ships with the application had been altered, perhaps to have the spawned process point to a hard-coded python path.
I have taken a look at the python standard libraries that ship with the application, and I do not find any discrepancy that explains this difference in behavior.
I tried manually setting the executable to what the default should be before multiprocessing.Process with multiprocessing.set_executable(sys.executable) or multiprocessing.get_context("spawn").set_executable(sys.executable). These do not have an effect.
What are possible explanations for the difference in behavior between a standard python installation, and this python that is embedded within a scriptable application? How can I investigate the cause, and force the correct python executable to be used when spawning child processes?
test_python_multiprocessing.py:
import multiprocessing
def functionality(arg1):
import sys
print("arg1: " + str(arg1))
print("sys.executable: " + str(sys.executable))
print("-----\n")
return sys.executable
def worker(queue, arg1):
import traceback
try:
retval = functionality(arg1)
queue.put({"retval": retval})
except Exception as e:
queue.put({"exception": e, "traceback_str": traceback.format_exc()})
raise
finally:
pass
def spawn_worker(arg1):
queue = multiprocessing.Queue()
p = multiprocessing.Process(target=worker, args=(queue, arg1,))
p.start()
p.join()
err_or_ret = queue.get()
handle_worker_err(err_or_ret)
if p.exitcode != 0:
raise RuntimeError("Subprocess failed with code " + str(p.exitcode) + ", but no exception was thrown.")
return err_or_ret["retval"]
def handle_worker_err(err_or_ret):
if "retval" in err_or_ret:
return None
err = err_or_ret
#import traceback
if (err is not None):
#traceback.print_tb(err["traceback"]) # TODO use e.g. tblib to get traceback
print("The exception was thrown in the child process, reraised in parent process:")
print(err["traceback_str"])
raise err["exception"]
def test_exes():
exe1 = functionality("called directly")
exe2 = spawn_worker("called via multiprocessing")
print("The two exes:")
print(exe1)
print(exe2)
assert exe1 == exe2
if __name__ == "__main__":
test_exes()
[EDIT] the fact that I detected the issue on a python embedded in the scriptable application is a red-herring. Making a virtual environment with a "standard install" Python 3.7.4 base also has the same issue.

long story short, using the "virtual" interpreter causes bugs in multiprocessing and the developers decided to redirect virtualenv environments to the base one.
link to issue 35797
and this is pulled from popen_spawn_win32.py
# bpo-35797: When running in a venv, we bypass the redirect
# executor and launch our base Python.
one solution is to use subprocess instead, and connect to your "pipes" through a socket to a manager instead of using multiprocessing, you can see how to connect to a manager using a socket in BaseManager documentation, python makes it as simple as plugging in its port number.
you can also try pathos as its multiprocessing implementation is "different", (i think its pools use sockets, but i didn't dig in it before and it has other problems from the way it spawns new workers differently, but it can work in a few weird environments where multiprocessing fails.)
Edit: another nice parallelizing alternative that actually uses sockets is Dask, but you have to start the workers separately, not through its built-in pool.

Related

Python 3.7.7 Subinterpreter fails at multiprocessing.Process

Assume the following python-program from the official docs (The Process class):
from multiprocessing import Process
def f(name):
print('hello', name)
if __name__ == '__main__':
p = Process(target=f, args=('bob',))
p.start()
p.join()
Running this code against my 3.7.7 interpreter on my Windows machine works, as expected, without any problems. However, running the same code against a Subinterpreter created in C++ fails with the following error (no Exception actually, the following error just gets printed to the console):
unrecognised option '-c'
I assume that the reason for this error is to be found in spawn.py (within the multiprocessing module, line 89):
...
return [_python_exe] + opts + ['-c', prog, '--multiprocessing-fork']
...
I could create my new process via Popen. This works, but the spawned process should be a child-process, not a completely independent process.
My question:
Why does this error occur? Is there any way to spawn a child process within a Subinterpreter via multiprocessing.Process?
Thank you!
UPDATE 1
As suggested, adding freeze_support fixes the error, but a new one occurs:
unrecognised option '--multiprocessing-fork'

Python Process which is joined will not call atexit

I thought Python Processes call their atexit functions when they terminate. Note that I'm using Python 2.7. Here is a simple example:
from __future__ import print_function
import atexit
from multiprocessing import Process
def test():
atexit.register(lambda: print("atexit function ran"))
process = Process(target=test)
process.start()
process.join()
I'd expect this to print "atexit function ran" but it does not.
Note that this question:
Python process won't call atexit
is similar, but it involves Processes that are terminated with a signal, and the answer involves intercepting that signal. The Processes in this question are exiting gracefully, so (as far as I can tell anyway) that question & answer do not apply (unless these Processes are exiting due to a signal somehow?).

I did some research by looking at how this is implemented in CPython. This is assumes you are running on Unix. If you are running on Windows the following might not be valid as the implementation of processes in multiprocessing differs.
It turns out that os._exit() is always called at the end of the process. That, together with the following note from the documentation for atexit, should explain why your lambda isn't running.
Note: The functions registered via this module are not called when the
program is killed by a signal not handled by Python, when a Python
fatal internal error is detected, or when os._exit() is called.
Here's an excerpt from the Popen class for CPython 2.7, used for forking processes. Note that the last statement of the forked process is a call to os._exit().
# Lib/multiprocessing/forking.py
class Popen(object):
def __init__(self, process_obj):
sys.stdout.flush()
sys.stderr.flush()
self.returncode = None
self.pid = os.fork()
if self.pid == 0:
if 'random' in sys.modules:
import random
random.seed()
code = process_obj._bootstrap()
sys.stdout.flush()
sys.stderr.flush()
os._exit(code)
In Python 3.4, the os._exit() is still there if you are starting a forking process, which is the default. But it seems like you can change it, see Contexts and start methods for more information. I haven't tried it, but perhaps using a start method of spawn would work? Not available for Python 2.7 though.

input() blocks other python processes in Windows 8 (python 3.3)

Working on a multi-threaded cross-platform python3.3 application I came across some weird behavior I was not expecting and am not sure is expected. The issue is on Windows 8 calling the input() method in one thread blocks other threads until it completes. I have tested the below example script on three Linux, two Windows 7 and one Windows 8 computers and this behavior is only observed on the Windows 8 computer. Is this expected behavior for Windows 8?
test.py:
import subprocess, threading, time
def ui():
i = input("-->")
print(i)
def loop():
i = 0
f = 'sky.{}'.format(i)
p = subprocess.Popen(['python', 'copy.py', 'sky1', f])
t = time.time()
while time.time() < t+15:
if p.poll() != None:
print(i)
time.sleep(3)
i+=1
f = 'sky.{}'.format(i)
p = subprocess.Popen(['python', 'copy.py', 'sky1', f])
p.terminate()
p.wait()
def start():
t1 = threading.Thread(target=ui)
t2 = threading.Thread(target=loop)
t1.start()
t2.start()
return t2
t2 = start()
t2.join()
print('done')
copy.py:
import shutil
import sys
src = sys.argv[1]
dst = sys.argv[2]
print('Copying \'{0}\' to \'{1}\''.format(src, dst))
shutil.copy(src, dst)
Update:
While trying out one of the suggestions I realized that I rushed to a conclusion missing something obvious. I apologize for getting off to a false start.
As Schollii suggested just using threads (no subprocess or python files) results in all threads making forward progress so the problem actually is using input() in one python process will cause other python processes to block/not run (I do not know exactly what is going on). Furthermore, it appears to be just python processes that are affected. If I use the same code shown above (with some modifications) to execute non-python executables with subprocess.Popen they will run as expected.
To summarize:
Using subprocess to execute non-python executable: works as expected with and without any calls to input().
Using subprocess to execute python executable: created processes appear to not run if a a call to input() is made in the original process.
Use subprocess to create python processes with a call to input() in a new process and not the original process: A call to input() blocks all python processes spawned by the 'main' process.
Side Note: I do not have Windows 8 platform so debugging/tests can be a little slow.

Because there are several problems with input in Python 3.0-3.2 this method has been impacted with few changes.
It's possible that we have a new bug again.
Can you try the following variant, which is raw_input() "back port" (which was avaiable in Python 2.x):
...
i = eval(input("-->"))
...

It's a very good problem to work with,
since you are dependent with input() method, which, usually needs the console input,
since you have threads, all the threads are trying to communicate with the console,
So, I advice you to use either Producer-Consumer concept or define all your inputs to a text file and pass the text file to the program.

Unable to open a Python subprocess in Web2py (SIGABRT)

I've got an Apache2/web2py server running using the wsgi handler functionality. Within one of the controllers, I am trying to run an external executable to perform some processing on 2 files.
My approach to this is to use the subprocess module to kick off the executable. I have simplified the code to a bare-bones implementation with little success.
from subprocess import *
p = Popen(("echo", "Hello"), shell=False)
ret = p.wait()
print "Process ended with status %s" % ret
When running the above code on its own (create new file and running via python command line), it works exactly as expected.
However, as soon as I place the exact same code into my web2py controller, the external process stops working. Instead of the process returning with code 0 as is expected in the above example, it always returns -6 and "Hello" is not printed to stdout.
After doing some digging, I found that negative results from p.wait() implies that a signal caused the process to end abnormally. And, according to some docs I found, -6 corresponds to the SIGABRT signal.
I would have expected this signal to be a result of some poorly executed code in my child process. However, since this is only running echo (and since it works outside of web2py) I have my doubts that the child process is signalling itself.
Is there some web2py limitation/configuration that causes Popen() requests to always fail? If so, how can I modify my logic so that the controller (or whatever) is actually able to spawn this external process?
** EDIT: Looks like web2py applications may not like the subprocess module. According to a reply to a message reply in the web2py email group:
"You should not use subprocess in a web2py application (if you really need too, look into the admin/controllers/shell.py) but you can use it in a web2py program running from shell (web2py.py -R myprogram.py)."
I will be checking out some options based on the note here and see if any solution presents itself.

In the end, the best I was able to come up with involved setting up a simple XML RPC server and call the functions from that:
my_server.py
#my_server.py
from SimpleXMLRPCServer import SimpleXMLRPCServer, SimpleXMLRPCRequestHandler
from subprocess import *
proc_srvr = xmlrpclib.ServerProxy("http://localhost:12345")
def echo_fn():
p = Popen(("echo", "hello"), shell=False)
ret = p.wait()
print "Process ended with status %s" % ret
return True # RPC Server doesn't like to return None
def main():
server = SimpleXMLRPCServer(("localhost", 12345), ErrorHandler)
server.register_function(echo_fn, "echo_fn")
while True:
server.handle_request()
if __name__ == "__main__":
main()
web2py_controller.py
#web2py_controller.py
def run_echo():
proc_srvr = xmlrpclib.ServerProxy("http://localhost:12345")
proc_srvr.echo_fn()
I'll be honest, I'm not a Python nor SimpleRPCServer guru, so the overall code may not be up to best-practice standards. However, going this route did allow me to, in effect, call a subprocess from a controller in web2py.
(Note, this was a quick and dirty simplification of the code that I have in my project. I have not validated it is in a working state, so it may require some tweaks.)

How to get environment from a subprocess?

I want to call a process via a python program, however, this process need some specific environment variables that are set by another process. How can I get the first process environment variables to pass them to the second?
This is what the program look like:
import subprocess
subprocess.call(['proc1']) # this set env. variables for proc2
subprocess.call(['proc2']) # this must have env. variables set by proc1 to work
but the to process don't share the same environment. Note that these programs aren't mine (the first is big and ugly .bat file and the second a proprietary soft) so I can't modify them (ok, I can extract all that I need from the .bat but it's very combersome).
N.B.: I am using Windows, but I prefer a cross-platform solution (but my problem wouldn't happen on a Unix-like ...)

Here's an example of how you can extract environment variables from a batch or cmd file without creating a wrapper script. Enjoy.
from __future__ import print_function
import sys
import subprocess
import itertools
def validate_pair(ob):
try:
if not (len(ob) == 2):
print("Unexpected result:", ob, file=sys.stderr)
raise ValueError
except:
return False
return True
def consume(iter):
try:
while True: next(iter)
except StopIteration:
pass
def get_environment_from_batch_command(env_cmd, initial=None):
"""
Take a command (either a single command or list of arguments)
and return the environment created after running that command.
Note that if the command must be a batch file or .cmd file, or the
changes to the environment will not be captured.
If initial is supplied, it is used as the initial environment passed
to the child process.
"""
if not isinstance(env_cmd, (list, tuple)):
env_cmd = [env_cmd]
# construct the command that will alter the environment
env_cmd = subprocess.list2cmdline(env_cmd)
# create a tag so we can tell in the output when the proc is done
tag = 'Done running command'
# construct a cmd.exe command to do accomplish this
cmd = 'cmd.exe /s /c "{env_cmd} && echo "{tag}" && set"'.format(**vars())
# launch the process
proc = subprocess.Popen(cmd, stdout=subprocess.PIPE, env=initial)
# parse the output sent to stdout
lines = proc.stdout
# consume whatever output occurs until the tag is reached
consume(itertools.takewhile(lambda l: tag not in l, lines))
# define a way to handle each KEY=VALUE line
handle_line = lambda l: l.rstrip().split('=',1)
# parse key/values into pairs
pairs = map(handle_line, lines)
# make sure the pairs are valid
valid_pairs = filter(validate_pair, pairs)
# construct a dictionary of the pairs
result = dict(valid_pairs)
# let the process finish
proc.communicate()
return result
So to answer your question, you would create a .py file that does the following:
env = get_environment_from_batch_command('proc1')
subprocess.Popen('proc2', env=env)

As you say, processes don't share the environment - so what you literally ask is not possible, not only in Python, but with any programming language.
What you can do is to put the environment variables in a file, or in a pipe, and either
have the parent process read them, and pass them to proc2 before proc2 is created, or
have proc2 read them, and set them locally
The latter would require cooperation from proc2; the former requires that the variables become known before proc2 is started.

Since you're apparently in Windows, you need a Windows answer.
Create a wrapper batch file, eg. "run_program.bat", and run both programs:
#echo off
call proc1.bat
proc2
The script will run and set its environment variables. Both scripts run in the same interpreter (cmd.exe instance), so the variables prog1.bat sets will be set when prog2 is executed.
Not terribly pretty, but it'll work.
(Unix people, you can do the same thing in a bash script: "source file.sh".)

You can use Process in psutil to get the environment variables for that Process.
If you want to implement it yourself, you can refer to the internal implementation of psutil. It adapts to some operating system.
Currently supported operating systems are:
AIX
FreeBSD, OpenBSD, NetBSD
Linux
macOS
Sun Solaris
Windows
Eg: In Linux platform, you can find one pid 7877 environment variables in file /proc/7877/environ, just open with rt mode to read it.
Of course the best way to do this is to:
import os
from typing import Dict
from psutil import Process
process = Process(pid=os.getpid())
process_env: Dict = process.environ()
print(process_env)
You can find other platform implementation in source code
Hope I can help you.

The Python standard module multiprocessing have a Queues system that allow you to pass pickle-able object to be passed through processes. Also processes can exchange messages (a pickled object) using os.pipe. Remember that resources (e.g : database connection) and handle (e.g : file handles) can't be pickled.
You may find this link interesting :
Communication between processes with multiprocessing
Also the PyMOTw about multiprocessing worth mentioning :
multiprocessing Basics
sorry for my spelling

Two things spring to mind: (1) make the processes share the same environment, by combining them somehow into the same process, or (2) have the first process produce output that contains the relevant environment variables, that way Python can read it and construct the environment for the second process. I think (though I'm not 100% sure) that there isn't any way to get the environment from a subprocess as you're hoping to do.

Environment is inherited from the parent process. Set the environment you need in the main script, not a subprocess (child).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python `multiprocessing` spawned process using base Python, not virtualenv Python - python

Related

Python 3.7.7 Subinterpreter fails at multiprocessing.Process

Python Process which is joined will not call atexit

input() blocks other python processes in Windows 8 (python 3.3)

Unable to open a Python subprocess in Web2py (SIGABRT)

How to get environment from a subprocess?

Categories

Resources