multiprocessing.freeze_support() - python

Why does the multiprocessing module need to call a specific function to work when being "frozen" to produce a windows executable?

The reason is lack of fork() on Windows (which is not entirely true). Because of this, on Windows the fork is simulated by creating a new process in which code, which on Linux is being run in child process, is being run. As the code is to be run in technically unrelated process, it has to be delivered there before it can be run. The way it's being delivered is first it's being pickled and then sent through the pipe from the original process to the new one. In addition this new process is being informed it has to run the code passed by pipe, by passing --multiprocessing-fork command line argument to it. If you take a look at implementation of freeze_support() function its task is to check if the process it's being run in is supposed to run code passed by pipe or not.

Related

Python: is it ok to call subprocess.Popen in a thread?

Note this question is not the same as Python Subprocess.Popen from a thread, because that question didn't seek an explanation on why it is ok.
If I understand correctly, subprocess.Popen() creates a new process by forking the current process and execv new program.
However, if the current process is multithreaded, and we call subprocess.Popen() in one of the thread, won't it be duplicating all the threads in the current process (because it calls syscall fork())? If it's the case, though these duplicated threads will be wiped out after syscall execv, there's a time gap in which the duplicated threads can do a bunch of nasty stuff.
A case in point is gtest_parallel.py, where the program creates a bunch of threads in execute_tasks(), and in each thread task_manager.run_task(task) will call task.run(), which calls subprocess.Popen() to run a task. Is it ok?
The question applies to other fork-in-thread programs, not just Python.
Forking only results in the calling thread being active in the fork, not all threads.. Most of the pitfalls related to forking in a multi-threaded program are related to mutexes being held by other threads that will never be released in the fork. When you're using Popen, you're going to launch some unrelated process once you execv, so that's not really a concern. There is a warning in the Popen docs about being careful with multiple threads and the preexec_fn parameter, which runs before the execv call happens:
Warning The preexec_fn parameter is not safe to use in the presence of
threads in your application. The child process could deadlock before
exec is called. If you must use it, keep it trivial! Minimize the
number of libraries you call into.
I'm not aware of any other pitfalls to watch out for with Popen, at least in recent versions of Python. Python 2.7's subprocess module does seem to have flaws that can cause issues with multi-threaded applications, however.

Using subprocess module to work in parallel (Multi-process)

New to multiprocessing in python, consider that you have the following function:
def do_something_parallel(self):
result_operation1 = doit.main(A,B)
do_something_else(C)
Now the point is that I want the doit.main to run in another process and to be non blocking, so the code in do_something_else will run immediately after the first has been launched in another process.
How can I do it using python subprocess module?
Is there a difference between subprocessing and creating new process aside to another one, why would we need a child processes of other process?
Note: I do not want to use multithreaded approach here..
EDIT: I wondered whether using a subprocess module and multiprocess module in the same function is prohibited?
Reason I want this is that I have two things to run: first an exe file, and second a function, each needs it own process.
If you want to run a Python code in a separate process, you could use multiprocessing module:
import multiprocessing
if __name__ == "__main__":
multiprocessing.Process(target=doit.main, args=[A, B]).start()
do_something_else() # this runs immmediately without waiting for main() to return
I wondered whether using a subprocess module and multiprocess module in the same function is prohibited?
No. You can use both subprocess and multiprocessing in the same function (moreover, multiprocessing may use subprocess to start its worker processes internally).
Reason I want this is that I have two things to run: first an exe file, and second a function, each needs it own process.
You don't need multprocessing to run an external command without blocking (obviously, in its own process); subprocess.Popen() is enough:
import subprocess
p = subprocess.Popen(['command', 'arg 1', 'arg 2'])
do_something_else() # this runs immediately without waiting for command to exit
p.wait() # this waits for the command to finish
Subprocess.Popen is definitely what you want if the "worker" process is an executable. Threading is what you need when you need things to happen asynchronously, and multiprocessing is what you need if you want to take advantage of multiple cores for the improved performance (although you will likely find yourself also using threads at the same time as they handle asynchronous output of multiple parallel processes).
The main limitation of multiprocessing is passing information. When a new process is spawned, an entire separate instance of the python interpreter is started with it's own independent memory allocation. The result of this is variables changed by one process won't be changed for other processes. For this functionality you need shared memory objects (also provided by multiprocessing module). One implementation I have done was a parent process that started several worker processes and passed them both an input queue, and an output queue. The function given to the child processes was a loop designed to do some calculations on the inputs pulled from the input queue and then spit them out to the output queue. I then designated a special input that the child would recognize to end the loop and terminate the process.
On your edit - Popen will start the other process in parallel, as will multiprocessing. If you need the child process to communicate with the executable, be sure to pass the file stream handles to the child process somehow.

Use python subprocess module like a command line simulator

I am writing a test framework in Python for a command line application. The application will create directories, call other shell scripts in the current directory and will output on the Stdout.
I am trying to treat {Python-SubProcess, CommandLine} combo as equivalent to {Selenium, Browser}. The first component plays something on the second and checks if the output is expected. I am facing the following problems
The Popen construct takes a command and returns back after that command is completed. What I want is a live handle to the process so I can run further commands + verifications and finally close the shell once done
I am okay with writing some infrastructure code for achieveing this since we have a lot of command line applications that need testing like this.
Here is a sample code that I am running
p = subprocess.Popen("/bin/bash", cwd = test_dir)
p.communicate(input = "hostname") --> I expect the hostname to be printed out
p.communicate(input = "time") --> I expect current time to be printed out
but the process hangs or may be I am doing something wrong. Also how do I "grab" the output of that sub process so I can assert that something exists?
subprocess.Popen allows you to continue execution after starting a process. The Popen objects expose wait(), poll() and many other methods to communicate with a child process when it is running. Isn't it what you need?
See Popen constructor and Popen objects description for details.
Here is a small example that runs Bash on Unix systems and executes a command:
from subprocess import Popen, PIPE
p = Popen (['/bin/sh'], stdout=PIPE, stderr=PIPE, stdin=PIPE)
sout, serr = p.communicate('ls\n')
print 'OUT:'
print sout
print 'ERR:'
print serr
UPD: communicate() waits for process termination. If you do not need that, you may use the appropriate pipes directly, though that usually gives you rather ugly code.
UPD2: You updated the question. Yes, you cannot call communicate twice for a single process. You may either give all commands you need to execute in a single call to communicate and check the whole output, or work with pipes (Popen.stdin, Popen.stdout, Popen.stderr). If possible, I strongly recommend the first solution (using communicate).
Otherwise you will have to put a command to input and wait for some time for desired output. What you need is non-blocking read to avoid hanging when there is nothing to read. Here is a recipe how to emulate a non-blocking mode on pipes using threads. The code is ugly and strangely complicated for such a trivial purpose, but that's how it's done.
Another option could be using p.stdout.fileno() for select.select() call, but that won't work on Windows (on Windows select operates only on objects originating from WinSock). You may consider it if you are not on Windows.
Instead of using plain subprocess you might find Python sh library very useful:
http://amoffat.github.com/sh/
Here is an example how to build in an asynchronous interaction loop with sh:
http://amoffat.github.com/sh/tutorials/2-interacting_with_processes.html
Another (old) library for solving this problem is pexpect:
http://www.noah.org/wiki/pexpect

Replace current process with invocation of subprocess?

In python, is there a way to invoke a new process in, hand it the same context, such as standard IO streams, close the current process, and give control to the invoked process? This would effectively 'replace' the process.
I have a program whose behavior I want to repeat. However, it uses a third-party library, and it seems that the only way that I can truly kill threads invoked by that library is to exit() my python process.
Plus, it seems like it could help manage memory.
You may be interested in os.execv() and friends:
These functions all execute a new program, replacing the current
process; they do not return. On Unix, the new executable is loaded
into the current process, and will have the same process id as the
caller. Errors will be reported as OSError exceptions.

How to find out if a program crashed with subprocess?

My application creates subprocesses. Usually, these processeses run and terminate without any problems. However, sometimes, they crash.
I am currently using the python subprocess module to create these subprocesses. I check if a subprocess crashed by invoking the Popen.poll() method. Unfortunately, since my debugger is activated at the time of a crash, polling doesn't return the expected output.
I'd like to be able to see the debugging window(not terminate it) and still be able to detect if a process is crashed in the python code.
Is there a way to do this?
When your debugger opens, the process isn't finished yet - and subprocess only knows if a process is running or finished. So no, there is not a way to do this via subprocess.
I found a workaround for this problem. I used the solution given in another question Can the "Application Error" dialog box be disabled?
Items of consideration:
subprocess.check_output() for your child processes return codes
psutil for process & child analysis (and much more)
threading library, to monitor these child states in your script as well once you've decided how you want to handle the crashing, if desired
import psutil
myprocess = psutil.Process(process_id) # you can find your process id in various ways of your choosing
for child in myprocess.children():
print("Status of child process is: {0}".format(child.status()))
You can also use the threading library to load your subprocess into a separate thread, and then perform the above psutil analyses concurrently with your other process.
If you find more, let me know, it's no coincidence I've found this post.

Categories

Resources