Python - Multiprocessing and external program

Python - Multiprocessing and external program - python

I am looking for a way to do multiprocessing (specifically use Process and Queue) but with an external program instead of a Python class/module. Is this possible? I don't want to use the subprocess module because it doesn't provide good message passing like multiprocessing does or at least I wasn't able to use it the same way as multiprocessing -- Running a process, sending a task every now and then and reading the results while keeping the process alive.
Any hint is greatly appreciated.

Related

Share a queue between processes and threads in python

I'm writing a program (python 2.7) that uses both multiprocessing and multithreading,the multiprocessing is done using Celery library.
I have a function that have to be parallelized using multithreading,so i've implemented a shared "input" queue that stores the arguments for the thread pool (using the python multiprocessing.Manger Queue) and also for each process Ive made "response" queue, the threads stores the computation result in a specific "response queue" according the source process from which the job came from. The problem is that storing the results in the response queues causing memory leak, even though that the results are almost immediately pop out of the queue the memory used by the interpreter keeps rising (used the memory-profiler library to discover) so i suspect that this open issue might be the cause:
https://bugs.python.org/issue33081
My question is what alternatives I could use to replace those python multiprocessing.Manager queues, Ive considered using Pathos multiprocess.Manager queues and pipes (pipes suit me well because there is one publisher and one consumer) but are there any other options i could try without refactoring the code ?
Thank you !

Python Multiprocessing vs Eventlet

Based on my understanding, threads cannot be executed in parallel(executed based on availability and random) and thats the reason Eventlet are being used.
If Eventlets are more for parallelism why can't we just use multiprocessing module of Python.
I thought of executing multi process modules and use the join method() to check if all the process are complete.
Can someone explain if my understanding is correct?

Based on my understanding, threads cannot be executed in parallel (executed based on availability and random)
Correct
and thats the reason Eventlet are being used.
Not so correct. The Eventlet library is used to simplify non-blocking IO programming. It does not actually add parallelism. Thread execution is still limited to one thread at a time due to the GIL. But it is used because it greatly simplifies the process of launching, scheduling, and managing IO-bound threads, particularly ones that do not need to interact with each other.
If Eventlets are more for parallelism
As I just mentioned, this is not what they exist for.
why can't we just use multiprocessing module of Python. I thought of executing multi process modules and use the join method() to check if all the process are complete.
You certainly can! And you will get actual parallel execution with this approach. But you may not get the same speedup. The multiprocessing library is better suited for CPU-bound parallel tasks, because those are the ones that need more frequent access to the interpreter. You may actually see an increase in execution time when using multiprocessing with IO-bound tasks because of the overhead of multiple process execution and management.
As is the case with most optimization and execution time questions, trying both and profiling is the surefire way to guarantee you're using the "best" option for your application. Though you may find that if you write the code to utilize Eventlets first, then try to modify it to use regular threads or multiprocessing, you'll have to write more boilerplate code just to manage the threads or processes, and the value of Eventlets should become more obvious.

Using subprocess module to work in parallel (Multi-process)

New to multiprocessing in python, consider that you have the following function:
def do_something_parallel(self):
result_operation1 = doit.main(A,B)
do_something_else(C)
Now the point is that I want the doit.main to run in another process and to be non blocking, so the code in do_something_else will run immediately after the first has been launched in another process.
How can I do it using python subprocess module?
Is there a difference between subprocessing and creating new process aside to another one, why would we need a child processes of other process?
Note: I do not want to use multithreaded approach here..
EDIT: I wondered whether using a subprocess module and multiprocess module in the same function is prohibited?
Reason I want this is that I have two things to run: first an exe file, and second a function, each needs it own process.

If you want to run a Python code in a separate process, you could use multiprocessing module:
import multiprocessing
if __name__ == "__main__":
multiprocessing.Process(target=doit.main, args=[A, B]).start()
do_something_else() # this runs immmediately without waiting for main() to return
I wondered whether using a subprocess module and multiprocess module in the same function is prohibited?
No. You can use both subprocess and multiprocessing in the same function (moreover, multiprocessing may use subprocess to start its worker processes internally).
Reason I want this is that I have two things to run: first an exe file, and second a function, each needs it own process.
You don't need multprocessing to run an external command without blocking (obviously, in its own process); subprocess.Popen() is enough:
import subprocess
p = subprocess.Popen(['command', 'arg 1', 'arg 2'])
do_something_else() # this runs immediately without waiting for command to exit
p.wait() # this waits for the command to finish

Subprocess.Popen is definitely what you want if the "worker" process is an executable. Threading is what you need when you need things to happen asynchronously, and multiprocessing is what you need if you want to take advantage of multiple cores for the improved performance (although you will likely find yourself also using threads at the same time as they handle asynchronous output of multiple parallel processes).
The main limitation of multiprocessing is passing information. When a new process is spawned, an entire separate instance of the python interpreter is started with it's own independent memory allocation. The result of this is variables changed by one process won't be changed for other processes. For this functionality you need shared memory objects (also provided by multiprocessing module). One implementation I have done was a parent process that started several worker processes and passed them both an input queue, and an output queue. The function given to the child processes was a loop designed to do some calculations on the inputs pulled from the input queue and then spit them out to the output queue. I then designated a special input that the child would recognize to end the loop and terminate the process.
On your edit - Popen will start the other process in parallel, as will multiprocessing. If you need the child process to communicate with the executable, be sure to pass the file stream handles to the child process somehow.

Issues with Pool and multiprocessing in Python 3

Currently I'm trying to convert my little python script to support multiple threads/cores. I've been reading about the multiprocessing module for several days now and I've also been trying to get it to suit my needs for some time, still I don't have a clue why it won't work.
This is the working code, and this is my approach on implementing the pool workers. As there are no locks in place and I didn't want to make it too complicated at first I already disabled the logging to file.
Still it doesn't work. It doesn't even output any kind of error message. After running it it just displays the welcome message and then it just keeps running, but without outputting any of the desired output, which would be 2 lines per converted file (before + after converting).

all your workers do is wait for started subprocesses to finish. they don't have any real work to do as that is performed by the external subprocesses, so they will be idle all the time.
using multiprocessing for what you do really is overkill, it's much more appropriate to use threads for that.
if you want to learn how to do multiprocessing, try something which involves inter-process communication, synchronisation, pipes, ...
but to also address your question:
hava a look at what arguments subprocess.call takes. you call it with a single space-separated command string. if you want that to work you have to pass shell=True, otherwise the whole string is interpreted as the executable's name.
the preferred way to call a program using subprocess is is to specify program and arguments as a list:
subprocess.Popen(['/path/to/program', 'arg1', 'arg2'], *otherarguments)

How to find out if a program crashed with subprocess?

My application creates subprocesses. Usually, these processeses run and terminate without any problems. However, sometimes, they crash.
I am currently using the python subprocess module to create these subprocesses. I check if a subprocess crashed by invoking the Popen.poll() method. Unfortunately, since my debugger is activated at the time of a crash, polling doesn't return the expected output.
I'd like to be able to see the debugging window(not terminate it) and still be able to detect if a process is crashed in the python code.
Is there a way to do this?

When your debugger opens, the process isn't finished yet - and subprocess only knows if a process is running or finished. So no, there is not a way to do this via subprocess.

I found a workaround for this problem. I used the solution given in another question Can the "Application Error" dialog box be disabled?

Items of consideration:
subprocess.check_output() for your child processes return codes
psutil for process & child analysis (and much more)
threading library, to monitor these child states in your script as well once you've decided how you want to handle the crashing, if desired
import psutil
myprocess = psutil.Process(process_id) # you can find your process id in various ways of your choosing
for child in myprocess.children():
print("Status of child process is: {0}".format(child.status()))
You can also use the threading library to load your subprocess into a separate thread, and then perform the above psutil analyses concurrently with your other process.
If you find more, let me know, it's no coincidence I've found this post.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.