I am playing around with a library for my beginner students, and I'm using the multiprocessing module in Python. I ran into this problem: importing and using a module that uses multiprocessing without causing infinite loop on Windows
As an example, suppose I have a module mylibrary.py:
# mylibrary.py
from multiprocessing import Process
class MyProcess(Process):
def run(self):
print "Hello from the new process"
def foo():
p = MyProcess()
p.start()
And a main program that calls this library:
# main.py
import mylibrary
mylibrary.foo()
If I run main.py on Windows, it tries to import main.py into the new process, meaning the code is executed again which results in an infinite loop of process generation. I can fix it like so:
import mylibrary
if __name__ == "__main__":
mylibrary.foo()
But, this is pretty confusing for beginners, and moreover it seems like it shouldn't be necessary. The new process is being created in mylibrary, so why doesn't the new process just import mylibrary? Is there a way to work around this issue without having to change main.py?
I am using Python 2.7, by the way.
Windows doesn't have fork, so there's no way to make a new process just like the existing one. So the child process has to run your code again, but now you need a way to distinguish between the parent process and the child process, and __main__ is it.
This is covered in the docs here: http://docs.python.org/2/library/multiprocessing.html#windows
I don't know of another way to structure the code to avoid the fork bomb effect.
Related
import subprocess
subprocess.Popen('python', 'second_script.py')
Does this open the second script and makes them run concurrently? Also will it close the second script if I stop the main one? If not, how can I do that?
You should do something like this
#! /usr/bin/env python3
import subprocess
pid = subprocess.Popen(['second-script']).pid
print(f'pid={pid}')
...
# do whatever you have to do here
I don't clearly understand what you need.
If you want to use some functions that are coded in 'pythonScriptA.py' in another script 'pythonScriptB.py', you can import the first script:
from pythonScriptA import *
# Use the functions in the ScriptB
If you want two script to run concurrently/in parallel, you should look into the multiprocessing library, or in any other library that allow python code to be run on multiple threads.
--
Edit: correction of the script
I'm pretty new to Python, this question probably shows that. I'm working on multiprocessing part of my script, couldn't find a definitive answer to my problem.
I'm struggling with one thing. When using multiprocessing, part of the code has to be guarded with if __name__ == "__main__". I get that, my pool is working great. But I would love to import that whole script (making it a one big function that returns an argument would be the best). And here is the problem. First, how can I import something if part of it will only run when launched from the main/source file because of that guard? Secondly, if I manage to work it out and the whole script will be in one big function, pickle can't handle that, will use of "multiprocessing on dill" or "pathos" fix it?
Thanks!
You are probably confused with the concept. The if __name__ == "__main__" guard in Python exists exactly in order for it to be possible for all Python files to be importable.
Without the guard, a file, once imported, would have the same behavior as if it were the "root" program - and it would require a lot of boyler plate and inter-process comunication (like writting a "PID" file at a fixed filesystem location) to coordinate imports of the same code, including for multiprocessing.
Just leave under the guard whatever code needs to run for the root process. Everything else you move into functions that you can call from the importing code.
If you'd run "all" the script, even the part setting up the multiprocessing workers would run, and any simple job would create more workers exponentially until all machine resources were taken (i.e.: it would crash hard and fast, potentially taking the machine to an unresponsive state).
So, this is a good pattern - th "dothejob" function can call all
other functions you need, so you just need to import and call it,
either from a master process, or from any other project importing
your file as a Python module.
import multiprocessing
...
def dothejob():
...
def start():
# code to setup and start multiprocessing workers:
# like:
worker1 = multiprocessing.Process(target=dothejob)
...
worker1.start()
...
worker1.join()
if __name__ == "__main__":
start()
Pretty much what the headline says. I want my Python script to exit and somehow schedule from within that script a new script for immediately after the first has finished.
Sounds simple enough but I can't seem to find a way -- let alone a clean way -- of doing that.
You are able to execute functions right before the interpreter exists with the 'atexit' module. You are even able to pass args and kwargs to the function.
https://docs.python.org/3/library/atexit.html
import atexit
import os
def spawn_new():
os.system("python3 -c 'print(\"Hello!\")'")
print("Main intepreter.")
atexit.register(spawn_new)
I have a script task.py that I am trying to invoke. It seems there are two ways to do that. One is to use the subprocess API while the other is to use Python's import mechanism.
task.py
def call_task():
print("task in progress...")
return "something"
print("calling task..")
out = call_task()
print("output of the executed task::", out)
Now, we have two approaches to invoke the above task.py python script.
Approach 1
import task as task
print("invoke call-task")
out = task.call_task()
print("output::", out)
Approach 2
import subprocess, shlex, PIPE
proc = subprocess.Popen(shlex.split("python task.py"), stdout = PIPE)
out = proc.communicate()
print("output::", out)
Although both approaches work, which approach is more pythonic?
Running a separate Python process from Python is frequently an antipattern. There are situations where you specifically want two Python instances (for example, if the module you want to use requires its own signal handling etc) but in the absence of factors which force the other choice, import is generally vastly preferrable in terms of usability (you get to call the functions inside the package in an order different from its main flow, and have more fine-grained control over the internals) and performance (starting a separate process is almost always a bad idea if you can avoid it).
While "The subprocess module allows you to spawn new processes" which executes the code within your task.py,
importing will result in the original process executing your code.
Other than that, it should be identical.
You can read more about it in the Python Subprocess Docs
As i've seen, its rather unusual to execute python code using an extra subprocess.
It may benefit performance wise but the more pythonic way would be importing i guess.
I have a wxPython GUI, and would like to use multiprocessing to create a separate process which uses PyAudio. That is, I want to use PyAudio, wxPython, and the multiprocessing module, but although I can use any two of these, I can't use all three together. Specifically, if from one file I import wx, and create a multiprocessing.Process which opens PyAudio, PyAudio won't open. Here's an example:
file: A.py
import wx
import time
use_multiprocessing = True
if use_multiprocessing:
from multiprocessing import Process as X
else:
from threading import Thread as X
import B
if __name__=="__main__":
p = X(target=B.worker)
p.start()
time.sleep(5.)
p.join()
file: B.py
import pyaudio
def worker():
print "11"
feed = pyaudio.PyAudio()
print "22"
feed.terminate()
In all my tests I see 11 print, but the problem is that I don't see 22 for the program as shown.
If I only comment out import wx I see 22 and pyaudio loads
If I only set use_multiprocessing=False so I use threading instead, I see 22 and pyaudio loads.
If I do something else in worker, it will run (only pyaudio doesn't run)
I've tried this with Python 2.6 and 2.7; PyAudio 0.2.4, 0.2.7, and 0.2.8; and wx 3.0.0.0 and 2.8.12.1; and I'm using OSX 10.9.4
There are two reasons this can happen, but they look pretty much the same.
Either way, the root problem is that multiprocessing is just forking a child. This could be either causing CoreFoundation to get confused about its runloop*, or causing some internal objects inside wx to get confused about its threads.**
But you don't care why your child process is deadlocking; you want to know how to fix it.
The simple solution is to, instead of trying to fork and then clean up all the stuff that shouldn't be copied, spawn a brand-new Python process and then copy over all the stuff that should.
As of Python 3.4, there are actually two variations on this. See Contexts and start methods for details, and issue #8713 for the background.
But you're on 2.6, so that doesn't help you. So, what can you do?
The easiest answer is to switch from multiprocessing to the third-party library billiard. billiard is a fork of Python 2.7's multiprocessing, which adds many of the features and bug fixes from both Python 3.x and Celery.
I believe new versions have the exact same fix as Python 3.4, but I'm not positive (sorry, I don't have it installed, and can't find the docs online…).
But I'm sure that it has a similar but different solution, inherited from Celery: call billiards.forking_enable(False) before calling anything else on the library. (Or, from outside the program, set the environment variable MULTIPROCESSING_FORKING_DISABLE=1.)
* Usually, CF can detect the problem and call __THE_PROCESS_HAS_FORKED_AND_YOU_CANNOT_USE_THIS_COREFOUNDATION_FUNCTIONALITY___YOU_MUST_EXEC__, which logs an error message and fails. But sometimes it can't, and will wait end up waiting forever for an event that nobody can send. Google that string for more information.
** See #5527 for details on the equivalent issue with threaded Tkinter, and the underlying problem. This one affects all BSD-like *nixes, not just OS X.
If you can't solve the problem by fixing or working around multiprocessing, there's another option. If you can spin off the child process before you create your main runloop or create any threads, you can prevent the child process from getting confused. This doesn't always work, but it often does, so it may be worth trying.
That's easy to do with Tkinter or PySide or another library that doesn't actually do anything until you call a function like mainloop or construct an App instance.
But with wx, I think it does some of the setup before you even touch anything beyond the import. So, you may have to do something a little hacky and move the import wx after the p.start().
In your real app, you probably aren't going to want to start doing audio until some trigger from the GUI. This means you'll need to create some kind of sync object, like an Event. So, you create the Event, then start the child process. The child initializes the audio, and then waits on the Event. And then, where you'd like to launch the child from the GUI, you instead just signal the Event.