Python doctest hangs using ProcessPoolExecutor - python

This code runs fine under regular CPython 3.5:
import concurrent.futures
def job(text):
print(text)
with concurrent.futures.ProcessPoolExecutor(1) as pool:
pool.submit(job, "hello")
But if you run it as python -m doctest myfile.py, it hangs. Changing submit(job to submit(print makes it not hang, as does using ThreadPoolExecutor instead of ProcessPoolExecutor.
Why does it hang when run under doctest?

So I think the issue is because of your with statement. When you have below
with concurrent.futures.ProcessPoolExecutor(1) as pool:
pool.submit(job, "hello")
It enforces the thread to be executed and closed then an there itself. When you run this as main process it works and gives time for thread to execute the job. But when you import it as a module then it doesn't give the background thread a chance and the shutdown on the pool waits for the work to be executed and hence a deadlock
So the workaround that you can use is below
import concurrent.futures
def job(text):
print(text)
pool = concurrent.futures.ProcessPoolExecutor(1)
pool.submit(job, "hello")
if __name__ == "__main__":
pool.shutdown(True)
This will prevent the deadlock and will let you run doctest as well as import the module if you want

The problem is that importing a module acquires a lock (which lock depends on your python version), see the docs for imp.lock_held.
Locks are shared over multiprocessing so your deadlock occurs because your main process, while it is importing your module, loads and waits for a subprocess which attempts to import your module, but can't acquire the lock to import it because it is currently being imported by your main process.
In step form:
Main process acquires lock to import myfile.py
Main process starts importing myfile.py (it has to import myfile.py because that is where your job() function is defined, which is why it didn't deadlock for print()).
Main process starts and blocks on subprocess.
Subprocess tries to acquire lock to import myfile.py
=> Deadlock.

doctest imports your module in order to process it. Try adding this to prevent execution on import:
if __name__ == "__main__":
with concurrent.futures.ProcessPoolExecutor(1) as pool:
pool.submit(job, "hello")

This should actually be a comment, but it's too long to be one.
Your code fails if it's imported as a module too, with the same error as doctest. I get _pickle.PicklingError: Can't pickle <function job at 0x7f28cb0d2378>: import of module 'a' failed (I named the file as a.py).
Your lack of if __name__ == "__main__": violates the programming guidelines for multiprocessing:
https://docs.python.org/3.6/library/multiprocessing.html#the-spawn-and-forkserver-start-methods
I guess that the child processes will also try to import the module, which then tries to start another child process (because the pool unconditionally executes). But I'm not 100% sure about this.
I'm also not sure why the error you get is can't pickle <function>.
The issue here seems to be that you want the module to auto start a process on import. I'm not sure if this is possible.

Related

Threading using less RAM than Popen, why?

I have a question, So I am testing RAM usage of my script and it is as easy as:
a script that is a start up, that script opens up 4 python script in a while loop and loops forever.
I tested 2 things. One with just calling Popen for each script and one using threading and I found out that there is a huge difference between each other...
The script with threading:
And a script that uses Popen to open up the scripts:
Since they both do the exact same thing, why does the Python threading uses so much less RAM than the other that opens up the script? What are the advantage vs. disadvantage of using the threading vs. Popen?
test2:
import time
def testing():
while True:
test = "Helllo world"
print(test)
time.sleep(1)
if __name__ == '__main__':
testing()
threading.py:
import threading
from test2 import testing
threading.Thread(target=testing()).start()
threading.Thread(target=testing()).start()
threading.Thread(target=testing()).start()
Popen:
from subprocess import Popen
for _ in range(4):
Popen(f"py test2.py", shell=True).communicate()
The popen() version creates 3 whole new processes, each of which runs its own, distinct Python interpreter.
The threading version run 3 threads within the current process, sharing its memory space and the single, original Python interpreter.

Running function as a thread in background python and exit before its application

I'm executing a function as a thread in python. Now, the program will wait for the function to execute and then terminate after its completion.
My target is to starting the background thread and closing the program calling it.
how can we do it. As in below code, the thread will take 30 min to execute. I want to stop the main program after calling the thread and let the thread run in background.
thread = threading.Thread(target=function_that_runs_for_30_min)
thread.start()
print "Thread Started"
quit()
You cannot do that directly. A thread is just a part of a process. Once the process exits, all the threads are gone. You need to create a background process to achieve that.
You cannot use the multiprocessing module either because it is a package that supports spawning processes using an API similar to the threading module (emphasize mine). As such it has no provision to allow a process to run after the end of the calling one.
The only way I can imagine is to use the subprocess module to restart the script with a specific parameter. For a simple use case, adding a parameter is enough, for more complex command line parameters, the module argparse should be used. Example of code:
import subprocess
import sys
# only to wait some time...
import time
def f(name):
"Function that could run in background for a long time (30')"
time.sleep(5)
print 'hello', name
if __name__ == '__main__':
if (len(sys.argv) > 1) and (sys.argv[1] == 'SUB'):
# Should be an internal execution: start the lengthy function
f('bar')
else:
# normal execution: start a subprocess with same script to launch the function
p = subprocess.Popen("%s %s SUB" % (sys.executable, sys.argv[0]))
# other processing...
print 'END of normal process'
Execution:
C:\>python foo.py
END of normal process
C:\>
and five seconds later:
hello bar

Why does the billiard multiprocessing module require the "if __name__=='__main__'" line?

If I have the following code:
def f():
print 'ok!'
import sys
sys.exit()
if __name__=='__main__':
import billiard
billiard.forking_enable(0)
p = billiard.Process( target=f)
p.start()
while p.is_alive():
pass
The script behaves as expected, printing "ok!" and ending. But if I omit the if __name__=='__main__': line and de-indent the following lines, my machine (OS X) goes crazy, continually spawning tons of Python processes until I killall Python. Any idea what's going on here?
(To those marking this as a duplicate, note that while the other question asks the purpose of if __name__=='__main__' generally, I'm specifically asking why failure to use it here causes dramatically unexpected behaviour)
You're disabling fork support with the line:
billiard.forking_enable(0)
That means that the library will need to spawn (instead of fork) your child process, and have it re-import the __main__ module to run f, just like Windows does. Without the if __name__ ... guard, re-importing the __main__ module in the children will also mean re-running your code that creates the billiard.Process, which creates an infinite loop.
If you leave fork enabled, the re-import in the child process isn't necessary, so everything works fine with or without the if __name__ ... guard.

Python Process which is joined will not call atexit

I thought Python Processes call their atexit functions when they terminate. Note that I'm using Python 2.7. Here is a simple example:
from __future__ import print_function
import atexit
from multiprocessing import Process
def test():
atexit.register(lambda: print("atexit function ran"))
process = Process(target=test)
process.start()
process.join()
I'd expect this to print "atexit function ran" but it does not.
Note that this question:
Python process won't call atexit
is similar, but it involves Processes that are terminated with a signal, and the answer involves intercepting that signal. The Processes in this question are exiting gracefully, so (as far as I can tell anyway) that question & answer do not apply (unless these Processes are exiting due to a signal somehow?).
I did some research by looking at how this is implemented in CPython. This is assumes you are running on Unix. If you are running on Windows the following might not be valid as the implementation of processes in multiprocessing differs.
It turns out that os._exit() is always called at the end of the process. That, together with the following note from the documentation for atexit, should explain why your lambda isn't running.
Note: The functions registered via this module are not called when the
program is killed by a signal not handled by Python, when a Python
fatal internal error is detected, or when os._exit() is called.
Here's an excerpt from the Popen class for CPython 2.7, used for forking processes. Note that the last statement of the forked process is a call to os._exit().
# Lib/multiprocessing/forking.py
class Popen(object):
def __init__(self, process_obj):
sys.stdout.flush()
sys.stderr.flush()
self.returncode = None
self.pid = os.fork()
if self.pid == 0:
if 'random' in sys.modules:
import random
random.seed()
code = process_obj._bootstrap()
sys.stdout.flush()
sys.stderr.flush()
os._exit(code)
In Python 3.4, the os._exit() is still there if you are starting a forking process, which is the default. But it seems like you can change it, see Contexts and start methods for more information. I haven't tried it, but perhaps using a start method of spawn would work? Not available for Python 2.7 though.

Python subprocess.call thread hang, subprocess.popen no hang

I am trying to automate the installation of a specific program using Sikuli and scripts on Windows 7. I needed to start the program installer and then used Siluki to step through the rest of the installation. I did this using Python 2.7
This code works as expected by creating a thread, calling the subprocess, and then continuing the main process:
import subprocess
from threading import Thread
class Installer(Thread):
def __init__(self):
Thread.__init__(self)
def run(self):
subprocess.Popen(["msiexec", "/i", "c:\path\to\installer.msi"], shell=True)
i = Installer()
i.run()
print "Will show up while installer is running."
print "Other things happen"
i.join()
This code does not operate as desired. It will start the installer but then hang:
import subprocess
from threading import Thread
class Installer(Thread):
def __init__(self):
Thread.__init__(self)
def run(self):
subprocess.call("msiexec /i c:\path\to\installer.msi")
i = Installer()
i.run()
print "Will not show up while installer is running."
print "Other things happen"
i.join()
I understand that subprocess.call will wait for the process to terminate. Why does that prevent the main thread from continuing on? Should the main continue execution immediately after the process call?
Why is there such a difference in behaviors?
I have only just recently started using threads C.
You're calling i.run(), but what you should be calling is i.start(). start() invokes run() in a separate thread, but calling run() directly will execute it in the main thread.
First.
you need to add the command line parameters to your install command to make it a silent install..
http://msdn.microsoft.com/en-us/library/aa372024%28v=vs.85%29.aspx
the subprocess is probably hung waiting for an install process that will never end because it is waiting for user input.
Second.
if that doesn't work.. you should be using popen and communicate
How to use subprocess popen Python
Third.
if that still didn't work, your installer is hanging some where and you should debug the underlying process there.

Categories

Resources