Cannot use ThreadPoolExecutor within threading.Thread (Python 3.9) - python

Due to production needs, we updated our project to Python3.9. But the project stopped working because of RuntimeError: can't register atexit after shutdown, which did not occur with Python3.7. Our project has many threads and each thread might spawn sub-threads. We used threading.Thread for the higher levels and concurrent.futures.ThreadPoolExecutor at the bottom level. For example, the following code would work on 3.7 but not 3.9:
from threading import Thread
import concurrent.futures
def func1():
print("func1 start")
def func2():
print("func2 start")
def func3():
with concurrent.futures.ThreadPoolExecutor() as executor:
print("func3 start")
future1 = executor.submit(func1)
future2 = executor.submit(func2)
concurrent.futures.wait([future1, future2])
print("func3 end")
thread1 = Thread(target=func1)
thread3 = Thread(target=func3)
thread1.start()
thread3.start()
thread1.join()
thread3.join()
with the following error in 3.9:
func1 start
Exception in thread Thread-2:
Traceback (most recent call last):
File "C:\my_project\lib\threading.py", line 973, in _bootstrap_inner
self.run()
File "C:\my_project\lib\threading.py", line 910, in run
self._target(*self._args, **self._kwargs)
File "C:\my_projectr\tests\thread_test.py", line 13, in func2
with concurrent.futures.ThreadPoolExecutor() as executor:
File "C:\my_project\lib\concurrent\futures\__init__.py", line 49, in __getattr__
from .thread import ThreadPoolExecutor as te
File "C:\my_project\lib\concurrent\futures\thread.py", line 37, in <module>
threading._register_atexit(_python_exit)
File "C:\my_project\lib\threading.py", line 1407, in _register_atexit
raise RuntimeError("can't register atexit after shutdown")
RuntimeError: can't register atexit after shutdown
After some experimenting, I realized that ThreadPoolExecutor cannot be used under Thread while Thread can be used under ThreadPoolExecutor in Python3.9.
My questions are:
Is this behaviour (change) intended? Why?
What would be a proper way to use multi-level threading in Python3.9?

Related

Why I can't use multiprocessing.Queue with ProcessPoolExecutor?

When I run the below code:
from concurrent.futures import ProcessPoolExecutor, as_completed
from multiprocessing import Queue
q = Queue()
def my_task(x, queue):
queue.put("Task Complete")
return x
with ProcessPoolExecutor() as executor:
tasks = [executor.submit(my_task, i, q) for i in range(10)]
for task in as_completed(tasks):
print(task.result())
I get this error:
concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/lib/python3.10/multiprocessing/queues.py", line 244, in _feed
obj = _ForkingPickler.dumps(obj)
File "/usr/lib/python3.10/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
File "/usr/lib/python3.10/multiprocessing/queues.py", line 58, in __getstate__
context.assert_spawning(self)
File "/usr/lib/python3.10/multiprocessing/context.py", line 373, in assert_spawning
raise RuntimeError(
RuntimeError: Queue objects should only be shared between processes through inheritance
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/tmp/nn.py", line 14, in <module>
print(task.result())
File "/usr/lib/python3.10/concurrent/futures/_base.py", line 451, in result
return self.__get_result()
File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
File "/usr/lib/python3.10/multiprocessing/queues.py", line 244, in _feed
obj = _ForkingPickler.dumps(obj)
File "/usr/lib/python3.10/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
File "/usr/lib/python3.10/multiprocessing/queues.py", line 58, in __getstate__
context.assert_spawning(self)
File "/usr/lib/python3.10/multiprocessing/context.py", line 373, in assert_spawning
raise RuntimeError(
RuntimeError: Queue objects should only be shared between processes through inheritance
What is the purpose of multiprocessing.Queue if I cannot using for multiprocessing? How can I make this to work? In my real code, I need every worker to update a queue frequently about the task status so another thread will get data from that queue to feed a progress bar.
Short Explanation
Why can't you pass a multiprocessing.Queue as a worker function argument? The short answer is that submitted tasks are submitted to a transparent input queue from which the pool processes get the next task to be performed. But these arguments must be serializable with pickle and a multiprocessing.Queue is not in general serializable. But it is serializable for the special case of passing an argument to a child process as a function argument. Arguments to a multiprocessing.Process are stored as an attribute of the instance when it is created. When start is called on the instance, its state must be serialized to the new address space before the run method is called in that new address space. Why this serialization works for this case but not the general case is unclear to me; I would have to spend a lot of time looking at the source for the interpreter to come up with a definitive answer.
See what happens when I try to put a queue instance to a queue:
>>> from multiprocessing import Queue
>>> q1 = Queue()
>>> q2 = Queue()
>>> q1.put(q2)
>>> Traceback (most recent call last):
File "C:\Program Files\Python38\lib\multiprocessing\queues.py", line 239, in _feed
obj = _ForkingPickler.dumps(obj)
File "C:\Program Files\Python38\lib\multiprocessing\reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
File "C:\Program Files\Python38\lib\multiprocessing\queues.py", line 58, in __getstate__
context.assert_spawning(self)
File "C:\Program Files\Python38\lib\multiprocessing\context.py", line 359, in assert_spawning
raise RuntimeError(
RuntimeError: Queue objects should only be shared between processes through inheritance
>>> import pickle
>>> b = pickle.dumps(q2)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Program Files\Python38\lib\multiprocessing\queues.py", line 58, in __getstate__
context.assert_spawning(self)
File "C:\Program Files\Python38\lib\multiprocessing\context.py", line 359, in assert_spawning
raise RuntimeError(
RuntimeError: Queue objects should only be shared between processes through inheritance
>>>
How to Pass the Queue via Inheritance
First of all your code will run more slowly using multiprocessing then if you had just called my_task in a loop because multiprocessing introduces additional overhead (starting of processes and moving data across address spaces) which requires that what you gain from running my_task in parallel more than offsets the additional overhead. In your case it doesn't because my_task is not sufficiently CPU-intensive as to justify multiprocessing.
That said, when you wish to have your pool processes using a multiprocessing.Queue instance, it cannot be passed as an argument to a worker function (unlike the case when you are using explicitly multiprocessing.Process instances instead of a pool). Instead, you must initialize a global variable in each pool process with the queue instance.
If you are running under a platform that uses fork to create new processes, then you can just create queue as a global and it will be inherited by each pool process:
from concurrent.futures import ProcessPoolExecutor, as_completed
from multiprocessing import Queue
queue = Queue()
def my_task(x):
queue.put("Task Complete")
return x
with ProcessPoolExecutor() as executor:
tasks = [executor.submit(my_task, i) for i in range(10)]
for task in as_completed(tasks):
print(task.result())
# This queue must be read before the pool terminates:
for _ in range(10):
print(queue.get())
Prints:
1
0
2
3
6
5
4
7
8
9
Task Complete
Task Complete
Task Complete
Task Complete
Task Complete
Task Complete
Task Complete
Task Complete
Task Complete
Task Complete
If you need portability with platforms that do not use the fork method to create processes, such as Windows (which uses the spawn method), then you cannot allocate the queue as a global since each pool process will create its own queue instance. Instead, the main process must create the queue and then initialize each pool process' global queue variable by using the initializer and initargs:
from concurrent.futures import ProcessPoolExecutor, as_completed
from multiprocessing import Queue
def init_pool_processes(q):
global queue
queue = q
def my_task(x):
queue.put("Task Complete")
return x
# Windows compatibilitY
if __name__ == '__main__':
q = Queue()
with ProcessPoolExecutor(initializer=init_pool_processes, initargs=(q,)) as executor:
tasks = [executor.submit(my_task, i) for i in range(10)]
for task in as_completed(tasks):
print(task.result())
# This queue must be read before the pool terminates:
for _ in range(10):
print(q.get())
If you want to advance a progress bar as each task completes (you haven't precisely stated how the bar is to advance; see my comment to your question), then the following shows that a queue is necessary. If each task submitted consisted of N parts (for a total of 10 * N parts, since there are 10 tasks) and would like to see a single progress bar advance as each part is completed, then a queue is probably the most straight forward way of signaling a part completion back to the main process.
from concurrent.futures import ProcessPoolExecutor, as_completed
from tqdm import tqdm
def my_task(x):
return x
# Windows compatibilitY
if __name__ == '__main__':
with ProcessPoolExecutor() as executor:
with tqdm(total=10) as bar:
tasks = [executor.submit(my_task, i) for i in range(10)]
for _ in as_completed(tasks):
bar.update()
# To get the results in task submission order:
results = [task.result() for task in tasks]
print(results)

KeyboardInterrupt with Python multiprocessing.Pool

I want to write a service that launches multiple workers that work infinitely and then quit when main process is Ctrl+C'd. However, I do not understand how to handle Ctrl+C correctly.
I have a following testing code:
import os
import multiprocessing as mp
def g():
print(os.getpid())
while True:
pass
def main():
with mp.Pool(1) as pool:
try:
s = pool.starmap(g, [[]] * 1)
except KeyboardInterrupt:
print('Done')
if __name__ == "__main__":
print(os.getpid())
main()
When I try to Ctrl+C it, I expect process(es) running g to just receive SIGTERM and silently terminate, however, I receive something like that instead:
Process ForkPoolWorker-1:
Done
Traceback (most recent call last):
File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python3.8/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/usr/lib/python3.8/multiprocessing/pool.py", line 51, in starmapstar
return list(itertools.starmap(args[0], args[1]))
File "test.py", line 8, in g
pass
KeyboardInterrupt
This obviously means that parent and children processes both raise KeyboardInterrupt from Ctrl+C, further suggested by tests with kill -2. Why does this happen and how to deal with it to achieve what I want?
The signal that triggers KeyboardInterrupt is delivered to the whole pool. The child worker processes treat it the same as the parent, raising KeyboardInterrupt.
The easiest solution here is:
Disable the SIGINT handling in each worker on creation
Ensure the parent terminates the workers when it catches KeyboardInterrupt
You can do this easily by passing an initializer function that the Pool runs in each worker before the worker begins doing work:
import signal
import multiprocessing as mp
# Use initializer to ignore SIGINT in child processes
with mp.Pool(1, initializer=signal.signal, initargs=(signal.SIGINT, signal.SIG_IGN)) as pool:
try:
s = pool.starmap(g, [[]] * 1)
except KeyboardInterrupt:
print('Done')
The initializer replaces the default SIGINT handler with one that ignores SIGINT in the children, leaving it up to the parent process to kill them. The with statement in the parent handles this automatically (exiting the with block implicitly calls pool.terminate()), so all you're responsible for is catching the KeyboardInterrupt in the parent and converting from ugly traceback to simple message.

asyncio + multiprocessing + unix

I have a pet project with the following logic:
import asyncio, multiprocessing
async def sub_main():
print('Hello from subprocess')
def sub_loop():
asyncio.get_event_loop().run_until_complete(sub_main())
def start():
multiprocessing.Process(target=sub_loop).start()
start()
If you run it, you'll see:
Hello from subprocess
That is good. But what I have to do is to make start() coroutine instead:
async def start():
multiprocessing.Process(target=sub_loop).start()
To run it, I have to do something like that:
asyncio.get_event_loop().run_until_complete(start())
Here is the issue: when sub process is created, it gets the whole Python environment cloned, so event loop is already running there:
Process Process-1:
Traceback (most recent call last):
File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
self.run()
File "/usr/lib/python3.5/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "test.py", line 7, in sub_loop
asyncio.get_event_loop().run_until_complete(sub_main())
File "/usr/lib/python3.5/asyncio/base_events.py", line 361, in run_until_complete
self.run_forever()
File "/usr/lib/python3.5/asyncio/base_events.py", line 326, in run_forever
raise RuntimeError('Event loop is running.')
RuntimeError: Event loop is running.
I tried to destroy it on subprocess side with no luck but I think that the correct way is to prevent its sharing with subprocess though. Is it possible somehow?
UPDATE:
Here is the full failing code:
import asyncio, multiprocessing
import asyncio.unix_events
async def sub_main():
print('Hello from subprocess')
def sub_loop():
asyncio.get_event_loop().run_until_complete(sub_main())
async def start():
multiprocessing.Process(target=sub_loop).start()
asyncio.get_event_loop().run_until_complete(start())
First, you should consider using loop.run_in_executor with a ProcessPoolExecutor if you plan to run python subprocesses from within the loop. As for your problem, you can use the event loop policy functions to set a new loop:
import asyncio
from concurrent.futures import ProcessPoolExecutor
async def sub_main():
print('Hello from subprocess')
def sub_loop():
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
loop.run_until_complete(sub_main())
async def start(executor):
await asyncio.get_event_loop().run_in_executor(executor, sub_loop)
if __name__ == '__main__':
executor = ProcessPoolExecutor()
asyncio.get_event_loop().run_until_complete(start(executor))
You should always add a check to see how you're running the code (the if __name__ == '__main__': part. Your subprocess is running everything in the module a 2nd time, giving you grief (couldn't resist).
import asyncio, multiprocessing
async def sub_main():
print('Hello from subprocess')
def sub_loop():
asyncio.get_event_loop().run_until_complete(sub_main())
async def start():
multiprocessing.Process(target=sub_loop).start()
if __name__ == '__main__':
asyncio.get_event_loop().run_until_complete(start())

How does a python process exit gracefully after receiving SIGTERM while waiting on a semaphore?

I have a Python process which spawns 5 other Python processes using the multiprocessing module. Let's call the parent process P0 and the others P1-P5. The requirement is, if we send a SIGTERM to P0, it should shut down P1 to P5 first and then exit itself.
The catch is P1 and P5 are waiting on semaphores. So when I send SIGTERM to these processes, they invoke the signal handler and exit. But since they are waiting on semaphore, they throw an exception. Is there any way to catch that exception before exit, so that P0 to P5 can make a graceful exit?
Traceback:
Traceback (most recent call last):
File "/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap
Traceback (most recent call last):
Process Process-2:
File "/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap
Traceback (most recent call last):
self.run()
File "/usr/lib64/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
Process Process-5:
Traceback (most recent call last):
File "/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib64/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "/opt/fireeye/scripts/mip/StaticAnalysisRunner.py", line 45, in run
qsem.acquire()
You can install a signal handler which throws an exception which is then caught in the subprocess to handle exits gracefully.
Here is an example of a script which waits in a semaphore in a subprocess and terminates gracefully when sent a SIGTERM.
#!/usr/bin/env python
import signal
import time
import multiprocessing
class GracefulExit(Exception):
pass
def signal_handler(signum, frame):
raise GracefulExit()
def subprocess_function():
try:
sem = multiprocessing.Semaphore()
print "Acquiring semaphore"
sem.acquire()
print "Semaphore acquired"
print "Blocking on semaphore - waiting for SIGTERM"
sem.acquire()
except GracefulExit:
print "Subprocess exiting gracefully"
if __name__ == "__main__":
# Use signal handler to throw exception which can be caught to allow
# graceful exit.
signal.signal(signal.SIGTERM, signal_handler)
# Start a subprocess and wait for it to terminate.
p = multiprocessing.Process(target=subprocess_function)
p.start()
print "Subprocess pid: %d" % p.pid
p.join()
An example run of this script is as follows:
$ ./test.py
Subprocess pid: 7546
Acquiring semaphore
Semaphore acquired
Blocking on semaphore - waiting for SIGTERM
----> Use another shell to kill -TERM 7546
Subprocess exiting gracefully
There is no traceback from the subprocess and the flow shows that the subprocess exits in a graceful manner. This is because the SIGTERM is caught by the subprocess signal handler which throws a normal Python exception which can be handled inside the process.

Race condition using multiprocessing and threading together

I wrote the sample program.
It creates 8 threads and spawns process in each one
import threading
from multiprocessing import Process
def fast_function():
pass
def thread_function():
process_number = 1
print 'start %s processes' % process_number
for i in range(process_number):
p = Process(target=fast_function, args=())
p.start()
p.join()
def main():
threads_number = 8
print 'start %s threads' % threads_number
threads = [threading.Thread(target=thread_function, args=())
for i in range(threads_number)]
for thread in threads:
thread.start()
for thread in threads:
thread.join()
It crashes with several exceptions like this
Exception in thread Thread-3:
Traceback (most recent call last):
File "/usr/lib/python2.6/threading.py", line 532, in __bootstrap_inner
self.run()
File "/usr/lib/python2.6/threading.py", line 484, in run
self.__target(*self.__args, **self.__kwargs)
File "./repeat_multiprocessing_bug.py", line 15, in thread_function
p.start()
File "/usr/lib/python2.6/multiprocessing/process.py", line 99, in start
_cleanup()
File "/usr/lib/python2.6/multiprocessing/process.py", line 53, in _cleanup
if p._popen.poll() is not None:
File "/usr/lib/python2.6/multiprocessing/forking.py", line 106, in poll
pid, sts = os.waitpid(self.pid, flag)
OSError: [Errno 10] No child processes
Python version 2.6.5. Can somebody explain what I do wrong?
You're probably trying to run it from the interactive interpreter. Try writing your code to a file and run it as a python script, it works on my machine...
See the explanation and examples at the Python multiprocessing docs.
The multiprocessing module has a thread-safety issue in 2.6.5. Your best bet is updating to a newer Python, or add this patch to 2.6.5: http://hg.python.org/cpython/rev/41aef062d529/
The bug is described in more detail in the following links:
http://bugs.python.org/issue11891
http://bugs.python.org/issue1731717

Categories

Resources