Python Multiprocessing processes not closing with join()

Python Multiprocessing processes not closing with join() - python

I am running a relatively large Python program which involves a handful of processes running at the same time, which is achieved using the multiprocessing library. I am experiencing an issue where 1 of my processes will not quit, so when I try to exit the program (with CTRL+C) it just hangs forever (The only way to close it is to force close the Python.exe process from task manager). For every other process that I have, when I call process.join(timeout=1), it closes the process. However, for this one specific process it just never closes (I was only able to identify this after putting a print statement after every .join() and seeing that there is only 1 process that never reaches the print statement).
Does anyone know why this might be happening, and how I can get this process to close? I saw somewhere else that this might be due to the process having a non-empty Queue, but this specific process only has 1 mp.Queue that I am clearing right before I close it:
class MyClass():
def __init__(self):
self.queue = mp.Queue()
self.bad_process = mp.Process(target=some_func)
self.bad_process.start()
...
def close(self):
# Clear queue before closing
while not self.queue.empty():
self.queue.get()
print("This line prints")
self.bad_process.join(timeout=1)
print("Never reaches here, always hanging")

Related

Python 3 - How to terminate a thread instantly?

In my code (a complex GUI application with Tkinter) I have a thread defined in a custom object (a progress bar). It runs a function with a while cicle like this:
def Start(self):
while self.is_active==True:
do it..
time.sleep(1)
do it..
time.sleep(1)
def Stop(self):
self.is_active=False
It can terminate only when another piece of code, placed in another thread, changes the attribute self.is_active using the method self.Stop(). I have the same situation in another custom object (a counter) and both of them have to work together when the another thread (the main one) works.
The code works, but I realized that the two threads associated with the progress bar and the counter don't terminate instantly as I wanted, because before to temrinate, they need to wait the end of their functions, and these ones are slow becose of the time.sleep(1) instructions. From the user point of view, it means see the end of the main thread with the progress bar and the cunter that terminate LATE and I don't like it.
To be honest I don't know how to solve this issue. Is there a way to force a thread to terminate instantly without waiting the end of the function?

First off, to be clear, hard-killing a thread is a terrible idea in any language, and Python doesn't support it; if nothing else, the risk of that thread holding a lock which is never unlocked, causing any thread that tries to acquire it to deadlock, is a fatal flaw.
If you don't care about the thread at all, you can create it with the daemon=True argument, and it will die if all non-daemon threads in the process have exited. But if the thread really should die with proper cleanup (e.g. it might have with statements or the like that manage cleanup of resources outside the process, that won't be cleaned up on process termination), that's not a real solution.
That said, you can avoid waiting a second or more by switching from using a plain bool and time.sleep to using an Event and using the .wait method on it. This will allow the "sleeps" to be interrupted immediately, at the small expense of requiring you to reverse your condition (because Event.wait only blocks while it's false/unset, so you need the flag to be based on when you should stop, not when you are currently active):
class Spam:
def __init__(self):
self.should_stop = threading.Event() # Create an unset event on init
def Start(self):
while not self.should_stop.is_set():
# do it..
if self.should_stop.wait(1):
break
# do it..
if self.should_stop.wait(1):
break
def Stop(self):
self.should_stop.set()
On modern Python (3.1 and higher) the wait method returns True if the event was set (on beginning the wait or because it got set while waiting), and False otherwise, so whenever wait returns True, that means you were told to stop and you can immediately break out of the loop. You also get notified almost immediately, instead of waiting up to one second before you can check the flag.
This won't cause the real "do it.." code to exit immediately, but from what you said, it sounds like that part of the code isn't all that long, so waiting for it to complete isn't a big hassle.
If you really want to preserve the is_active attribute for testing whether it's still active, you can define it as a property that reverses the meaning of the Event, e.g.:
#property
def is_active(self):
return not self.should_stop.is_set()

the safest way to do it without risking a segmentation fault, is to return.
def Start(self):
while self.is_active==True:
do it..
if not self.is_active: return
time.sleep(1)
if not self.is_active: return
do it..
if not self.is_active: return
time.sleep(1)
def Stop(self):
self.is_active=False
python threads need to free the associated resources, and while "killing" the thread is possible using some C tricks, you will be risking a segmentation fault or a memory leak.
here is a cleaner way to do it.
class MyError(Exception):
pass
def Start(self):
try:
while self.is_active==True:
do it..
self.check_termination()
time.sleep(1)
self.check_termination()
do it..
self.check_termination()
time.sleep(1)
except MyError:
return
def check_termination(self):
if not self.is_active:
raise MyError
and you can call self.check_termination() from inside any function to terminate this loop, not necessarily from inside Start directly.
Edit: ShadowRanger solution handles the "interruptable wait" better, i am just keeping this for implementing a kill switch for the thread that can be checked from anywhere inside the thread.

Python multiprocessing - main process wont continue when spawned process terminated

I want to run a function in python in a new process, do some work, return progress to the main process using a queue and wait on the main process for termination of the spawned process and then continue execution of the main process.
I got the following code, which runs the function foo in a new process and returns progress using a queue:
import multiprocessing as mp
import time
def foo(queue):
for i in range(10):
queue.put(i)
time.sleep(1)
if __name__ == '__main__':
mp.set_start_method('spawn')
queue = mp.Queue()
p = mp.Process(target=foo, args=(queue,))
p.start()
while p.is_alive():
print("ALIVE")
print(queue.get())
time.sleep(0.01)
print("Process finished")
The output is:
ALIVE
0
ALIVE
1
ALIVE
2
ALIVE
3
ALIVE
4
ALIVE
5
ALIVE
6
ALIVE
7
ALIVE
8
ALIVE
9
ALIVE
At some point neither "Alive" nor "Process finished" is printed. How can I continue execution when the spawned process stops running?
*Edit
The problem was that I didn't know that queue.get() blocks until an item is put into the queue if the queue is empty. I fixed it by changing
while p.is_alive():
print(queue.get())
time.sleep(0.01)
to
while p.is_alive():
if not queue.empty():
print(queue.get())
time.sleep(0.01)

Your code has a race condition. After the last number is put into the queue, the child process sleeps one more time before it exits. That gives the parent process enough time to fetch that option, sleep for a shorter time, and then conclude that the child is still alive before waiting for an 11th item that never comes.
Note that you get more ALIVE reports in your output than you do numbers. That tells you where the parent process is deadlocked.
There are a few possible ways you could fix the issue. You could change the foo function to sleep first, and put the item into the queue afterwards. That would make it so that it could quit running immediately after sending the 9 to its parent, which would probably allow it to avoid the race condition (since the parent does sleep for a short time after receiving each item). There would still be a small possibility of the race happening if things behaved very strangely, but it's quite unlikely.
A better approach might be to prevent the possibility of the race from occurring at all. For example, you might change the queue.get call to have a timeout set, so that it will give up (with a queue.Empty exception) if there's nothing to retrieve for too long. You could catch that exception immediately, or even use it as a planned method of breaking out of the loop rather than testing if the child is still alive or not, and catching it at a higher level.
A final option might be to send a special sentinel value from the child to the parent in the queue to signal when there will be no further values coming. For instance, you might send None as the last value, just before the foo function ends. The parent code could check for that specific value and break out if its loop, rather than treating it like a normal value (and e.g. printing it). This sort of positive signal that the child code is done might be better than the negative signal of a timeout, since it's less likely for something going wrong (e.g. the child crashing) being misinterpreted as the expected shutdown.

sys.exit() not working, no other threads and no try: blocks capturing SystemExit?

I can't get my Python app to exit. After a call to sys.exit(), python.exe stays running and I have to kill it with task manager.
I've spent the past 4 hours looking into this, and I'm stumped.
This is Python 3.4.4 on Windows 10 x86.
First, I do have a multithreaded application. However I have verified that all threads are exiting with only the main thread running before I call sys.exit(). (I did this by calling threading.enumerate() in a while loop and waiting until there's only the main thread remaining, printing the list of running threads and watching it get smaller on each loop until only the main thread remains.)
Also, I've confirmed that I don't have anything wrapped in a try: block that would be swallowing the SystemExit exception. If I print sys.exc_info() I get (None, None, None), and if I call raise then it also confirms there are no exceptions pending.
What's interesting is that I've narrowed this down to the offending thread by commenting out different parts of my app to disable each thread one-by-one. (I have 4 threads total, each doing different things.)
If I comment out the thread in question, I can quit my app no problem. But again, even when I have that thread running, that thread does successfully exit, there's just something in there that's preventing the main Python exe from exiting.
I've tried setting the daemon flag, but that doesn't do anything either way. The offending thread's purpose is to wait at a PriorityQueue() with a 1 second timeout, and then when that times out it checks a threading.Event() flag to exit itself gracefully. Again, that works fine. I can see in my while() loop while the program is exiting that that thread is running, then stops.
The only other information is this application is launched via a console_scripts entry. I've looked at the script file that setuptools creates and see that just wraps the call to my entry point in a sys.exit(), but even hacking that file, I just cannot get this thing to exit.
I've tried calling sys.exit, raising SystemExit, and simply returning to let the console_script call sys.exit. None of those work.
I've also tried more brute force efforts, like os._exit(), but that also doesn't work.
What's really weird is that if I create a recursive loop (a simple one-line method that just calls itself), and I put that in my stop method before I set my threading Event which stops the threads, then Python will exit as it should. (I did that by mistake and first and was dumbfounded that that works. But if I move that loop call down a few lines to just before I call sys.exit, then the recursive loop doesn't kill python.exe. So even though my problem thread exits properly, something about it trying to exit is causing Python.exe to hang.
So, my question, does anyone have any other ideas or things to try about why Python won't exit? Specifically why my problem thread stops and only the main thread remains, yet sys.exit or os._exit() do nothing? I'm completely stumped.
My app consumes about 90MB of memory, and in task manager, I can see the GC doing its job as when my app is "hung" after the sys.exit() call, I see the memory usage drop from 90MB to 0.1MB over the course of about 30 seconds. But even after leaving it, python.exe doesn't stop.
Update: Here's some code that demonstrates what things look like:
From the module and function that's registered as the console_script:
def run_from_command_line(args=None):
path = os.path.abspath(os.path.curdir)
CommandLineUtility(path).execute()
From the CommandLineUtility() which starts my app. This is the last line:
def __init__(...):
... skipping a bunch of setup stuff
MpfMc(options=vars(args), config=mpf_config,
machine_path=machine_path).run() # this is not a threading run, just the name of the method for my app
From MpfMc():
def __init__(...):
...
self.thread_stopper = threading.Event()
...
self.asset_manager = AssetManager(self)
From AssetManager():
self.loader_thread = AssetLoader(loader_queue=self.loader_queue,
loaded_queue=self.loaded_queue,
exception_queue=self.machine.crash_queue,
thread_stopper=self.machine.thread_stopper)
self.loader_thread.daemon = True
self.loader_thread.start()
From AssetLoader:
def run(self):
"""Run loop for the loader thread."""
while True:
try:
asset = self.loader_queue.get(block=True, timeout=1)
except Empty:
asset = None
if self.thread_stopper.is_set():
return
if asset:
if not asset.loaded:
with asset.lock:
asset.do_load()
self.loaded_queue.put(asset)
From the MpfMc.stop() method that stops the app:
def stop(self):
self.log.info("Stopping ...")
self.thread_stopper.set()
while [x for x in self.threads if x.is_alive()]:
# self.threads is a list of threads I created, not the main thread.
print("Waiting for threads to stop")
print([x for x in self.threads if x.is_alive()])
print(threading.enumerate())
time.sleep(0.5)
for thread in self.threads:
# verify none of the sub threads are alive
print("THREAD", thread, thread.is_alive())
sys.exit() # here's where I also tried raise SystemExit, os._exit(), etc
Thanks!

Function within Worker/Child instance does not return, freezes program

I am using the multiprocessing module in python. Here is a sample of the code I am using:
import multiprocessing as mp
def function(fun_var1, fun_var2):
b = fun_var1 + fun_var2
# and more computationally intensive stuff happens here
return b
# my program freezes after the return command
class Worker(mp.Process):
def __init__(self, queue_obj, func_var1, func_var2):
mp.Process.__init__(self)
self.queue_obj = queue_obj
self.func_var1 = func_var1
self.func_var2 = func_var2
def run(self):
self.var = function( self.func_var1, self.func_var2 )
self.queue_obj.put(self.var)
if __name__ == '__main__':
mp.freeze_support()
queue_list = []
processes = []
result = []
for i in range(2):
queue_list.append(mp.Queue())
processes.append( Worker(queue_list[i], i, var1, var2 )
processes[i].start()
for i in range(2):
processes[i].join()
result.append(queue_list[i].get())
During runtime of the program two instances of the worker class are generated which work simultaneously. One instance finishes after about 2 minutes and the other would take about 7 minutes. The first instance returns its results fine. However, the second instance freezes the program when the function() that is called within the run() method returns its value. No error is being thrown, the program just does not continue to execute. The console also indicates that it is busy but not displaying the >>> prompt. I am completely clueless why this behavior occurs. The same code works fine for slightly different inputs in the two Worker instances. The only difference I can make out is that the work loads are more equal when it executes correctly. Could the time difference cause trouble? Does anyone have experience with this kind of behavior? Also note that if I run a serial setup of the program in which function() is just called twice by the main program, the code executes flawlessly. Could there be some timeout involved in the worker instance that makes it impossible for function() to return its value to the Worker instance? The return value of function() is actually a list that is fairly small. It contains about 100 float values.
Any suggestions are welcomed!

This is a bit of an educated guess without actually seeing what's going on in worker, but is it possible that your child has put items into the Queue that haven't been consumed? The documentation has a warning about this:
Warning
As mentioned above, if a child process has put items on a queue (and
it has not used JoinableQueue.cancel_join_thread), then that process
will not terminate until all buffered items have been flushed to the
pipe.
This means that if you try joining that process you may get a deadlock
unless you are sure that all items which have been put on the queue
have been consumed. Similarly, if the child process is non-daemonic
then the parent process may hang on exit when it tries to join all its
non-daemonic children.
Note that a queue created using a manager does not have this issue.
See Programming guidelines.
It might be worth trying to create your Queue object using mp.Manager.Queue and see if the issue goes away.

What could make a connection.send() block ? (from conn1, conn2 = multiprocessing.Pipe() )

I am debugging application that gather information from 2 sensors : a webcam and a microphone.
The general architecture is quite simple :
the main process sends messages (start, stop, get_data) via pipes to the child processes (one for each).
child processes gather the data and send it to the main process
Child & main processes are in infinite loops to process commands (the main process from the user, the child process from the main process).
It globally works but I have trouble stopping the child processes.
I have logged the code and it seems to happen 2 things :
The 'stop' message is sent but doesn't get through the pipe.
The child process continue to send data and the conn.send(data) blocks.
The behavior is clearly linked to the state of the connection, as child processes that send nothing back don't have this behavior. Still, I don't see how to debug/modify the current architecture which seems reasonnable.
So, what cause this blocking behavior and how to avoid it ?
This is the code which is executed for each iteration of the infinite loop in the child process :
def do(self):
while self.cnx.poll():
msg = self.cnx.recv()
self.queue.append(msg)
#==
if not self.queue:
func_name = 'default_action'
self.queue.append([func_name, ])
#==
msg = self.queue.pop()
func_name, args = msg[0], msg[1:]
#==
res = self.target.__getattribute__(func_name)(*args)
#==
running = func_name != 'stop'
#==
if res and self.send:
assert running
self.output_queue.append(res[0])
if self.output_queue and running:
self.cnx.send(self.output_queue.popleft())
#==
return running
update : it seems that the Pipe cannot be written simultaneously on both end. It works if change the last few lines of the above code to :
if self.output_queue and running:
if not self.cnx.poll():
self.cnx.send(self.output_queue.popleft())
The question stays open though as Pipe are documented as full duplex by default and this behavior is not documented at all. I must have misunderstood something. Please, enlight me!
update 2 : just to be clear, no connection is closed during in this situation. To describe the sequence of events :
the main process sends a messsage ("stop") (it empties the connection before sending the message)
the main process enter an (infinite) loop that stops when the child process is terminated.
meanwhile, the child process is blocked in the send and never gets the message.

A full duplex multiprocessing.Pipe is implemented as socketpair(). Calling .send can block for all the normal reasons when talking to a socket. Based on your description I think it's likely that the reader of your Pipe has quit reading and data has built up in the buffers in the kernel to the point where your .send blocks.
If you explicitly .close the receiving side you'll probably get some kind of error (although possibly SIGPIPE as well, not sure) when you try to .send. If your receiving connection was going out of scope this would probably happen automatically. You may be able to fix the problem by just being more careful not to store references (direct or indirect) to the receiving side so it gets deallocated when that thread goes away.
Trivial demo of blocking .send:
import multiprocessing
a, b = multiprocessing.Pipe()
while True:
print "send!"
a.send("hello world")
Now note that after a while it quits printing "send!"

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.