I am trying to write a Python multi-threaded script that does the following two things in different threads:
Parent: Start Child Thread, Do some simple task, Stop Child Thread
Child: Do some long running task.
Below is a simple way to do it. And it works for me:
from multiprocessing import Process
import time
def child_func():
while not stop_thread:
time.sleep(1)
if __name__ == '__main__':
child_thread = Process(target=child_func)
stop_thread = False
child_thread.start()
time.sleep(3)
stop_thread = True
child_thread.join()
But a complication arises because in actuality, instead of the while-loop in child_func(), I need to run a single long-running process that doesn't stop unless it is killed by Ctrl-C. So I cannot periodically check the value of stop_thread in there. So how can I tell my child process to end when I want it to?
I believe the answer has to do with using signals. But I haven't seen a good example of how to use them in this exact situation. Can someone please help by modifying my code above to use signals to communicate between the Child and the Parent thread. And making the child-thread terminate iff the user hits Ctrl-C.
There is no need to use the signal module here unless you want to do cleanup on your child process. It is possible to stop any child processes using the terminate method (which has the same effect as SIGTERM)
from multiprocessing import Process
import time
def child_func():
time.sleep(1000)
if __name__ == '__main__':
event = Event()
child_thread = Process(target=child_func)
child_thread.start()
time.sleep(3)
child_thread.terminate()
child_thread.join()
The docs are here: https://docs.python.org/2/library/multiprocessing.html#multiprocessing.Process.terminate
Related
I have some code that needs to run against several other systems that may hang or have problems not under my control. I would like to use python's multiprocessing to spawn child processes to run independent of the main program and then when they hang or have problems terminate them, but I am not sure of the best way to go about this.
When terminate is called it does kill the child process, but then it becomes a defunct zombie that is not released until the process object is gone. The example code below where the loop never ends works to kill it and allow a respawn when called again, but does not seem like a good way of going about this (ie multiprocessing.Process() would be better in the __init__()).
Anyone have a suggestion?
class Process(object):
def __init__(self):
self.thing = Thing()
self.running_flag = multiprocessing.Value("i", 1)
def run(self):
self.process = multiprocessing.Process(target=self.thing.worker, args=(self.running_flag,))
self.process.start()
print self.process.pid
def pause_resume(self):
self.running_flag.value = not self.running_flag.value
def terminate(self):
self.process.terminate()
class Thing(object):
def __init__(self):
self.count = 1
def worker(self,running_flag):
while True:
if running_flag.value:
self.do_work()
def do_work(self):
print "working {0} ...".format(self.count)
self.count += 1
time.sleep(1)
You might run the child processes as daemons in the background.
process.daemon = True
Any errors and hangs (or an infinite loop) in a daemon process will not affect the main process, and it will only be terminated once the main process exits.
This will work for simple problems until you run into a lot of child daemon processes which will keep reaping memories from the parent process without any explicit control.
Best way is to set up a Queue to have all the child processes communicate to the parent process so that we can join them and clean up nicely. Here is some simple code that will check if a child processing is hanging (aka time.sleep(1000)), and send a message to the queue for the main process to take action on it:
import multiprocessing as mp
import time
import queue
running_flag = mp.Value("i", 1)
def worker(running_flag, q):
count = 1
while True:
if running_flag.value:
print(f"working {count} ...")
count += 1
q.put(count)
time.sleep(1)
if count > 3:
# Simulate hanging with sleep
print("hanging...")
time.sleep(1000)
def watchdog(q):
"""
This check the queue for updates and send a signal to it
when the child process isn't sending anything for too long
"""
while True:
try:
msg = q.get(timeout=10.0)
except queue.Empty as e:
print("[WATCHDOG]: Maybe WORKER is slacking")
q.put("KILL WORKER")
def main():
"""The main process"""
q = mp.Queue()
workr = mp.Process(target=worker, args=(running_flag, q))
wdog = mp.Process(target=watchdog, args=(q,))
# run the watchdog as daemon so it terminates with the main process
wdog.daemon = True
workr.start()
print("[MAIN]: starting process P1")
wdog.start()
# Poll the queue
while True:
msg = q.get()
if msg == "KILL WORKER":
print("[MAIN]: Terminating slacking WORKER")
workr.terminate()
time.sleep(0.1)
if not workr.is_alive():
print("[MAIN]: WORKER is a goner")
workr.join(timeout=1.0)
print("[MAIN]: Joined WORKER successfully!")
q.close()
break # watchdog process daemon gets terminated
if __name__ == '__main__':
main()
Without terminating worker, attempt to join() it to the main process would have blocked forever since worker has never finished.
The way Python multiprocessing handles processes is a bit confusing.
From the multiprocessing guidelines:
Joining zombie processes
On Unix when a process finishes but has not been joined it becomes a zombie. There should never be very many because each time a new process starts (or active_children() is called) all completed processes which have not yet been joined will be joined. Also calling a finished process’s Process.is_alive will join the process. Even so it is probably good practice to explicitly join all the processes that you start.
In order to avoid a process to become a zombie, you need to call it's join() method once you kill it.
If you want a simpler way to deal with the hanging calls in your system you can take a look at pebble.
I am new to python multi threading and trying to understand the basic difference between joining multiple worker threads and calling abort on them after I am done processing with them. Can somebody please explain me with an example?
.join() and setting a abort flags are two different steps in cleanly shutting down a thread.
join() just waits for a thread that is going to terminate anyway to be finished. Thus:
import threading
import time
def thread_main():
time.sleep(10)
t = threading.Thread(target=thread_main)
t.start()
t.join()
This is a reasonable program. The join just waits until the thread is finished. It doesn't do anything to make that happen, but the thread will terminate anyway, because it is just a 10 second sleep.
In contrast
import threading
import time
def thread_main():
while True:
time.sleep(10)
t = threading.Thread(target=thread_main)
t.start()
t.join()
Is not a good idea, because join will still wait for the thread to terminate on it's own. But the thread will never do that because it loops forever. Thus the whole program can't terminate.
That's the point where you want some kind of signaling to the thread for it so stop itself
import threading
import time
stop_thread = False
def thread_main():
while not stop_thread:
time.sleep(10)
t = threading.Thread(target=thread_main)
t.start()
stop_thread = True
t.join()
Here stop_thread takes the role of your __abort flag and signals the thread to stop after it has finished with it's latest work (the sleep(10) in this case)
Thus this program again is reasonable and terminates when asked to do.
Another popular way to signal a thread to stop when the thread uses a consumer pattern (i.e. gets its work from a queue) is to post a special 'terminate now' work item as alternative to setting a flag variable:
def thread_main():
while True:
(quit, data) = work_queue().get()
if quit: break
do_work(data)
I have a function I'm calling every 5 seconds like such:
def check_buzz(super_buzz_words):
print 'Checking buzz'
t = Timer(5.0, check_buzz, args=(super_buzz_words,))
t.dameon = True
t.start()
buzz_word = get_buzz_word()
if buzz_word is not 'fail':
super_buzz_words.put(buzz_word)
main()
check_buzz()
I'm exiting the script by either catching a KeyboardInterrupt or by catching a System exit and calling this:
sys.exit('\nShutting Down\n')
I'm also restarting the program every so often by calling:
execv(sys.executable, [sys.executable] + sys.argv)
My question is, how do I get that timer thread to shut off? If I keyboard interrupt, the timer keeps going.
I think you just spelled daemon wrong, it should have been:
t.daemon = True
Then sys.exit() should work
Expanding on the answer from notorious.no, and the comment asking:
How can I call t.cancel() if I have no access to t oustide the
function?
Give the Timer thread a distinct name when you first create it:
import threading
def check_buzz(super_buzz_words):
print 'Checking buzz'
t = Timer(5.0, check_buzz, args=(super_buzz_words,))
t.daemon = True
t.name = "check_buzz_daemon"
t.start()
Although the local variable t soon goes out of scope, the Timer thread that t pointed to still exists and still retains the name assigned to it.
Your atexit-registered method can then identify this thread by its name and cancel it:
from atexit import register
def all_done():
for thr in threading._enumerate():
if thr.name == "check_buzz_daemon":
if thr.is_alive():
thr.cancel()
thr.join()
register(all_done)
Calling join() after calling cancel()is based on a StackOverflow answer by Cédric Julien.
HOWEVER, your thread is set to be a Daemon. According to this StackOverflow post, daemon threads do not need to be explicitly terminated.
from atexit import register
def all_done():
if t.is_alive():
# do something that will close your thread gracefully
register(all_done)
Basically when your code is about to exit, it will fire one last function and this is where you will check if your thread is still running. If it is, do something that will either cancel the transaction or otherwise exit gracefully. In general, it's best to let threads finish by themselves, but if it's not doing anything important (please note the emphasis) than you can just do t.cancel(). Design your code so that threads will finish on their own if possible.
Another way would be to use the Queue() module to send and recieve info from a thread using the .put() outside the thread and the .get() inside the thread.
What you can also do is create a txt file and make program write to it when you exit And put an if statement in the thread function to check it after each iteration (this is not a really good solution but it also works)
I would have put a code exemple but i am writing from mobile sorry
How can I terminate a running process, started using concurrent.futures? As I understand, the cancel() method is there to remove a process from the queue if it is not running. But what about killing a running process? For example, if I have a long running process, and I want to stop it when I press a Cancel button in a GUI.
In this case it would probably be better to use a multiprocessing.Process for a long running task.
Create a multiprocessing.Event before starting the new process. Have the child process check the status of this Event regularly, and make it exit when Event.is_set() returns True.
In your GUI code, have the callback associated with the button call set() on the Event.
You may want to look at my answer to a related StackOverflow question here.
In short, there does not appear to be a simple way to cancel a running process inside a concurrent.futures.ProcessPoolExecutor. But you can accomplish it in a hacky way by killing the child processes manually.
You can use the _processes from the executor.
For example script.py:
import signal
import time
from concurrent.futures import ProcessPoolExecutor
def sleep_squre(x):
def sigterm_handler(signum, frame):
raise SystemExit(signum)
signal.signal(signal.SIGTERM, sigterm_handler)
try:
time.sleep(x)
except SystemExit:
return -1
return x * x
with ProcessPoolExecutor(max_workers=2) as ex:
results = ex.map(sleep_squre, [1, 5])
time.sleep(3)
for pid, proc in ex._processes.items():
proc.terminate()
print(list(results))
In this case, we send SIGKILL after 3 seconds to all processes.
$ python3 script.py
[1, -1]
I'm trying to run the following code (it i simplified a bit):
def RunTests(self):
from threading import Thread
import signal
global keep_running
keep_running = True
signal.signal( signal.SIGINT, stop_running )
for i in range(0, NumThreads):
thread = Thread(target = foo)
self._threads.append(thread)
thread.start()
# wait for all threads to finish
for t in self._threads:
t.join()
def stop_running(signl, frme):
global keep_testing
keep_testing = False
print "Interrupted by the Master. Good by!"
return 0
def foo(self):
global keep_testing
while keep_testing:
DO_SOME_WORK();
I expect that the user presses Ctrl+C the program will print the good by message and interrupt. However it doesn't work. Where is the problem?
Thanks
Unlike regular processes, Python doesn't appear to handle signals in a truly asynchronous manner. The 'join()' call is somehow blocking the main thread in a manner that prevents it from responding to the signal. I'm a bit surprised by this since I don't see anything in the documentation indicating that this can/should happen. The solution, however, is simple. In your main thread, add the following loop prior to calling 'join()' on the threads:
while keep_testing:
signal.pause()
From the threading docs:
A thread can be flagged as a “daemon thread”. The significance of this flag is that the entire Python program exits when only daemon threads are left. The initial value is inherited from the creating thread. The flag can be set through the daemon property.
You could try setting thread.daemon = True before calling start() and see if that solves your problem.