I'm having trouble coming up with a piece of code that
spawns multiple processes and
for any individual process, kill
the process if it is still alive after 5 secs
I know how to handle 1) and 2) individually, but I don't know how to combine them together. Any suggestions would be helpful. Thanks!
For 1) I know how to write a simple multi-process program with return dictionary from here:
import multiprocessing
def worker(procnum, return_dict):
'''worker function'''
print str(procnum) + ' represent!'
return_dict[procnum] = procnum
if __name__ == '__main__':
manager = multiprocessing.Manager()
return_dict = manager.dict()
jobs = []
for i in range(5):
p = multiprocessing.Process(target=worker, args=(i,return_dict))
jobs.append(p)
p.start()
for proc in jobs:
proc.join()
print return_dict.values()
For 2), my program hangs on some data, as a function, which is an external C++ extension of Python, will not return. Since there are one million data points to handle, I need to have a time-out killer that kills this function when it's running too long, and moves on to the next iteration. Currently I set a wait time of 5 seconds before killing this process. I know how to write the code from here:
import multiprocessing
import time
# bar
def bar():
for i in range(100):
print "Tick"
time.sleep(1)
if __name__ == '__main__':
# Start bar as a process
p = multiprocessing.Process(target=bar)
p.start()
# Wait for 10 seconds or until process finishes
p.join(10)
# If thread is still active
if p.is_alive():
print "running... let's kill it..."
# Terminate
p.terminate()
p.join()
But, as I mentioned, I am not sure know how to combine these two pieces of code together. Mainly because I don't know where to put the p.join() and if p.alive(). What's the use of p.join() since we already have `p.start()'?
Thanks!
I am trying to restart a python process using multiprocessing module, but "AssertionError: cannot start a process twice" appears.
My question
How can I restart the process
Once its terminated why it is going to zombie mod
How can I remove the zombie process
import time
from multiprocessing import Process
def worker ():
while True:
print "Inside the worker"
time.sleep(10)
p1 = Process(target=worker,name="worker")
p1.start()
#p1.join()
time.sleep(3)
p1.terminate()
print "after Termination "
time.sleep(3)
p1.start()
Actually I am trying to create a process monitor function to watch the memory and CPU usage of all processes . If it reach a certain level I want to restart on realtime
How can I restart the process?
You cannot restart a terminated process. You need to instantiate a new process.
Once its terminated why it is going to zombie mod?
Because on Unix-y systems the parent process needs to read the exit-code before the kernel clears the corresponding entry from the process table.
How can I remove the zombie process?
You have multiple options. I'm citing the docs here:
Joining zombie processes
On Unix when a process finishes but has not been joined it becomes a zombie. There should never be very many because each time a new process starts (or active_children() is called) all completed processes which have not yet been joined will be joined. Also calling a finished process’s Process.is_alive will join the process. Even so it is probably good practice to explicitly join all the processes that you start.
Actually I am trying to create a process monitor function to watch the memory and CPU usage of all processes.
You should take a look at the psutil module for that.
In case you just want to suspend (not kill) processes if memory consumption gets to high, you might be able to draw some inspiration from my answer here.
I hope it will help you
import time
from multiprocessing import Process
def worker ():
while True:
print "Inside the worker"
time.sleep(10)
def proc_start():
p_to_start = Process(target=worker,name="worker")
p_to_start.start()
return p_to_start
def proc_stop(p_to_stop):
p_to_stop.terminate()
print "after Termination "
p = proc_start()
time.sleep(3)
proc_stop(p)
time.sleep(3)
p = proc_start()
print "start gain"
time.sleep(3)
proc_stop(p)
terminate() process will not allow to restart the process but kill() process can be used and the process can be restarted. it works
import time
from multiprocessing import Process
def worker ():
while True:
print "Inside the worker"
time.sleep(10)
p1 = Process(target=worker,name="worker")
p1.start()
#p1.join()
time.sleep(3)
p1.kill()
print "after kill"
time.sleep(3)
p1.start()
When using multiprocessing in Python, I usually see examples where the join() function is called in a separate loop to where each process was actually created.
For example, this:
processes = []
for i in range(10):
p = Process(target=my_func)
processes.append(p)
p.start()
for p in processes:
p.join()
is more common than this:
processes = []
for i in range(10):
p = Process(target=my_func)
processes.append(p)
p.start()
p.join()
But from my understanding of join(), it just tells the script not to exit until that process has finished. Therefore, it shouldn't matter when join() is called. So why is it usually called in a separate loop?
join() is blocking operation.
In first example you start 10 processes and then you are waiting for all procces to finish. All processes are running at same time.
In second example you start one process at time and you are waiting for finish before you start another process. There is only one running process at same time
First example:
def wait()
time.sleep(1)
# You start 10 processes
for i in range(10):
p = Process(target=wait)
processes.append(p)
p.start()
# One second after all processes can be finished you check them all and finish
for p in processes:
p.join()
Execution time of whole script can be near one second.
Second example:
for i in range(10):
p = Process(target=wait) # Here you start one process
processes.append(p)
p.start()
p.join() # Here you will have to wait one second before process finished.
Execution time of whole script can be near 10 seconds!.
I have some code that needs to run against several other systems that may hang or have problems not under my control. I would like to use python's multiprocessing to spawn child processes to run independent of the main program and then when they hang or have problems terminate them, but I am not sure of the best way to go about this.
When terminate is called it does kill the child process, but then it becomes a defunct zombie that is not released until the process object is gone. The example code below where the loop never ends works to kill it and allow a respawn when called again, but does not seem like a good way of going about this (ie multiprocessing.Process() would be better in the __init__()).
Anyone have a suggestion?
class Process(object):
def __init__(self):
self.thing = Thing()
self.running_flag = multiprocessing.Value("i", 1)
def run(self):
self.process = multiprocessing.Process(target=self.thing.worker, args=(self.running_flag,))
self.process.start()
print self.process.pid
def pause_resume(self):
self.running_flag.value = not self.running_flag.value
def terminate(self):
self.process.terminate()
class Thing(object):
def __init__(self):
self.count = 1
def worker(self,running_flag):
while True:
if running_flag.value:
self.do_work()
def do_work(self):
print "working {0} ...".format(self.count)
self.count += 1
time.sleep(1)
You might run the child processes as daemons in the background.
process.daemon = True
Any errors and hangs (or an infinite loop) in a daemon process will not affect the main process, and it will only be terminated once the main process exits.
This will work for simple problems until you run into a lot of child daemon processes which will keep reaping memories from the parent process without any explicit control.
Best way is to set up a Queue to have all the child processes communicate to the parent process so that we can join them and clean up nicely. Here is some simple code that will check if a child processing is hanging (aka time.sleep(1000)), and send a message to the queue for the main process to take action on it:
import multiprocessing as mp
import time
import queue
running_flag = mp.Value("i", 1)
def worker(running_flag, q):
count = 1
while True:
if running_flag.value:
print(f"working {count} ...")
count += 1
q.put(count)
time.sleep(1)
if count > 3:
# Simulate hanging with sleep
print("hanging...")
time.sleep(1000)
def watchdog(q):
"""
This check the queue for updates and send a signal to it
when the child process isn't sending anything for too long
"""
while True:
try:
msg = q.get(timeout=10.0)
except queue.Empty as e:
print("[WATCHDOG]: Maybe WORKER is slacking")
q.put("KILL WORKER")
def main():
"""The main process"""
q = mp.Queue()
workr = mp.Process(target=worker, args=(running_flag, q))
wdog = mp.Process(target=watchdog, args=(q,))
# run the watchdog as daemon so it terminates with the main process
wdog.daemon = True
workr.start()
print("[MAIN]: starting process P1")
wdog.start()
# Poll the queue
while True:
msg = q.get()
if msg == "KILL WORKER":
print("[MAIN]: Terminating slacking WORKER")
workr.terminate()
time.sleep(0.1)
if not workr.is_alive():
print("[MAIN]: WORKER is a goner")
workr.join(timeout=1.0)
print("[MAIN]: Joined WORKER successfully!")
q.close()
break # watchdog process daemon gets terminated
if __name__ == '__main__':
main()
Without terminating worker, attempt to join() it to the main process would have blocked forever since worker has never finished.
The way Python multiprocessing handles processes is a bit confusing.
From the multiprocessing guidelines:
Joining zombie processes
On Unix when a process finishes but has not been joined it becomes a zombie. There should never be very many because each time a new process starts (or active_children() is called) all completed processes which have not yet been joined will be joined. Also calling a finished process’s Process.is_alive will join the process. Even so it is probably good practice to explicitly join all the processes that you start.
In order to avoid a process to become a zombie, you need to call it's join() method once you kill it.
If you want a simpler way to deal with the hanging calls in your system you can take a look at pebble.
I'm new to thread in python, i have a question that, supposed i start 3 threads like below, each one takes care of 1 different task:
def start( taskName, delay):
// do somthing with each taskName
# Create two threads as follows
try:
thread.start_new_thread( start, ("task1", ) )
thread.start_new_thread( start, ("task2", ) )
thread.start_new_thread( start, ("task3", ) )
except:
print "Error: unable to start thread"
Supposed that for each "start", it takes around 10-15 seconds to finish depending on each taskName it is. My question is that, if task 1 finishes in 12 seconds, tasks 2 in 10secs and task 3 in 15 seconds. Will task 2 finish then close and leave task 1 and task 3 to run till finish, or will task 2 force task 1 and 3 to close after task 2 is finished?
Are there any arguments that we can pass to the start_new_thread method in order to archive 2 of the things that I have mentioned above:
1. First to finish forces the rest to close.
2. Each one finish individually.
Thank you
As Max Noel already mentioned, it is advised to use the Thread class instead of using start_new_thread.
Now, as for your two questions:
1. First to finish forces the rest to close
You will need two important things: a shared queue that the threads can put their ID in once they are done. And a shared Event that will signal all threads to stop working when it is triggered. The main thread will wait for the first thread to put something in the queue and will then trigger the event to stop all threads.
import threading
import random
import time
import Queue
def work(worker_queue, id, stop_event):
while not stop_event.is_set():
print "This is worker", id
# do stuff
time.sleep(random.random() * 5)
# put worker ID in queue
if not stop_event.is_set():
worker_queue.put(id)
break
# queue for workers
worker_queue = Queue.Queue()
# indicator for other threads to stop
stop_event = threading.Event()
# run workers
threads = []
threads.append(threading.Thread(target=work, args=(worker_queue, 0, stop_event)))
threads.append(threading.Thread(target=work, args=(worker_queue, 1, stop_event)))
threads.append(threading.Thread(target=work, args=(worker_queue, 2, stop_event)))
for thread in threads:
thread.start()
# this will block until the first element is in the queue
first_finished = worker_queue.get()
print first_finished, 'was first!'
# signal the rest to stop working
stop_event.set()
2. Each finish individually
Now this is much easier. Just call the join method on all Thread objects. This will wait for each thread to finish.
for thread in threads:
thread.start()
for thread in threads:
thread.join()
Btw, the above code is for Python 2.7. Let me know if you need Python 3
First off, don't use start_new_thread, it's a low-level primitive. Use the Thread class in the threading module instead.
Once you have that, Thread instances have a .join() method, which you can call from another thread (your program's main thread) to wait for them to terminate.
t1 = Thread(target=my_func)
t1.start()
# Waits for t1 to finish.
t1.join()
All threads will terminate when the process terminates.
Thus, if your main program ends after the try..except, then all three threads may get terminated prematurely. For example:
import thread
import logging
import time
logger = logging.getLogger(__name__)
def start(taskname, n):
for i in range(n):
logger.info('{}'.format(i))
time.sleep(0.1)
if __name__ == '__main__':
logging.basicConfig(level=logging.DEBUG,
format='[%(asctime)s %(threadName)s] %(message)s',
datefmt='%H:%M:%S')
try:
thread.start_new_thread( start, ("task1", 10) )
thread.start_new_thread( start, ("task2", 5) )
thread.start_new_thread( start, ("task3", 8) )
except Exception as err:
logger.exception(err)
may print something like
[14:15:16 Dummy-3] 0
[14:15:16 Dummy-1] 0
In contrast, if you place
time.sleep(5)
at the end of the script, then you see the full expected output from all three
threads.
Note also that the thread module is a low-level module; unless you have a
particular reason for using it, most often people use the threading module which
implements more useful features for dealing with threads, such as a join
method which blocks until the thread has finished. See below for an example.
The docs state:
When the function returns, the thread silently exits.
When the function terminates with an unhandled exception, a stack trace is
printed and then the thread exits (but other threads continue to run).
Thus, by default, when one thread finishes, the others continue to run.
The example above also demonstrates this.
To make all the threads exit when one function finishes is more difficult.
One thread can not kill another thread cleanly (e.g., without killing the entire
process.)
Using threading, you could arrange for the threads to set a variable
(e.g. flag) to True when finished, and have each thread check the state of
flag periodically and quit if it is True. But note that the other threads will
not necessarily terminate immediately; they will only terminate when they next
check the state of flag. If a thread is blocked, waiting for I/O for instance,
then it may not check the flag for a considerable amount of time (if ever!).
However, if the thread spends most of its time in a quick loop, you could check the state of flag once per iteration:
import threading
import logging
import time
logger = logging.getLogger(__name__)
def start(taskname, n):
global flag
for i in range(n):
if flag:
break
logger.info('{}'.format(i))
time.sleep(0.1)
else:
# get here if loop finishes without breaking
logger.info('FINISHED')
flag = True
if __name__ == '__main__':
logging.basicConfig(level=logging.DEBUG,
format='[%(asctime)s %(threadName)s] %(message)s',
datefmt='%H:%M:%S')
threads = list()
flag = False
try:
threads.append(threading.Thread(target=start, args=("task1", 10) ))
threads.append(threading.Thread(target=start, args=("task2", 5) ))
threads.append(threading.Thread(target=start, args=("task3", 8) ))
except Exception as err:
logger.exception(err)
for t in threads:
t.start()
for t in threads:
# make the main process wait until all threads have finished.
t.join()