I am trying to restart a python process using multiprocessing module, but "AssertionError: cannot start a process twice" appears.
My question
How can I restart the process
Once its terminated why it is going to zombie mod
How can I remove the zombie process
import time
from multiprocessing import Process
def worker ():
while True:
print "Inside the worker"
time.sleep(10)
p1 = Process(target=worker,name="worker")
p1.start()
#p1.join()
time.sleep(3)
p1.terminate()
print "after Termination "
time.sleep(3)
p1.start()
Actually I am trying to create a process monitor function to watch the memory and CPU usage of all processes . If it reach a certain level I want to restart on realtime
How can I restart the process?
You cannot restart a terminated process. You need to instantiate a new process.
Once its terminated why it is going to zombie mod?
Because on Unix-y systems the parent process needs to read the exit-code before the kernel clears the corresponding entry from the process table.
How can I remove the zombie process?
You have multiple options. I'm citing the docs here:
Joining zombie processes
On Unix when a process finishes but has not been joined it becomes a zombie. There should never be very many because each time a new process starts (or active_children() is called) all completed processes which have not yet been joined will be joined. Also calling a finished process’s Process.is_alive will join the process. Even so it is probably good practice to explicitly join all the processes that you start.
Actually I am trying to create a process monitor function to watch the memory and CPU usage of all processes.
You should take a look at the psutil module for that.
In case you just want to suspend (not kill) processes if memory consumption gets to high, you might be able to draw some inspiration from my answer here.
I hope it will help you
import time
from multiprocessing import Process
def worker ():
while True:
print "Inside the worker"
time.sleep(10)
def proc_start():
p_to_start = Process(target=worker,name="worker")
p_to_start.start()
return p_to_start
def proc_stop(p_to_stop):
p_to_stop.terminate()
print "after Termination "
p = proc_start()
time.sleep(3)
proc_stop(p)
time.sleep(3)
p = proc_start()
print "start gain"
time.sleep(3)
proc_stop(p)
terminate() process will not allow to restart the process but kill() process can be used and the process can be restarted. it works
import time
from multiprocessing import Process
def worker ():
while True:
print "Inside the worker"
time.sleep(10)
p1 = Process(target=worker,name="worker")
p1.start()
#p1.join()
time.sleep(3)
p1.kill()
print "after kill"
time.sleep(3)
p1.start()
Related
I am trying to learn multiprocessing, and created an example, however it's behaving unexpectedly.
the parent process run, then create a child process, but resources doesnt go back to parent until child is done.
code:
from multiprocessing import Process
import time
def f():
newTime = time.time() + 7
while(time.time() < newTime):
print("inside child process")
time.sleep(int(5))
if __name__ == '__main__':
bln = True
while(True):
newTime = time.time() + 4
while(time.time() < newTime):
print("printing fillers")
if(bln):
p = Process(target=f)
p.start()
p.join()
bln = False
result
"inside child process"
(wait for 5 sec)
"inside child process"
"printing fillers"
"printing fillers"
[...]
If I remove 'p.join()' then it will work. But from my understanding, p.join() is to tell the program to wait for this thread/process to finish before ending the program.
Can someone tell me why this is happening?
But from my understanding, p.join() is to tell the program to wait for
this thread/process to finish before ending the program.
Nope, It blocks the main thread right then and there until the thread / process finishes. By doing that right after you start the process, you don't let the loop continue until each process completes.
It would be better to collect all the Process objects you create into a list, so they can be accessed after the loop creating them. Then in a new loop, wait for them to finish only after they are all created and started.
#for example
processes = []
for i in whatever:
p = Process(target=foo)
p.start()
processes.append(p)
for p in processes:
p.join()
If you want to be able to do things in the meantime (while waiting for join), it is most common to use yet another thread or process. You can also choose to only wait a short time on join by giving it a timeout value, and if the process doesn't complete in that amount of time, an exception will be thrown which you can catch with a try block, and decide to go do something else before trying to join again.
p.join() isn't for ending the program, it's for waiting for a subprocess to finish. If you need to end the program, use something like sys.exit(0) or raise SystemExit('your reason here')
I've been trying to write an interactive wrapper (for use in ipython) for a library that controls some hardware. Some calls are heavy on the IO so it makes sense to carry out the tasks in parallel. Using a ThreadPool (almost) works nicely:
from multiprocessing.pool import ThreadPool
class hardware():
def __init__(IPaddress):
connect_to_hardware(IPaddress)
def some_long_task_to_hardware(wtime):
wait(wtime)
result = 'blah'
return result
pool = ThreadPool(processes=4)
Threads=[]
h=[hardware(IP1),hardware(IP2),hardware(IP3),hardware(IP4)]
for tt in range(4):
task=pool.apply_async(h[tt].some_long_task_to_hardware,(1000))
threads.append(task)
alive = [True]*4
Try:
while any(alive) :
for tt in range(4): alive[tt] = not threads[tt].ready()
do_other_stuff_for_a_bit()
except:
#some command I cannot find that will stop the threads...
raise
for tt in range(4): print(threads[tt].get())
The problem comes if the user wants to stop the process or there is an IO error in do_other_stuff_for_a_bit(). Pressing Ctrl+C stops the main process but the worker threads carry on running until their current task is complete.
Is there some way to stop these threads without having to rewrite the library or have the user exit python? pool.terminate() and pool.join() that I have seen used in other examples do not seem to do the job.
The actual routine (instead of the simplified version above) uses logging and although all the worker threads are shut down at some point, I can see the processes that they started running carry on until complete (and being hardware I can see their effect by looking across the room).
This is in python 2.7.
UPDATE:
The solution seems to be to switch to using multiprocessing.Process instead of a thread pool. The test code I tried is to run foo_pulse:
class foo(object):
def foo_pulse(self,nPulse,name): #just one method of *many*
print('starting pulse for '+name)
result=[]
for ii in range(nPulse):
print('on for '+name)
time.sleep(2)
print('off for '+name)
time.sleep(2)
result.append(ii)
return result,name
If you try running this using ThreadPool then ctrl-C does not stop foo_pulse from running (even though it does kill the threads right away, the print statements keep on coming:
from multiprocessing.pool import ThreadPool
import time
def test(nPulse):
a=foo()
pool=ThreadPool(processes=4)
threads=[]
for rn in range(4) :
r=pool.apply_async(a.foo_pulse,(nPulse,'loop '+str(rn)))
threads.append(r)
alive=[True]*4
try:
while any(alive) : #wait until all threads complete
for rn in range(4):
alive[rn] = not threads[rn].ready()
time.sleep(1)
except : #stop threads if user presses ctrl-c
print('trying to stop threads')
pool.terminate()
print('stopped threads') # this line prints but output from foo_pulse carried on.
raise
else :
for t in threads : print(t.get())
However a version using multiprocessing.Process works as expected:
import multiprocessing as mp
import time
def test_pro(nPulse):
pros=[]
ans=[]
a=foo()
for rn in range(4) :
q=mp.Queue()
ans.append(q)
r=mp.Process(target=wrapper,args=(a,"foo_pulse",q),kwargs={'args':(nPulse,'loop '+str(rn))})
r.start()
pros.append(r)
try:
for p in pros : p.join()
print('all done')
except : #stop threads if user stops findRes
print('trying to stop threads')
for p in pros : p.terminate()
print('stopped threads')
else :
print('output here')
for q in ans :
print(q.get())
print('exit time')
Where I have defined a wrapper for the library foo (so that it did not need to be re-written). If the return value is not needed the neither is this wrapper :
def wrapper(a,target,q,args=(),kwargs={}):
'''Used when return value is wanted'''
q.put(getattr(a,target)(*args,**kwargs))
From the documentation I see no reason why a pool would not work (other than a bug).
This is a very interesting use of parallelism.
However, if you are using multiprocessing, the goal is to have many processes running in parallel, as opposed to one process running many threads.
Consider these few changes to implement it using multiprocessing:
You have these functions that will run in parallel:
import time
import multiprocessing as mp
def some_long_task_from_library(wtime):
time.sleep(wtime)
class MyException(Exception): pass
def do_other_stuff_for_a_bit():
time.sleep(5)
raise MyException("Something Happened...")
Let's create and start the processes, say 4:
procs = [] # this is not a Pool, it is just a way to handle the
# processes instead of calling them p1, p2, p3, p4...
for _ in range(4):
p = mp.Process(target=some_long_task_from_library, args=(1000,))
p.start()
procs.append(p)
mp.active_children() # this joins all the started processes, and runs them.
The processes are running in parallel, presumably in a separate cpu core, but that is to the OS to decide. You can check in your system monitor.
In the meantime you run a process that will break, and you want to stop the running processes, not leaving them orphan:
try:
do_other_stuff_for_a_bit()
except MyException as exc:
print(exc)
print("Now stopping all processes...")
for p in procs:
p.terminate()
print("The rest of the process will continue")
If it doesn't make sense to continue with the main process when one or all of the subprocesses have terminated, you should handle the exit of the main program.
Hope it helps, and you can adapt bits of this for your library.
In answer to the question of why pool did not work then this is due to (as quoted in the Documentation) then main needs to be importable by the child processes and due to the nature of this project interactive python is being used.
At the same time it was not clear why ThreadPool would - although the clue is right there in the name. ThreadPool creates its pool of worker processes using multiprocessing.dummy which as noted here is just a wrapper around the Threading module. Pool uses the multiprocessing.Process. This can be seen by this test:
p=ThreadPool(processes=3)
p._pool[0]
<DummyProcess(Thread23, started daemon 12345)> #no terminate() method
p=Pool(processes=3)
p._pool[0]
<Process(PoolWorker-1, started daemon)> #has handy terminate() method if needed
As threads do not have a terminate method the worker threads carry on running until they have completed their current task. Killing threads is messy (which is why I tried to use the multiprocessing module) but solutions are here.
The one warning about the solution using the above:
def wrapper(a,target,q,args=(),kwargs={}):
'''Used when return value is wanted'''
q.put(getattr(a,target)(*args,**kwargs))
is that changes to attributes inside the instance of the object are not passed back up to the main program. As an example the class foo above can also have methods such as:
def addIP(newIP):
self.hardwareIP=newIP
A call to r=mp.Process(target=a.addIP,args=(127.0.0.1)) does not update a.
The only way round this for a complex object seems to be shared memory using a custom manager which can give access to both the methods and attributes of object a For a very large complex object based on a library this may be best done using dir(foo) to populate the manager. If I can figure out how I'll update this answer with an example (for my future self as much as others).
If for some reasons using threads is preferable, we can use this.
We can send some siginal to the threads we want to terminate. The simplest siginal is global variable:
import time
from multiprocessing.pool import ThreadPool
_FINISH = False
def hang():
while True:
if _FINISH:
break
print 'hanging..'
time.sleep(10)
def main():
global _FINISH
pool = ThreadPool(processes=1)
pool.apply_async(hang)
time.sleep(10)
_FINISH = True
pool.terminate()
pool.join()
print 'main process exiting..'
if __name__ == '__main__':
main()
I am trying to write a Python multi-threaded script that does the following two things in different threads:
Parent: Start Child Thread, Do some simple task, Stop Child Thread
Child: Do some long running task.
Below is a simple way to do it. And it works for me:
from multiprocessing import Process
import time
def child_func():
while not stop_thread:
time.sleep(1)
if __name__ == '__main__':
child_thread = Process(target=child_func)
stop_thread = False
child_thread.start()
time.sleep(3)
stop_thread = True
child_thread.join()
But a complication arises because in actuality, instead of the while-loop in child_func(), I need to run a single long-running process that doesn't stop unless it is killed by Ctrl-C. So I cannot periodically check the value of stop_thread in there. So how can I tell my child process to end when I want it to?
I believe the answer has to do with using signals. But I haven't seen a good example of how to use them in this exact situation. Can someone please help by modifying my code above to use signals to communicate between the Child and the Parent thread. And making the child-thread terminate iff the user hits Ctrl-C.
There is no need to use the signal module here unless you want to do cleanup on your child process. It is possible to stop any child processes using the terminate method (which has the same effect as SIGTERM)
from multiprocessing import Process
import time
def child_func():
time.sleep(1000)
if __name__ == '__main__':
event = Event()
child_thread = Process(target=child_func)
child_thread.start()
time.sleep(3)
child_thread.terminate()
child_thread.join()
The docs are here: https://docs.python.org/2/library/multiprocessing.html#multiprocessing.Process.terminate
I have some code that needs to run against several other systems that may hang or have problems not under my control. I would like to use python's multiprocessing to spawn child processes to run independent of the main program and then when they hang or have problems terminate them, but I am not sure of the best way to go about this.
When terminate is called it does kill the child process, but then it becomes a defunct zombie that is not released until the process object is gone. The example code below where the loop never ends works to kill it and allow a respawn when called again, but does not seem like a good way of going about this (ie multiprocessing.Process() would be better in the __init__()).
Anyone have a suggestion?
class Process(object):
def __init__(self):
self.thing = Thing()
self.running_flag = multiprocessing.Value("i", 1)
def run(self):
self.process = multiprocessing.Process(target=self.thing.worker, args=(self.running_flag,))
self.process.start()
print self.process.pid
def pause_resume(self):
self.running_flag.value = not self.running_flag.value
def terminate(self):
self.process.terminate()
class Thing(object):
def __init__(self):
self.count = 1
def worker(self,running_flag):
while True:
if running_flag.value:
self.do_work()
def do_work(self):
print "working {0} ...".format(self.count)
self.count += 1
time.sleep(1)
You might run the child processes as daemons in the background.
process.daemon = True
Any errors and hangs (or an infinite loop) in a daemon process will not affect the main process, and it will only be terminated once the main process exits.
This will work for simple problems until you run into a lot of child daemon processes which will keep reaping memories from the parent process without any explicit control.
Best way is to set up a Queue to have all the child processes communicate to the parent process so that we can join them and clean up nicely. Here is some simple code that will check if a child processing is hanging (aka time.sleep(1000)), and send a message to the queue for the main process to take action on it:
import multiprocessing as mp
import time
import queue
running_flag = mp.Value("i", 1)
def worker(running_flag, q):
count = 1
while True:
if running_flag.value:
print(f"working {count} ...")
count += 1
q.put(count)
time.sleep(1)
if count > 3:
# Simulate hanging with sleep
print("hanging...")
time.sleep(1000)
def watchdog(q):
"""
This check the queue for updates and send a signal to it
when the child process isn't sending anything for too long
"""
while True:
try:
msg = q.get(timeout=10.0)
except queue.Empty as e:
print("[WATCHDOG]: Maybe WORKER is slacking")
q.put("KILL WORKER")
def main():
"""The main process"""
q = mp.Queue()
workr = mp.Process(target=worker, args=(running_flag, q))
wdog = mp.Process(target=watchdog, args=(q,))
# run the watchdog as daemon so it terminates with the main process
wdog.daemon = True
workr.start()
print("[MAIN]: starting process P1")
wdog.start()
# Poll the queue
while True:
msg = q.get()
if msg == "KILL WORKER":
print("[MAIN]: Terminating slacking WORKER")
workr.terminate()
time.sleep(0.1)
if not workr.is_alive():
print("[MAIN]: WORKER is a goner")
workr.join(timeout=1.0)
print("[MAIN]: Joined WORKER successfully!")
q.close()
break # watchdog process daemon gets terminated
if __name__ == '__main__':
main()
Without terminating worker, attempt to join() it to the main process would have blocked forever since worker has never finished.
The way Python multiprocessing handles processes is a bit confusing.
From the multiprocessing guidelines:
Joining zombie processes
On Unix when a process finishes but has not been joined it becomes a zombie. There should never be very many because each time a new process starts (or active_children() is called) all completed processes which have not yet been joined will be joined. Also calling a finished process’s Process.is_alive will join the process. Even so it is probably good practice to explicitly join all the processes that you start.
In order to avoid a process to become a zombie, you need to call it's join() method once you kill it.
If you want a simpler way to deal with the hanging calls in your system you can take a look at pebble.
I have a script that does a bunch of things and I want to spawn a thread that monitors the cpu and memory usage of what's happening.
The monitoring portion is:
import psutil
import time
import datetime
def MonitorProcess():
procname = "firefox"
while True:
output_sys = open("/tmp/sysstats_counter.log", 'a')
for proc in psutil.process_iter():
if proc.name == procname:
p = proc
p.cmdline
proc_rss, proc_vms = p.get_memory_info()
proc_cpu = p.get_cpu_percent(1)
scol1 = str(proc_rss / 1024)
scol2 = str(proc_cpu)
now = str(datetime.datetime.now())
output_sys.write(scol1)
output_sys.write(", ")
output_sys.write(scol2)
output_sys.write(", ")
output_sys.write(now)
output_sys.write("\n")
output_sys.close( )
time.sleep(1)
I'm sure there's a better way to do the monitoring but I don't care at this point.
The main script calls:
RunTasks() # which runs the forground tasks
MonitorProcess() # Which is intended to monitor the tasks CPU and Memory Usage over time
I want to run both functions simultaneously. To do this I assume that I have to use the threading library. Is the approach then to do something like:
thread = threading.Thread(target=MonitorProcess())
thread.start
Or am I way off?
Also when the RunTasks() function finishes how do I get MonitorProcess() to automatically stop? I assume I could test for the process to be present and if it's not kill the function???
It sounds like you want a daemon thread. From the docs:
A thread can be flagged as a “daemon thread”. The significance of this flag is that the entire Python program exits when only daemon threads are left. The initial value is inherited from the creating thread. The flag can be set through the daemon property.
In your code:
thread = threading.Thread(target=MonitorProcess)
thread.daemon = True
thread.start()
The program will exit when main exits, even if the daemon thread is still active. You will want to run your foreground tasks after you set up and start your monitoring thread.