I wrote the following code, but I don't understand how it works very well:
NUM=8
def timec():
x=1000000
while x>0:
x-=1
pid_children=[]
start_time=time.time()
for i in range(NUM):
pid=os.fork()
if pid==0:
timec()
os._exit(0)
else:
pid_children.append(pid)
for j in pid_children:
os.waitpid(j,0)
print(time.time()-start_time)
I cannot understand where the child process starts or where it will finish.
And another question is will the waitpid() method wait for the child process to finish its work, or will it just return as soon as it is called?
When os.fork() is called, the program splits into two completely separate programs. In the child, os.fork() returns 0. In the parent, os.fork() returns the process id of the child.
The key distinction about os.fork() is that it does not create a new thread that shares the memory of the original thread, but instead creates an entirely new process. The new process has a copy of the memory of it's parent. Updates in the parent are not reflected in the child and updates in the child are not reflected in the parent! The each have their own state.
Given that context, here are the answers to your specific questions:
Where do the child processes start?
pid = os.fork()
This will generate more than NUM processes because after the first iteration you will have 2 processes inside of the for loop, each of which will fork into 2 processes, yielding 4 total processes after the second iteration. In total 256 (2^8) processes will be created!
Where do the child processes end?
Some will exit at:
os._exit(0)
Others will exit at the end of the file. That's because you overwrote pid in the subsequent iterations of the loop, so some children became orphaned (and never ran timec()).
pid_children will always only have a single process in it. That's because the entire state of the program is forked, and each fork (which has it's own copy of the list) only adds one element to the list.
What does waitpid do?
os.waitpid(pid) will block until the process with pid pid has completed.
os.fork() documentation
os.waitpid() documentation
Related
while True:
pid = os.getpid()
try:
pool = mp.Pool(processes=1, maxtasksperchild=1)
result = pool.apply_async(my_func, args=())
result.get(timeout=60)
pool.close()
except multiprocessing.context.TimeoutError:
traceback.print_exc()
kill_proc_tree(pid)
def kill_proc_tree(pid):
parent = psutil.Process(pid)
children = parent.children(recursive=True)
for child in children:
child.kill()
I am using the multiprocessing library and am trying to spawn a new process everytime my_func finishes running, throws an exception, or has ran longer than 60 seconds (result.get(timeout=60) should throw an exception). Since I want to keep the while loop running but also avoid having zombie processes, I need to be able to keep the parent process running but at the same time, kill all child processes if an exception is thrown in the parent process or the child process, or the child process finishes before spawning a new process.The kill_proc_tree function that I found online was supposed to tackle the issue which it seemed to do at first (my_func opens a new window when a process begins and closes the window when the process supposedly ends), but then I realized that in my Task Manager, the Python Script is still taking up my memory and after enough multiprocessing.context.TimeoutError errors (they are thrown by the parent process), my memory becomes full.
So what I should I do to solve this problem? Any help would be greatly appreciated!
The solution should be as simple as calling method terminate on the pool for all exceptions and not just for a TimeoutError since result.get(timeout=60) can throw an arbitrary exception if your my_func completes before the 60 seconds with an exception.
Note that according to the documentation the terminate method "stops the worker processes immediately without completing outstanding work" and will be implicitly called when the context handler for the pool is exited as in the following example:
import multiprocessing
while True:
try:
with multiprocessing.Pool(processes=1, maxtasksperchild=1) as pool:
result = pool.apply_async(my_func, args=())
result.get(timeout=60)
except Exception:
pass
Specifying the maxtasksperchild=1 parameter to the Pool constructor seems somewhat superfluous since you are never submitting more than one task to the pool anyway.
I want to run a function in python in a new process, do some work, return progress to the main process using a queue and wait on the main process for termination of the spawned process and then continue execution of the main process.
I got the following code, which runs the function foo in a new process and returns progress using a queue:
import multiprocessing as mp
import time
def foo(queue):
for i in range(10):
queue.put(i)
time.sleep(1)
if __name__ == '__main__':
mp.set_start_method('spawn')
queue = mp.Queue()
p = mp.Process(target=foo, args=(queue,))
p.start()
while p.is_alive():
print("ALIVE")
print(queue.get())
time.sleep(0.01)
print("Process finished")
The output is:
ALIVE
0
ALIVE
1
ALIVE
2
ALIVE
3
ALIVE
4
ALIVE
5
ALIVE
6
ALIVE
7
ALIVE
8
ALIVE
9
ALIVE
At some point neither "Alive" nor "Process finished" is printed. How can I continue execution when the spawned process stops running?
*Edit
The problem was that I didn't know that queue.get() blocks until an item is put into the queue if the queue is empty. I fixed it by changing
while p.is_alive():
print(queue.get())
time.sleep(0.01)
to
while p.is_alive():
if not queue.empty():
print(queue.get())
time.sleep(0.01)
Your code has a race condition. After the last number is put into the queue, the child process sleeps one more time before it exits. That gives the parent process enough time to fetch that option, sleep for a shorter time, and then conclude that the child is still alive before waiting for an 11th item that never comes.
Note that you get more ALIVE reports in your output than you do numbers. That tells you where the parent process is deadlocked.
There are a few possible ways you could fix the issue. You could change the foo function to sleep first, and put the item into the queue afterwards. That would make it so that it could quit running immediately after sending the 9 to its parent, which would probably allow it to avoid the race condition (since the parent does sleep for a short time after receiving each item). There would still be a small possibility of the race happening if things behaved very strangely, but it's quite unlikely.
A better approach might be to prevent the possibility of the race from occurring at all. For example, you might change the queue.get call to have a timeout set, so that it will give up (with a queue.Empty exception) if there's nothing to retrieve for too long. You could catch that exception immediately, or even use it as a planned method of breaking out of the loop rather than testing if the child is still alive or not, and catching it at a higher level.
A final option might be to send a special sentinel value from the child to the parent in the queue to signal when there will be no further values coming. For instance, you might send None as the last value, just before the foo function ends. The parent code could check for that specific value and break out if its loop, rather than treating it like a normal value (and e.g. printing it). This sort of positive signal that the child code is done might be better than the negative signal of a timeout, since it's less likely for something going wrong (e.g. the child crashing) being misinterpreted as the expected shutdown.
I have a python code where the main process creates a child process. There is a shared queue between the two processes. The child process writes some data to this shared queue. The main process join()s on the child process.
If the data in the queue is not removed with get(), the child process does not terminate and the main is blocked at join(). Why is it so.
Following is the code that I used :
from multiprocessing import Process, Queue
from time import *
def f(q):
q.put([42, None, 'hello', [x for x in range(100000)]])
print (q.qsize())
#q.get()
print (q.qsize())
q = Queue()
print (q.qsize())
p = Process(target=f, args=(q,))
p.start()
sleep(1)
#print (q.get())
print('bef join')
p.join()
print('aft join')
At present the q.get() is commented and so the output is :
0
1
1
bef join
and then the code is blocked.
But if I uncomment one of the q.get() invocations, then the code runs completely with the following output :
0
1
0
bef join
aft join
Well, if you look at the Queue documentation, it explicitly says that
Queue.join : Blocks until all items in the queue have been gotten and processed. It seems logic to me that join() blocks your program if you don't empty the Queue.
To me, you need to learn about the philosophy of Multiprocessing. You have several tasks to do that don't need each other to be run, and your program at the moment is too slow for you. You need to use Multiprocess !
But don't forget there will (trust me) come a time when you will need to wait until some parallel computations are all done, because you need all of these elements to do your next task. And that's where, in your case, join() comes in. You are basically saying : I was doing things asynchronously. But now, my next task needs to be synced with the different items I computed before. Let's wait here until they are all ready.
I'am having problemes to kill the child processes from a fork spawn inside an also spawned thread:
_td = threading.Thread(target=updateProxies,args=())
_td.start()
def updateProxies():
quota = 25
children = []
sons = 0
for i in range(50):
pid = os.fork()
if pid:
children.append(pid)
sons+=1
if sons >= quota:
os.wait()
sons-=1
else:
{CHILD CODE EXECUTION} #database calls, and network requests
sys.exit()
for x in children:
os.waitpid(x,0)
When I run the code above, the parent from the children stops at the "os.waitpid(x,0)" line, and never resumes from there. And yes, I tracked all the children until they die at their respectively sys.exit(), but waitpid never gets informed about their death and my parent process never resumes!
When doing ps -ef, the childs processes are (defunct) aren't they diyng?
IMPORTANT: when I execute the function from the main thread, everything goes fine. How to deal with it?
FOUND THE ANSWER:
Had to exit the fork processes with:
os._exit(0)
not with
sys.exit()
When using python-daemon, I'm creating subprocesses likeso:
import multiprocessing
class Worker(multiprocessing.Process):
def __init__(self, queue):
self.queue = queue # we wait for things from this in Worker.run()
...
q = multiprocessing.Queue()
with daemon.DaemonContext():
for i in xrange(3):
Worker(q)
while True: # let the Workers do their thing
q.put(_something_we_wait_for())
When I kill the parent daemonic process (i.e. not a Worker) with a Ctrl-C or SIGTERM, etc., the children don't die. How does one kill the kids?
My first thought is to use atexit to kill all the workers, likeso:
with daemon.DaemonContext():
workers = list()
for i in xrange(3):
workers.append(Worker(q))
#atexit.register
def kill_the_children():
for w in workers:
w.terminate()
while True: # let the Workers do their thing
q.put(_something_we_wait_for())
However, the children of daemons are tricky things to handle, and I'd be obliged for thoughts and input on how this ought to be done.
Thank you.
Your options are a bit limited. If doing self.daemon = True in the constructor for the Worker class does not solve your problem and trying to catch signals in the Parent (ie, SIGTERM, SIGINT) doesn't work, you may have to try the opposite solution - instead of having the parent kill the children, you can have the children commit suicide when the parent dies.
The first step is to give the constructor to Worker the PID of the parent process (you can do this with os.getpid()). Then, instead of just doing self.queue.get() in the worker loop, do something like this:
waiting = True
while waiting:
# see if Parent is at home
if os.getppid() != self.parentPID:
# woe is me! My Parent has died!
sys.exit() # or whatever you want to do to quit the Worker process
try:
# I picked the timeout randomly; use what works
data = self.queue.get(block=False, timeout=0.1)
waiting = False
except queue.Queue.Empty:
continue # try again
# now do stuff with data
The solution above checks to see if the parent PID is different than what it originally was (that is, if the child process was adopted by init or lauchd because the parent died) - see reference. However, if that doesn't work for some reason you can replace it with the following function (adapted from here):
def parentIsAlive(self):
try:
# try to call Parent
os.kill(self.parentPID, 0)
except OSError:
# *beeep* oh no! The phone's disconnected!
return False
else:
# *ring* Hi mom!
return True
Now, when the Parent dies (for whatever reason), the child Workers will spontaneously drop like flies - just as you wanted, you daemon! :-D
You should store the parent pid when the child is first created (let's say in self.myppid) and when self.myppid is diferent from getppid() means that the parent died.
To avoid checking if the parent has changed over and over again, you can use PR_SET_PDEATHSIG that is described in the signals documentation.
5.8 The Linux "parent death" signal
For each process there is a variable pdeath_signal, that is
initialized to 0 after fork() or clone(). It gives the signal that the
process should get when its parent dies.
In this case, you want your process to die, you can just set it to a SIGHUP, like this:
prctl(PR_SET_PDEATHSIG, SIGHUP);
Atexit won't do the trick -- it only gets run on successful non-signal termination -- see the note near the top of the docs. You need to set up signal handling via one of two means.
The easier-sounding option: set the daemon flag on your worker processes, per http://docs.python.org/library/multiprocessing.html#process-and-exceptions
Somewhat harder-sounding option: PEP-3143 seems to imply there is a built-in way to hook program cleanup needs in python-daemon.