This question may be slightly academic; but yet something puzzling.
Assume on a windows (8+) machine you have a process (in this case a service), say proc_0. I can send a request to this to run something specific. This is done by proc_0 starting (say) proc_a. proc_a then spawns proc_b which in turn may spawn proc_c, which again may spawn proc_d
We'll end up in this process tree:
proc_0
|_proc_a
|____proc_b
|______proc_c
|________proc_d
Let's say I have only influence what happens in proc_b.
Ok, what we want is if proc_a dies, every child of proc_a should die too.
The problem arises if proc_0 kills proc_a. On windows this "orphan" proc_b and it (and its children) stay alive.
Now, proc_b, the one and only process I can decide what to do in, is actually watching what happens, i.e. if proc_a dies it will kill its children. Fine. proc_b may know about proc_c's child proc_d, so it kills that, and then kills proc_c.
But here it goes theoretical.... before proc_c actually got killed by proc_b, proc_c just spawned another child proc_z. proc_b has now no idea of proc_z... which will get orphaned when proc_c eventually dies (remember, I cannot do anything on how proc_c behaves or even if it got killed the hard way for some other reason). Now the processes looks like this:
proc_0
proc_b (orphaned, but I know)
proc_z (orphaned, but I dunno)
proc_b I can terminate my self, but is there any way to detect that proc_z was actually started by a process I knew about and just killed (i.e. it should die too).
Ok, I can find the orphaned proc_z, but yet, how can I decide if it is ok to kill? proc_z will have no parent process, but might have an old parent pid. Even I could know this pid was actually the one I knew from proc_c, but I have no chance to actually detect if proc_z was actually started by proc_c or a completely different process that also died, where the OS just happened to reuse the pid from my dead proc_c
To relate to reality, this is about what I see on a windows jenkins slave; which caused me to insert proc_b, because cancelling a job leaves subprocesses running. There might be an update to jenkins on this, but the "theoretical" problem - to decide whether to kill this orphan or not - is still in play, I think.
Related
I have a python server that eventually needs a background process to perform an action.
It creates a child process that should be able to last more than its parent. But it shouldn't create such a child process if it is already running (it can happen if a previous parent process created it).
I can think of a couple of different aproaches to solve this problem:
Check all current running processes before creating the new one: Cross-platform way to get PIDs by process name in python
Write a file when the child process starts, delete it when it's done. Check the file before creating a child process.
But none of them seem to perfectly fit my needs. Solution (1) doesn't work well if child process is a fork of its parent. Solution (2) is ugly, it looks prone to failure.
It would be great for me to provide a fixed pid or name at process creation, so I could always look for the process in system in a fixed way and be certain if it is running or not. But I haven't found a way to do this.
"It creates a child process that should be able to last more than its parent." Don't.
Have a longer lived service process create the child for you. Talk to this service over a Unix domain socket. It then can be used to pass file descriptors to the child. The service can also trivially ensure that it only ever has a single child.
This is the pattern that can be used to eliminate the need for children that outlive their parents.
Using command names makes it trivial to do a DoS by just creating a process with the same name that does nothing. Using PID files is ambiguous due to PID reuse. Only having a supervisor that waits on its children can it properly restart them when they exit or ensure that they are running.
I was looking around on GitHub, when I stumbled across this method called daemonize() in a reverse shell example. source
What I don't quite understand is what it does in this context, wouldn't running this code from the command line as such: python example.py & not achieve the same thing?
Deamonize method source:
def daemonize():
pid = os.fork()
if pid > 0:
sys.exit(0) # Exit first parent
pid = os.fork()
if pid > 0:
sys.exit(0) # Exit second parent
A background process - running python2.7 <file>.py with the & signal - is not the same thing as a true daemon process.
A true daemon process:
Runs in the background. This also happens if you use &, and is where the similarity ends.
Is not in the same process group as the terminal. When the terminal closes, the daemon will not die either. This does not happen with & - the process remains the same, it is simply moved to the background.
Properly closes all inherited file descriptors (including input, output, etc.) so that nothing ties it back to the parent. Again, this does not happen with & - it will still write to the terminal.
Should only ideally be killed by SIGKILL, not SIGHUP. Running with & allows your process to be killed by SIGHUP.
All of this, however, is pedantry. Few tasks really require you to go to the extreme that these properties require - a background task spawned in a new terminal using screen can usually do the same job, though less efficiently, and you may as well call that a daemon in that it is a long-running background task. The only real difference between that and a true daemon is that the latter simply tries to avoid all avenues of potential death.
The code you saw simply forks the current process. Essentially, it clones the current process, kills its parent and 'acts in the background' by simply being a separate process that does not block the current execution - a bit of an ugly hack, if you ask me, but it works.
Have a look at Orphan Processes and Daemon Process. A process without a parent becomes a child of init (pid 1).
When it comes time to shut down a group of processes, say all the children of a bash instance, the OS will go about giving a sighup to the children of that bash. An orphan, forced as in this case, or other due to some accident, won't get that treatment and will stay around longer.
I want to create some worker processes and if they crash due to an exception, I would like them to respawn. Aside from the is_alive method in the multiprocessing module, I can't seem to find a way to do this.
This would require me to iterate over all the running processes (after a sleep) and check if they are alive. This is essentially a busy loop, I was wondering if there was a better solution that will wake up my program in the event that any one of my worker processes has crashed. Once it wakes up, I would like to log th exception that crashed my program and launch another process.
Polling to see if the child processes are alive should work fine, since it's a low-overhead check and you don't need to check that often.
The first answer to this (similar) question has a Python code example: Multi-server monitor/auto restarter in python
You can wrap your worker processes in try/except blocks where the except pushes a message onto a pipe before raising. Of course, polling isn't really worse than this and it's simpler.
If you're on a unix-like system, your main program can be notified of dead children by installing a signal handler. Look up your operating system's documentation on signal(), especially SIGCHLD. I'm afraid I don't remember whether Windows covers SIGCHLD with its very limited POSIX signal support.
How do I know is there my child process got hang while operating?
Well, how do you tell the difference between a stuck process and a process that takes longer than usual to complete? The short answer is: No, you can't detect if your child process is stuck.
I would say that to be able to detect this you need some kind of continuous communication with the process (e.g. look at log files, IPC or similar). Based on this communication you might be able to tell when and if a process is stuck.
I guess, you are asking, how do you find if the child process is hung while operating. You can't tell easily. A process could be doing a long running operation. The context is important to understand when a process is hung.
If you are expecting a process to respond to a user input and is not responsive for a long period then we consider it hung. Process is running probably waiting for some thing that will never happen. "Hung Process" is humanly way of saying that a program has reached a dead end and will be no more useful.
You could have a program calculating prime numbers one after another and can run for eons and can not be called a hung process.
I have a problem with creating parallel program using multiprocessing. AFAIK when I start a new process using this module (multiprocessing) I should do "os.wait()" or "childProcess.join()" to get its' exit status. But placing above functions in my program can occur in stopping main process if something happens to child process (and the child process will hang).
The problem is that if I don't do that I'll get child processes go zombie (and will be listed as something like "python < defunct>" in top listing).
Is there any way to avoid waiting for child processes to end and to avoid creating zombie processes and\or not bothering the main process so much about it's child processes?
Though ars' answer should solve your immediate issues, you might consider looking at celery: http://ask.github.com/celery/index.html. It's a relatively developer-friendly approach to accomplishing these goals and more.
You may have to provide more information or actual code to figure this out. Have you been through the documentation, in particular the sections labeled "Warning"? For example, you may be facing something like this:
Warning: As mentioned above, if a child process has put items on a queue (and it has not used JoinableQueue.cancel_join_thread()), then that process will not terminate until all buffered items have been flushed to the pipe.
This means that if you try joining that process you may get a deadlock unless you are sure that all items which have been put on the queue have been consumed. Similarly, if the child process is non-daemonic then the parent process may hang on exit when it tries to join all its non-daemonic children.
Note that a queue created using a manager does not have this issue. See Programming guidelines.