I have a python server that eventually needs a background process to perform an action.
It creates a child process that should be able to last more than its parent. But it shouldn't create such a child process if it is already running (it can happen if a previous parent process created it).
I can think of a couple of different aproaches to solve this problem:
Check all current running processes before creating the new one: Cross-platform way to get PIDs by process name in python
Write a file when the child process starts, delete it when it's done. Check the file before creating a child process.
But none of them seem to perfectly fit my needs. Solution (1) doesn't work well if child process is a fork of its parent. Solution (2) is ugly, it looks prone to failure.
It would be great for me to provide a fixed pid or name at process creation, so I could always look for the process in system in a fixed way and be certain if it is running or not. But I haven't found a way to do this.
"It creates a child process that should be able to last more than its parent." Don't.
Have a longer lived service process create the child for you. Talk to this service over a Unix domain socket. It then can be used to pass file descriptors to the child. The service can also trivially ensure that it only ever has a single child.
This is the pattern that can be used to eliminate the need for children that outlive their parents.
Using command names makes it trivial to do a DoS by just creating a process with the same name that does nothing. Using PID files is ambiguous due to PID reuse. Only having a supervisor that waits on its children can it properly restart them when they exit or ensure that they are running.
Related
I have a python Django manage command that should be called upon receiving an input file but this command is not safe for parallel calls. So an input file should be processed only and only when there is no other file being processed.
One solution that I have is to use a lock file. Basically, create a lock file at the start of the process and delete it at the end.
I'm worried that if the process crashes the lock file won't be deleted and consequently none of the other files would be processed until we manually remove that lock file.
The solution doesn't need to be specific for Django or even python, but what is the best practice to enforce that only one instance of this process is being run?
As KlausD mentions in his comment, the canonical (and language-agnostic) solution is to use a lock file containing the pid of the running process, so the code responsible for the lock acquisition can check if the process is still running.
An alternative solution if you use redis in your project is to store the lock in redis with a TTL that's a bit longer than the worst case runtime of the task. This makes sure the lock will be freed whatever, and also allow to easily share the lock between multiple servers if needed.
EDIT:
is it possible that the process crashes and another process pick up the same pid?
Yes, of course, and that's even rather likely (and this is an understatement) on a server running for month or more without reboot, and even more so if the server runs a lot of short-lived processes. You will not only have to check if there's a running process matching this pid but also get the process stats to inspect the process start time, the command line, the parent etc and decides the likelyhood it's the same process or a new one.
Note that this is nothing new - most process monitoring tools face the same problem, so you may want to check how they solved it (gunicorn might be a good starting point here).
I am writing a web application in Python (Django) that will execute tasks/process on the side, typically network scans. I would like the user to be able to terminate a scan, view its status or results in real-time.
I thought one of the best ways to do this is to have a job manager daemon that is a stand alone process, that:
Accepts new jobs via a TCP connection.
Accepts user-commands, typically to terminate or restart a process.
Reports on the status of a job.
I am struggling with the structure of this code. I am thinking that a TCP port on the daemon process will accept new jobs. It will then create an os.fork(), which itself will create an os.fork(). The second fork will perform an os.execv() for nmap. The first os.fork() will monitor the second fork (how?) and when it completes, it will report back to the master daemon that it has ended. The first fork must also be able to terminate the second child process.
How does that sound? Are there any structures of this already having been done? I would hate to re-create the wheel.
Finally, how would the child process know that the second child, the one running the os.execv() has terminated? Or whether its still running? I would hate to continuously poll a list of processes.
And as I've said, this must be done in Python.
I opted for a fork-based approach. This approach is "wrong", but it works and fulfills my needs.
https://gist.github.com/FarhansCode/a0f27469142b6afaa6c2
How to correctly fork a child process in twisted that does not use anything from twisted (but uses data from the parent process) (e.g. to process a “snapshot” of some data from the parent process and write it to file, without blocking)?
It seems if I do anything like clean shutdown in the child process after os.fork(), it closes some of the sockets / descriptors in the parent process; the only way to avoid that that I see is to do os.kill(os.getpid(), signal.SIGKILL), which does seem like a bad idea (though not directly problematic).
(additionally, if a dict is changed in the parent process, can it be that it will change in the child process too? Quick test shows that it doesn't change, though. OS/kernels are debian stable / sid)
IReactorProcess.spawnProcess (usually available as from twisted.internet import reactor; reactor.spawnProcess) can spawn a process running any available executable on your system. The subprocess does not need to use Twisted, or, indeed, even be in Python.
Do not call os.fork yourself. As you've discovered, it has lots of very peculiar interactions with process state, that spawnProcess will manage for you.
Among the problems with os.fork are:
Forking copies your current process state, but doesn't copy the state of threads. This means that any thread in the middle of modifying some global state will leave things half-broken, possibly holding some locks which will never be released. Don't run any threads in your application? Have you audited every library you use, every one of its dependencies, to ensure that none of them have ever or will ever use a background thread for anything?
You might think you're only touching certain areas of your application memory, but thanks to Python's reference counting, any object which you even peripherally look at (or is present on the stack) may have reference counts being incremented or decremented. Incrementing or decrementing a refcount is a write operation, which means that whole page (not just that one object) gets copied back into your process. So forked processes in Python tend to accumulate a much larger copied set than, say, forked C programs.
Many libraries, famously all of the libraries that make up the systems on macOS and iOS, cannot handle fork() correctly and will simply crash your program if you attempt to use them after fork but before exec.
There's a flag for telling file descriptors to close on exec - but no such flag to have them close on fork. So any files (including log files, and again, any background temp files opened by libraries you might not even be aware of) can get silently corrupted or truncated if you don't manage access to them carefully.
I have a problem with creating parallel program using multiprocessing. AFAIK when I start a new process using this module (multiprocessing) I should do "os.wait()" or "childProcess.join()" to get its' exit status. But placing above functions in my program can occur in stopping main process if something happens to child process (and the child process will hang).
The problem is that if I don't do that I'll get child processes go zombie (and will be listed as something like "python < defunct>" in top listing).
Is there any way to avoid waiting for child processes to end and to avoid creating zombie processes and\or not bothering the main process so much about it's child processes?
Though ars' answer should solve your immediate issues, you might consider looking at celery: http://ask.github.com/celery/index.html. It's a relatively developer-friendly approach to accomplishing these goals and more.
You may have to provide more information or actual code to figure this out. Have you been through the documentation, in particular the sections labeled "Warning"? For example, you may be facing something like this:
Warning: As mentioned above, if a child process has put items on a queue (and it has not used JoinableQueue.cancel_join_thread()), then that process will not terminate until all buffered items have been flushed to the pipe.
This means that if you try joining that process you may get a deadlock unless you are sure that all items which have been put on the queue have been consumed. Similarly, if the child process is non-daemonic then the parent process may hang on exit when it tries to join all its non-daemonic children.
Note that a queue created using a manager does not have this issue. See Programming guidelines.
Is there a way for a child process in Python to detect if the parent process has died?
If your Python process is running under Linux, and the prctl() system call is exposed, you can use the answer here.
This can cause a signal to be sent to the child when the parent process dies.
Assuming the parent is alive when you start to do this, you can check whether it is still alive in a busy loop as such, by using psutil:
import psutil, os, time
me = psutil.Process(os.getpid())
while 1:
if me.parent is not None:
# still alive
time.sleep(0.1)
continue
else:
print "my parent is gone"
Not very nice but...
The only reliable way I know of is to create a pipe specifically for this purpose. The child will have to repeatedly attempt to read from the pipe, preferably in a non-blocking fashion, or using select. It will get an error when the pipe does not exist anymore (presumably because of the parent's death).
You might get away with reading your parent process' ID very early in your process, and then checking, but of course that is prone to race conditions. The parent that did the spawn might have died immediately, and even before your process got to execute its first instruction.
Unless you have a way of verifying if a given PID refers to the "expected" parent, I think it's hard to do reliably.