Note this question is not the same as Python Subprocess.Popen from a thread, because that question didn't seek an explanation on why it is ok.
If I understand correctly, subprocess.Popen() creates a new process by forking the current process and execv new program.
However, if the current process is multithreaded, and we call subprocess.Popen() in one of the thread, won't it be duplicating all the threads in the current process (because it calls syscall fork())? If it's the case, though these duplicated threads will be wiped out after syscall execv, there's a time gap in which the duplicated threads can do a bunch of nasty stuff.
A case in point is gtest_parallel.py, where the program creates a bunch of threads in execute_tasks(), and in each thread task_manager.run_task(task) will call task.run(), which calls subprocess.Popen() to run a task. Is it ok?
The question applies to other fork-in-thread programs, not just Python.
Forking only results in the calling thread being active in the fork, not all threads.. Most of the pitfalls related to forking in a multi-threaded program are related to mutexes being held by other threads that will never be released in the fork. When you're using Popen, you're going to launch some unrelated process once you execv, so that's not really a concern. There is a warning in the Popen docs about being careful with multiple threads and the preexec_fn parameter, which runs before the execv call happens:
Warning The preexec_fn parameter is not safe to use in the presence of
threads in your application. The child process could deadlock before
exec is called. If you must use it, keep it trivial! Minimize the
number of libraries you call into.
I'm not aware of any other pitfalls to watch out for with Popen, at least in recent versions of Python. Python 2.7's subprocess module does seem to have flaws that can cause issues with multi-threaded applications, however.
Related
I'm using OpenERP, a Python based ERP, which uses different threads (one-thread per client, etc). I would like to use multiprocessing.Process() to fork() and call a long-running method.
My question is: what will happen to the parent's threads? Will they be copied and continue to run? Will the child process call accept() on the server socket?
Thanks for your answers,
Forking does not copy threads, only the main one. So be very careful with forking multithreaded application as it can cause unpredictable side-effects (e.g when forking happened while some thread was executing in a mutexed critical section), something really can be broken in your forked process unless you know the code you're forking ideally.
Though everything that I said above is true, there's a workaround (at least on Linux) called pthread_atfork() which acts as a callback when a process was forked (you can recreate all needed threads). Though it applies to C applications, it's not applied to Python ones.
For further information you can refer to:
Python issue tracker on this problem - http://bugs.python.org/issue6923
Seek around the web on similar ideas implementation, for example: http://code.google.com/p/python-atfork/
This question already has answers here:
Is there any way to kill a Thread?
(31 answers)
Closed 6 years ago.
I start a thread using the following code.
t = thread.start_new_thread(myfunction)
How can I kill the thread t from another thread. So basically speaking in terms of code, I want to be able to do something like this.
t.kill()
Note that I'm using Python 2.4.
In Python, you simply cannot kill a Thread.
If you do NOT really need to have a Thread (!), what you can do, instead of using the threading package (http://docs.python.org/2/library/threading.html), is to use the multiprocessing package (http://docs.python.org/2/library/multiprocessing.html). Here, to kill a process, you can simply call the method:
yourProcess.terminate() # kill the process!
Python will kill your process (on Unix through the SIGTERM signal, while on Windows through the TerminateProcess() call). Pay attention to use it while using a Queue or a Pipe! (it may corrupt the data in the Queue/Pipe)
Note that the multiprocessing.Event and the multiprocessing.Semaphore work exactly in the same way of the threading.Event and the threading.Semaphore respectively. In fact, the first ones are clones of the latters.
If you REALLY need to use a Thread, there is no way to kill your threads directly. What you can do, however, is to use a "daemon thread". In fact, in Python, a Thread can be flagged as daemon:
yourThread.daemon = True # set the Thread as a "daemon thread"
The main program will exit when no alive non-daemon threads are left. In other words, when your main thread (which is, of course, a non-daemon thread) will finish its operations, the program will exit even if there are still some daemon threads working.
Note that it is necessary to set a Thread as daemon before the start() method is called!
Of course you can, and should, use daemon even with multiprocessing. Here, when the main process exits, it attempts to terminate all of its daemonic child processes.
Finally, please, note that sys.exit() and os.kill() are not choices.
If your thread is busy executing Python code, you have a bigger problem than the inability to kill it. The GIL will prevent any other thread from even running whatever instructions you would use to do the killing. (After a bit of research, I've learned that the interpreter periodically releases the GIL, so the preceding statement is bogus. The remaining comment stands, however.)
Your thread must be written in a cooperative manner. That is, it must periodically check in with a signalling object such as a semaphore, which the main thread can use to instruct the worker thread to voluntarily exit.
while not sema.acquire(False):
# Do a small portion of work…
or:
for item in work:
# Keep working…
# Somewhere deep in the bowels…
if sema.acquire(False):
thread.exit()
You can't kill a thread from another thread. You need to signal to the other thread that it should end. And by "signal" I don't mean use the signal function, I mean that you have to arrange for some communication between the threads.
Having worked out painfully that there is a race hazard in a multi-threaded program between opening a file and setting the 'close on exec' bit in one thread and calling subprocess.Popen in another thread - which can result in unexpected handles being passed to the 2nd child, it seems to me I need to protect this access with a lock (I know closing all the handles is possible from subprocess.Popen but that might be overkill).
Is that going to be safe? The subprocess is going to exec a shell immediately but I'm not sure how python threading locks behave in that sort of situation.
PS I know linux has a 'close on exec' bit for open, but I'm not running on linux, and anyway, the python tempfile (or at least the 2.6 one) doesn't use that facility.
Ideally of course, python would deal with that nastiness itself, but I can't find anything suggesting it might.
It sounds quite safe. If you do
with my_exec_lock:
open_file()
set_coe()
in one thread and
with my_exec_lock:
popen()
in the other, you should be safe.
But be aware that whis way, the 1st thread might be blocked until popen() is finished.
Maybe one of the other Threading mechanisms could be more appropriate.
I want to create some worker processes and if they crash due to an exception, I would like them to respawn. Aside from the is_alive method in the multiprocessing module, I can't seem to find a way to do this.
This would require me to iterate over all the running processes (after a sleep) and check if they are alive. This is essentially a busy loop, I was wondering if there was a better solution that will wake up my program in the event that any one of my worker processes has crashed. Once it wakes up, I would like to log th exception that crashed my program and launch another process.
Polling to see if the child processes are alive should work fine, since it's a low-overhead check and you don't need to check that often.
The first answer to this (similar) question has a Python code example: Multi-server monitor/auto restarter in python
You can wrap your worker processes in try/except blocks where the except pushes a message onto a pipe before raising. Of course, polling isn't really worse than this and it's simpler.
If you're on a unix-like system, your main program can be notified of dead children by installing a signal handler. Look up your operating system's documentation on signal(), especially SIGCHLD. I'm afraid I don't remember whether Windows covers SIGCHLD with its very limited POSIX signal support.
I'm using the Jython 2.51 implementation of Python to write a script that repeatedly invokes another process via subprocess.Popen and uses PIPE to pipe stdout and stderr to the parent process and stdin to the child process. After several hundred loop iterations, I seem to run out of file descriptors.
The Python subprocess documentation mentions very little about freeing file descriptors, other than the close_fds option, which isn't described very clearly (Why should there be any file descriptors besides 0, 1 and 2 open in the first place?). I'm assuming that in CPython, reference counting takes care of the resource freeing issue. What's the proper way to make sure all descriptors get freed when one is done with a Popen object in Jython?
Edit: Just in case it makes a difference, this is a multithreaded program, so there are several Popen processes running simultaneously.
This only answers part of your question, but my understanding is that, when you spawn a new process, it normally inherits all the handles of the parent process. That includes such things as open files and sockets that you're listening on.
On UNIX, that's a side-effect of using 'fork', which duplicates the current process and all of its handles before loading the new executable. On Windows it's more explicit, but Python does it anyway, to try to match the behavior across platforms as much as possible.
The close_fds option, when True, closes all these inherited handles after spawning the subprocess, so the new executable starts with a clean slate. But if your subprocesses are run one at a time, and terminating when they're done, then this shouldn't be the problem.