How do I know is there my child process got hang while operating?
Well, how do you tell the difference between a stuck process and a process that takes longer than usual to complete? The short answer is: No, you can't detect if your child process is stuck.
I would say that to be able to detect this you need some kind of continuous communication with the process (e.g. look at log files, IPC or similar). Based on this communication you might be able to tell when and if a process is stuck.
I guess, you are asking, how do you find if the child process is hung while operating. You can't tell easily. A process could be doing a long running operation. The context is important to understand when a process is hung.
If you are expecting a process to respond to a user input and is not responsive for a long period then we consider it hung. Process is running probably waiting for some thing that will never happen. "Hung Process" is humanly way of saying that a program has reached a dead end and will be no more useful.
You could have a program calculating prime numbers one after another and can run for eons and can not be called a hung process.
Related
I am facing a pretty odd issue. I have multi process python code that processes some data in parallel. I split the data in to 8 and work on each split individually using a Process Class, I then do a join on each Process.
I just noticed that when I process a large amount of data, one of the threads.... disappears. As in it doesn't error out or raise an exception and it just goes missing. What is even more interesting is that it seems to successfully complete the join() on the process when I know for a fact it did not finish.
tn1_processes = []
for i in range(8):
tn1_processes.append(
MyCustomProcess(logger=self.logger, i=i,
shared_queue=shared_queue))
tn1_processes[-1].start()
for tn1_processor in tn1_processes:
tn1_processor.join()
print('Done')
What do I know for sure:
All Processes are starting and are processing data and reach about half way, I know this because I have logs that show all the Processes doing their work.
Then Process 1 disappears from the logs towards the end of it's job, while all the other ones keep working fine and completing. Then My code moves on after thinking all the Processes are complete after the joins (I demonstrate this with a print) however I know for a fact that one of the processes did not complete, it did not error out and for some strange reason it passed the join()?
The only thing I can think of is that the Process runs out of memory but I would feel it would error out or throw an exception if this happened. Actually it has happened to me before using the same code and I saw the exception in my logs and the code was able to handle and see that the Process failed. But this, no error or anything is strange.
Can anyone shed some light?
Using Python3.4
If I remember correctly when a process abruptly terminates it wouldn't throw an error, you need to have another queue for storing the thrown exceptions and handle them elsewhere.
When a process ends however, an exit code is given: https://docs.python.org/3/library/multiprocessing.html#multiprocessing.Process.exitcode
A rudimentary check would be making sure all of them safely exited (probably with 0 as exit code, while negative indicates termination signal and None as running).
The issue was that the python was running out of memory. The only way I knew this is that I monitored the machine's memory usage while the code was running and it needed more space than was available so one of the processes was just killed with no errors or exceptions. #j4hangir's answer of how to avoid this is good, I need to check the exit code. I haven't tested this yet but I will and then update
I'm writing a program in which I want to evaluate a piece of code asynchronously. I want it to be isolated from the main thread so that it can raise an error, enter an infinite loop, or just about anything else without disrupting the main program. I was hoping to use threading.Thread, but this has a major problem; I can't figure out how to stop it. I have tried Thread._stop(), but that frequently doesn't work. I end up with a thread that I can't control hogging both interpreter time and CPU power. The code in the thread doesn't open any files or do anything else that would cause problems if I hard-killed it.
Python's multiprocessing.Process.terminate() does this really well; unfortunately, initiating a process on Windows takes nearly a second, which is long enough to cause annoying delays in my GUI.
Does anyone know either a: how to kill a Python thread (I don't think I care how dirty the exit is), or b: how to speed up starting a process?
A third possibility would be a third-party library that provides an alternative method for asynchronous execution, but I've never heard of any such thing.
In my case, the best way to do this seems to be to maintain a running worker process, and send the code to it on an as-needed basis. If the process acts up, I kill it and then start a new one immediately to avoid any delay the next time.
I am using requests to pull some files. I have noticed that the program seems to hang after some large number of iterations that varies from 5K to 20K. I can tell it is hanging because the folder where the results are stored has not changed in several hours. I have been trying to interrupt the process (I am using IDLE) by hitting CTRL + C to no avail. I would like to interrupt instead of killing the process because restart is easier. I have finally had to kill the process. I restart and it runs fine again until I have the same symptoms. I would like to figure out how to diagnose the problem but since I am having to kill everything I have no idea where to start.
Is there an alternate way to view what is going on or to more robustly interrupt the process?
I have been assuming that if I can interrupt without killing I can look at globals and or do some other mucking around to figure out where my code is hanging.
In case it's not too late: I've just faced the same problems and have some tips
First thing: In python most waiting apis are not interruptible (ie Thread.join(), Lock.acquire()...).
Have a look at theese pages for more informations:
http://snakesthatbite.blogspot.fr/2010/09/cpython-threading-interrupting.html
http://docs.python.org/2/library/thread.html
Then if a thread is waiting on such a call, it cannot be stopped.
There is another thing to know: if a normal thread is running (or hanged) the main program will stay indefinitely untill all threads are stopped or the process is killed.
To avoid that, you can make the thread a daemon thread: Thread.daemon=True before calling Thread.start().
Second thing, to find where your program is hanged, you can launch it with a debugger but I prefer logging because logs are always there in case its to late to debug.
Try logging before and after each waiting call to see how much time your threads have been hanged. To have high quality logs, uses python logging configured with file handler, html handler or even better with a syslog handler.
I want to create some worker processes and if they crash due to an exception, I would like them to respawn. Aside from the is_alive method in the multiprocessing module, I can't seem to find a way to do this.
This would require me to iterate over all the running processes (after a sleep) and check if they are alive. This is essentially a busy loop, I was wondering if there was a better solution that will wake up my program in the event that any one of my worker processes has crashed. Once it wakes up, I would like to log th exception that crashed my program and launch another process.
Polling to see if the child processes are alive should work fine, since it's a low-overhead check and you don't need to check that often.
The first answer to this (similar) question has a Python code example: Multi-server monitor/auto restarter in python
You can wrap your worker processes in try/except blocks where the except pushes a message onto a pipe before raising. Of course, polling isn't really worse than this and it's simpler.
If you're on a unix-like system, your main program can be notified of dead children by installing a signal handler. Look up your operating system's documentation on signal(), especially SIGCHLD. I'm afraid I don't remember whether Windows covers SIGCHLD with its very limited POSIX signal support.
I have a problem with creating parallel program using multiprocessing. AFAIK when I start a new process using this module (multiprocessing) I should do "os.wait()" or "childProcess.join()" to get its' exit status. But placing above functions in my program can occur in stopping main process if something happens to child process (and the child process will hang).
The problem is that if I don't do that I'll get child processes go zombie (and will be listed as something like "python < defunct>" in top listing).
Is there any way to avoid waiting for child processes to end and to avoid creating zombie processes and\or not bothering the main process so much about it's child processes?
Though ars' answer should solve your immediate issues, you might consider looking at celery: http://ask.github.com/celery/index.html. It's a relatively developer-friendly approach to accomplishing these goals and more.
You may have to provide more information or actual code to figure this out. Have you been through the documentation, in particular the sections labeled "Warning"? For example, you may be facing something like this:
Warning: As mentioned above, if a child process has put items on a queue (and it has not used JoinableQueue.cancel_join_thread()), then that process will not terminate until all buffered items have been flushed to the pipe.
This means that if you try joining that process you may get a deadlock unless you are sure that all items which have been put on the queue have been consumed. Similarly, if the child process is non-daemonic then the parent process may hang on exit when it tries to join all its non-daemonic children.
Note that a queue created using a manager does not have this issue. See Programming guidelines.