So I wrote a little multithreaded SMTP program. The problem is every time I run it, it freezes the computer shortly after. The script appears to still work, as my network card is still lighting up and the emails are received, but in some cases it will lock up completely and stop sending the emails.
Here's a link to my two script files. The first is the one used to launch the program:
readFile.py
newEmail.py
First, you're using popen which creates subprocesses, ie. processes not threads. I'll assume this is what you meant.
My guess would be that the program gets stuck in a loop where it generates processes continuously, which the OS will probably dislike. (That kind of thing is known as a forkbomb which is a good way to freeze Linux unless a process limit has been set with ulimit.) I couldn't find the bug though, but if I were you, I'd log messages each time I spawn or kill a subprocess, and if everything is normal, watch the system closely (ps or top on Unix systems) to see if the processes are really being killed.
Related
I am running some Python code using a SLURM script on a remote server accessed through SSH. At some point, issues related to licenses on the SLURM platform may happen, generating errors in Python and ending the subprocess. I want to use try-except to let the Python subprocess wait until the issue is fixed, after that it can keep running from where it stopped.
What are some smart implementations for that?
My most obvious solution is just keeping Python inside a loop if the error occurs and letting it read a file every X seconds, when I finally fix the error and want it to keep running from where it stopped, I would write something on the file and break the loop. I wonder if there is a smarter way to provide input to the Python subprocess while it is running through the SLURM script.
One idea might be to add a signal handler for signal USR1 to your Python script like this.
In the signal handler function, you can set a global variable or send a message or set a threading.Event that the main process is waiting on.
Then you can signal the process with:
kill -USR1 <PID>
or with the Python os.kill() equivalent.
Though I do have to agree there is something to be said for the simplicity of your process doing:
touch /tmp/blocked.$$
and your program waiting in a loop with a 1s sleep for that file to be removed. This way you can tell which process id is blocked.
I've built a notification system w/ raspberry pi which checks a Database every two minutes, and if any new entries are found, it sends out emails. I have it working two ways..
A python script starts at boot and runs forever. It has a timer built into the loop. Every two minutes, the DB is checked and emails are sent.
A python script is set to check DB and send emails. A cron job is set to run this script every two minutes.
which would be the better choice and why?
Your first option, even if you use a sleep implements a kind of busy-waiting strategy
(https://en.wikipedia.org/wiki/Busy_waiting),
this stragegy uses more CPU/memory than your second option (the cron approach)
because you will have in memory your processus footprint
even if it is actually doing nothing.
On the other hand, in the cron approach your processus will only appear while doing useful activities.
Just Imagine if you implement this kind of approach
for many programs running on your machine,
a lot of memory will be consume by processus in waiting states,
it will also have an impact (memory/CPU usage) on the scheduling algorithm of your OS
since it will have more processes in queue to manage.
Therefore, I would absolutely recommend the cron/scheduling approach.
Anyway,your cron daemon will be running in background whether you add the entry or not in the crontab, so why not adding it?
Last but not least, imagine if your busy-waiting processus is killed for any reason, if you go for the first option you will need to restart it manually and you might lose a couple of monitoring entries.
Hope it helps you.
I have a unittest that does a bunch of stuff in several different threads. When I stop everything in the tearDown method, somehow something is still running. And by running I mean sleeping. I ran the top command on the python process (Ubuntu 12.04), which told me that the process was sleeping.
Now I have tried using pdb to figure out what is going on, e.g. by putting set_trace() at the end of tearDown. But that tells me nothing. I suspect this is because some other thread has started sleeping earlier and is therefore not accessed anymore at this point.
Is there any tool or method I can use to track down the cause of my non-stopping process?
EDIT
Using ps -Tp <#Process> -o wchan I now know that 4 threads are still running, of which three waiting on futex_wait_queue_me and one on unix_stream_data_wait. Since I had a subprocess previously, which I killed with os.kill(pid, signal.SIGKILL), I suspect that the Pipe connection is somehow still waiting for that process. Perhaps the fast mutexes are waiting for that as well.
Is there anyway I could further reduce the search space?
If you are working under Linux then you should be able to use 'ps -eLf' to get a list of all active processes and threads. Assuming your have given your threads good names at creation it should be easy to see what is still running.
I believe under windows you can get a tool to do something similar - see http://technet.microsoft.com/en-us/sysinternals/bb896645.aspx
N.B. I have not used the windows tool this myself
Also from within Python you can use the psutil package (https://pypi.python.org/pypi/psutil/) to get similar infomration
I am using requests to pull some files. I have noticed that the program seems to hang after some large number of iterations that varies from 5K to 20K. I can tell it is hanging because the folder where the results are stored has not changed in several hours. I have been trying to interrupt the process (I am using IDLE) by hitting CTRL + C to no avail. I would like to interrupt instead of killing the process because restart is easier. I have finally had to kill the process. I restart and it runs fine again until I have the same symptoms. I would like to figure out how to diagnose the problem but since I am having to kill everything I have no idea where to start.
Is there an alternate way to view what is going on or to more robustly interrupt the process?
I have been assuming that if I can interrupt without killing I can look at globals and or do some other mucking around to figure out where my code is hanging.
In case it's not too late: I've just faced the same problems and have some tips
First thing: In python most waiting apis are not interruptible (ie Thread.join(), Lock.acquire()...).
Have a look at theese pages for more informations:
http://snakesthatbite.blogspot.fr/2010/09/cpython-threading-interrupting.html
http://docs.python.org/2/library/thread.html
Then if a thread is waiting on such a call, it cannot be stopped.
There is another thing to know: if a normal thread is running (or hanged) the main program will stay indefinitely untill all threads are stopped or the process is killed.
To avoid that, you can make the thread a daemon thread: Thread.daemon=True before calling Thread.start().
Second thing, to find where your program is hanged, you can launch it with a debugger but I prefer logging because logs are always there in case its to late to debug.
Try logging before and after each waiting call to see how much time your threads have been hanged. To have high quality logs, uses python logging configured with file handler, html handler or even better with a syslog handler.
I'm designing a long running process, triggered by a Django management command, that needs to run on a fairly frequent basis. This process is supposed to run every 5 min via a cron job, but I want to prevent it from running a second instance of the process in the rare case that the first takes longer than 5 min.
I've thought about creating a touch file that gets created when the management process starts and is removed when the process ends. A second management command process would then check to make sure the touch file didn't exist before running. But that seems like a problem if a process dies abruptly without properly removing the touch file. It seems like there's got to be a better way to do that check.
Does anyone know any good tools or patterns to help solve this type of issue?
For this reason I prefer to have a long-running process that gets its work off of a shared queue. By long-running I mean that its lifetime is longer than a single unit of work. The process is then controlled by some daemon service such as supervisord which can take over control of restarting the process when it crashes. This delegates the work appropriately to something that knows how to manage process lifecycles and frees you from having to worry about the nitty gritty of posix processes in the scope of your script.
If you have a queue, you also have the luxury of being able to spin up multiple processes that can each take jobs off of the queue and process them, but that sounds like it's out of scope of your problem.