How to determine what Python daemon is doing in the background? - python

I've got a Python process that is running as a daemon, using daemon runner, in the background. Its purpose is to query for some information from another computer on the network every 15 minutes, do some processing, and then send it somewhere else for logging. However, every so often, the processing bit takes much longer and the CPU usage for the process spikes for an extended period of time. Is there any way to figure out what might be happening during that time? I do have the daemon source.

The best thing to do is instrument the daemon with logging statements (using either the logging module or print statements with timestamps), and redirect the output to a log file. Then you can watch the logfile (perhaps using multitail) and note the output when you see the CPU spike.

Related

Which is better to send notifications, a python loop/timer, or as a cron job?

I've built a notification system w/ raspberry pi which checks a Database every two minutes, and if any new entries are found, it sends out emails. I have it working two ways..
A python script starts at boot and runs forever. It has a timer built into the loop. Every two minutes, the DB is checked and emails are sent.
A python script is set to check DB and send emails. A cron job is set to run this script every two minutes.
which would be the better choice and why?
Your first option, even if you use a sleep implements a kind of busy-waiting strategy
(https://en.wikipedia.org/wiki/Busy_waiting),
this stragegy uses more CPU/memory than your second option (the cron approach)
because you will have in memory your processus footprint
even if it is actually doing nothing.
On the other hand, in the cron approach your processus will only appear while doing useful activities.
Just Imagine if you implement this kind of approach
for many programs running on your machine,
a lot of memory will be consume by processus in waiting states,
it will also have an impact (memory/CPU usage) on the scheduling algorithm of your OS
since it will have more processes in queue to manage.
Therefore, I would absolutely recommend the cron/scheduling approach.
Anyway,your cron daemon will be running in background whether you add the entry or not in the crontab, so why not adding it?
Last but not least, imagine if your busy-waiting processus is killed for any reason, if you go for the first option you will need to restart it manually and you might lose a couple of monitoring entries.
Hope it helps you.

Costs of multiprocessing in python

In python, what is the cost of creating another process - is it sufficiently high that it's not worth it as a way of handling events?
Context of question: I'm using radio modules to transmit data from sensors to a raspberry pi. I have a python script running on the pi, catching the data and handling it - putting it in a MySQL database and occasionally triggering other things.
My dilemma is that if I handle everything in a single script there's a risk that some data packet might be ignored, because it's taking too long to run the processing. I could avoid this by spawning a separate process to handle the event and then die - but if the cost of creating a process is high it might be worth me focusing on more efficient code than creating a process.
Thoughts people?
Edit to add:
Sensors are pushing data, at intervals of 8 seconds and up
No buffering easily available
If processing takes longer longer than the time till the next reading, it would be ignored and lost. (Transmission system
guarantees delivery - I need to guarantee the pi is in a position to
receive it)
I think you're trying to address two problems at the same time, and it is getting confusing.
Polling frequency: here the question is, how fast you need to poll data so that you don't risk losing some
Concurrency and i/o locking: what happens if processing takes longer than the frequency interval
The first problem depends entirely on your underlying architecture: are your sensors pushing or polling to your Raspberry? Is any buffering involved? What happens if your polling frequency is faster than the rate of arrival of data?
My recommendation is to enforce the KISS principle and basically write two tools: one that is entirely in charge of storing data data as fast as you need; the other that takes care of doing something with the data.
For example the storing could be done by a memcached instance, or even a simple shell pipe if you're at the prototyping level. The second utility that manipulates data then does not have to worry about polling frequency, I/O errors (what if the SQL database errors?), and so on.
As a bonus, de-coupling data retrieval and manipulation allows you to:
Test more easily (you can store some data as a sample, and then reply it to the manipulation routine to validate behaviour)
Isolate problems more easily
Scale much faster (you could have as many "manipulators" as you need)
Spawning new threads cost depends on what you do with them.
In term of memory, make sure your threads aren't loading themselves with everything, threading shares the memory for the whole application so variables keep their scope.
In term of processing, be sure you don't overload your system.
I'm doing something quite similar for work : I'm scanning a folder (where files are put constantly), and I do stuff on every file.
I use my main thread to initialize the application and spawn the child threads.
One child thread is used for logging.
Others child are for the actual work.
My main loop looks like this :
#spawn logging thread
while 1:
for stuff in os.walk('/gw'):
while threading.active_count() > 200:
time.sleep(0.1)
#spawn new worker thread sending the filepath
time.sleep(1)
This basically means that my application won't use more than 201 threads (200 + main thread).
So then it was just playing with the application, using htop for monitoring it's resources consumption and limiting the app to a proper max number of threads.

How to identify the cause in Python of code that is not interruptible with a CTRL +C

I am using requests to pull some files. I have noticed that the program seems to hang after some large number of iterations that varies from 5K to 20K. I can tell it is hanging because the folder where the results are stored has not changed in several hours. I have been trying to interrupt the process (I am using IDLE) by hitting CTRL + C to no avail. I would like to interrupt instead of killing the process because restart is easier. I have finally had to kill the process. I restart and it runs fine again until I have the same symptoms. I would like to figure out how to diagnose the problem but since I am having to kill everything I have no idea where to start.
Is there an alternate way to view what is going on or to more robustly interrupt the process?
I have been assuming that if I can interrupt without killing I can look at globals and or do some other mucking around to figure out where my code is hanging.
In case it's not too late: I've just faced the same problems and have some tips
First thing: In python most waiting apis are not interruptible (ie Thread.join(), Lock.acquire()...).
Have a look at theese pages for more informations:
http://snakesthatbite.blogspot.fr/2010/09/cpython-threading-interrupting.html
http://docs.python.org/2/library/thread.html
Then if a thread is waiting on such a call, it cannot be stopped.
There is another thing to know: if a normal thread is running (or hanged) the main program will stay indefinitely untill all threads are stopped or the process is killed.
To avoid that, you can make the thread a daemon thread: Thread.daemon=True before calling Thread.start().
Second thing, to find where your program is hanged, you can launch it with a debugger but I prefer logging because logs are always there in case its to late to debug.
Try logging before and after each waiting call to see how much time your threads have been hanged. To have high quality logs, uses python logging configured with file handler, html handler or even better with a syslog handler.

Preventing management commands from running more than one at a time

I'm designing a long running process, triggered by a Django management command, that needs to run on a fairly frequent basis. This process is supposed to run every 5 min via a cron job, but I want to prevent it from running a second instance of the process in the rare case that the first takes longer than 5 min.
I've thought about creating a touch file that gets created when the management process starts and is removed when the process ends. A second management command process would then check to make sure the touch file didn't exist before running. But that seems like a problem if a process dies abruptly without properly removing the touch file. It seems like there's got to be a better way to do that check.
Does anyone know any good tools or patterns to help solve this type of issue?
For this reason I prefer to have a long-running process that gets its work off of a shared queue. By long-running I mean that its lifetime is longer than a single unit of work. The process is then controlled by some daemon service such as supervisord which can take over control of restarting the process when it crashes. This delegates the work appropriately to something that knows how to manage process lifecycles and frees you from having to worry about the nitty gritty of posix processes in the scope of your script.
If you have a queue, you also have the luxury of being able to spin up multiple processes that can each take jobs off of the queue and process them, but that sounds like it's out of scope of your problem.

Python: Why does my SMTP script freeze my computer?

So I wrote a little multithreaded SMTP program. The problem is every time I run it, it freezes the computer shortly after. The script appears to still work, as my network card is still lighting up and the emails are received, but in some cases it will lock up completely and stop sending the emails.
Here's a link to my two script files. The first is the one used to launch the program:
readFile.py
newEmail.py
First, you're using popen which creates subprocesses, ie. processes not threads. I'll assume this is what you meant.
My guess would be that the program gets stuck in a loop where it generates processes continuously, which the OS will probably dislike. (That kind of thing is known as a forkbomb which is a good way to freeze Linux unless a process limit has been set with ulimit.) I couldn't find the bug though, but if I were you, I'd log messages each time I spawn or kill a subprocess, and if everything is normal, watch the system closely (ps or top on Unix systems) to see if the processes are really being killed.

Categories

Resources