Python: Ways to monitor program is running

Python: Ways to monitor program is running - python

Background info:
I have built a program main.py, that is built to perform work at a stateless level (meaning if something fails, it will get requeued/database etc separately)...
Problem:
I am trying to figure out how to monitor this main.py program from my linux instance.
Main.py runs and threads out multiple workers, however I want to monitor and ensure the system is still healthy to 'perform work'...
How can you do this from a single script level and determine that it is no longer working?
My Ideas:
Create a status notification to syslog stating it is healthy... and trigger on no longer receives? (if that is possible)
Linux maybe provides a way to monitor that system is healthy?
last one is creating a TCP/Socket level query mechanism to check for status...
I hope i'm describing the problem right... I'm trying to ensure program is not only running ,but processing and can process[?]

Related

Python : How can I send command lines to a (silent) running shell program

I work with python in Houdini, but bear with me as my question might not be only restricted to the Houdini environment.
Basically, Houdini comes with a "Hython" process that is supposed to be a Python shell
(see the end of this doc : https://www.sidefx.com/docs/houdini/hom/commandline.html)
At some point, Houdini can start multiple of these "hython.exe" without the shell interface, they are automatically managed in background using a server that dispatch instructions to them.
There is no explicit way to directly communicate with them using the API, but I want to be able to do exactly this.
Is there a way to send command lines to a running process (that is supposed to support them) ?
Very long shot part of my question : I'm trying to communicate with the hython processes as the server never tells them to flush their cache when a full set of computation has finished; leading to inflated memory consumption after a while and the need to stop/restart new hythons. It's already admitted by SideFx there is nothing in the API to ask for a cache flush (see here : https://www.sidefx.com/forum/topic/81712/)
I tried to embedd some naive "clear output" in the nodes processes I send to the hythons, to no avail.
"Unload" tag on nodes does nothing.
Node deletion doesn't understand context and will delete my project nodes before they are sent to a hython.

External input to Python program during runtime

I am creating a test automation which uses an application without any interfaces. However, The application calls a batch script when it changes modes, and I am therefore am able to catch the mode transitions.
What I want to do is to get the batch script to give an input to my python script (I have a state machine running in python) during runtime. Such that I can monitor the state of the application with python instead of the batch file.
I am using a similar state machine to the one of Karn Saheb:
https://dev.to/karn/building-a-simple-state-machine-in-python
However, instead of changing states statically like:
device.on_event('event')
I want the python script to do something similar to:
while(True):
device.on_event(input()) # where the input is passed from the batch script:
REM state.bat
set CurrentState=%1
"magic code to pass CurrentState to python input()" %CurrentState%
I see that a solution would be to start the python script from the batch file every time it is called with the "event" and then save the current event in another file upon termination of the python script... But I want to avoid such handling and rather evaluate this during runtime.
Thank you in advance!

A reasonably portable way of doing this without ugly polling on temporary files is to use a socket: have the main process listen and have the batch file(s) start a small program that connects to the server and writes a message.
There are security considerations here: you can start by listening only to the loopback interface, with further authentication if the local machine should not be trusted.
If you have more than one of these processes, or if you need to handle the child dying before it issues its next report, you’ll have to use threads or something like select to unify the news from different input channels (e.g., waiting on the child to exit vs. waiting on news from the next batch file).

Profiling a long-running Python Server

I have a long-running twisted server.
In a large system test, at one particular point several minutes into the test, when some clients enter a particular state and a particular outside event happens, then this server takes several minutes of 100% CPU and does its work very slowly. I'd like to know what it is doing.
How do you get a profile for a particular span of time in a long-running server?
I could easily send the server start and stop messages via HTTP if there was a way to enable or inject the profiler at runtime?
Given the choice, I'd like stack-based/call-graph profiling but even leaf sampling might give insight.

yappi profiler can be started and stopped at runtime.

There are two interesting tools that came up that try to solve that specific problem, where you might not necessarily have instrumented profiling in your code in advance but want to profile production code in a pinch.
pyflame will attach to an existing process using the ptrace(2) syscall and create "flame graphs" of the process. It's written in Python.
py-spy works by reading the process memory instead and figuring out the Python call stack. It also provides a flame graph but also a "top-like" interface to show which function is taking the most time. It's written in Rust and Python.

Not a very Pythonic answer, but maybe straceing the process gives some insight (assuming you are on a Linux or similar).
Using strictly Python, for such things I'm using tracing all calls, storing their results in a ringbuffer and use a signal (maybe you could do that via your HTTP message) to dump that ringbuffer. Of course, tracing slows down everything, but in your scenario you could switch on the tracing by an HTTP message as well, so it will only be enabled when your trouble is active as well.

Pyliveupdate is a tool designed for the purpose: profiling long running programs without restarting them. It allows you to dynamically selecting specific functions to profiling or stop profiling without instrument your code ahead of time -- it dynamically instrument code to do profiling.
Pyliveupdate have three key features:
Profile specific Python functions' (by function names or module names) call time.
Add / remove profilings without restart programs.
Show profiling results with call summary and flamegraphs.
Check out a demo here: https://asciinema.org/a/304465.

A daemon to call a function every 2 minutes with start and stop capablities

I am working on a django web application.
A function 'xyx' (it updates a variable) needs to be called every 2 minutes.
I want one http request should start the daemon and keep calling xyz (every 2 minutes) until I send another http request to stop it.
Appreciate your ideas.
Thanks
Vishal Rana

There are a number of ways to achieve this. Assuming the correct server resources I would write a python script that calls function xyz "outside" of your django directory (although importing the necessary stuff) that only runs if /var/run/django-stuff/my-daemon.run exists. Get cron to run this every two minutes.
Then, for your django functions, your start function creates the above mentioned file if it doesn't already exist and the stop function destroys it.
As I say, there are other ways to achieve this. You could have a python script on loop waiting for approx 2 minutes... etc. In either case, you're up against the fact that two python scripts run on two different invocations of cpython (no idea if this is the case with mod_wsgi) cannot communicate with each other and as such IPC between python scripts is not simple, so you need to use some sort of formal IPC (like semaphores, files etc) rather than just common variables (which won't work).

Probably a little hacked but you could try this:
Set up a crontab entry that runs a script every two minutes. This script will check for some sort of flag (file existence, contents of a file, etc.) on the disk to decide whether to run a given python module. The problem with this is it could take up to 1:59 to run the function the first time after it is started.
I think if you started a daemon in the view function it would keep the httpd worker process alive as well as the connection unless you figure out how to send a connection close without terminating the django view function. This could be very bad if you want to be able to do this in parallel for different users. Also to kill the function this way, you would have to somehow know which python and/or httpd process you want to kill later so you don't kill all of them.
The real way to do it would be to code an actual daemon in w/e language and just make a system call to "/etc/init.d/daemon_name start" and "... stop" in the django views. For this, you need to make sure your web server user has permission to execute the daemon.

If the easy solutions (loop in a script, crontab signaled by a temp file) are too fragile for your intended usage, you could use Twisted facilities for process handling and scheduling and networking. Your Django app (using a Twisted client) would simply communicate via TCP (locally) with the Twisted server.

can a python script know that another instance of the same script is running... and then talk to it?

I'd like to prevent multiple instances of the same long-running python command-line script from running at the same time, and I'd like the new instance to be able to send data to the original instance before the new instance commits suicide. How can I do this in a cross-platform way?
Specifically, I'd like to enable the following behavior:
"foo.py" is launched from the command line, and it will stay running for a long time-- days or weeks until the machine is rebooted or the parent process kills it.
every few minutes the same script is launched again, but with different command-line parameters
when launched, the script should see if any other instances are running.
if other instances are running, then instance #2 should send its command-line parameters to instance #1, and then instance #2 should exit.
instance #1, if it receives command-line parameters from another script, should spin up a new thread and (using the command-line parameters sent in the step above) start performing the work that instance #2 was going to perform.
So I'm looking for two things: how can a python program know another instance of itself is running, and then how can one python command-line program communicate with another?
Making this more complicated, the same script needs to run on both Windows and Linux, so ideally the solution would use only the Python standard library and not any OS-specific calls. Although if I need to have a Windows codepath and an *nix codepath (and a big if statement in my code to choose one or the other), that's OK if a "same code" solution isn't possible.
I realize I could probably work out a file-based approach (e.g. instance #1 watches a directory for changes and each instance drops a file into that directory when it wants to do work) but I'm a little concerned about cleaning up those files after a non-graceful machine shutdown. I'd ideally be able to use an in-memory solution. But again I'm flexible, if a persistent-file-based approach is the only way to do it, I'm open to that option.
More details: I'm trying to do this because our servers are using a monitoring tool which supports running python scripts to collect monitoring data (e.g. results of a database query or web service call) which the monitoring tool then indexes for later use. Some of these scripts are very expensive to start up but cheap to run after startup (e.g. making a DB connection vs. running a query). So we've chosen to keep them running in an infinite loop until the parent process kills them.
This works great, but on larger servers 100 instances of the same script may be running, even if they're only gathering data every 20 minutes each. This wreaks havoc with RAM, DB connection limits, etc. We want to switch from 100 processes with 1 thread to one process with 100 threads, each executing the work that, previously, one script was doing.
But changing how the scripts are invoked by the monitoring tool is not possible. We need to keep invocation the same (launch a process with different command-line parameters) but but change the scripts to recognize that another one is active, and have the "new" script send its work instructions (from the command line params) over to the "old" script.
BTW, this is not something I want to do on a one-script basis. Instead, I want to package this behavior into a library which many script authors can leverage-- my goal is to enable script authors to write simple, single-threaded scripts which are unaware of multi-instance issues, and to handle the multi-threading and single-instancing under the covers.

The Alex Martelli approach of setting up a communications channel is the appropriate one. I would use a multiprocessing.connection.Listener to create a listener, in your choice. Documentation at:
http://docs.python.org/library/multiprocessing.html#multiprocessing-listeners-clients
Rather than using AF_INET (sockets) you may elect to use AF_UNIX for Linux and AF_PIPE for Windows. Hopefully a small "if" wouldn't hurt.
Edit: I guess an example wouldn't hurt. It is a basic one, though.
#!/usr/bin/env python
from multiprocessing.connection import Listener, Client
import socket
from array import array
from sys import argv
def myloop(address):
try:
listener = Listener(*address)
conn = listener.accept()
serve(conn)
except socket.error, e:
conn = Client(*address)
conn.send('this is a client')
conn.send('close')
def serve(conn):
while True:
msg = conn.recv()
if msg.upper() == 'CLOSE':
break
print msg
conn.close()
if __name__ == '__main__':
address = ('/tmp/testipc', 'AF_UNIX')
myloop(address)
This works on OS X, so it needs testing with both Linux and (after substituting the right address) Windows. A lot of caveats exists from a security point, the main one being that conn.recv unpickles its data, so you are almost always better of with recv_bytes.

The general approach is to have the script, on startup, set up a communication channel in a way that's guaranteed to be exclusive (other attempts to set up the same channel fail in a predictable way) so that further instances of the script can detect the first one's running and talk to it.
Your requirements for cross-platform functionality strongly point towards using a socket as the communication channel in question: you can designate a "well known port" that's reserved for your script, say 12345, and open a socket on that port listening to localhost only (127.0.0.1). If the attempt to open that socket fails, because the port in question is "taken", then you can connect to that port number instead, and that will let you communicate with the existing script.
If you're not familiar with socket programming, there's a good HOWTO doc here. You can also look at the relevant chapter in Python in a Nutshell (I'm biased about that one, of course;-).

Perhaps try using sockets for communication?

Sounds like your best bet is sticking with a pid file but have it not only contain the process Id - have it also include the port number that the prior instance is listening on. So when starting up check for the pid file and if present see if a process with that Id is running - if so send your data to it and quit otherwise overwrite the pid file with the current process's info.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.