custom nagios script with nrpe resulting in non-zero exit status 1

custom nagios script with nrpe resulting in non-zero exit status 1 - python

I am trying to run a python script using nrpe to monitor rabbitmq. Inside the script is a command 'sudo rabbiqmqctl list_queues' which gives me a message count on each queue. However this is resulting in nagios giving htis message:
CRITICAL - Command '['sudo', 'rabbitmqctl', 'list_queues']' returned non-zero exit status 1
I thought this might be a permissions issue so proceeded in the following manner
/etc/group:
ec2-user:x:500:
rabbitmq:x:498:nrpe,nagios,ec2-user
nagios:x:497:
nrpe:x:496:
rpc:x:32:
/etc/sudoers:
%rabbitmq ALL=NOPASSWD: /usr/sbin/rabbitmqctl
nagios configuration:
command[check_rabbitmq_queuecount_prod]=/usr/bin/python27 /etc/nagios/check_rabbitmq_prod -a queues_count -C 3000 -W 1500
check_rabbitmq_prod:
#!/usr/bin/env python
from optparse import OptionParser
import shlex
import subprocess
import sys
class RabbitCmdWrapper(object):
"""So basically this just runs rabbitmqctl commands and returns parsed output.
Typically this means you need root privs for this to work.
Made this it's own class so it could be used in other monitoring tools
if desired."""
#classmethod
def list_queues(cls):
args = shlex.split('sudo rabbitmqctl list_queues')
cmd_result = subprocess.check_output(args).strip()
results = cls._parse_list_results(cmd_result)
return results
#classmethod
def _parse_list_results(cls, result_string):
results = result_string.strip().split('\n')
#remove text fluff
results.remove(results[-1])
results.remove(results[0])
return_data = []
for row in results:
return_data.append(row.split('\t'))
return return_data
def check_queues_count(critical=1000, warning=1000):
"""
A blanket check to make sure all queues are within count parameters.
TODO: Possibly break this out so test can be done on individual queues.
"""
try:
critical_q = []
warning_q = []
ok_q = []
results = RabbitCmdWrapper.list_queues()
for queue in results:
if queue[0] == 'SFS_Production_Queue':
count = int(queue[1])
if count >= critical:
critical_q.append("%s: %s" % (queue[0], count))
elif count >= warning:
warning_q.append("%s: %s" % (queue[0], count))
else:
ok_q.append("%s: %s" % (queue[0], count))
if critical_q:
print "CRITICAL - %s" % ", ".join(critical_q)
sys.exit(2)
elif warning_q:
print "WARNING - %s" % ", ".join(warning_q)
sys.exit(1)
else:
print "OK - %s" % ", ".join(ok_q)
sys.exit(0)
except Exception, err:
print "CRITICAL - %s" % err
sys.exit(2)
USAGE = """Usage: ./check_rabbitmq -a [action] -C [critical] -W [warning]
Actions:
- queues_count
checks the count in each of the queues in rabbitmq's list_queues"""
if __name__ == "__main__":
parser = OptionParser(USAGE)
parser.add_option("-a", "--action", dest="action",
help="Action to Check")
parser.add_option("-C", "--critical", dest="critical",
type="int", help="Critical Threshold")
parser.add_option("-W", "--warning", dest="warning",
type="int", help="Warning Threshold")
(options, args) = parser.parse_args()
if options.action == "queues_count":
check_queues_count(options.critical, options.warning)
else:
print "Invalid action: %s" % options.action
print USAGE
At this point I'm not sure what is preventing the script from running. It runs fine via the command-line. Any help is appreciated.

The "non-zero exit code" error is often associated with requiretty being applied to all users by default in your sudoers file.
Disabling "requiretty" in your sudoers file for the user that runs the check is safe, and may potentially fix the issue.
E.g. (assuming nagios/nrpe are the users)
# /etc/sudoers
Defaults:nagios !requiretty
Defaults:nrpe !requiretty

I guess what Mr #EE1213 mentions is right. If you have the permission to see /var/log/secure, the log probably contains error messages regarding sudoers. Like:
"sorry, you must have a tty to run sudo"

Related

How to check if a string is a valid shell command using Python?

I am making a program that adds additional functionality to the standard command shell in Windows. For instance, typing google followed by keywords will open a new tab with Google search for those keywords, etc. Whenever the input doesn't refer to a custom function I've created, it gets processed as a shell command using subprocess.call(rawCommand, shell=True).
Since I'd like to anticipate when my input isn't a valid command and return something like f"Invalid command: {rawCommand}", how should I go about doing that?
So far I've tried subprocess.call(rawCommand) which also return the standard output as well as the exit code. So that looks like this:
>>> from subprocess import call
>>> a, b = call("echo hello!", shell=1), call("xyz arg1 arg2", shell=1)
hello!
'xyz' is not recognized as an internal or external command,
operable program or batch file.
>>> a
0
>>> b
1
I'd like to simply recieve that exit code. Any ideas on how I can do this?

Should you one day want deal with encoding errors, get back the result of the command you're running, have a timeout or decide which exit codes other than 0 may not trigger errors (i'm looking at you, java runtime !), here's a complete function that does that job:
import os
from logging import getLogger
import subprocess
logger = getLogger()
def command_runner(command, valid_exit_codes=None, timeout=300, shell=False, encoding='utf-8',
windows_no_window=False, **kwargs):
"""
Whenever we can, we need to avoid shell=True in order to preseve better security
Runs system command, returns exit code and stdout/stderr output, and logs output on error
valid_exit_codes is a list of codes that don't trigger an error
windows_no_window will hide the command window (works with Microsoft Windows only)
Accepts subprocess.check_output arguments
"""
# Set default values for kwargs
errors = kwargs.pop('errors', 'backslashreplace') # Don't let encoding issues make you mad
universal_newlines = kwargs.pop('universal_newlines', False)
creationflags = kwargs.pop('creationflags', 0)
if windows_no_window:
creationflags = creationflags | subprocess.CREATE_NO_WINDOW
try:
# universal_newlines=True makes netstat command fail under windows
# timeout does not work under Python 2.7 with subprocess32 < 3.5
# decoder may be unicode_escape for dos commands or utf-8 for powershell
output = subprocess.check_output(command, stderr=subprocess.STDOUT, shell=shell,
timeout=timeout, universal_newlines=universal_newlines, encoding=encoding,
errors=errors, creationflags=creationflags, **kwargs)
except subprocess.CalledProcessError as exc:
exit_code = exc.returncode
try:
output = exc.output
except Exception:
output = "command_runner: Could not obtain output from command."
if exit_code in valid_exit_codes if valid_exit_codes is not None else [0]:
logger.debug('Command [%s] returned with exit code [%s]. Command output was:' % (command, exit_code))
if isinstance(output, str):
logger.debug(output)
return exc.returncode, output
else:
logger.error('Command [%s] failed with exit code [%s]. Command output was:' %
(command, exc.returncode))
logger.error(output)
return exc.returncode, output
# OSError if not a valid executable
except (OSError, IOError) as exc:
logger.error('Command [%s] failed because of OS [%s].' % (command, exc))
return None, exc
except subprocess.TimeoutExpired:
logger.error('Timeout [%s seconds] expired for command [%s] execution.' % (timeout, command))
return None, 'Timeout of %s seconds expired.' % timeout
except Exception as exc:
logger.error('Command [%s] failed for unknown reasons [%s].' % (command, exc))
logger.debug('Error:', exc_info=True)
return None, exc
else:
logger.debug('Command [%s] returned with exit code [0]. Command output was:' % command)
if output:
logger.debug(output)
return 0, output
Usage:
exit_code, output = command_runner('whoami', shell=True)

Some shells have a syntax-checking mode (e.g., bash -n), but that’s the only form of error that’s separable from “try to execute the commands and see what happens”. Defining a larger class of “immediate” errors is a fraught proposition: if echo hello; ./foo is invalid because foo can’t be found as a command, what about false && ./foo, which will never try to run it, or cp /bin/ls foo; ./foo, which may succeed (or might fail to copy)? What about eval $(configure_shell); foo which might or might not manipulate PATH so as to find foo? What about foo || install_foo, where the failure might be anticipated?
As such, anticipating failure is not possible in any meaningful sense: your only real option is to capture the command’s output/error (as mentioned in the comments) and report them in some useful way.

adding an IF statement for exit code 0 in python using system module

I'm trying to develop a script that interacts with salesforcedx and bamboo. I want to write a simple python script that run each cli command and runs an exit code after each call. for example
import os
path = "/var/Atlassian/bamboo/bamboo-home/xml-data/build-dir/SAL-SC-JOB1"
auth = "sfdx force:auth:jwt:grant --clientid clientidexample --jwtkeyfile /root/server.key --username username#example.org --setalias Alias --setdefaultdevhubusername; echo $?"
os.chdir(path)
os.system(auth)
I get a result like this
Successfully authorized username#example.org with org ID 234582038957
0<< exit code 0 or could 1 or 100
I want to be able to run an IF statement (if possible) to stop the script if any number other than 0 exit code pops up. keep in mind my script will be making several calls using Saleforce cli commands which should hopefully all result in 0, however just in case one of the many calls fails I need some means of stopping the script. Any advice or help is greatly appreciated!

import subprocess
path = "/var/Atlassian/bamboo/bamboo-home/xml-data/build-dir/SAL-SC-JOB1"
users = {
'username#example.org': 'Alias',
'other#example.org': 'Other Alias'
}
for username, alias in users.iteritems():
auth = ['sfdx', 'force:auth:jwt:grant',
'--clientid', 'clientidexample',
'--jwtkeyfile', '/root/server.key',
'--username', username,
'--setalias', alias,
'--setdefaultdevhubusername']
status = subprocess.call(auth, cwd=path)
if status != 0:
print("Argument list %r failed with exit status %r" % (auth, status))
...will automatically stop on any nonzero exit code. If you didn't want to do the comparison yourself, you could use subprocess.check_call() and rely on a CalledProcessError being thrown.
Community Wiki because this is duplicative of many, many questions on the subject already.

This is my final code based on advice from here and some other articles.
#!/usr/bin/python3
import subprocess
import os
import sys
path = "/var/Atlassian/bamboo/bamboo-home/xml-data/build-dir/SAL-SC-JOB1"
sfdx = 'sfdx'
auth_key= (os.environ['AUTH_KEY']) ### environment variable
def auth():
username="username#example.org"
alias="somealias"
key="server.key"
command="force:auth:jwt:grant"
auth= [ sfdx, command,
'--clientid', auth_key,
'--jwtkeyfile', key,
'--username', username,
'--setalias', alias,
'--setdefaultdevhubusername']
status = subprocess.call(auth, cwd=path)
if status != 0:
raise ValueError("Argument list %r failed with exit status %r" % (auth, status))
elif status == 0:
print("auth passed with exit code %r" % (status))
auth ()

Multiple SSH Connections in a Python 2.7 script- Multiprocessing Vs Threading

I have a script that gets a list of nodes as an argument (could be 10 or even 50), and connects to each by SSH to run a service restart command.
At the moment, I'm using multiprocessing in order to parallelize the script (getting the batch size as an argument as well), however I've heard that threading module could help me with performing my tasks in a quicker and easier to manage way (I'm using try..except KeyboardInterrupt with sys.exit() and pool.terminate(), but it won't stop the entire script because it's a different process).
Since I understand the multithreading is more lightweight and easier to manage for my case, I am trying to convert my script to use threading instead of multiprocessing but it doesn't properly work.
The current code in multiprocessing (works):
def restart_service(node, initd_tup):
"""
Get a node name as an argument, connect to it via SSH and run the service restart command..
"""
command = 'service {0} restart'.format(initd_tup[node])
logger.info('[{0}] Connecting to {0} in order to restart {1} service...'.format(node, initd_tup[node]))
try:
ssh.connect(node)
stdin, stdout, stderr = ssh.exec_command(command)
result = stdout.read()
if not result:
result_err = stderr.read()
print '{0}{1}[{2}] ERROR: {3}{4}'.format(Color.BOLD, Color.RED, node, result_err, Color.END)
logger.error('[{0}] Result of command {1} output: {2}'.format(node, command, result_err))
else:
print '{0}{1}{2}[{3}]{4}\n{5}'.format(Color.BOLD, Color.UNDERLINE, Color.GREEN, node, Color.END, result)
logger.info('[{0}] Result of command {1} output: {2}'.format(node, command, result.replace("\n", "... ")))
ssh.close()
except paramiko.AuthenticationException:
print "{0}{1}ERROR! SSH failed with Authentication Error. Make sure you run the script as root and try again..{2}".format(Color.BOLD, Color.RED, Color.END)
logger.error('SSH Authentication failed, thrown error message to the user to make sure script is run with root permissions')
pool.terminate()
except socket.error as error:
print("[{0}]{1}{2} ERROR! SSH failed with error: {3}{4}\n".format(node, Color.RED, Color.BOLD, error, Color.END))
logger.error("[{0}] SSH failed with error: {1}".format(node, error))
except KeyboardInterrupt:
pool.terminate()
general_utils.terminate(logger)
def convert_to_tuple(a_b):
"""Convert 'f([1,2])' to 'f(1,2)' call."""
return restart_service(*a_b)
def iterate_nodes_and_call_exec_func(nodes_list):
"""
Iterate over the list of nodes to process,
create a list of nodes that shouldn't exceed the batch size provided (or 1 if not provided).
Then using the multiprocessing module, call the restart_service func on x nodes in parallel (where x is the batch size).
If batch_sleep arg was provided, call the sleep func and provide the batch_sleep argument between each batch.
"""
global pool
general_utils.banner('Initiating service restart')
pool = multiprocessing.Pool(10)
manager = multiprocessing.Manager()
work = manager.dict()
for line in nodes_list:
work[line] = general_utils.get_initd(logger, args, line)
if len(work) >= int(args.batch):
pool.map(convert_to_tuple, itertools.izip(work.keys(), itertools.repeat(work)))
work = {}
if int(args.batch_sleep) > 0:
logger.info('*** Sleeping for %d seconds before moving on to next batch ***', int(args.batch_sleep))
general_utils.sleep_func(int(args.batch_sleep))
if len(work) > 0:
try:
pool.map(convert_to_tuple, itertools.izip(work.keys(), itertools.repeat(work)))
except KeyboardInterrupt:
pool.terminate()
general_utils.terminate(logger)
And here's what I've tried to to with Threading, which doesn't work (when I assign a batch_size larger than 1, the script simply gets stuck and I have to kill it forcefully.
def parse_args():
"""Define the argument parser, and the arguments to accept.."""
global args, parser
parser = MyParser(description=__doc__)
parser.add_argument('-H', '--host', help='List of hosts to process, separated by "," and NO SPACES!')
parser.add_argument('--batch', help='Do requests in batches', default=1)
args = parser.parse_args()
# If no arguments were passed, print the help file and exit with ERROR..
if len(sys.argv) == 1:
parser.print_help()
print '\n\nERROR: No arguments passed!\n'
sys.exit(3)
def do_work(node):
logger.info('[{0}]'.format(node))
try:
ssh.connect(node)
stdin, stdout, stderr = ssh.exec_command('hostname ; date')
print stdout.read()
ssh.close()
except:
print 'ERROR!'
sys.exit(2)
def worker():
while True:
item = q.get()
do_work(item)
q.task_done()
def iterate():
for item in args.host.split(","):
q.put(item)
for i in range(int(args.batch)):
t = Thread(target=worker)
t.daemon = True
t.start()
q.join()
def main():
parse_args()
try:
iterate()
except KeyboardInterrupt:
exit(1)
In the script log I see a WARNING generated by Paramiko as below:
2016-01-04 22:51:37,613 WARNING: Oops, unhandled type 3
I tried to Google this unhandled type 3 error, but didn't find anything related to my issue, since it's talking about 2 factor authentication or trying to connect via both password and SSH key at the same time, but I'm only loading the host keys without providing any password to the SSH Client.
I would appreciate any help on this matter..

Managed to solve my problem using parallel-ssh module.
Here's the code, fixed with my desired actions:
def iterate_nodes_and_call_exec_func(nodes):
"""
Get a dict as an argument, containing linux services (initd) as the keys,
and a list of nodes on which the linux service needs to be checked/
Iterate over the list of nodes to process,
create a list of nodes that shouldn't exceed the batch size provided (or 1 if not provided).
Then using the parallel-ssh module, call the restart_service func on x nodes in parallel (where x is the batch size)
and provide the linux service (initd) to process.
If batch_sleep arg was provided, call the sleep func and provide the batch_sleep argument between each batch.
"""
for initd in nodes.keys():
work = dict()
work[initd] = []
count = 0
for node in nodes[initd]:
count += 1
work[initd].append(node)
if len(work[initd]) == args.batch:
restart_service(work[initd], initd)
work[initd] = []
if args.batch_sleep > 0 and count < len(nodes[initd]):
logger.info('*** Sleeping for %d seconds before moving on to next batch ***', args.batch_sleep)
general_utils.sleep_func(int(args.batch_sleep))
if len(work[initd]) > 0:
restart_service(work[initd], initd)
def restart_service(nodes, initd):
"""
Get a list of nodes and linux service as an argument,
then connect by Parallel SSH module to the nodes and run the service restart command..
"""
command = 'service {0} restart'.format(initd)
logger.info('Connecting to {0} to restart the {1} service...'.format(nodes, initd))
try:
client = pssh.ParallelSSHClient(nodes, pool_size=args.batch, timeout=10, num_retries=1)
output = client.run_command(command, sudo=True)
for node in output:
for line in output[node]['stdout']:
if client.get_exit_code(output[node]) == 0:
print '[{0}]{1}{2} {3}{4}'.format(node, Color.BOLD, Color.GREEN, line, Color.END)
else:
print '[{0}]{1}{2} ERROR! {3}{4}'.format(node, Color.BOLD, Color.RED, line, Color.END)
logger.error('[{0}] Result of command {1} output: {2}'.format(node, command, line))
except pssh.AuthenticationException:
print "{0}{1}ERROR! SSH failed with Authentication Error. Make sure you run the script as root and try again..{2}".format(Color.BOLD, Color.RED, Color.END)
logger.error('SSH Authentication failed, thrown error message to the user to make sure script is run with root permissions')
sys.exit(2)
except pssh.ConnectionErrorException as error:
print("[{0}]{1}{2} ERROR! SSH failed with error: {3}{4}\n".format(error[1], Color.RED, Color.BOLD, error[3], Color.END))
logger.error("[{0}] SSH Failed with error: {1}".format(error[1], error[3]))
restart_service(nodes[nodes.index(error[1])+1:], initd)
except KeyboardInterrupt:
general_utils.terminate(logger)
def generate_nodes_by_initd_dict(nodes_list):
"""
Get a list of nodes as an argument.
Then by calling the get_initd function for each of the nodes,
Build a dict based on linux services (initd) as keys and a list of nodes on which the initd
needs to be processed as values. Then call the iterate_nodes_and_call_exec_func and provide the generated dict
as its argument.
"""
nodes = {}
for node in nodes_list:
initd = general_utils.get_initd(logger, args, node)
if initd in nodes.keys():
nodes[initd].append(node)
else:
nodes[initd] = [node, ]
return iterate_nodes_and_call_exec_func(nodes)
def main():
parse_args()
try:
general_utils.init_script('Service Restart', logger, log)
log_args(logger, args)
generate_nodes_by_initd_dict(general_utils.generate_nodes_list(args, logger, ['service', 'datacenter', 'lob']))
except KeyboardInterrupt:
general_utils.terminate(logger)
finally:
general_utils.wrap_up(logger)
if __name__ == '__main__':
main()

In addition to using pssh module, after a more thorough troubleshooting effort, I was able to solve the the original code that I posted in the question using native Threading module, by creating a new paramiko client for every thread, rather than using the same client for all threads.
So basically (only updating the do_work function from the original question), here's the change:
def do_work(node):
logger.info('[{0}]'.format(node))
try:
ssh = paramiko.SSHClient()
ssh.connect(node)
stdin, stdout, stderr = ssh.exec_command('hostname ; date')
print stdout.read()
ssh.close()
except:
print 'ERROR!'
sys.exit(2)
When done this way, the native Threading module works perfectly!

Git diff is complaining, "external diff died, stopping at ... " with my python diff program

Here's the beginning part of my diff.
#!/usr/bin/env python
import fileinput
import difflib
import subprocess
import sys
# for debugging
def info(type, value, info):
import traceback
traceback.print_exception(type, value, info)
print
pdb.pm()
sys.excepthook = info
import pdb
#end debugging
if len(sys.argv) == 8:
# assume this was passed to git; we can of course do
# some parsing to check if we got valid git style args
args = [sys.argv[2], sys.argv[5]]
elif len(sys.argv) == 3:
args = sys.argv[1:]
else:
exit("Not a valid number of args (2 or 7) to this diff program")
print "Files: " + ' '.join(args)
for filename in args:
filetype = subprocess.check_output(['file', filename])
if filetype.find('text') == -1:
args.insert(0, 'diff')
print "A binary file was found: " + filename + ", deferring to diff"
exit(subprocess.call(args))
When a binary (or otherwise non text) file is encountered, it attempts to fork diff to obtain whether the binary files differ or not. The goal is for this python diff program to be used as an external differ for git.
But I get this ghastly "external diff died, stopping at <file>" message once it hits the binary file.
How is git evaluating my program? How does it know it died? Isn't return value supposed to indicate the differing condition?

There's no exit function in your code. How about replace exit to sys.exit?
#!/usr/bin/env python
import subprocess
import sys
if len(sys.argv) == 8:
# assume this was passed to git; we can of course do
# some parsing to check if we got valid git style args
args = [sys.argv[2], sys.argv[5]]
elif len(sys.argv) == 3:
args = sys.argv[1:]
else:
print "Not a valid number of args (2 or 7) to this diff program"
sys.exit(1)
print "Files: ", args
for filename in args:
filetype = subprocess.check_output(['file', filename])
if filetype.find('text') == -1:
args.insert(0, 'diff')
print "A binary file was found: " + filename + ", deferring to diff"
#sys.stdout.flush()
subprocess.call(args)
sys.exit(0)
EDIT: git depend on external diff's exit status. diff exit with 0 only if there is no differce. So changed the code not to use diff's exit status.
PS: Without sys.stdout.flush(), diff output come before print output.

Check to see if python script is running

I have a python daemon running as a part of my web app/ How can I quickly check (using python) if my daemon is running and, if not, launch it?
I want to do it that way to fix any crashes of the daemon, and so the script does not have to be run manually, it will automatically run as soon as it is called and then stay running.
How can i check (using python) if my script is running?

A technique that is handy on a Linux system is using domain sockets:
import socket
import sys
import time
def get_lock(process_name):
# Without holding a reference to our socket somewhere it gets garbage
# collected when the function exits
get_lock._lock_socket = socket.socket(socket.AF_UNIX, socket.SOCK_DGRAM)
try:
# The null byte (\0) means the socket is created
# in the abstract namespace instead of being created
# on the file system itself.
# Works only in Linux
get_lock._lock_socket.bind('\0' + process_name)
print 'I got the lock'
except socket.error:
print 'lock exists'
sys.exit()
get_lock('running_test')
while True:
time.sleep(3)
It is atomic and avoids the problem of having lock files lying around if your process gets sent a SIGKILL
You can read in the documentation for socket.close that sockets are automatically closed when garbage collected.

Drop a pidfile somewhere (e.g. /tmp). Then you can check to see if the process is running by checking to see if the PID in the file exists. Don't forget to delete the file when you shut down cleanly, and check for it when you start up.
#/usr/bin/env python
import os
import sys
pid = str(os.getpid())
pidfile = "/tmp/mydaemon.pid"
if os.path.isfile(pidfile):
print "%s already exists, exiting" % pidfile
sys.exit()
file(pidfile, 'w').write(pid)
try:
# Do some actual work here
finally:
os.unlink(pidfile)
Then you can check to see if the process is running by checking to see if the contents of /tmp/mydaemon.pid are an existing process. Monit (mentioned above) can do this for you, or you can write a simple shell script to check it for you using the return code from ps.
ps up `cat /tmp/mydaemon.pid ` >/dev/null && echo "Running" || echo "Not running"
For extra credit, you can use the atexit module to ensure that your program cleans up its pidfile under any circumstances (when killed, exceptions raised, etc.).

The pid library can do exactly this.
from pid import PidFile
with PidFile():
do_something()
It will also automatically handle the case where the pidfile exists but the process is not running.

My solution is to check for the process and command line arguments
Tested on windows and ubuntu linux
import psutil
import os
def is_running(script):
for q in psutil.process_iter():
if q.name().startswith('python'):
if len(q.cmdline())>1 and script in q.cmdline()[1] and q.pid !=os.getpid():
print("'{}' Process is already running".format(script))
return True
return False
if not is_running("test.py"):
n = input("What is Your Name? ")
print ("Hello " + n)

Of course the example from Dan will not work as it should be.
Indeed, if the script crash, rise an exception, or does not clean pid file, the script will be run multiple times.
I suggest the following based from another website:
This is to check if there is already a lock file existing
\#/usr/bin/env python
import os
import sys
if os.access(os.path.expanduser("~/.lockfile.vestibular.lock"), os.F_OK):
#if the lockfile is already there then check the PID number
#in the lock file
pidfile = open(os.path.expanduser("~/.lockfile.vestibular.lock"), "r")
pidfile.seek(0)
old_pid = pidfile.readline()
# Now we check the PID from lock file matches to the current
# process PID
if os.path.exists("/proc/%s" % old_pid):
print "You already have an instance of the program running"
print "It is running as process %s," % old_pid
sys.exit(1)
else:
print "File is there but the program is not running"
print "Removing lock file for the: %s as it can be there because of the program last time it was run" % old_pid
os.remove(os.path.expanduser("~/.lockfile.vestibular.lock"))
This is part of code where we put a PID file in the lock file
pidfile = open(os.path.expanduser("~/.lockfile.vestibular.lock"), "w")
pidfile.write("%s" % os.getpid())
pidfile.close()
This code will check the value of pid compared to existing running process., avoiding double execution.
I hope it will help.

There are very good packages for restarting processes on UNIX. One that has a great tutorial about building and configuring it is monit. With some tweaking you can have a rock solid proven technology keeping up your daemon.

Came across this old question looking for solution myself.
Use psutil:
import psutil
import sys
from subprocess import Popen
for process in psutil.process_iter():
if process.cmdline() == ['python', 'your_script.py']:
sys.exit('Process found: exiting.')
print('Process not found: starting it.')
Popen(['python', 'your_script.py'])

There are a myriad of options. One method is using system calls or python libraries that perform such calls for you. The other is simply to spawn out a process like:
ps ax | grep processName
and parse the output. Many people choose this approach, it isn't necessarily a bad approach in my view.

I'm a big fan of Supervisor for managing daemons. It's written in Python, so there are plenty of examples of how to interact with or extend it from Python. For your purposes the XML-RPC process control API should work nicely.

Try this other version
def checkPidRunning(pid):
'''Check For the existence of a unix pid.
'''
try:
os.kill(pid, 0)
except OSError:
return False
else:
return True
# Entry point
if __name__ == '__main__':
pid = str(os.getpid())
pidfile = os.path.join("/", "tmp", __program__+".pid")
if os.path.isfile(pidfile) and checkPidRunning(int(file(pidfile,'r').readlines()[0])):
print "%s already exists, exiting" % pidfile
sys.exit()
else:
file(pidfile, 'w').write(pid)
# Do some actual work here
main()
os.unlink(pidfile)

Rather than developing your own PID file solution (which has more subtleties and corner cases than you might think), have a look at supervisord -- this is a process control system that makes it easy to wrap job control and daemon behaviors around an existing Python script.

The other answers are great for things like cron jobs, but if you're running a daemon you should monitor it with something like daemontools.

ps ax | grep processName
if yor debug script in pycharm always exit
pydevd.py --multiproc --client 127.0.0.1 --port 33882 --file processName

try this:
#/usr/bin/env python
import os, sys, atexit
try:
# Set PID file
def set_pid_file():
pid = str(os.getpid())
f = open('myCode.pid', 'w')
f.write(pid)
f.close()
def goodby():
pid = str('myCode.pid')
os.remove(pid)
atexit.register(goodby)
set_pid_file()
# Place your code here
except KeyboardInterrupt:
sys.exit(0)

Here is more useful code (with checking if exactly python executes the script):
#! /usr/bin/env python
import os
from sys import exit
def checkPidRunning(pid):
global script_name
if pid<1:
print "Incorrect pid number!"
exit()
try:
os.kill(pid, 0)
except OSError:
print "Abnormal termination of previous process."
return False
else:
ps_command = "ps -o command= %s | grep -Eq 'python .*/%s'" % (pid,script_name)
process_exist = os.system(ps_command)
if process_exist == 0:
return True
else:
print "Process with pid %s is not a Python process. Continue..." % pid
return False
if __name__ == '__main__':
script_name = os.path.basename(__file__)
pid = str(os.getpid())
pidfile = os.path.join("/", "tmp/", script_name+".pid")
if os.path.isfile(pidfile):
print "Warning! Pid file %s existing. Checking for process..." % pidfile
r_pid = int(file(pidfile,'r').readlines()[0])
if checkPidRunning(r_pid):
print "Python process with pid = %s is already running. Exit!" % r_pid
exit()
else:
file(pidfile, 'w').write(pid)
else:
file(pidfile, 'w').write(pid)
# main programm
....
....
os.unlink(pidfile)
Here is string:
ps_command = "ps -o command= %s | grep -Eq 'python .*/%s'" % (pid,script_name)
returns 0 if "grep" is successful, and the process "python" is currently running with the name of your script as a parameter .

A simple example if you only are looking for a process name exist or not:
import os
def pname_exists(inp):
os.system('ps -ef > /tmp/psef')
lines=open('/tmp/psef', 'r').read().split('\n')
res=[i for i in lines if inp in i]
return True if res else False
Result:
In [21]: pname_exists('syslog')
Out[21]: True
In [22]: pname_exists('syslog_')
Out[22]: False

I was looking for an answer on this and in my case, came to mind a very easy and very good solution, in my opinion (since it's not possible to exist a false positive on this, I guess - how can the timestamp on the TXT be updated if the program doesn't do it):
--> just keep writing on a TXT the current timestamp in some time interval, depending on your needs (here each half hour was perfect).
If the timestamp on the TXT is outdated relatively to the current one when you check, then there was a problem on the program and it should be restarted or what you prefer to do.

A portable solution that relies on multiprocessing.shared_memory:
import atexit
from multiprocessing import shared_memory
_ensure_single_process_store = {}
def ensure_single_process(name: str):
if name in _ensure_single_process_store:
return
try:
shm = shared_memory.SharedMemory(name='ensure_single_process__' + name,
create=True,
size=1)
except FileExistsError:
print(f"{name} is already running!")
raise
_ensure_single_process_store[name] = shm
atexit.register(shm.unlink)
Usually you wouldn't have to use atexit, but sometimes it helps to clean up upon abnormal exit.

Consider the following example to solve your problem:
#!/usr/bin/python
# -*- coding: latin-1 -*-
import os, sys, time, signal
def termination_handler (signum,frame):
global running
global pidfile
print 'You have requested to terminate the application...'
sys.stdout.flush()
running = 0
os.unlink(pidfile)
running = 1
signal.signal(signal.SIGINT,termination_handler)
pid = str(os.getpid())
pidfile = '/tmp/'+os.path.basename(__file__).split('.')[0]+'.pid'
if os.path.isfile(pidfile):
print "%s already exists, exiting" % pidfile
sys.exit()
else:
file(pidfile, 'w').write(pid)
# Do some actual work here
while running:
time.sleep(10)
I suggest this script because it can be executed one time only.

Using bash to look for a process with the current script's name. No extra file.
import commands
import os
import time
import sys
def stop_if_already_running():
script_name = os.path.basename(__file__)
l = commands.getstatusoutput("ps aux | grep -e '%s' | grep -v grep | awk '{print $2}'| awk '{print $2}'" % script_name)
if l[1]:
sys.exit(0);
To test, add
stop_if_already_running()
print "running normally"
while True:
time.sleep(3)

This is what I use in Linux to avoid starting a script if already running:
import os
import sys
script_name = os.path.basename(__file__)
pidfile = os.path.join("/tmp", os.path.splitext(script_name)[0]) + ".pid"
def create_pidfile():
if os.path.exists(pidfile):
with open(pidfile, "r") as _file:
last_pid = int(_file.read())
# Checking if process is still running
last_process_cmdline = "/proc/%d/cmdline" % last_pid
if os.path.exists(last_process_cmdline):
with open(last_process_cmdline, "r") as _file:
cmdline = _file.read()
if script_name in cmdline:
raise Exception("Script already running...")
with open(pidfile, "w") as _file:
pid = str(os.getpid())
_file.write(pid)
def main():
"""Your application logic goes here"""
if __name__ == "__main__":
create_pidfile()
main()
This approach works good without any dependency on an external module.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

custom nagios script with nrpe resulting in non-zero exit status 1 - python

I guess what Mr #EE1213 mentions is right. If you have the permission to see /var/log/secure, the log probably contains error messages regarding sudoers. Like: "sorry, you must have a tty to run sudo"

Related

How to check if a string is a valid shell command using Python?

adding an IF statement for exit code 0 in python using system module

Multiple SSH Connections in a Python 2.7 script- Multiprocessing Vs Threading

Git diff is complaining, "external diff died, stopping at ... " with my python diff program

Check to see if python script is running

Categories

Resources