I need to detect when a program crashes or is not running using python and restart it. I need a method that doesn't necessarily rely on the python module being the parent process.
I'm considering implementing a while loop that essentially does
ps -ef | grep process name
and when the process isn't found it starts another. Perhaps this isn't the most efficient method. I'm new to python so possibly there is a python module that does this already.
Why implement it yourself? An existing utility like daemon or Debian's start-stop-daemon is more likely to get the other difficult stuff right about running long-living server processes.
Anyway, when you start the service, put its pid in /var/run/<name>.pid and then make your ps command just look for that process ID, and check that it is the right process. On Linux you can simply look at /proc/<pid>/exe to check that it points to the right executable.
Please don't reinvent init. Your OS has capabilities to do this that require nearly no system resources and will definitely do it better and more reliably than anything you can reproduce.
Classic Linux has /etc/inittab
Ubuntu has /etc/event.d (upstart)
OS X has launchd
Solaris has smf
The following code checks a given process in a given interval, and restarts it.
#Restarts a given process if it is finished.
#Compatible with Python 2.5, tested on Windows XP.
import threading
import time
import subprocess
class ProcessChecker(threading.Thread):
def __init__(self, process_path, check_interval):
threading.Thread.__init__(self)
self.process_path = process_path
self.check_interval = check_interval
def run (self):
while(1):
time.sleep(self.check_interval)
if self.is_ok():
self.make_sure_process_is_running()
def is_ok(self):
ok = True
#do the database locks, client data corruption check here,
#and return true/false
return ok
def make_sure_process_is_running(self):
#This call is blocking, it will wait for the
#other sub process to be finished.
retval = subprocess.call(self.process_path)
def main():
process_path = "notepad.exe"
check_interval = 1 #In seconds
pm = ProcessChecker(process_path, check_interval)
pm.start()
print "Checker started..."
if __name__ == "__main__":
main()
maybe you need http://supervisord.org
I haven't tried it myself, but there is a Python System Information module that can be used to find processes and get information about them. AFAIR there is a ProcessTable class that can be used to inspect the running processes, but it doesn't seem to be very well documented...
I'd go the command-line route (it's just easier imho) as long as you only check every second or two the resource usage should be infintesimal compared to the available processing on any system less than 10 years old.
Related
I have two scripts a.py and b.py, they send data to each other via a local host (mqtt), and they both depend from a configuration file conf.json. I usually execute them in two different terminals,
a.py in one terminal
b.py in another
and everything it's OK. I am trying right now to create another script c.py which should do the following:
for parameter in parameters
update config.json
execute a.py and b.py "in two different terminals"
close a.py, b.py and start again with the new parameters
Now, I am very noob about this, so I tried to use Thread from threading
from threading import Thread
for parameter in parameter
#update config.json
class exp(Thread):
def __init__(self, name):
Thread.__init__(self)
self.name = name
def run(self):
if self.name == 0:
a.runs()
else:
b.runs()
thread1 = exp(0)
thread1.start()
thread2 = exp(1)
thread2.start()
a.py and b.py scripts both end by:
def runs():
#whatever runs do
if __name__ = 'main':
runs()
It runs without errors, but it does not work. I am quite sure there should be a nice and standard solution to this problem. Any ideas? Thanks!
So I eventually found this (dirty) solution...any advices for improvements?
p = subprocess.Popen(['python', 'random.py']) #initialize a random process just to make sense of the firsts terminate calls in the for cycle.
for parameter in parameters
subprocess.Popen.terminate(p)
#random code
p = subprocess.Popen(['python', 'a.py'])
p = subprocess.call(['python', 'b.py'])
#here I would like to do
#subprocess.Popen.terminate(p)....but it does not work, so I putted the terminate process at the start of the for cycle
I do not totally understand what I wrote but it works fine. Thanks everybody for previous tips, and I hope for further explanations.
You probably want multiprocessing, not the threading library (look up multiprocessing.Process. Another fairly equivalent option is to use subprocess.run to launch the two scripts via shell.
Regarding threads - Keep in mind they are limited due to the Global Interpreter Lock in CPython - which is the prevalent python implementation and the one you probably are using.
you can us qt Threading. Qt has a very powerful library exactly for this purpose.
I have a website (Wordpress site on Ubuntu OS and Apache Server) with special math calculators, many of which utilize python3 scripts to do the main calculations. The flow of data on these calculators is as such:
1.) User inputs numbers into html form, then hits submit button.
2.) PHP function is called, it assigns html user inputs to variables and does exec() on applicable python3 file with those variables (the user inputs are filtered and escapeshellarg is used so all good here).
3.) PHP function returns result of python3 script which is displayed via shortcode on the calculator web page.
The issue I am having is that occasionally the symbolic and numeric computations within my python scripts will hang up indefinitely. As that python3 process keeps running, it starts to use massive CPU and memory resources (big problem during peak traffic hours).
My question is this: is there some way to make a script or program on my server's backend that will kill a process instance of python3 if it has exceeded an arbitrary runtime and CPU usage level? I would like to restrict it only to instances of python3 so that it can't kill something like mysqld. Also, I am OK if it only uses runtime as a kill condition. None of my python scripts should run longer than ~10 seconds under normal circumstances and CPU usage will not be an issue if they don't run longer than 10 seconds.
You can create another python script to serve as a health checker on your server based on the psutil and os modules.
The following code could serve as a base for your specific needs, notice that what it does is basically check for the PIDs for the python scripts on the script_name_list variable based on the name of the script and kill them after checking if your server's CPU is above some threshold or if the memory available is below some threshold as well.
#!/usr/bin/env python3
import psutil
import os
import signal
CPU_LIMIT = 80 #Change Me
AV_MEM = 500.0 #Change Me
script_name_list = ['script1'] #Put in the name of the scripts
def find_other_scripts_pid(script_list):
pid_list = []
for proc in psutil.process_iter(['pid','name', 'cmdline']):
#this is not the PID of the process referencing this script and therefore we chould check inside the list of script name to kill them
if proc.info['pid'] != os.getpid() and proc.info['name'] in ['python','python3']:
for element in proc.info['cmdline']:
for script_name in script_name_list:
if script_name in element:
pid_list.append(proc.info['pid'])
return pid_list
def kill_process(pid):
if psutil.pid_exists(pid):
os.kill(pid,signal.SIGKILL)
return None
def check_cpu():
return psutil.cpu_percent(interval=1)
def check_available_memory():
mem = psutil.virtual_memory()
return mem.available/(2**(20))
def main():
cpu_usage = check_cpu()
av_memory_mb = check_available_memory()
if cpu_usage > CPU_LIMIT or av_memory_mb < AV_MEM:
pid_list = find_other_scripts_pid(script_name_list)
for pid in pid_list:
kill_process(pid)
if __name__ == "__main__":
main()
You can afterwards run this script periodically on your server by using a crontab as explained on this post shared within the community.
I have a multi processed web server with processes that never end, I would like to check my code coverage on the whole project in a live environment (not only from tests).
The problem is, that since the processes never end, I don't have a good place to set the cov.start() cov.stop() cov.save() hooks.
Therefore, I thought about spawning a thread that in an infinite loop will save and combine the coverage data and then sleep some time, however this approach doesn't work, the coverage report seems to be empty, except from the sleep line.
I would be happy to receive any ideas about how to get the coverage of my code,
or any advice about why my idea doesn't work. Here is a snippet of my code:
import coverage
cov = coverage.Coverage()
import time
import threading
import os
class CoverageThread(threading.Thread):
_kill_now = False
_sleep_time = 2
#classmethod
def exit_gracefully(cls):
cls._kill_now = True
def sleep_some_time(self):
time.sleep(CoverageThread._sleep_time)
def run(self):
while True:
cov.start()
self.sleep_some_time()
cov.stop()
if os.path.exists('.coverage'):
cov.combine()
cov.save()
if self._kill_now:
break
cov.stop()
if os.path.exists('.coverage'):
cov.combine()
cov.save()
cov.html_report(directory="coverage_report_data.html")
print "End of the program. I was killed gracefully :)"
Apparently, it is not possible to control coverage very well with multiple Threads.
Once different thread are started, stopping the Coverage object will stop all coverage and start will only restart it in the "starting" Thread.
So your code basically stops the coverage after 2 seconds for all Thread other than the CoverageThread.
I played a bit with the API and it is possible to access the measurments without stopping the Coverage object.
So you could launch a thread that save the coverage data periodically, using the API.
A first implementation would be something like in this
import threading
from time import sleep
from coverage import Coverage
from coverage.data import CoverageData, CoverageDataFiles
from coverage.files import abs_file
cov = Coverage(config_file=True)
cov.start()
def get_data_dict(d):
"""Return a dict like d, but with keys modified by `abs_file` and
remove the copied elements from d.
"""
res = {}
keys = list(d.keys())
for k in keys:
a = {}
lines = list(d[k].keys())
for l in lines:
v = d[k].pop(l)
a[l] = v
res[abs_file(k)] = a
return res
class CoverageLoggerThread(threading.Thread):
_kill_now = False
_delay = 2
def __init__(self, main=True):
self.main = main
self._data = CoverageData()
self._fname = cov.config.data_file
self._suffix = None
self._data_files = CoverageDataFiles(basename=self._fname,
warn=cov._warn)
self._pid = os.getpid()
super(CoverageLoggerThread, self).__init__()
def shutdown(self):
self._kill_now = True
def combine(self):
aliases = None
if cov.config.paths:
from coverage.aliases import PathAliases
aliases = PathAliases()
for paths in self.config.paths.values():
result = paths[0]
for pattern in paths[1:]:
aliases.add(pattern, result)
self._data_files.combine_parallel_data(self._data, aliases=aliases)
def export(self, new=True):
cov_report = cov
if new:
cov_report = Coverage(config_file=True)
cov_report.load()
self.combine()
self._data_files.write(self._data)
cov_report.data.update(self._data)
cov_report.html_report(directory="coverage_report_data.html")
cov_report.report(show_missing=True)
def _collect_and_export(self):
new_data = get_data_dict(cov.collector.data)
if cov.collector.branch:
self._data.add_arcs(new_data)
else:
self._data.add_lines(new_data)
self._data.add_file_tracers(get_data_dict(cov.collector.file_tracers))
self._data_files.write(self._data, self._suffix)
if self.main:
self.export()
def run(self):
while True:
sleep(CoverageLoggerThread._delay)
if self._kill_now:
break
self._collect_and_export()
cov.stop()
if not self.main:
self._collect_and_export()
return
self.export(new=False)
print("End of the program. I was killed gracefully :)")
A more stable version can be found in this GIST.
This code basically grab the info collected by the collector without stopping it.
The get_data_dict function take the dictionary in the Coverage.collector and pop the available data. This should be safe enough so you don't lose any measurement.
The report files get updated every _delay seconds.
But if you have multiple process running, you need to add extra efforts to make sure all the process run the CoverageLoggerThread. This is the patch_multiprocessing function, monkey patched from the coverage monkey patch...
The code is in the GIST. It basically replaces the original Process with a custom process, which start the CoverageLoggerThread just before running the run method and join the thread at the end of the process.
The script main.py permits to launch different tests with threads and processes.
There is 2/3 drawbacks to this code that you need to be carefull of:
It is a bad idea to use the combine function concurrently as it performs comcurrent read/write/delete access to the .coverage.* files. This means that the function export is not super safe. It should be alright as the data is replicated multiple time but I would do some testing before using it in production.
Once the data have been exported, it stays in memory. So if the code base is huge, it could eat some ressources. It is possible to dump all the data and reload it but I assumed that if you want to log every 2 seconds, you do not want to reload all the data every time. If you go with a delay in minutes, I would create a new _data every time, using CoverageData.read_file to reload previous state of the coverage for this process.
The custom process will wait for _delay before finishing as we join the CoverageThreadLogger at the end of the process so if you have a lot of quick processes, you want to increase the granularity of the sleep to be able to detect the end of the Process more quickly. It just need a custom sleep loop that break on _kill_now.
Let me know if this help you in some way or if it is possible to improve this gist.
EDIT:
It seems you do not need to monkey patch the multiprocessing module to start automatically a logger. Using the .pth in your python install you can use a environment variable to start automatically your logger on new processes:
# Content of coverage.pth in your site-package folder
import os
if "COVERAGE_LOGGER_START" in os.environ:
import atexit
from coverage_logger import CoverageLoggerThread
thread_cov = CoverageLoggerThread(main=False)
thread_cov.start()
def close_cov()
thread_cov.shutdown()
thread_cov.join()
atexit.register(close_cov)
You can then start your coverage logger with COVERAGE_LOGGER_START=1 python main.y
Since you are willing to run your code differently for the test, why not add a way to end the process for the test? That seems like it will be simpler than trying to hack coverage.
You can use pyrasite directly, with the following two programs.
# start.py
import sys
import coverage
sys.cov = cov = coverage.coverage()
cov.start()
And this one
# stop.py
import sys
sys.cov.stop()
sys.cov.save()
sys.cov.html_report()
Another way to go would be to trace the program using lptrace even if it only prints calls it can be useful.
The context for this is much, much too big for an SO question so the code below is a extremely simplified demonstration of the actual implementation.
Generally, I've written an extensive module for academic contexts that launches a subprocess at runtime to be used for event scheduling. When a script or program using this module closes on pre-El Capitan machines my efforts to join the child process fail, as do my last-ditch efforts to just kill the process; OS X gives a "Python unexpectedly quit" error and the the orphaned process persists. I am very much a nub to multiprocessing, without a CS background; diagnosing this is beyond me.
If I am just too ignorant, I'm more than willing to go RTFM; specific directions welcome.
I'm pretty sure this example is coherent & representative, but, know that the actual project works flawlessly on El Capitan, works during runtime on everything else, but consistently crashes as described when quitting. I've tested it with absurd time-out values (30 sec+); always the same result.
One last note: I started this with python's default multiprocessing libraries, then switched to billiard as a dev friend suggested it might run smoother. To date, I've not experienced any difference.
UPDATE:
Had omitted the function that gives the #threaded decorator purpose; now present in code.
Generally, we have:
shared_queue = billiard.Queue() # or multiprocessing, have used both
class MainInstanceParent(object):
def __init__(self):
# ..typically init stuff..
self.event_ob = EventClass(self) # gets a reference to parent
def quit():
try:
self.event_ob.send("kkbai")
started = time.time()
while time.time - started < 1: # or whatever
self.event_ob.recieve()
if self.event_ob.event_p.is_alive():
raise RuntimeError("Little bugger still kickin'")
except RuntimeError:
os.kill(self.event_on.event_p.pid, SIGKILL)
class EventClass(object):
def __init__(self, parent):
# moar init stuff
self.parent = parent
self.pipe, child = Pipe()
self.event_p = __event_process(child)
def receive():
self.pipe.poll()
t = self.pipe.recv()
if isinstance(t, Exception):
raise t
return t
def send(deets):
self.pipe.send(deets)
def threaded(func):
def threaded_func(*args, **kwargs):
p = billiard.Process(target=func, args=args, kwargs=kwargs)
p.start()
return p
return threaded_func
#threaded
def __event_process(pipe):
while True:
if pipe.poll():
inc = pipe.recv()
# do stuff conditionally on what comes through
if inc == "kkbai":
return
if inc == "meets complex condition to pass here":
shared_queue.put("stuff inferred from inc")
Before exiting the main program, call multiprocessing.active_children() to see how many child processes are still running. This will also join the processes that have already quit.
If you would need to signal the children that it's time to quit, create a multiprocessing.Event before starting the child processes. Give it a meaningful name like children_exit. The child processes should regularly call children_exit.is_set() to see if it is time for them to quit. In the main program you call children_exit.set() to signal the child processes.
Update:
Have a good look through the Programming guidelines in the multiprocessing documentation;
It is best to provide the abovementioned Event objects as argument to the target of the Process initializer for reasons mentioned in those guidelines.
If your code also needs to run on ms-windows, you have to jump through some extra hoop, since that OS doesn't do fork().
Update 2:
On your PyEval_SaveThread error; could you modify your question to show the complete trace or alternatively could you post it somewhere?
Since multiprocessing uses threads internally, this is probably the culprit, unless you are also using threads somewhere.
If you also use threads note that GUI toolkits in general and tkinter in particular are not thread safe. Tkinter calls should therefore only be made from one thread!
How much work would it be to port your code to Python 3? If it is a bug in Python 2.7, it might be already fixed in the current (as of now) Python 3.5.1.
--Hi guys, --
I have about 4000 (1-50MB) files to sort.
I was thinking to have Python call the Linux sort command. And since I'm thinking this might be somewhat I/O bound, I would use the threading library.
So here's what I have but I when I run it and watch the system monitor I don't see 25 sort tasks pop up. It seems to be running one at a time? What am I doing wrong?
...
print "starting sort"
def sort_unique(file_path):
"""Run linux sort -ug on a file"""
out = commands.getoutput('sort -ug -o "%s" "%s"' % (file_path, file_path))
assert not out
pool = ThreadPool(25)
for fn in os.listdir(target_dir):
fp = os.path.join(target_dir,fn)
pool.add_task(sort_unique, fp)
pool.wait_completion()
Here's where ThreadPool comes from, perhaps that is broken?
You're doing everything correct.
There is something which is called GIL in python;
Global Interpreter Lock - which eventually cause python to execute only one thread at time.
Choose subprocess instead :), python is not multithreaded.
Actually this does seem to be working. I spoke too soon. I'm not sure if you guys want to delete this or what? Sorry about that.
Normally people do this by spawning multiple processes. The multiprocessing module makes this easy to do.
On the other hand, Python is pretty good at sorting, so why not just read the file into a list of strings file.readlines() and then sort it in Python. You would have to write a key function to use with list.sort() to do the -g option, and you would also have to remove duplicates, i.e. -u option. The easiest way (and a fast way) to remove duplicates is to do list(set(UNsortedfile)) before you do the sort.