How would I unit test the following?
def sigterm_handler(signum, frame):
pid = os.getpid() # type: int
sys.exit(0)
signal.signal(signal.SIGTERM, sigterm_handler)
Should I mock and ensure mock is called?
I would write a test that runs your code in a subprocess which can check if you terminated successfully.
For example, let's say your question code lives in a module called signals.py. You can write a test wrapper module that looks like this:
test_signals_wrapper.py
from time import sleep
from sys import exit
# import this last to ensure it overrides any prior settings
import signals
while True:
sleep(1)
exit(1) # just in case the loop ends for other reasons
Now you can write a unit test that looks like this:
test_signals.py
from subprocess import run, TimeoutExpired
from sys import executable
def test_sigterm_handler():
try:
status = run([executable, '-m', 'test_signals_wrapper'], timeout=30)
except TimeoutExpired:
assert False, 'Did not trigger assertion in 30 seconds'
assert status.returncode == 0, f'Wrong return code: {status.returncode}'
This requires a bit of extra infrastructure for your test, but it solves all the problems with testing this code. By running in a subprocess, you can freely execute sys.exit and get the return value. By having a wrapper script, you can control how the code is loaded and run. You don't need to mock anything, just make sure that your packages are set up correctly, and that your test runner doesn't attempt to pick up the wrapper script as a test.
The code lines you have shown are not suited to be unit-tested, but should rather be integration tested. The reason is, that your code lines consist only of interactions with other components (in this case the signal, sys and os modules).
Therefore, the bugs you can expect to encounter lie in the interactions with these other components: Are you calling the right functions in the right components with the right values for the arguments in the right order and are the results/reactions as you expect them to be?
All these questions can not be answered in a unit-test, where bugs shall be found that can be found in the isolated units: If you mock the signal, sys and/or the os dependencies, then you will write your mocks such that they reflect your (potentially wrong) understanding of these components. The unit-tests will therefore succeed, although the code in the integrated system may fail. If your intent is that the code works on different systems, you might even encounter the situation that the code works in one integration (maybe Linux) but fails in another (maybe Windows).
Therefore, for code like yours, unit-testing and thus mocking for unit-testing does not have much value.
Monkey patch the handler and send the signal when testing?
import os
import signal
import sys
import time
# your handler
def sigterm_handler(signum, frame):
print("Handled")
pid = os.getpid() # type: int FIXME: what's this for?
sys.exit(0)
signal.signal(signal.SIGTERM, sigterm_handler)
# Mock out the existing sigterm_handler
_handled = False
def mocked_sigterm_handler(signum, frame):
print("Mocked")
_handled = True
# register the handler
signal.signal(signal.SIGTERM, mocked_sigterm_handler)
# test sending the signal
os.kill(os.getpid(), signal.SIGTERM)
print(f"done ({_handled})")
# reset your handler?
signal.signal(signal.SIGTERM, sigterm_handler)
If you want to test you handler itself you'll probably have to put some kind of code like this.. in the handler which is not beautiful.
if _unittesting_sigterm_handler:
_handled = True
else:
sys.exit(0)
and then you can just call the handler directly (or pass the test flag in the call).
_unittesting_sigterm_handler = True
sigterm_handler(0, None)
Related
I've got a Python script that sometimes displays images to the user. The images can, at times, be quite large, and they are reused often. Displaying them is not critical, but displaying the message associated with them is. I've got a function that downloads the image needed and saves it locally. Right now it's run inline with the code that displays a message to the user, but that can sometimes take over 10 seconds for non-local images. Is there a way I could call this function when it's needed, but run it in the background while the code continues to execute? I would just use a default image until the correct one becomes available.
Do something like this:
def function_that_downloads(my_args):
# do some long download here
then inline, do something like this:
import threading
def my_inline_function(some_args):
# do some stuff
download_thread = threading.Thread(target=function_that_downloads, name="Downloader", args=some_args)
download_thread.start()
# continue doing stuff
You may want to check if the thread has finished before going on to other things by calling download_thread.isAlive()
Typically the way to do this would be to use a thread pool and queue downloads which would issue a signal, a.k.a an event, when that task has finished processing. You can do this within the scope of the threading module Python provides.
To perform said actions, I would use event objects and the Queue module.
However, a quick and dirty demonstration of what you can do using a simple threading.Thread implementation can be seen below:
import os
import threading
import time
import urllib2
class ImageDownloader(threading.Thread):
def __init__(self, function_that_downloads):
threading.Thread.__init__(self)
self.runnable = function_that_downloads
self.daemon = True
def run(self):
self.runnable()
def downloads():
with open('somefile.html', 'w+') as f:
try:
f.write(urllib2.urlopen('http://google.com').read())
except urllib2.HTTPError:
f.write('sorry no dice')
print 'hi there user'
print 'how are you today?'
thread = ImageDownloader(downloads)
thread.start()
while not os.path.exists('somefile.html'):
print 'i am executing but the thread has started to download'
time.sleep(1)
print 'look ma, thread is not alive: ', thread.is_alive()
It would probably make sense to not poll like I'm doing above. In which case, I would change the code to this:
import os
import threading
import time
import urllib2
class ImageDownloader(threading.Thread):
def __init__(self, function_that_downloads):
threading.Thread.__init__(self)
self.runnable = function_that_downloads
def run(self):
self.runnable()
def downloads():
with open('somefile.html', 'w+') as f:
try:
f.write(urllib2.urlopen('http://google.com').read())
except urllib2.HTTPError:
f.write('sorry no dice')
print 'hi there user'
print 'how are you today?'
thread = ImageDownloader(downloads)
thread.start()
# show message
thread.join()
# display image
Notice that there's no daemon flag set here.
I prefer to use gevent for this sort of thing:
import gevent
from gevent import monkey; monkey.patch_all()
greenlet = gevent.spawn( function_to_download_image )
display_message()
# ... perhaps interaction with the user here
# this will wait for the operation to complete (optional)
greenlet.join()
# alternatively if the image display is no longer important, this will abort it:
#greenlet.kill()
Everything runs in one thread, but whenever a kernel operation blocks, gevent switches contexts when there are other "greenlets" running. Worries about locking, etc are much reduced, as there is only one thing running at a time, yet the image will continue to download whenever a blocking operation executes in the "main" context.
Depending on how much, and what kind of thing you want to do in the background, this can be either better or worse than threading-based solutions; certainly, it is much more scaleable (ie you can do many more things in the background), but that might not be of concern in the current situation.
import threading
import os
def killme():
if keyboard.read_key() == "q":
print("Bye ..........")
os._exit(0)
threading.Thread(target=killme, name="killer").start()
If you want to add more keys, add defs and threading.Thread(target=killme, name="killer").start() lines multiple times. It looks bad but works much better than complex codes.
I have built a tool using django to automate script execution. The tool is working fine but sometimes the scripts take too long to execute. I want to limit the time for which my tool can execute each script.
I have found 2 approaches and implemented them but I am not sure which is the right one to use.
1.) Using the signal module
2.) Using multiprocessing
Here is the sample code for both approaches
1.) Using the signal module
import signal
from contextlib import contextmanager
class TimeoutException(Exception): pass
#contextmanager
def time_limit(seconds):
def signal_handler(signum, frame):
raise TimeoutException("Timed out!")
signal.signal(signal.SIGALRM, signal_handler)
signal.alarm(seconds)
try:
yield
finally:
signal.alarm(0)
try:
with time_limit(10):
long_function_call()
except TimeoutException as e:
print("Timed out!")
2.) Using multiprocessing
from multiprocessing import Process
from time import sleep
def f(time):
sleep(time)
def run_with_limited_time(func, args, kwargs, time):
p = Process(target=func, args=args, kwargs=kwargs)
p.start()
p.join(time)
if p.is_alive():
p.terminate()
return False
return True
if __name__ == '__main__':
print run_with_limited_time(f, (1.5, ), {}, 2.5) # True
print run_with_limited_time(f, (3.5, ), {}, 2.5) # False
The problem am facing with signal module is that signal only works in main thread.
Want to know which is the better approach and why? Also if there is any approach I can use to alter the behaviour of signal module.
The signal based approach comes with several corner cases and limitations. It is not portable, signals can be handled only on the main thread and if your application is busy in a low level loop (because it's calling some C api for example) your application will become unresponsive.
I would recommend the multiprocessing based approach as it overcomes all the above limitations and has one major benefit: it protects your service from crashes, timeouts and instabilities deriving from the logic you run in your functions.
There are few libraries built for helping in that, pebble and billiard are the ones which come to my mind.
Code:
# callee.py
import signal
import sys
import time
def int_handler(*args):
for i in range(10):
print('INTERRUPT', args)
sys.exit()
if __name__ == '__main__':
signal.signal(signal.SIGINT, int_handler)
signal.signal(signal.SIGTERM, int_handler)
while 1:
time.sleep(1)
# caller.py
import subprocess
import sys
def wait_and_communicate(p):
out, err = p.communicate(timeout=1)
print('========out==========')
print(out.decode() if out else '')
print('========err==========')
print(err.decode() if err else '')
print('=====================')
if __name__ == '__main__':
p = subprocess.Popen(
['/usr/local/bin/python3', 'callee.py'],
stdout=sys.stdout,
stderr=subprocess.PIPE,
)
while 1:
try:
wait_and_communicate(p)
except KeyboardInterrupt:
p.terminate()
wait_and_communicate(p)
break
except subprocess.TimeoutExpired:
continue
Simply execute caller.py and then press Ctrl+C, the program will raise RuntimeError: reentrant call inside <_io.BufferedWriter name='<stdout>'> randomly. From the documentation I learn that signal handlers are called asynchronously, and in this case two signals SIGINT(Ctrl+C action) and SIGTERM(p.terminate()) are sent nearly at the same time, causing a race condition.
However, from this post I learn that signal module doesn't execute signal handler inside low-level (C) handler. Instead, it sets a flag, and the interpreter checks the flag between bytecode instructions and then invokes the python signal handler. In other words, while signal handlers may mess up the control flow in the main thread, a bytecode instruction is always atomic.
This seems to contradict with the result of my example program. As far as I am concerned, print and the implicit _io.BufferedWriter are both implemented in pure C, and thus calling print function should consume only one bytecode instruction (CALL_FUNCTION). I am confused: within one uninterrupted instruction on one thread, how can a function be reentrant?
I'm using Python 3.6.2.
Signals are processed between opscode(see eval_frame_handle_pending()
in python's opscode processor loop), but not limited to it. print is a perfect example. It is implemented based on _io_BufferedWriter_write_impl(), which has a structure like
ENTER_BUFFERED() => it locks buffer
PyErr_CheckSignals() => it invoke signal handler
LEAVE_BUFFERED() => it unlocks buffer
by calling PyErr_CheckSignals(), it invoke another signal handler, which has another print in this case. The 2nd print run ENTER_BUFFERED() again, because the buffer is already locked by previous print in 1st signal handler, so the reentrant exception is thrown as below snippet shows.
// snippet of ENTER_BUFFERED
static int
_enter_buffered_busy(buffered *self)
{
int relax_locking;
PyLockStatus st;
if (self->owner == PyThread_get_thread_ident()) {
PyErr_Format(PyExc_RuntimeError,
"reentrant call inside %R", self);
return 0;
}
}
#define ENTER_BUFFERED(self) \
( (PyThread_acquire_lock(self->lock, 0) ? \
1 : _enter_buffered_busy(self)) \
&& (self->owner = PyThread_get_thread_ident(), 1) )
P.S.
Reentrant Functions from Advanced Programming in the Unix Environment.
The Single UNIX Specification specifies the functions that are guaranteed to be safe to call from within a signal handler. These functions are reentrant and are called async-signal safe. Most of the functions that are not reentrant because
they are known to use static data structures,
they call malloc or free
they are part of the standard I/O library. Most implementations of the standard I/O library use global data structures in a nonreentrant way. print in Python belongs to this category.
You might prefer to inhibit delivery of SIGINT to the child, so there's no race,
perhaps by putting it in a different process group,
or by having it ignore the signal.
Then only SIGTERM from the parent would matter.
To reveal where it was interrupted, use this:
sig_num, frame = args
print(dis.dis(frame.f_code.co_code))
print(frame.f_lasti)
The bytecode offsets in the left margin correspond to
that last instruction executed offset.
Other items of interest include
frame.f_lineno,
frame.f_code.co_filename, and
frame.f_code.co_names.
This issue becomes moot in python 3.7.3, which no longer exhibits the symptom.
I have a multi processed web server with processes that never end, I would like to check my code coverage on the whole project in a live environment (not only from tests).
The problem is, that since the processes never end, I don't have a good place to set the cov.start() cov.stop() cov.save() hooks.
Therefore, I thought about spawning a thread that in an infinite loop will save and combine the coverage data and then sleep some time, however this approach doesn't work, the coverage report seems to be empty, except from the sleep line.
I would be happy to receive any ideas about how to get the coverage of my code,
or any advice about why my idea doesn't work. Here is a snippet of my code:
import coverage
cov = coverage.Coverage()
import time
import threading
import os
class CoverageThread(threading.Thread):
_kill_now = False
_sleep_time = 2
#classmethod
def exit_gracefully(cls):
cls._kill_now = True
def sleep_some_time(self):
time.sleep(CoverageThread._sleep_time)
def run(self):
while True:
cov.start()
self.sleep_some_time()
cov.stop()
if os.path.exists('.coverage'):
cov.combine()
cov.save()
if self._kill_now:
break
cov.stop()
if os.path.exists('.coverage'):
cov.combine()
cov.save()
cov.html_report(directory="coverage_report_data.html")
print "End of the program. I was killed gracefully :)"
Apparently, it is not possible to control coverage very well with multiple Threads.
Once different thread are started, stopping the Coverage object will stop all coverage and start will only restart it in the "starting" Thread.
So your code basically stops the coverage after 2 seconds for all Thread other than the CoverageThread.
I played a bit with the API and it is possible to access the measurments without stopping the Coverage object.
So you could launch a thread that save the coverage data periodically, using the API.
A first implementation would be something like in this
import threading
from time import sleep
from coverage import Coverage
from coverage.data import CoverageData, CoverageDataFiles
from coverage.files import abs_file
cov = Coverage(config_file=True)
cov.start()
def get_data_dict(d):
"""Return a dict like d, but with keys modified by `abs_file` and
remove the copied elements from d.
"""
res = {}
keys = list(d.keys())
for k in keys:
a = {}
lines = list(d[k].keys())
for l in lines:
v = d[k].pop(l)
a[l] = v
res[abs_file(k)] = a
return res
class CoverageLoggerThread(threading.Thread):
_kill_now = False
_delay = 2
def __init__(self, main=True):
self.main = main
self._data = CoverageData()
self._fname = cov.config.data_file
self._suffix = None
self._data_files = CoverageDataFiles(basename=self._fname,
warn=cov._warn)
self._pid = os.getpid()
super(CoverageLoggerThread, self).__init__()
def shutdown(self):
self._kill_now = True
def combine(self):
aliases = None
if cov.config.paths:
from coverage.aliases import PathAliases
aliases = PathAliases()
for paths in self.config.paths.values():
result = paths[0]
for pattern in paths[1:]:
aliases.add(pattern, result)
self._data_files.combine_parallel_data(self._data, aliases=aliases)
def export(self, new=True):
cov_report = cov
if new:
cov_report = Coverage(config_file=True)
cov_report.load()
self.combine()
self._data_files.write(self._data)
cov_report.data.update(self._data)
cov_report.html_report(directory="coverage_report_data.html")
cov_report.report(show_missing=True)
def _collect_and_export(self):
new_data = get_data_dict(cov.collector.data)
if cov.collector.branch:
self._data.add_arcs(new_data)
else:
self._data.add_lines(new_data)
self._data.add_file_tracers(get_data_dict(cov.collector.file_tracers))
self._data_files.write(self._data, self._suffix)
if self.main:
self.export()
def run(self):
while True:
sleep(CoverageLoggerThread._delay)
if self._kill_now:
break
self._collect_and_export()
cov.stop()
if not self.main:
self._collect_and_export()
return
self.export(new=False)
print("End of the program. I was killed gracefully :)")
A more stable version can be found in this GIST.
This code basically grab the info collected by the collector without stopping it.
The get_data_dict function take the dictionary in the Coverage.collector and pop the available data. This should be safe enough so you don't lose any measurement.
The report files get updated every _delay seconds.
But if you have multiple process running, you need to add extra efforts to make sure all the process run the CoverageLoggerThread. This is the patch_multiprocessing function, monkey patched from the coverage monkey patch...
The code is in the GIST. It basically replaces the original Process with a custom process, which start the CoverageLoggerThread just before running the run method and join the thread at the end of the process.
The script main.py permits to launch different tests with threads and processes.
There is 2/3 drawbacks to this code that you need to be carefull of:
It is a bad idea to use the combine function concurrently as it performs comcurrent read/write/delete access to the .coverage.* files. This means that the function export is not super safe. It should be alright as the data is replicated multiple time but I would do some testing before using it in production.
Once the data have been exported, it stays in memory. So if the code base is huge, it could eat some ressources. It is possible to dump all the data and reload it but I assumed that if you want to log every 2 seconds, you do not want to reload all the data every time. If you go with a delay in minutes, I would create a new _data every time, using CoverageData.read_file to reload previous state of the coverage for this process.
The custom process will wait for _delay before finishing as we join the CoverageThreadLogger at the end of the process so if you have a lot of quick processes, you want to increase the granularity of the sleep to be able to detect the end of the Process more quickly. It just need a custom sleep loop that break on _kill_now.
Let me know if this help you in some way or if it is possible to improve this gist.
EDIT:
It seems you do not need to monkey patch the multiprocessing module to start automatically a logger. Using the .pth in your python install you can use a environment variable to start automatically your logger on new processes:
# Content of coverage.pth in your site-package folder
import os
if "COVERAGE_LOGGER_START" in os.environ:
import atexit
from coverage_logger import CoverageLoggerThread
thread_cov = CoverageLoggerThread(main=False)
thread_cov.start()
def close_cov()
thread_cov.shutdown()
thread_cov.join()
atexit.register(close_cov)
You can then start your coverage logger with COVERAGE_LOGGER_START=1 python main.y
Since you are willing to run your code differently for the test, why not add a way to end the process for the test? That seems like it will be simpler than trying to hack coverage.
You can use pyrasite directly, with the following two programs.
# start.py
import sys
import coverage
sys.cov = cov = coverage.coverage()
cov.start()
And this one
# stop.py
import sys
sys.cov.stop()
sys.cov.save()
sys.cov.html_report()
Another way to go would be to trace the program using lptrace even if it only prints calls it can be useful.
We were hit by this bug:
http://bugs.python.org/issue1856 Daemon threads segfault during interpreter shut down.
Now I search a way to code around this bug.
At the moment the code looks like this:
while True:
do_something()
time.sleep(interval)
Is there a way to check if the interpreter is still usable before do_something()?
Or is it better to not do mythread.setDaemon(True) and the check if the main thread has exited?
Answer to own question:
I use this pattern now: don't setDaemon(True), don't use sleep(), use parent_thread.join()
while True:
parent_thread.join(interval)
if not parent_thread.is_alive():
break
do_something()
Related: http://docs.python.org/2/library/threading.html#threading.Thread.join
This is a code from threading.py module:
import sys as _sys
class Thread(_Verbose):
def _bootstrap_inner(self):
# some code
# If sys.stderr is no more (most likely from interpreter
# shutdown) use self._stderr. Otherwise still use sys (as in
# _sys) in case sys.stderr was redefined since the creation of
# self.
if _sys:
_sys.stderr.write("Exception in thread %s:\n%s\n" %
(self.name, _format_exc()))
else:
# some code
might be helpful. The error you see comes from else statement. So in your case:
import sys as _sys
while True:
if not _sys:
break/return/die/whatever
do_something()
time.sleep(interval)
I'm not sure if it works though (note that interpreter shutdown may happen inside do_something so you should probably wrap everything with try:except:).
Daemon threads are not necessarily bad, they can definitely speed up development process. You just have to be careful with them.