python running coverage on never ending process - python

I have a multi processed web server with processes that never end, I would like to check my code coverage on the whole project in a live environment (not only from tests).
The problem is, that since the processes never end, I don't have a good place to set the cov.start() cov.stop() cov.save() hooks.
Therefore, I thought about spawning a thread that in an infinite loop will save and combine the coverage data and then sleep some time, however this approach doesn't work, the coverage report seems to be empty, except from the sleep line.
I would be happy to receive any ideas about how to get the coverage of my code,
or any advice about why my idea doesn't work. Here is a snippet of my code:
import coverage
cov = coverage.Coverage()
import time
import threading
import os
class CoverageThread(threading.Thread):
_kill_now = False
_sleep_time = 2
#classmethod
def exit_gracefully(cls):
cls._kill_now = True
def sleep_some_time(self):
time.sleep(CoverageThread._sleep_time)
def run(self):
while True:
cov.start()
self.sleep_some_time()
cov.stop()
if os.path.exists('.coverage'):
cov.combine()
cov.save()
if self._kill_now:
break
cov.stop()
if os.path.exists('.coverage'):
cov.combine()
cov.save()
cov.html_report(directory="coverage_report_data.html")
print "End of the program. I was killed gracefully :)"

Apparently, it is not possible to control coverage very well with multiple Threads.
Once different thread are started, stopping the Coverage object will stop all coverage and start will only restart it in the "starting" Thread.
So your code basically stops the coverage after 2 seconds for all Thread other than the CoverageThread.
I played a bit with the API and it is possible to access the measurments without stopping the Coverage object.
So you could launch a thread that save the coverage data periodically, using the API.
A first implementation would be something like in this
import threading
from time import sleep
from coverage import Coverage
from coverage.data import CoverageData, CoverageDataFiles
from coverage.files import abs_file
cov = Coverage(config_file=True)
cov.start()
def get_data_dict(d):
"""Return a dict like d, but with keys modified by `abs_file` and
remove the copied elements from d.
"""
res = {}
keys = list(d.keys())
for k in keys:
a = {}
lines = list(d[k].keys())
for l in lines:
v = d[k].pop(l)
a[l] = v
res[abs_file(k)] = a
return res
class CoverageLoggerThread(threading.Thread):
_kill_now = False
_delay = 2
def __init__(self, main=True):
self.main = main
self._data = CoverageData()
self._fname = cov.config.data_file
self._suffix = None
self._data_files = CoverageDataFiles(basename=self._fname,
warn=cov._warn)
self._pid = os.getpid()
super(CoverageLoggerThread, self).__init__()
def shutdown(self):
self._kill_now = True
def combine(self):
aliases = None
if cov.config.paths:
from coverage.aliases import PathAliases
aliases = PathAliases()
for paths in self.config.paths.values():
result = paths[0]
for pattern in paths[1:]:
aliases.add(pattern, result)
self._data_files.combine_parallel_data(self._data, aliases=aliases)
def export(self, new=True):
cov_report = cov
if new:
cov_report = Coverage(config_file=True)
cov_report.load()
self.combine()
self._data_files.write(self._data)
cov_report.data.update(self._data)
cov_report.html_report(directory="coverage_report_data.html")
cov_report.report(show_missing=True)
def _collect_and_export(self):
new_data = get_data_dict(cov.collector.data)
if cov.collector.branch:
self._data.add_arcs(new_data)
else:
self._data.add_lines(new_data)
self._data.add_file_tracers(get_data_dict(cov.collector.file_tracers))
self._data_files.write(self._data, self._suffix)
if self.main:
self.export()
def run(self):
while True:
sleep(CoverageLoggerThread._delay)
if self._kill_now:
break
self._collect_and_export()
cov.stop()
if not self.main:
self._collect_and_export()
return
self.export(new=False)
print("End of the program. I was killed gracefully :)")
A more stable version can be found in this GIST.
This code basically grab the info collected by the collector without stopping it.
The get_data_dict function take the dictionary in the Coverage.collector and pop the available data. This should be safe enough so you don't lose any measurement.
The report files get updated every _delay seconds.
But if you have multiple process running, you need to add extra efforts to make sure all the process run the CoverageLoggerThread. This is the patch_multiprocessing function, monkey patched from the coverage monkey patch...
The code is in the GIST. It basically replaces the original Process with a custom process, which start the CoverageLoggerThread just before running the run method and join the thread at the end of the process.
The script main.py permits to launch different tests with threads and processes.
There is 2/3 drawbacks to this code that you need to be carefull of:
It is a bad idea to use the combine function concurrently as it performs comcurrent read/write/delete access to the .coverage.* files. This means that the function export is not super safe. It should be alright as the data is replicated multiple time but I would do some testing before using it in production.
Once the data have been exported, it stays in memory. So if the code base is huge, it could eat some ressources. It is possible to dump all the data and reload it but I assumed that if you want to log every 2 seconds, you do not want to reload all the data every time. If you go with a delay in minutes, I would create a new _data every time, using CoverageData.read_file to reload previous state of the coverage for this process.
The custom process will wait for _delay before finishing as we join the CoverageThreadLogger at the end of the process so if you have a lot of quick processes, you want to increase the granularity of the sleep to be able to detect the end of the Process more quickly. It just need a custom sleep loop that break on _kill_now.
Let me know if this help you in some way or if it is possible to improve this gist.
EDIT:
It seems you do not need to monkey patch the multiprocessing module to start automatically a logger. Using the .pth in your python install you can use a environment variable to start automatically your logger on new processes:
# Content of coverage.pth in your site-package folder
import os
if "COVERAGE_LOGGER_START" in os.environ:
import atexit
from coverage_logger import CoverageLoggerThread
thread_cov = CoverageLoggerThread(main=False)
thread_cov.start()
def close_cov()
thread_cov.shutdown()
thread_cov.join()
atexit.register(close_cov)
You can then start your coverage logger with COVERAGE_LOGGER_START=1 python main.y

Since you are willing to run your code differently for the test, why not add a way to end the process for the test? That seems like it will be simpler than trying to hack coverage.

You can use pyrasite directly, with the following two programs.
# start.py
import sys
import coverage
sys.cov = cov = coverage.coverage()
cov.start()
And this one
# stop.py
import sys
sys.cov.stop()
sys.cov.save()
sys.cov.html_report()
Another way to go would be to trace the program using lptrace even if it only prints calls it can be useful.

Related

can I run a piece of code continuously and the main code at the same time in python? [duplicate]

I've got a Python script that sometimes displays images to the user. The images can, at times, be quite large, and they are reused often. Displaying them is not critical, but displaying the message associated with them is. I've got a function that downloads the image needed and saves it locally. Right now it's run inline with the code that displays a message to the user, but that can sometimes take over 10 seconds for non-local images. Is there a way I could call this function when it's needed, but run it in the background while the code continues to execute? I would just use a default image until the correct one becomes available.
Do something like this:
def function_that_downloads(my_args):
# do some long download here
then inline, do something like this:
import threading
def my_inline_function(some_args):
# do some stuff
download_thread = threading.Thread(target=function_that_downloads, name="Downloader", args=some_args)
download_thread.start()
# continue doing stuff
You may want to check if the thread has finished before going on to other things by calling download_thread.isAlive()
Typically the way to do this would be to use a thread pool and queue downloads which would issue a signal, a.k.a an event, when that task has finished processing. You can do this within the scope of the threading module Python provides.
To perform said actions, I would use event objects and the Queue module.
However, a quick and dirty demonstration of what you can do using a simple threading.Thread implementation can be seen below:
import os
import threading
import time
import urllib2
class ImageDownloader(threading.Thread):
def __init__(self, function_that_downloads):
threading.Thread.__init__(self)
self.runnable = function_that_downloads
self.daemon = True
def run(self):
self.runnable()
def downloads():
with open('somefile.html', 'w+') as f:
try:
f.write(urllib2.urlopen('http://google.com').read())
except urllib2.HTTPError:
f.write('sorry no dice')
print 'hi there user'
print 'how are you today?'
thread = ImageDownloader(downloads)
thread.start()
while not os.path.exists('somefile.html'):
print 'i am executing but the thread has started to download'
time.sleep(1)
print 'look ma, thread is not alive: ', thread.is_alive()
It would probably make sense to not poll like I'm doing above. In which case, I would change the code to this:
import os
import threading
import time
import urllib2
class ImageDownloader(threading.Thread):
def __init__(self, function_that_downloads):
threading.Thread.__init__(self)
self.runnable = function_that_downloads
def run(self):
self.runnable()
def downloads():
with open('somefile.html', 'w+') as f:
try:
f.write(urllib2.urlopen('http://google.com').read())
except urllib2.HTTPError:
f.write('sorry no dice')
print 'hi there user'
print 'how are you today?'
thread = ImageDownloader(downloads)
thread.start()
# show message
thread.join()
# display image
Notice that there's no daemon flag set here.
I prefer to use gevent for this sort of thing:
import gevent
from gevent import monkey; monkey.patch_all()
greenlet = gevent.spawn( function_to_download_image )
display_message()
# ... perhaps interaction with the user here
# this will wait for the operation to complete (optional)
greenlet.join()
# alternatively if the image display is no longer important, this will abort it:
#greenlet.kill()
Everything runs in one thread, but whenever a kernel operation blocks, gevent switches contexts when there are other "greenlets" running. Worries about locking, etc are much reduced, as there is only one thing running at a time, yet the image will continue to download whenever a blocking operation executes in the "main" context.
Depending on how much, and what kind of thing you want to do in the background, this can be either better or worse than threading-based solutions; certainly, it is much more scaleable (ie you can do many more things in the background), but that might not be of concern in the current situation.
import threading
import os
def killme():
if keyboard.read_key() == "q":
print("Bye ..........")
os._exit(0)
threading.Thread(target=killme, name="killer").start()
If you want to add more keys, add defs and threading.Thread(target=killme, name="killer").start() lines multiple times. It looks bad but works much better than complex codes.

Cannot kill a loading animation when using multiprocessing

I'm trying to use multiprocessing to run multiple scripts. At the start, I launch a loading animation, however I am unable to ever kill it. Below is an example...
Animation: foo.py
import sys
import time
import itertools
# Simple loading animation that runs infinitely.
for c in itertools.cycle(['|', '/', '-', '\\']):
sys.stdout.write('\r' + c)
sys.stdout.flush()
time.sleep(0.1)
Useful script: bar.py
from time import sleep
# Stand-in for a script that does something useful.
sleep(5)
Attempt to run them both:
import multiprocessing
from multiprocessing import Process
import subprocess
pjt_dir = "/home/solebay/path/to/project" # Setup paths..
foo_path = pjt_dir + "/foo.py" # ..
bar_path = pjt_dir + "/bar.py" # ..
def run_script(path): # Simple function that..
"""Launches python scripts.""" # ..allows me to set a..
subprocess.run(["python", path]) # ..script as a process.
foo_p = Process(target=run_script, args=(foo_path,)) # Define the processes..
bar_p = Process(target=run_script, args=(bar_path,)) # ..
foo_p.start() # start loading animation
bar_p.start() # start 'useful' script
bar_p.join() # Wait for useful script to finish executing
foo_p.kill() # Kill loading animation
I get no error messages, and (my_venv) solebay#computer:~$ comes up in my terminal, but the loading animation persists (clipping over my name and environement). How can I kill it?
I've run into a similar situation before where I couldn't terminate the program using ctrl + c. The issue is (more or less) solved by using daemonic processes/threads (see multiprocessing doc). To do this, you simply change
foo_p = Process(target=run_script, args=(foo_path,))
to
foo_p = Process(target=run_script, args=(foo_path,), daemon=True)
and similarly for other children processes that you would like to create.
With that being said, I myself am not exactly sure if this is the correct way to remedy the issue with not being able to terminate the multiprocessing program, or is it just some artifact that happens to help with this. I would suggest this thread that went into the discussion about daemon threads more. But essentially, from my understanding, daemon threads would be terminated automatically whenever their parent process is terminated, regardless of whether they are finished or not. Meanwhile, if a thread is not daemonic, then somehow you need to wait until the children processes to finish before you're able to fully terminate the program.
You are creating too many processes. These two lines:
foo_p = Process(target=run_script, args=(foo_path,)) # Define the processes..
bar_p = Process(target=run_script, args=(bar_path,)) # ..
create two new processes. Let's all them "A" and "B". Each process consists of this function:
def run_script(path): # Simple function that..
"""Launches python scripts.""" # ..allows me to set a..
subprocess.run(["python", path]) # ..script as a process.
which then creates another subprocess. Let's call those two processes "C" and "D". In all you have created 4 extra processes, instead of just the 2 that you need. It is actually process "C" that's producing the output on the terminal. This line:
bar_p.join()
waits for "B" to terminate, which implies that "D" has terminated. But this line:
foo_p.kill()
kills process "A" but orphans process "C". So the output to the terminal continues forever.
This is well documented - see the description of multiprocessing.terminate, which says:
"Note that descendant processes of the process will not be terminated – they will simply become orphaned."
The following program works as you intended, exiting gracefully from the second process after the first one has finished. (I renamed "foo.py" to useless.py and "bar.py" to useful.py, and made small changes so I could run it on my computer.)
import subprocess
import os
def run_script(name):
s = os.path.join(r"c:\pyproj310\so", name)
return subprocess.Popen(["py", s])
if __name__ == "__main__":
useless_p = run_script("useless.py")
useful_p = run_script("useful.py")
useful_p.wait() # Wait for useful script to finish executing
useless_p.kill() # Kill loading animation
You can't use subprocess.run() to launch the new processes since that function will block the main script until the process completes. So I used Popen instead. Also I placed the running code under an if __name__ == "__main__" which is good practice (and maybe necessary on Windows).

Reducing cpu usage in python multiprocessing without sacrificing responsiveness

I have a multiprocessing programs in python, which spawns several sub-processes and manages them (restarting them if the children identify problems, etc). Each subprocess is unique and their setup depends on a configuration file. The general structure of the master program is:
def main():
messageQueue = multiprocessing.Queue()
errorQueue = multiprocessing.Queue()
childProcesses = {}
for required_children in configuration:
childProcesses[required_children] = MultiprocessChild(errorQueue, messageQueue, *args, **kwargs)
for child_process in ChildProcesses:
ChildProcesses[child_process].start()
while True:
if local_uptime > configuration_check_timer: # This is to check if configuration file for processes has changed. E.g. check every 5 minutes
reload_configuration()
killChildProcessIfConfigurationChanged()
relaunchChildProcessIfConfigurationChanged()
# We want to relaunch error processes immediately (so while statement)
# Errors are not always crashes. Sometimes other system parameters change that require relaunch with different, ChildProcess specific configurations.
while not errorQueue.empty():
_error_, _childprocess_ = errorQueue.get()
killChildProcess(_childprocess_)
relaunchChildProcess(_childprocess)
print(_error_)
# Messages are allowed to lag if a configuration_timer is going to trigger or errorQueue gets something (so if statement)
if not messageQueue.empty():
print(messageQueue.get())
Is there a way to prevent the contents of the infinite while True loop take up 100pct CPU. If I add a sleep event at the end of the loop (e.g. sleep for 10s), then errors will take 10s to correct, ans messages will take 10s to flush.
If on the other hand, there was a way to have a time.sleep() for the duration of the configuration_check_timer, while still running code if messageQueue or errorQueue get stuff inside them, that would be nice.

Python 'print' in a c++ based threading model

I am designing a Python app by calling a C++ DLL, I have posted my interaction between my DLL and Python 3.4 here. But now I need to do some process in streaming involving a threading based model and my callback function looks to put in a queue all the prints and only when my streaming has ended, all the Info is printed.
def callbackU(OutList, ConList, nB):
for i in range(nB):
out_list_item = cast(OutList[i], c_char_p).value
print("{}\t{}".format(ConList[i], out_list_item))
return 0
I have tried to use the next ways, but all of them looks to work in the same way:
from threading import Lock
print_lock = Lock()
def save_print(*args, **kwargs):
with print_lock:
print (*args, **kwargs)
def callbackU(OutList, ConList, nB):
for i in range(nB):
out_list_item = cast(OutList[i], c_char_p).value
save_print(out_list_item))
return 0
and:
import sys
def callbackU(OutList, ConList, nB):
for i in range(nB):
a = cast(OutList[i], c_char_p).value
sys.stdout.write(a)
sys.stdout.flush()
return 0
I would like that my callback prints its message when the it is called, not when the whole process ends.
I can find what was the problem, I am using a thread based process that needs to stay for an indefinite time before end it. In c++ I'm using getchar() to wait until the process has to be ended, then when I pushed the enter button the process jump to the releasing part. I also tried to use sleep()s of 0.5 secs in a while until a definite time has passed to test if that could help me, but it didn't. Both methods worked in the same way in my Python application, the values that I needed to have in streaming were put in a queue first and unless the process ended that values were printed.
The solution was to make two functions, the former one for initialize the thread based model. And the last one function for ends the process. By so doing I didn't need a getchar() neither a sleep(). This works pretty good to me!, thanks for you attention!

background function in Python

I've got a Python script that sometimes displays images to the user. The images can, at times, be quite large, and they are reused often. Displaying them is not critical, but displaying the message associated with them is. I've got a function that downloads the image needed and saves it locally. Right now it's run inline with the code that displays a message to the user, but that can sometimes take over 10 seconds for non-local images. Is there a way I could call this function when it's needed, but run it in the background while the code continues to execute? I would just use a default image until the correct one becomes available.
Do something like this:
def function_that_downloads(my_args):
# do some long download here
then inline, do something like this:
import threading
def my_inline_function(some_args):
# do some stuff
download_thread = threading.Thread(target=function_that_downloads, name="Downloader", args=some_args)
download_thread.start()
# continue doing stuff
You may want to check if the thread has finished before going on to other things by calling download_thread.isAlive()
Typically the way to do this would be to use a thread pool and queue downloads which would issue a signal, a.k.a an event, when that task has finished processing. You can do this within the scope of the threading module Python provides.
To perform said actions, I would use event objects and the Queue module.
However, a quick and dirty demonstration of what you can do using a simple threading.Thread implementation can be seen below:
import os
import threading
import time
import urllib2
class ImageDownloader(threading.Thread):
def __init__(self, function_that_downloads):
threading.Thread.__init__(self)
self.runnable = function_that_downloads
self.daemon = True
def run(self):
self.runnable()
def downloads():
with open('somefile.html', 'w+') as f:
try:
f.write(urllib2.urlopen('http://google.com').read())
except urllib2.HTTPError:
f.write('sorry no dice')
print 'hi there user'
print 'how are you today?'
thread = ImageDownloader(downloads)
thread.start()
while not os.path.exists('somefile.html'):
print 'i am executing but the thread has started to download'
time.sleep(1)
print 'look ma, thread is not alive: ', thread.is_alive()
It would probably make sense to not poll like I'm doing above. In which case, I would change the code to this:
import os
import threading
import time
import urllib2
class ImageDownloader(threading.Thread):
def __init__(self, function_that_downloads):
threading.Thread.__init__(self)
self.runnable = function_that_downloads
def run(self):
self.runnable()
def downloads():
with open('somefile.html', 'w+') as f:
try:
f.write(urllib2.urlopen('http://google.com').read())
except urllib2.HTTPError:
f.write('sorry no dice')
print 'hi there user'
print 'how are you today?'
thread = ImageDownloader(downloads)
thread.start()
# show message
thread.join()
# display image
Notice that there's no daemon flag set here.
I prefer to use gevent for this sort of thing:
import gevent
from gevent import monkey; monkey.patch_all()
greenlet = gevent.spawn( function_to_download_image )
display_message()
# ... perhaps interaction with the user here
# this will wait for the operation to complete (optional)
greenlet.join()
# alternatively if the image display is no longer important, this will abort it:
#greenlet.kill()
Everything runs in one thread, but whenever a kernel operation blocks, gevent switches contexts when there are other "greenlets" running. Worries about locking, etc are much reduced, as there is only one thing running at a time, yet the image will continue to download whenever a blocking operation executes in the "main" context.
Depending on how much, and what kind of thing you want to do in the background, this can be either better or worse than threading-based solutions; certainly, it is much more scaleable (ie you can do many more things in the background), but that might not be of concern in the current situation.
import threading
import os
def killme():
if keyboard.read_key() == "q":
print("Bye ..........")
os._exit(0)
threading.Thread(target=killme, name="killer").start()
If you want to add more keys, add defs and threading.Thread(target=killme, name="killer").start() lines multiple times. It looks bad but works much better than complex codes.

Categories

Resources