Python Code Coverage and Multiprocessing - python

I use coveralls in combination with coverage.py to track python code coverage of my testing scripts. I use the following commands:
coverage run --parallel-mode --source=mysource --omit=*/stuff/idont/need.py ./mysource/tests/run_all_tests.py
coverage combine
coveralls --verbose
This works quite nicely with the exception of multiprocessing. Code executed by worker pools or child processes is not tracked.
Is there a possibility to also track multiprocessing code? Any particular option I am missing? Maybe adding wrappers to the multiprocessing library to start coverage every time a new process is spawned?
EDIT:
I (and jonrsharpe, also :-) found a monkey-patch for multiprocessing.
However, this does not work for me, my Tracis-CI build is killed almost right after the start. I checked the problem on my local machine and apparently adding the patch to multiprocessing busts my memory. Tests that take much less than 1GB of memory need more than 16GB with this fix.
EDIT2:
The monkey-patch does work after a small modification: Removing
the config_file parsing (config_file=os.environ['COVERAGE_PROCESS_START']) did the trick. This solved the issue of the bloated memory. Accordingly, the corresponding line simply becomes:
cov = coverage(data_suffix=True)

Coverage 4.0 includes a command-line option --concurrency=multiprocessing to deal with this. You must use coverage combine afterward. For instance, if your tests are in regression_tests.py, then you would simply do this at the command line:
coverage run --concurrency=multiprocessing regression_tests.py
coverage combine

I've had spent some time trying to make sure coverage works with multiprocessing.Pool, but it never worked.
I have finally made a fix that makes it work - would be happy if someone directed me if I am doing something wrong.
https://gist.github.com/andreycizov/ee59806a3ac6955c127e511c5e84d2b6

One of the possible causes of missing coverage data from forked processes, even with concurrency=multiprocessing, is the way of multiprocessing.Pool shutdown. For example, with statement leads to terminate() call (see __exit__ here). As a consequence, pool workers have no time to save coverage data. I had to use close(), timed join() (in a thread), terminate sequence instead of with to get coverage results saved.

Related

subprocess.run() completes after first step of nextflow pipeline

I wrote a nextflow workflow that performs five steps. I have done this to be used by me and my colleagues, but not everyone is skilled with nextflow. Therefore, I decided to write a small wrapper in python3 that could run it for them.
The wrapper is very simple, reads the options with argparse and creates a command to be run with subprocess.run(). The issue with the wrapper is that, once the first step of the pipeline is completed, subprocess.run() thinks that the process is over.
I tried using shell=True, I tried using subprocess.Popen() with a while waiting for an output file, but it won't solve it.
How can I tell either subprocess.run() to wait until the end-end, or to nextflow run not to emit any exit code until the last step? Is there a way, or am I better off giving my colleagues a NF tutorial instead?
EDIT:
The reason why I prefer the wrapper, is that nextflow creates lots of temporary files which one has to know how to clean up. The wrapper does it for them, saving disk space and my time.
The first part of your question is a bit tricky to answer without the details, but we know subprocess.run() should wait for the command specified to complete. If your nextflow command is actually exiting before all of your tasks/steps have completed, then there could be a problem with the workflow or with the version of Nextflow itself. Since this occurs after the first process completes, I would suspect the former. My guess is that there might be some plumbing issue somewhere. For example, if your second task/step definition is conditional in any way then this could allow an early exit from your workflow.
I would avoid the wrapper here. Running Nextflow pipelines should be easy, and the documentation that accompanies your workflow should be sufficient to get it up and running quickly. If you need to set multiple params on the command line, you could include one or more configuration profiles to make it easy for your colleagues to get started running it. The section on pipeline sharing is also worth reading if you haven't seen it already. If the workflow does create lots of temporary files, just ensure these are all written to the working directory. So upon successful completion, all you need to clean up should be a simple rm -rf ./work. I tend to avoid automating destructive commands like this to avoid accidental deletes. A line in your workflow's documentation to say that the working directory can be removed (following successful completion of the pipeline) should be sufficient in my opinion and just leave it up to the users to clean up after themselves.
EDIT: You may also be interested in this project: https://github.com/goodwright/nextflow.py

Starting process in Google Colab with Prefix "!" vs. "subprocess.Popen(..)"

I've been using Google Colab for a few weeks now and I've been wondering what the difference is between the two following commands (for example):
!ffmpeg ...
subprocess.Popen(['ffmpeg', ...
I was wondering because I ran into some issues when I started either of the commands above and then tried to stop execution midway. Both of them cancel on KeyboardInterrupt but I noticed that after that the runtime needs a factory reset because it somehow got stuck. Checking ps aux in the Linux console listed a process [ffmpeg] <defunct> which somehow still was running or at least blocking some ressources as it seemed.
I then did some research and came across some similar posts asking questions on how to terminate a subprocess correctly (1, 2, 3). Based on those posts I generally came to the conclusion that using the subprocess.Popen(..) variant obviously provides more flexibility when it comes to handling the subprocess: Defining different stdout procedures or reacting to different returncode etc. But I'm still unsure on what the first command above using the ! as prefix exactly does under the hood.
Using the first command is much easier and requires way less code to start this process. And assuming I don't need a lot of logic handling the process flow it would be a nice way to execute something like ffmpeg - if I were able to terminate it as expected. Even following the answers from the other posts using the 2nd command never got me to a point where I could terminate the process fully once started (even when using shell=False, process.kill() or process.wait() etc.). This got me frustrated, because restarting and re-initializing the Colab instance itself can take several minutes every time.
So, finally, I'd like to understand in more general terms what the difference is and was hoping that someone could enlighten me. Thanks!
! commands are executed by the notebook (or more specifically by the ipython interpreter), and are not valid Python commands. If the code you are writing needs to work outside of the notebook environment, you cannot use ! commands.
As you correctly note, you are unable to interact with the subprocess you launch via !; so it's also less flexible than an explicit subprocess call, though similar in this regard to subprocess.call
Like the documentation mentions, you should generally avoid the bare subprocess.Popen unless you specifically need the detailed flexibility it offers, at the price of having to duplicate the higher-level functionality which subprocess.run et al. already implement. The code to run a command and wait for it to finish is simply
subprocess.check_call(['ffmpeg', ... ])
with variations for capturing its output (check_output) and the more modern run which can easily replace all three of the legacy high-level calls, albeit with some added verbosity.

I need to execute test cases in robot file in parallel

As there are huge number of test cases,it takes lot of time to complete the entire execution.I am not using selenium. Is there any way that can help to achieve parallel execution in robot framework or python .Any example would be of great help.
Pabot is likely what you're looking for. Though you should know that this will not magically make your tests thread-safe. In other words Pabot can only help you with the execution part, but your test cases will need to be designed with parallelization in mind. For example, test cases that make changes to a database or edit a global file may not be parallelization-friendly and will need to be redesigned with parallelization in mind.
PabotLib can help you design thread-safe test cases when needed.
See pabot here : https://github.com/mkorpela/pabot
First install pabot:
pip3 install -U robotframework-pabot
Then you can run rest under case directory in parallel as simple as:
pabot --testlevelsplit case

Timeout on tests with nosetests

I'm setting up my nosetests environment but can't seem to get the timeout to work properly. I would like to have an x second (say 2) timeout on each test discovered by nose.
I tried the following:
nosetests --processes=-1 --process-timeout=2
This works just fine but I noticed the following:
Parallel testing takes longer for a few simple testcases
Nose does not report back when a test has timed out (and thus failed)
Does anyone know how I can get such a timeout to work? I would prefer it to work without parallel testing but this would not be an issue as long as I get the feedback that a test has timed out.
I do not know if this will make your life easier, but there is a similar functionality in nose.tools that will fail on timeout, and you do not have to have parallel testing for it:
from nose.tools import timed
#timed(2)
def test_a():
sleep(3)
You can probably auto decorate all your tests in a module using a script/plugin, if manually adding an attribute is an issue, but I personally prefer clarity over magic.
By looking through the Lib/site-packages/nose/plugins/multiprocess.py source it looks like process-timeout option that you are using is somewhat specific to managing "hanging" subprocesses that may be preventing test from completion.

How to get around memory allocation error in python nosetest?

I have a python script to allocate a huge memory space and eventually ended up overflow. Is there anyway nosetests can peacefully handle this?
Unfortunately the only way to survive such a thing would be to have your test fixture run that particular test in a subcommand using subprocess.Popen(), and capture its output and error code so that you can see the nonzero error code and “out of memory” traceback that result. Note that sys.executable is the full path to the current Python executable, if that helps you build a Popen() command line to run Python on the little test script that runs out of memory.
Once a process it out of memory, there is typically no way to recover, because nearly anything it might try to do — format a string to print out, for example — takes even more memory which is, by definition, now exhausted. :)

Categories

Resources