improving performance of behave tests

improving performance of behave tests - python

We run behave BDD tests in our pipeline. We run the tests in the docker container as part of the jenkins pipeline. Currently it takes ~10 minutes to run all the tests. We are adding a lot of tests and in few months, it might go upto 30 minutes. It is outputting a lot of information. I believe that if I reduce the amount of information it outputs, I can get the tests to run faster. Is there a way to control the amount of information behave outputs? I want to print the information only if something fails.
I took a look at behave-parallel. Looks like it is in python 2.7. We are in python3.
I was looking at various options behave provides.
behave -verbose=false folderName (I assumed that it will not output all the steps)
behave --logging-level=ERROR TQXYQ (I assumed it will print only if there is an error)
behave --logging-filter="Test Step" TQXYQ (I assumed it will print only the tests that has "Test Step" in it)
None of the above worked.
The current output looks like this
Scenario Outline: IsError is populated correctly based on Test Id -- #1.7 # TestName/Test.feature:187
Given the test file folder is set to /TestName/steps/ # common/common_steps.py:22 0.000s
And Service is running # common/common_steps.py:10 0.000s
Given request used is current.json # common/common_steps.py:26 0.000s
And request is modified to set X to q of type str # common/common_steps.py:111 0.000s
And request is modified to set Y to null of type str # common/common_steps.py:111 0.000s
And request is modified to set Z to USD of type str # common/common_steps.py:111 0.000s
When make a modification request # common/common_steps.py:37 0.203s
Then it returns 200 status code # common/common_steps.py:47 0.000s
And transformed result has IsError with 0 of type int # common/common_steps.py:92 0.000s
And transformed result has ErrorMessages contain [] # common/common_steps.py:52 0.000s
I want to print only all these things only if there is an error. If everything is passing, I don't want to display this information.

I think the default log level INFO will not impact the performance of your tests.
I am also using docker container to run the regression suite and it takes about 2 hours to run 2300 test scenarios. It took nearly a day before and here is what I did :
1. Run all test suite parallel.
This is the most important reason that will reduce the execution time.
We spent a lot of efforts to turn the regression suite to be parallel-able.
- make atomic, autonomous and independent tests so that you can run all your tests in parallel effectively.
- create a parallel runner to run tests on multiple processes. I am using multiprocessing and subprocessing libraries to do this.
I would not recommend behave-parallel because it is no longer active supported.
You can refer to this link :
http://blog.crevise.com/2018/02/executing-parallel-tests-using-behave.html?m=1
- using Docker Swarm to add more nodes into Selenium Grid.
You can scale up to add more nodes and the maximum numbers of nodes depend on the number of cpus. The best practice is number of node = number of cpu.
I have 4 PCs , each has 4 cores so I can scale up to 1 hub and 15 nodes.
2. Optimize synchronization in your framework.
Remove time.sleep()
Remove implicitly wait. Use explicitly wait instead.
Hope it helps.

Well I have solved this in a traditional way, but I m not sure how effective it could be. I just started this yesterday and now trying to work on building the reports out of it. Approach as below, suggestions welcome
this solves the parallel execution at the example driven as well.
parallel_behave.py
Run command (mimics all the params of behave command)
py parallel_behave.py -t -d -f ......
initial_command = 'behave -d -t <tags>'
'''
the above command returns the eligible cases. may not be the right approach, but works well for me
'''
r = subprocess.Popen(initial_command.split(' '), stdout=subprocess.PIPE, bufsize=0)
finalsclist = []
_tmpstr=''
for out in r.stdout:
out = out.decode('utf-8')
# print(out.decode('utf-8'))
if shellout.startswith('[') :
_tmpstr+=out
if shellout.startswith('{') :
_tmpstr+=out
if shellout.startswith(']'):
_tmpstr+=out
break
scenarionamedt = json.loads(_tmpstr)
for sc in scenarionamedt:
[finalsclist.append(s['name']) for s in sc['elements']]
now the finalsclist contains the scenario name
ts = int(timestamp.timestamp)
def foo:
cmd = "behave -n '{}' -o ./report/output{}.json".format(scenarioname,ts)
pool = Pool(<derive based on the power of the processor>)
pool.map(foo, finalsclist)
this will create that many processes of individual behave calls and generate the json output under report folder
*** there was a reference from https://github.com/hugeinc/behave-parallel but this is at the feature level. I just extended it to the scenarios and example ****

Related

numba or dask: parallel to call bash script by python

I am a beginner in the field of python parallel computation. I want to parallize one part of my codes which consists for-loops. By default, my code run a time for loop for 3 years. In each day, my code calls a bash script run_offline.sh and run it 8 times. Each time the bash script is given different input data indexing by the loop id. Here is the main part of my python codes demo.py:
import os
import numpy as np
from dateutil.rrule import rrule, HOURLY, DAILY
import datetime
import subprocess
...
start_date = datetime.datetime(2017, 1, 1, 10, 0, 0)
end_date = datetime.datetime(2019, 12, 31, 10, 0, 0)
...
loop_id = 0
for date_assim in rrule(freq=HOURLY,
dtstart=start_date,
interval=time_delta,
until=end_date):
RESDIR='./results/'
TYP='experiment_1'
END_ID = 8
YYYYMMDDHH = date_assim.strftime('%Y%m%d%H')
p1 = subprocess.Popen(['./run_offline.sh', str(END_ID), str(loop_id), str(YYYYMMDDHH), RESDIR, TYP])
p1.wait()
#%%
# p1 creates 9 files from RESULTS_0.nc to RESULTS_8.nc
# details please see the bash script attached below
# following are codes computing based on RESULTS_${CID}.nc, CID from 0 to 8.
# In total 9 files are generated by p1 and used later.
loop_id += 1
And the ./run_offline.sh runs an atmsopheric model offline.exe 9 times which follows:
#!/bin/bash
# Usage: ./run_offline.sh END_ID loop_id YYYYMMDDHH RESDIR TYP
END_ID=${1:-1}
loop_id=${2:-1}
YYYYMMDDHH=${3:-1}
RESDIR=${4:-1}
TYP=${5:-1}
END_ID=`echo $((END_ID))`
loop_id=`echo $((loop_id))`
CID=0
ln -sf PREP_0.nc PREP.nc # one of the input file required. Must named by PREP.nc
while [ $CID -le $END_ID ]; do
cp -f ./OPTIONS.nam_${CID} ./OPTIONS.nam # one of the input file required by offline.exe
# different ./OPTIONS.nam_${CID} has different index of a perturbation.
# Say ./OPTIONS.nam_1 lets the offline.exe knows it should perturb the first variable in the atmospheric model,
# ./OPTIONS.nam_2 perturbs the second variable...
./offline.exe
cp RESULTS1.nc RESULTS_${CID}.OUT.nc # for next part of python codes in demo.py
mv RESULTS2.nc $RESDIR/$TYP/RESULTS2_${YYYYMMDDHH}.nc # store this file in my results dir
CID=$((CID+1))
done
Now I found the for-loop of offline.exe is super time-consuming. It's around 10-20s each time I called run_offline.sh (running ./offline.exe 9 times costs 10-20s). In total it costs 15s*365*3=4.5hours on average, if I want to run my scripts for 3 years...So can I parallize the loop of offline.exe? say assign the different run of different CID to different core/subprocess in the server. But one should note that two input files OPTIONS.nam and PREP.nc are forced to name as same names when we run offline.exe each time....which means we cannot use OPTIONS.nam_x for loop x. So can I use dask or numba to help this parallelization? Thanks!

If I understand your problem correctly, you run ~1000 times a bash script which runs 8~9 time a black-box executable and this executable is the main bottleneck.
So can I parallize the loop of offline.exe?
This is hard to say due to the executable being a black box. You need to check the input/output/temporary data required by the program. For example, if the program store temporary file somewhere in the storage device, then calling it in parallel will result in a race condition. Besides, you can only call it in parallel if the computational parts are fully independent. A dataflow analysis is very useful to know whether you can parallelize an application (especially when it is composed of multiple programs).
Additionally, you need to check if the program is already parallel or not. Running in parallel multiple parallel programs generally results in a much slower execution due to a large amount of thread to schedule, poor cache usage, bad synchronization patterns, etc.
In your case, I think the best option would be to parallelize the program run in the loop (ie. offline.exe). Otherwise, if the program is sequential and can be parallelised (see above), then you can run multiple processes using & in bash and then wait them in the end of the loop. Alternatively you can use GNU parallel.
But one should note that two input files OPTIONS.nam and PREP.nc are forced to name as same names when we run offline.exe each time
This can be solved by calling the N program from N distinct working directories in parallel. This is actually safer if the program creates temporary files in its working directory. You need to move/copy the files before the parallel execution and certainly after.
If the files OPTIONS.nam and/or PREP.nc are modified by the program, then it means the computation is completely sequential and cannot be parallelized (I assume the computation of each day is dependent of the previous one as this is a very common pattern in scientific simulations).
So can I use dask or numba to help this parallelization?
No. Dask and Numba are not mean to be used in this context. They are designed to operate on Numpy array in a Python code. The part you want to parallelize is in bash and the parallelized program is apparently not even written in Python.

If you cannot make offline.exe use other names than OPTIONS.nam and RESULTS1.nc, you will need to make sure that the parallel instances do not overwrite eachother.
One way to do this is to make a dir for each run:
#!/bin/bash
# Usage: ./run_offline.sh END_ID loop_id YYYYMMDDHH RESDIR TYP
END_ID=${1:-1}
loop_id=${2:-1}
YYYYMMDDHH=${3:-1}
RESDIR=${4:-1}
TYP=${5:-1}
END_ID=`echo $((END_ID))`
loop_id=`echo $((loop_id))`
doit() {
mkdir $1
cd $1
ln -sf ../PREP_$1.nc PREP.nc
cp -f ./OPTIONS.nam_$1 ./OPTIONS.nam # one of the input file required by
../offline.exe
cp RESULTS1.nc ../RESULTS_$1.OUT.nc # for next part of python codes in demo.py
mv RESULTS2.nc $RESDIR/$TYP/RESULTS2_${YYYYMMDDHH}.nc # store this file in my results dir
}
export -f doit
export RESDIR YYYYMMDDHH TYP
seq 0 $END_ID | parallel doit

How to customize the passed test message in a dynamic way

I might be guilty of using pytest in a way I'm not supposed to, but let's say I want to generate and display some short message on successful test completion. Like I'm developing a compression algorithm, and instead of "PASSED" I'd like to see "SQUEEZED BY 65.43%" or something like that. Is this even possible? Where should I start with the customization, or maybe there's a plugin I might use?
I've stumbled upon pytest-custom-report, but it provides only static messages, that are set up before tests run. That's not what I need.

I might be guilty of using pytest in a way I'm not supposed to
Not at all - this is exactly the kind of use cases the pytest plugin system is supposed to solve.
To answer your actual question: it's not clear where the percentage value comes from. Assuming it is returned by a function squeeze(), I would first store the percentage in the test, for example using the record_property fixture:
from mylib import squeeze
def test_spam(record_property):
value = squeeze()
record_property('x', value)
...
To display the stored percentage value, add a custom pytest_report_teststatus hookimpl in a conftest.py in your project or tests root directory:
# conftest.py
def pytest_report_teststatus(report, config):
if report.when == 'call' and report.passed:
percentage = dict(report.user_properties).get('x', float("nan"))
short_outcome = f'{percentage * 100}%'
long_outcome = f'SQUEEZED BY {percentage * 100}%'
return report.outcome, short_outcome, long_outcome
Now running test_spam in default output mode yields
test_spam.py 10.0% [100%]
Running in the verbose mode
test_spam.py::test_spam SQUEEZED BY 10.0% [100%]

How to add more metrics to a finished MLflow run?

Once an MLflow run is finished, external scripts can access its parameters and metrics using python mlflow client and mlflow.get_run(run_id) method, but the Run object returned by get_run seems to be read-only.
Specifically, .log_param .log_metric, or .log_artifact cannot be used on the object returned by get_run, raising errors like these:
AttributeError: 'Run' object has no attribute 'log_param'
If we attempt to run any of the .log_* methods on mlflow, it would log them into to a new run with auto-generated run ID in the Default experiment.
Example:
final_model_mlflow_run = mlflow.get_run(final_model_mlflow_run_id)
with mlflow.ActiveRun(run=final_model_mlflow_run) as myrun:
# this read operation uses correct run
run_id = myrun.info.run_id
print(run_id)
# this write operation writes to a new run
# (with auto-generated random run ID)
# in the "Default" experiment (with exp. ID of 0)
mlflow.log_param("test3", "This is a test")
Note that the above problem exists regardless of the Run status (.info.status can be both "FINISHED" or "RUNNING", without making any difference).
I wonder if this read-only behavior is by design (given that immutable modeling runs improve experiments reproducibility)? I can appreciate that, but it also goes against code modularity if everything has to be done within a single monolith like the with mlflow.start_run() context...

As it was pointed out to me by Hans Bambel and as it is documented here mlflow.start_run (in contrast to mlflow.ActiveRun) accepts the run_id parameter of an existing run.
Here's an example tested to work in v1.13 through v1.19 - as you see one can even overwrite an existing metric to correct a mistake:
with mlflow.start_run(run_id=final_model_mlflow_run_id):
# print(mlflow.active_run().info)
mlflow.log_param("start_run_test", "This is a test")
mlflow.log_metric("start_run_test", 1.23)
mlflow.log_metric("start_run_test", 1.33)
mlflow.log_artifact("/home/jovyan/_tmp/formula-features-20201103.json", "start_run_test")

How to link interactive problems (w.r.t. CodeJam)?

I'm not sure if it's allowed to seek for help(if not, I don't mind not getting an answer until the competition period is over).
I was solving the Interactive Problem (Dat Bae) on CodeJam. On my local files, I can run the judge (testing_tool.py) and my program (<name>.py) separately and copy paste the I/O manually. However, I assume I need to find a way to make it automatically.
Edit: To make it clear, I want every output of x file to be input in y file and vice versa.
Some details:
I've used sys.stdout.write / sys.stdin.readline instead of print / input throughout my program
I tried running interactive_runner.py, but I don't seem to figure out how to use it.
I tried running it on their server, my program in first tab, the judge file in second. It's always throwing TLE error.
I don't seem to find any tutorial to do the same either, any help will be appreciated! :/

The usage is documented in comments inside the scripts:
interactive_runner.py
# Run this as:
# python interactive_runner.py <cmd_line_judge> -- <cmd_line_solution>
#
# For example:
# python interactive_runner.py python judge.py 0 -- ./my_binary
#
# This will run the first test set of a python judge called "judge.py" that
# receives the test set number (starting from 0) via command line parameter
# with a solution compiled into a binary called "my_binary".
testing_tool.py
# Usage: `testing_tool.py test_number`, where the argument test_number
# is 0 for Test Set 1 or 1 for Test Set 2.
So use them like this:
python interactive_runner.py python testing_tool.py 0 -- ./dat_bae.py

How to run the python unittest N number of times

I have a python unittest like below , I want to run this whole Test N Number of times
class Test(TestCase)
def test_0(self):
.........
.........
.........
Test.Run(name=__name__)
Any Suggestions?

You can use parameterized tests. There are different modules to do that. I use nose to run my unittests (more powerful than the default unittest module) and there's a package called nose-parameterized that allows you to write a factory test and run it a number of times with different values for variables you want.
If you don't want to use nose, there are several other options for running parameterized tests.
Alternatively you can execute any number of test conditions in a single test (as soon as one fails the test will report error). In your particular case, maybe this makes more sense than parameterized tests, because in reality it's only one test, only that it needs a large number of runs of the function to get to some level of confidence that it's working properly. So you can do:
import random
class Test(TestCase)
def test_myfunc(self):
for _ in range(100):
input = random.random()
self.assertEquals(input, input + 2)
Test.Run(name=__name__)

Why because... the test_0 method contains a random option.. so each time it runs it selects random number of configuration and tests against those configurations. so I am not testing the same thing multiple times..
Randomness in a test makes it non-reproducible. One day you might get 1 failure out of 100, and when you run it again, it’s already gone.
Use a modern testing tool to parametrize your test with a sequential number, then use random.seed to have a random but reproducible test case for each number in a sequence.
portusato suggests nose, but pytest is a more modern and popular tool:
import random, pytest
#pytest.mark.parametrize('i', range(100))
def test_random(i):
orig_state = random.getstate()
try:
random.seed(i)
data = generate_random_data()
assert my_algorithm(data) == works
finally:
random.setstate(orig_state)
pytest.mark.parametrize “explodes” your single test_random into 100 individual tests — test_random[0] through test_random[99]:
$ pytest -q test.py
....................................................................................................
100 passed in 0.14 seconds
Each of these tests generates different, random, but reproducible input data to your algorithm. If test_random[56] fails, it will fail every time, so you will then be able to debug it.

If you don't want your test to stop after the first failure, you can use subTest.
class Test(TestCase):
def test_0(self):
for i in [1, 2, 3]:
with self.subTest(i=i):
self.assertEqual(squared(i), i**2)
Docs

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.