I need to execute test cases in robot file in parallel - python

As there are huge number of test cases,it takes lot of time to complete the entire execution.I am not using selenium. Is there any way that can help to achieve parallel execution in robot framework or python .Any example would be of great help.

Pabot is likely what you're looking for. Though you should know that this will not magically make your tests thread-safe. In other words Pabot can only help you with the execution part, but your test cases will need to be designed with parallelization in mind. For example, test cases that make changes to a database or edit a global file may not be parallelization-friendly and will need to be redesigned with parallelization in mind.
PabotLib can help you design thread-safe test cases when needed.

See pabot here : https://github.com/mkorpela/pabot
First install pabot:
pip3 install -U robotframework-pabot
Then you can run rest under case directory in parallel as simple as:
pabot --testlevelsplit case

Related

Run Unittests in Python with highlighting of Keywords in Terminal (determined during the run)

I'm administrating a quite a huge amount of Python Unittests and as sometimes the Stacktraces are weirdly long I had one thought on optimizing the outputs in order to see the file I'm interested faster.
Currently I am running tests with ack:
python3 unittests.py | ack --passthru 'GraphTests.py|Versioning/Testing/Tests.py'
Which does work as I desired it. But as the amount of tests keeps growing and I want to keep it dynamic I wanted to read the classes from whatever I've set in the testing suite.
suite.addTest(loader.loadTestsFromModule(UC3))
What would be the best way to accomplish this?
My first thought was to split the unittests up into two files, one loader, one caller. The loader adds all the unittests as before and executes the caller.py with an ack, including the list of files.
Is that a reasonable approach? Are there better ways? I don't think it is possible to fill the ack patterns in after I executed the line.
There's also the idea of just piping the results into a file that I read afterwards, but from my experience until now I am not sure if that will work as planned and not just cause extra-troubles. (I use an experimental unittest-framework that adds colouring during the execution of unittests using format-characters like '\033[91m' - see: https://stackoverflow.com/questions/15580303/python-output-complex-line-with-floats-colored-by-value)
Is there an option I don't see?
Edit (1): Also I forgot to add: Getting into the debugger is a different experience. With ack it doesn't seem to work properly any more

What is the state of the art way to handle what makefiles do for python data analysis?

I have a program that is a DAG which process and cleans certain files, combines them, then does additional calculations. I want a way to run the whole analysis pipeline, and re-run if anything changes, but without having to re-process every single component.
I read about Makefiles and thought that it sounds like the perfect solution. I am also aware that it is probably outdated and that better alternatives probably exist, but I generally only find large lists of work flow scheduler tools that are not quite suited to this purpose, as far as I can tell (e.g., Airflow, Luigi, Nextflow, Dagobah, etc., etc.)
It seems like many of these are overkill with schedulers, GUIs, etc. which I don't really need. I just want one file that does the following:
makes it obvious what all of the python scripts are that need to run
shows file dependencies so that a full re-run will only redo parts where something has been changed upstream
has the potential for some parallelization (not very necessary)
doesn't have too much boilerplate
Makefile example:
.PHONY : dats
dats : isles.dat abyss.dat
isles.dat : books/isles.txt
python countwords.py books/isles.txt isles.dat
abyss.dat : books/abyss.txt
python countwords.py books/abyss.txt abyss.dat
.PHONY : clean
clean :
rm -f *.dat
Is this the best procedure to run something like this in python or is there a better way?
DVC (Data Version Control) includes a modern re-implementation and extension of make that is particularly suited to data-science pipelines (see here).
Handling pipelines in DVC has important benefits over make for many scenarios, such as relying on file checksum rather than modification-time. On the contrary, make is simpler in some sense, and it has a powerful macro mechanism. Still, there are elements in the syntax of makefiles that are quite subtle (e.g., multiple outputs, intermediate files), and make generally doesn't support whitespace in filenames.
Is this the best procedure to run something like this in python or is there a better way?
"Best" is surely in the eye of the beholder. However, if the make-based approach presented in the question is satisfactorily representative of the problem then it is a good way. make implementations are very widely available, and their behavior is well understood and generally well-suited to problems such as the one presented.
There are other build tools that compete with make, some written in Python, and there are undoubtedly some more esoteric software frameworks that could be applied to the task. Nevertheless, if you want to focus on doing the work instead of on building the framework to do the work, then I don't see any reason to look past the make-based solution you already have.
The way you present the question, I would say snakemake is the way to go. Having said that, GNU make may be old but is not going to disappear any time soon and it's been tested and tried to death.
I don't speak make, but I think your example Makefile in snakemake would be something like this:
rule all:
input:
['isles.dat', 'abyss.dat'],
rule make_isles:
input:
'books/isles.txt',
output:
'isles.dat',
shell:
r"""
python countwords.py {input} {output}
"""
rule make_abyss:
input:
'books/abyss.txt',
output:
'abyss.dat',
shell:
r"""
python countwords.py {input} {output}
"""
Save this in a file called Snakefile and execute it as:
snakemake # vanilla execution
snakemake -p -n # Print shell commands (-p). Dry-run mode (-n)
snakemake --delete-all-output # Same-ish as .PHONY clean
snakemake is popular in bioinformatics but it has pretty general purpose.

What is the correct workflow for parallelization: running on a cluster or multiproccesses?

I want to call a function similar to parallelize.map(function, args) that returns a list of results and the user is blind to the actual process. One of the functions I want to parallelize calls subprocess to another unix program that benefits from multiple cores.
I first tried ipython-cluster-helper. This works well with my setup, but I ran into problems installing it on several other machines. I also have to ask for names of clusters during setup. I haven't seen other programs start jobs on clusters for you, so I don't know if that is accepted practice.
joblib seems to be the standard for parallelization, but it can only use one cluster or computer at a time. This works as well, but is significantly slower because it is not using the cluster.
Also, the server I am running this code on complains if a program has run too long to ensure that people use the cluster. Do I write another script to run this program only on our cluster -- if I used joblib?
For now, I added special parameters in setup.py to add cluster names and install ipython-cluster-helper if necessary. And when map is called, it first checks if ipython-cluster-helper and the cluster names are available, use them, else use joblib.
What are others ways of achieving this? I'm looking for a standard way to do this that will work on most machines with or with out a cluster, so I can release the code and make it easy to use.
Thanks.

Python Code Coverage and Multiprocessing

I use coveralls in combination with coverage.py to track python code coverage of my testing scripts. I use the following commands:
coverage run --parallel-mode --source=mysource --omit=*/stuff/idont/need.py ./mysource/tests/run_all_tests.py
coverage combine
coveralls --verbose
This works quite nicely with the exception of multiprocessing. Code executed by worker pools or child processes is not tracked.
Is there a possibility to also track multiprocessing code? Any particular option I am missing? Maybe adding wrappers to the multiprocessing library to start coverage every time a new process is spawned?
EDIT:
I (and jonrsharpe, also :-) found a monkey-patch for multiprocessing.
However, this does not work for me, my Tracis-CI build is killed almost right after the start. I checked the problem on my local machine and apparently adding the patch to multiprocessing busts my memory. Tests that take much less than 1GB of memory need more than 16GB with this fix.
EDIT2:
The monkey-patch does work after a small modification: Removing
the config_file parsing (config_file=os.environ['COVERAGE_PROCESS_START']) did the trick. This solved the issue of the bloated memory. Accordingly, the corresponding line simply becomes:
cov = coverage(data_suffix=True)
Coverage 4.0 includes a command-line option --concurrency=multiprocessing to deal with this. You must use coverage combine afterward. For instance, if your tests are in regression_tests.py, then you would simply do this at the command line:
coverage run --concurrency=multiprocessing regression_tests.py
coverage combine
I've had spent some time trying to make sure coverage works with multiprocessing.Pool, but it never worked.
I have finally made a fix that makes it work - would be happy if someone directed me if I am doing something wrong.
https://gist.github.com/andreycizov/ee59806a3ac6955c127e511c5e84d2b6
One of the possible causes of missing coverage data from forked processes, even with concurrency=multiprocessing, is the way of multiprocessing.Pool shutdown. For example, with statement leads to terminate() call (see __exit__ here). As a consequence, pool workers have no time to save coverage data. I had to use close(), timed join() (in a thread), terminate sequence instead of with to get coverage results saved.

How would I discover the memory used by an application through a python script?

Recently I've found myself testing an aplication in Froglogic's Squish, using Python to create test scripts. Just the other day, the question of how much memory the program is using has come up, and I've found myself unable to answer it.
It seems reasonable to assume that there's a way to query the os (windows 7) API for the information, but I've no idea where to begin. Does anyone know how I'd go about this?
this answer has some code (for windows and unix):
Total memory used by Python process?
on win, you are checking Win32_PerfRawData_PerfProc_Process and on linux it's /proc/pid/status (or ps)
Remember that Squish allows remote testing of the application. A system parameter queried via Python directly will only apply to the case of local testing.
An approach that works in either case is to call the currentApplicationContext() function that will give you a handle to the Application Under Test. It has a usedMemory property you can query. I don't recall which process property exactly is being queried but it should provide a rough indication.
In command line: tasklist /FO LIST and parse the results?
Sorry, I don't know a Pythonic way. =P

Categories

Resources