Pysys - How to run only the validate portion of a test

Pysys - How to run only the validate portion of a test - python

I am looking at a way to add a new "mode" in the Pysys Baserunner.
In particular I would like to add a validate mode, that just re-run the validation portion. Useful when you are writing your testcase and trying to tune the validation condition so to fit the current output without having to re-reun the complete testcase.
What is the best way to do this without having to change the original class?

This requires support from the framework unfortunately. The issue is that the BaseRunner class will always automatically purge the output directory, and there is no hook into the framework to allow you to avoid this. You can for instance move the output subdirectory manually you want to re-run the validation over to say 'repeat' (same directory level), and then use;
from pysys.constants import *
from pysys.basetest import BaseTest
class PySysTest(BaseTest):
def execute(self):
if self.mode=='repeat': pass
def validate(self):
if self.mode=='repeat':
self.output=os.path.join(self.descriptor.output, 'repeat')
where I have ommitted the implementations of the execute and validate. You would need to add the mode into the descriptor for the test
<classification>
<groups>
<group></group>
</groups>
<modes>
<mode>repeat</mode>
</modes>
</classification>
and run using "pysys.py run -mrepeat". This would help with the debugging if your execute takes a long time, but probably not want you want out-of-the-box i.e. a top level option to the runner to just perform validation over a previously run test. I'll add a feature request for this.

Since the original discussion, a --validateOnly command line option was added to PySys (in v1.1.1) which does pretty much what you suggest - it skips the execute method and just runs validate.
Assumes you aren't running with --purge (which I imagine is a safe assumption for this use case), and that you don't have validation commands that try to read zero-byte files from the output dir (which always get deleted even if --purge is not specified). However assuming those conditions are met, your (non-empty) output files will still be there after you've completed the first run of the test and you can re-run just validation using the --validateOnly command.
To get this feature you can install the latest PySys version (1.4.0) - see https://pypi.org/project/PySys/

Related

Python - run another python script with current environment passing the arguments over and getting the printed print output

A little bit of an ugly question, but I didn't find existing SO posts which cover it.
Right now I need to use an existing python tool available on this github
This is a rather big piece of code with a lot of dependencies which I don't want to mess with. In a nutshell one can run its module by passing the command line arguments, for example:
timesearch.py timesearch -r "subreddit1" -l "1466812800" -up "1498348800"
Now, I need to run this tool a bunch of times using a for loop, passing over different argument values each time. The tool also prints out some output into command line when you run it - and I would like to intercept and print it out from my python script as well. Finally, I need to ensure that before I move on in my loop and run the tool another time that current execution of the timesearch tool is completed.
One side note here - I do need to ensure that the timesearch is executed using same environment which I use to run my main script with for loop.
I am trying to understand what is the best way to do it.
If I just go for this it doesn't work:
import os
#for loop will go here
os.system('python timesearch.py timesearch -r "ethereum" -l "1466812800" -up "1498348800"')
It fails due to several reasons - it doesn't use the environment in which I am writing my script with a loop, it also doesn't capture the print output of timesearch.
Any advice on how to achieve it?
Just to highlight - I can't just go and pull function I need in timesearch, since it calls the __init__ to set up some things based on the arguments you pass.

I wouldn't call python script with os.system. There is basically one function which you need to use: main(sys.argv[1:])
https://github.com/voussoir/timesearch/blob/master/timesearch/__init__.py#L435.

Read py.test's output as object

Earlier I was using python unittest in my project, and with it came unittest.TextTestRunner and unittest.defaultTestLoader.loadTestsFromTestCase. I used them for the following reasons,
Control the execution of unittest using a wrapper function which calls the unittests's run method. I did not want the command line approach.
Read the unittest's output from the result object and upload the results to a bug tracking system which allow us to generate some complex reports on code stability.
Recently there was a decision made to switch to py.test, how can I do the above using py.test ? I don't want to parse any CLI/HTML to get the output from py.test. I also don't want to write too much code on my unit test file to do this.
Can someone help me with this ?

You can use the pytest's hook to intercept the test result reporting:
conftest.py:
import pytest
#pytest.hookimpl(hookwrapper=True)
def pytest_runtest_logreport(report):
yield
# Define when you want to report:
# when=setup/call/teardown,
# fields: .failed/.passed/.skipped
if report.when == 'call' and report.failed:
# Add to the database or an issue tracker or wherever you want.
print(report.longreprtext)
print(report.sections)
print(report.capstdout)
print(report.capstderr)
Similarly, you can intercept one of these hooks to inject your code at the needed stage (in some cases, with the try-except around yield):
pytest_runtest_protocol(item, nextitem)
pytest_runtest_setup(item)
pytest_runtest_call(item)
pytest_runtest_teardown(item, nextitem)
pytest_runtest_makereport(item, call)
pytest_runtest_logreport(report)
Read more: Writing pytest plugins
All of this can be easily done either with a tiny plugin made as a simple installable library, or as a pseudo-plugin conftest.py which just lies around in one of the directories with the tests.

It looks like pytest lets you launch from Python code instead of using the command line. It looks like you just pass the same arguments to the function call that would be on the command line.
Pytest will create resultlog format files, but the feature is deprecated. The documentation suggests using the pytest-tap plugin that produces files in the Test Anything Protocol.

when using Watchman's watch-make I want to access the name of the changed files

I am writing a watchman command with watchman-make and I'm at a loss when trying to access exactly what was changed in the directory. I want to run my upload.py script and inside the script I would like to access filenames of newly created files in /var/spool/cups-pdf/ANONYMOUS .
so far I have
$ watchman-make -p '/var/spool/cups-pdf/ANONYMOUS' -—run 'python /home/pi/upload.py'
I'd like to add another argument to python upload.py so I can have an exact filepath to the newly created file so that I can send the new file over to my database in upload.py,
I've been looking at the docs of watchman and the closest thing I can think to use is a trigger object. Please help!

Solution with watchman-wait:
Assuming project layout like this:
/posts/_SUBDIR_WITH_POST_NAME_/index.md
/Scripts/convert.sh
And the shell script like this:
#!/bin/bash
# File: convert.sh
SrcDirPath=$(cd "$(dirname "$0")/../"; pwd)
cd "$SrcDirPath"
echo "Converting: $SrcDirPath/$1"
Then we can launch watchman-wait like this:
watchman-wait . --max-events 0 -p 'posts/**/*.md' | while read line; do ./Scripts/convert.sh $line; done
When we changing file /posts/_SUBDIR_WITH_POST_NAME_/index.md the output will be like this:
...
Converting: /Users/.../Angular/dartweb_quickstart/posts/swift-on-android-building-toolchain/index.md
Converting: /Users/.../Angular/dartweb_quickstart/posts/swift-on-android-building-toolchain/index.md
...

watchman-make is intended to be used together with tools that will perform a follow-up query of their own to discover what they want to do as a next step. For example, running the make tool will cause make to stat the various deps to bring things up to date.
That means that your upload.py script needs to know how to do this for itself if you want to use it with watchman.
You have a couple of options, depending on how sophisticated you want things to be:
Use pywatchman to issue an ad-hoc query
If you want to be able to run upload.py whenever you want and have it figure out the right thing (just like make would do) then you can have it ask watchman directly. You can have upload.py use pywatchman (the python watchman client) to do this. pywatchman will get installed if the the watchman configure script thinks you have a working python installation. You can also pip install pywatchman. Once you have it available and in your PYTHONPATH:
import pywatchman
client = pywatchman.client()
client.query('watch-project', os.getcwd())
result = client.query('query', os.getcwd(), {
"since": "n:pi_upload",
"fields": ["name"]})
print(result["files"])
This snippet uses the since generator with a named cursor to discover the list of files that changed since the last query was issued using that same named cursor. Watchman will remember the associated clock value for you, so you don't need to complicate your script with state tracking. We're using the name pi_upload for the cursor; the name needs to be unique among the watchman clients that might use named cursors, so naming it after your tool is a good idea to avoid potential conflict.
This is probably the most direct way to extract the information you need without requiring that you make more invasive changes to your upload script.
Use pywatchman to initiate a long running subscription
This approach will transform your upload.py script so that it knows how to directly subscribe to watchman, so instead of using watchman-make you'd just directly run upload.py and it would keep running and performing the uploads. This is a bit more invasive and is a bit too much code to try and paste in here. If you're interested in this approach then I'd suggest that you take the code behind watchman-wait as a starting point. You can find it here:
https://github.com/facebook/watchman/blob/master/python/bin/watchman-wait
The key piece of this that you might want to modify is this line:
https://github.com/facebook/watchman/blob/master/python/bin/watchman-wait#L169
which is where it receives the list of files.
Why not triggers?
You could use triggers for this, but we're steering folks away from triggers because they are hard to manage. A trigger will run in the background and have its output go to the watchman log file. It can be difficult to tell if it is running, or to stop it running.
The interface is closer to the unix model and allows you to feed a list of files on stdin.
Speaking of unix, what about watchman-wait?
We also have a command that emits the list of changed files as they change. You could potentially stream the output from watchman-wait in your upload.py. This would make it have some similarities with the subscription approach but do so without directly using the pywatchman client.

Always run rule in Snakefile (snakemake)

I am writing a Snakefile for a snakemake workflow. As part of my workflow I need to check whether a set of records in a database has changed, and if they have re-download them.
My thought was to write a rule that checks the database timestamp and writes it to an output file. And use the timestamp file as an input into my download rule. The problem is once the timestamp file is written that timestamp rule will never run again, and hence the timestamp will never be updated.
Is there a way to make this rule run every time. (I know I can force it from the shell, but I would like to specify it in the Snakefile) Or, is there a better way to handle this?

Any code you add to a Snakefile outside of a rule or function definition will be run at startup just like a regular Python script, so you don't need an external shell script. You can implement the logic you want in Python right in the Snakefile, making use of the shell() function if you need it.
One caveat would be that if you tried to run your workflow on a cluster, the code would be run each time for each cluster job submitted. A crude but effective way to avoid this is to guard it with a check like this:
if '--nolock' not in sys.argv:
if check_database_for_updates():
os.utime('touch.file')
Then set touch.file as a proxy input to your rule that reads from the database. Does that make sense?
TIM

Since v3.6.0, onstart handler allows to always execute something before the workflow starts.
Snakemake 3.6.0 adds an onstart handler, that will be executed before the workflow starts. Note that dry-runs do not trigger any of the handlers.
It's unfortunate that onstart doesn't get triggered during dry-runs.
On similar note, onsuccess and onerrorhandlers can be used to trigger something to be executed depending on workflow's success and error, respectively.

how to handle time-consuming tests

I have tests which have a huge variance in their runtime. Most will take much less than a second, some maybe a few seconds, some of them could take up to minutes.
Can I somehow specify that in my Nosetests?
In the end, I want to be able to run only a subset of my tests which take e.g. less than 1 second (via my specified expected runtime estimate).

Have a look at this write up about attribute plugin for nose tests, where you can manually tag tests as #attr('slow') and #attr('fast'). You can tun nosetests -a '!slow' afterward to run your tests quickly.
It would be great if you can do it automatically, but I'm afraid that you would have to write additional code to do it on the fly. If you are into rapid development, I would run the nose with xunit xml output enabled (which tracks the runtime of each test). Your test module can dynamically read in your xml output file from previous runs and set attribute settings for tests accordingly to filter out quick tests. This way you do not have to do it manually, alas with more work (and you have to run all tests at least once).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.