Testing methods that return nothing, XML, Python

Testing methods that return nothing, XML, Python - python

I'm trying to understand why, how, and if to unit test methods that seem to return nothing. I've read in a couple other threads that:
The point of a unit test is to test something that the function does. If its not returning a value, then what is it actually doing?
unittest for none type in python?
In my example, I am using the XMLSigner and XMLVerifier from the sign_XML library.
def verify_xml(signed_xml: str, cert_file: str) -> None:
with open(cert_file, 'rb') as file:
cert = file.read()
with open(signed_xml, 'rb') as input_file:
input_data = input_file.read()
XMLVerifier().verify(input_data, x509_cert=cert)
I started looking up documentaion I found for SignXML. I read that verify():
class signxml.XMLVerifier Create a new XML Signature Verifier object,
which can be used to hold configuration information and verify
multiple pieces of data. verify(data, require_x509=True,
x509_cert=None, cert_subject_name=None, ca_pem_file=None,
ca_path=None, hmac_key=None, validate_schema=True, parser=None,
uri_resolver=None, id_attribute=None, expect_references=1)
Verify the
XML signature supplied in the data and return the XML node signed by
the signature, or raise an exception if the signature is not valid. By
default, this requires the signature to be generated using a valid
X.509 certificate.
This is my first time working with this and I'm confused even more now. So this apparently does return something.
What I've attempted
For another method which ends up calling verify_xml I've used #patch and just checked that the method I patched was called and with the correct arguments. This also seems like it's not the way to do it, but I didn't know how else to test it.
It feels weird doing something similar with the verify_xml method and just checking that it has been called once.
I've also tried self.assertIsNone... and that passes but that seems weird to me and not like it's a way one does this.
Could someone help me understand why, how, and if to unit test methods that seem to return nothing).
Thanks

to test verify_xml() is to test the Exception triggered by XMLVerifer().verify() if input parameters is not valid
There are a few types of exceptions you can tested.
from signxml import (XMLSigner, XMLVerifier, InvalidInput, InvalidSignature, InvalidCertificate, InvalidDigest)
class TestVerifyXML(unittest.TestCase):
def setUpCls(cls):
cls.signed_xml = from_magic()
cls.cert_file = from_magic2()
cls.ceft_file_bad = from_magic_bad()
def test_verify_xml(self):
# no Exception with correct xml
verify_xml(self.signed_xml, self.cert_file)
with self.assertRaises(InvalidSignature):
verify_xml(self.signed_xml, self.cert_file_bad)

Related

Does mypy not allow you to define the same variable twice?

I've started using Mypy on my code, and I've got the following snippet here:
class BPDataStore():
def __init__(self, path: pathlib.Path):
self.path: pathlib.Path = path
if self.path.exists():
with self.path.open("r") as F:
self._bpdata: list[dict] = json.loads(F.read())
else:
self._bpdata: list[dict] = {}
Running mypy . from the project root gets me the following errors (there are others, but they're in other files and I want to focus on this one right now)
bp_tracker\back\data_store.py:13: error: Attribute "_bpdata" already defined on line 11
Why doesn't mypy allow "redeclaration" like this? It works in production just fine, and doesn't seem very unreasonable to me. And assuming that there's no way around this, what's the correct way to write this code so that it passes mypy's check?
For now I've just removed the annotation from the 2nd _bpdata attribute to get it to pass.
Currently using:
Python 3.10.4
Mypy 0.982

The Python interpreter does not care about annotations either way (so long as they are syntactically correct), which is why this "works" as you said.
But defining a variable more than once is generally not type safe. In this case it technically doesn't matter, because the definitions are identical, but I think mypy just simply disallows re-defining.
If you want to be explicit, you can either define it on the class in advance or inside that method beforehand. In both cases you just don't assign a value to it:
class BPDataStore:
_bpdata: list[dict] # here
...
def __init__(self, path: pathlib.Path):
self._bpdata: list[dict] # or here
...
But the problem seems to be that your type annotation doesn't match what you assign it. {} is an empty dict and not a list.
Assuming that was a typo and you are certain that you would always get a list from that json.loads (i.e. the top-level is a JSON array), you could just assign an empty list first and then potentially overwrite it with what you load from the file.
Also, I would suggest including the type arguments for generic types like dict. Here is how I would do it:
import json
from pathlib import Path
from typing import Any
class BPDataStore:
def __init__(self, path: Path) -> None:
self.path = path
self._bpdata: list[dict[str, Any]] = []
try:
with self.path.open("r") as f:
self._bpdata = json.loads(f.read())
except Exception as e:
# handle exception `e`...
Notice also that I don't explicitly annotate self.path because it would be inferred by any type checker based on that first assignment via the typed argument path. But at that point it is just a matter of preference.
EDIT: Thanks to #SUTerliakov for pointing out that you should indeed wrap your file opening in a try-block, instead of checking for existence. I edited my code example accordingly. If you are only worried the file may not exist, you should catch FileNotFoundError.

Python ast - getting function parameters and processing them

I'm trying to use the ast module in Python to parse input code, but am struggling with a lot of the syntax of how to do so. For instance, I have the following code as a testing environment:
import ast
class NodeVisitor(ast.NodeVisitor):
def visit_Call(self, node):
for each in node.args:
print(ast.literal_eval(each))
self.generic_visit(node)
line = "circuit = QubitCircuit(3, True)"
tree = ast.parse(line)
print("VISITOR")
visitor = NodeVisitor()
visitor.visit(tree)
Output:
VISITOR
3
True
In this instance, and please correct me if I'm wrong, the visit_Call will be used if it's a function call? So I can get each argument, however there's no guarantee it will work like this as there are different arguments available to be provided. I understand that node.args is providing my arguments, but I'm not sure how to do things with them?
I guess what I'm asking is how do I check what the arguments are and do different things with them? I'd like to check, perhaps, that the first argument is an Int, and if so, run processInt(parameter) as an example.

The value each in your loop in the method will be assigned to the AST node for each of the arguments in each function call you visit. There are lots of different types of AST nodes, so by checking which kind you have, you may be able to learn things about the argument being passed in.
Note however that the AST is about syntax, not values. So if the function call was foo(bar), it's just going to tell you that the argument is a variable named bar, not what the value of that variable is (which it does not know). If the function call was foo(bar(baz)), it's going to show you that the argument is another function call. If you only need to handle calls with literals as their arguments, then you're probably going to be OK, you'll just look instances of AST.Num and similar.
If you want to check if the first argument is a number and process it if it is, you can do something like:
def visit_Call(self, node):
first_arg = node.args[0]
if isinstance(first_arg, ast.Num):
processInt(first_arg.n)
else:
pass # Do you want to do something on a bad argument? Raise an exception maybe?

How can I use unittest.mock to remove side effects from code?

I have a function with several points of failure:
def setup_foo(creds):
"""
Creates a foo instance with which we can leverage the Foo virtualization
platform.
:param creds: A dictionary containing the authorization url, username,
password, and version associated with the Foo
cluster.
:type creds: dict
"""
try:
foo = Foo(version=creds['VERSION'],
username=creds['USERNAME'],
password=creds['PASSWORD'],
auth_url=creds['AUTH_URL'])
foo.authenticate()
return foo
except (OSError, NotFound, ClientException) as e:
raise UnreachableEndpoint("Couldn't find auth_url {0}".format(creds['AUTH_URL']))
except Unauthorized as e:
raise UnauthorizedUser("Wrong username or password.")
except UnsupportedVersion as e:
raise Unsupported("We only support Foo API with major version 2")
and I'd like to test that all the relevant exceptions are caught (albeit not handled well currently).
I have an initial test case that passes:
def test_setup_foo_failing_auth_url_endpoint_does_not_exist(self):
dummy_creds = {
'AUTH_URL' : 'http://bogus.example.com/v2.0',
'USERNAME' : '', #intentionally blank.
'PASSWORD' : '', #intentionally blank.
'VERSION' : 2
}
with self.assertRaises(UnreachableEndpoint):
foo = osu.setup_foo(dummy_creds)
but how can I make my test framework believe that the AUTH_URL is actually a valid/reachable URL?
I've created a mock class for Foo:
class MockFoo(Foo):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
and my thought is mock the call to setup_foo and remove the side effect of raising an UnreachableEndpoint exception. I know how to add side-effects to a Mock with unittest.mock, but how can I remove them?

Assuming your exceptions are being raised from foo.authenticate(), what you want to realize here is that it does not necessarily matter whether the data is in fact really valid in your tests. What you are trying to say really is this:
When this external method raises with something, my code should behave accordingly based on that something.
So, with that in mind, what you want to do is have different test methods where you pass what should be valid data, and have your code react accordingly. The data itself does not matter, but it provides a documented way of showing how the code should behave with data that is passed in that way.
Ultimately, you should not care how the nova client handles the data you give it (nova client is tested, and you should not care about it). What you care about is what it spits back at you and how you want to handle it, regardless of what you gave it.
In other words, for the sake of your tests, you can actually pass a dummy url as:
"this_is_a_dummy_url_that_works"
For the sake of your tests, you can let that pass, because in your mock, you will raise accordingly.
For example. What you should be doing here is actually mocking out Client from novaclient. With that mock in hand, you can now manipulate whatever call within novaclient so you can properly test your code.
This actually brings us to the root of your problem. Your first exception is catching the following:
except (OSError, NotFound, ClientException)
The problem here, is that you are now catching ClientException. Almost every exception in novaclient inherits from ClientException, so no matter what you try to test beyond that exception line, you will never reach those exceptions. You have two options here. Catch ClientException, and just raise a custom exception, or, remote ClientException, and be more explicit (like you already are).
So, let us go with removing ClientException and set up our example accordingly.
So, in your real code, you should be now setting your first exception line as:
except (OSError, NotFound) as e:
Furthermore, the next problem you have is that you are not mocking properly. You are supposed to mock with respect to where you are testing. So, if your setup_nova method is in a module called your_nova_module. It is with respect to that, that you are supposed to mock. The example below illustrates all this.
#patch("your_nova_module.Client", return_value=Mock())
def test_setup_nova_failing_unauthorized_user(self, mock_client):
dummy_creds = {
'AUTH_URL': 'this_url_is_valid',
'USERNAME': 'my_bad_user. this should fail',
'PASSWORD': 'bad_pass_but_it_does_not_matter_what_this_is',
'VERSION': '2.1',
'PROJECT_ID': 'does_not_matter'
}
mock_nova_client = mock_client.return_value
mock_nova_client.authenticate.side_effect = Unauthorized(401)
with self.assertRaises(UnauthorizedUser):
setup_nova(dummy_creds)
So, the main idea with the example above, is that it does not matter what data you are passing. What really matters is that you are wanting to know how your code will react when an external method raises.
So, our goal here is to actually raise something that will get your second exception handler to be tested: Unauthorized
This code was tested against the code you posted in your question. The only modifications were made were with module names to reflect my environment.

If you wish to mock out http servers from bogus urls, I suggest you check out HTTPretty. It mocks out urls at a socket level so it can trick most Python HTTP libraries that it's a valid url.
I suggest the following setup for your unittest:
class FooTest(unittest.TestCase):
def setUp(self):
httpretty.register_uri(httpretty.GET, "http://bogus.example.com/v2.0",
body='[{"response": "Valid"}]',
content_type="application/json")
#httpretty.activate
def test_test_case(self):
resp = requests.get("http://bogus.example.com/v2.0")
self.assertEquals(resp.status_code, 200)
Note that the mock will only apply to stacks that are decorated with http.activate decorator, so it won't leak to other places in your code that you don't want to mock. Hope that makes sense.

unittest web scraper in python

I am new to unit test.I want to write unit test for web scraper that I wrote.My scraper collects the data from website,which is on local disk where inputting different date gives different results
I have the following function in script.
get_date [returns date mentioned on web page]
get_product_and_cost [returns product mentioned and their cost]
I am not sure what to test in these functions.So far I have written this
class SimplisticTest(unittest.TestCase):
def setUp(self):
data = read_file("path to file")
self.soup = BeautifulSoup(data,'html5lib')
def test_date(self):
self.assertIsInstance(get_date(self.soup), str)
def test_date_length(self):
self.assertEqual(len(get_date(self.soup)),10)
if __name__ == '__main__':
unittest.main()

Usually, it is good to test a known output from a known input. In your case you test for the return type, but it would be even better to test if the returned object corresponds to what you would expect from the input, and that's where static test data (a test web page in your case) becomes useful. You can also test for exceptions with self.assertRaises(ExceptionType, method, args). Refer to https://docs.python.org/3.4/library/unittest.html if you haven't already.
Basically you want to test at least one explicit case (like the test page), the exceptions that can be raised like bad argument type (TypeError or ValueError) or a possible None return type depending on your function. Make sure not to test only for the type of the return or the amount of the return but explicitly for the data, such that if a change is made that breaks the feature, it is found (whereas a change could still return 10 elements, yet the elements might contain invalid data). I'd also suggest to have one test method for each method: get_date would have its test method test_get_date.
Keep in mind that what you want to find is if the method does its job, so test for extreme cases (big input data, as much as it can support or at least the method defines it can) and try to create them such that if the method outputs differently from what is expected based on its definition (documentation), the test fails and breaking changes are caught early on.

Dictionary or If statements, Jython

I am writing a script at the moment that will grab certain information from HTML using dom4j.
Since Python/Jython does not have a native switch statement I decided to use a whole bunch of if statements that call the appropriate method, like below:
if type == 'extractTitle':
extractTitle(dom)
if type == 'extractMetaTags':
extractMetaTags(dom)
I will be adding more depending on what information I want to extract from the HTML and thought about taking the dictionary approach which I found elsewhere on this site, example below:
{
'extractTitle': extractTitle,
'extractMetaTags': extractMetaTags
}[type](dom)
I know that each time I run the script the dictionary will be built, but at the same time if I were to use the if statements the script would have to check through all of them until it hits the correct one. What I am really wondering, which one performs better or is generally better practice to use?
Update: #Brian - Thanks for the great reply. I have a question, if any of the extract methods require more than one object, e.g.
handle_extractTag(self, dom, anotherObject)
# Do something
How would you make the appropriate changes to the handle method to implemented this? Hope you know what I mean :)
Cheers

To avoid specifying the tag and handler in the dict, you could just use a handler class with methods named to match the type. Eg
class MyHandler(object):
def handle_extractTitle(self, dom):
# do something
def handle_extractMetaTags(self, dom):
# do something
def handle(self, type, dom):
func = getattr(self, 'handle_%s' % type, None)
if func is None:
raise Exception("No handler for type %r" % type)
return func(dom)
Usage:
handler = MyHandler()
handler.handle('extractTitle', dom)
Update:
When you have multiple arguments, just change the handle function to take those arguments and pass them through to the function. If you want to make it more generic (so you don't have to change both the handler functions and the handle method when you change the argument signature), you can use the *args and **kwargs syntax to pass through all received arguments. The handle method then becomes:
def handle(self, type, *args, **kwargs):
func = getattr(self, 'handle_%s' % type, None)
if func is None:
raise Exception("No handler for type %r" % type)
return func(*args, **kwargs)

With your code you're running your functions all get called.
handlers = {
'extractTitle': extractTitle,
'extractMetaTags': extractMetaTags
}
handlers[type](dom)
Would work like your original if code.

It depends on how many if statements we're talking about; if it's a very small number, then it will be more efficient than using a dictionary.
However, as always, I strongly advice you to do whatever makes your code look cleaner until experience and profiling tell you that a specific block of code needs to be optimized.

Your use of the dictionary is not quite correct. In your implementation, all methods will be called and all the useless one discarded. What is usually done is more something like:
switch_dict = {'extractTitle': extractTitle,
'extractMetaTags': extractMetaTags}
switch_dict[type](dom)
And that way is facter and more extensible if you have a large (or variable) number of items.

The efficiency question is barely relevant. The dictionary lookup is done with a simple hashing technique, the if-statements have to be evaluated one at a time. Dictionaries tend to be quicker.
I suggest that you actually have polymorphic objects that do extractions from the DOM.
It's not clear how type gets set, but it sure looks like it might be a family of related objects, not a simple string.
class ExtractTitle( object ):
def process( dom ):
return something
class ExtractMetaTags( object ):
def process( dom ):
return something
Instead of setting type="extractTitle", you'd do this.
type= ExtractTitle() # or ExtractMetaTags() or ExtractWhatever()
type.process( dom )
Then, you wouldn't be building this particular dictionary or if-statement.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.