Best-practice: automated web API testing - python

I've written a program in Python, which works with two distinct API to get the data from two different services (CKAN and MediaWiki).
In particular, there is a class Resource, which requests the data from the above mentioned services and process it.
At some point I've come to conclusion, that there is a need for tests for my app.
And the problem is that all examples I've found on web and in books do not deal with such cases.
For example, inside Resource class I've got a method:
def load_from_ckan(self):
"""
Get the resource
specified by self.id
from config.ckan_api_url
"""
data = json.dumps({'id': self.id})
headers = {'Content-type': 'application/json', 'Accept': 'text/plain'}
url = config.ckan_api_url + '/action/resource_show'
r = requests.post(url, timeout=config.ckan_request_timeout, data=data, headers=headers)
assert r.ok, r
resource = json.loads(r.content)
resource = resource["result"]
for key in resource:
setattr(self, key, resource[key])
The load_from_ckan method get the data about resource from the CKAN API and assign it to the object. It is simple, but...
My question is: how to test the methods like this? OR What should I test here?
I thought about the possibility to pickle (save) results to HDD. Then I could load it in the test and compare with the object initialized with load_from_ckan(). But CKAN is community-driven platform and such behavior of such tests will be unpredictable.
If there exist any books on philosophy of automated tests (like what to test, what not to test, how to make tests meaningful etc.), please, give me a link to it.

With any testing, the key question really is - what could go wrong?
In your case, it looks like the three risks are:
The web API in question could stop working. But you check for this already, with assert r.ok.
You, or someone else, could make a mistaken change to the code in future (e.g. mistyping a variable) which breaks it.
The API could change, so that it no longer returns the fields or the format you need.
It feels like you could write a fairly simple test for the latter two, depending on what data from this API you actually rely on: for example, if you're expecting the JSON to have a field called "temperature" which is a floating-point Celsius number, you could write a test which calls your function, then checks that self.temperature is an instance of 'float' and is within a sensible range of values (-30 to 50?). That should leave you confident that both the API and your function are working as designed.

Typically if you want to test against some external service like this you will need to use a mock/dummy object to fake the api of the external service. This must be configurable at run-time either via the method's arguments or the class's constructor or another type of indirection. Another more complex option would be to monkey patch globals during testing, like "import requests; request.post = fake_post", but that can create more problems.
So for example your method could take an argument like so:
def load_from_ckan(self, post=requests.post):
# ...
r = post(url, timeout=config.ckan_request_timeout, data=data,
headers=headers)
# ...
Then during testing your would write your own post function that returned json results you'd see coming back from ckan. For example:
def mock_post(url, timeout=30, data='', headers=None):
# ... probably check input arguments
class DummyResponse:
pass
r = DummyResponse()
r.ok = True
r.content = json.dumps({'result': {'attr1': 1, 'attr2': 2}})
return r
Constructing the result in your test gives you a lot more flexibility than pickling results and returning them because you can fabricate error conditions or focus in on specific formats your code might not expect but you know could exist.
Overall you can see how complicated this could become so I would only start adding this sort of testing if you are experiencing repeated errors or other difficulties. This will just more code you have to maintain.

At this point, you can test that the response from CKAN is properly parsed. So you can pull the JSON from CKAN and ensure that it's returning data with the attributes you're interested in.

Related

Get the current response headers set in a Tornado request handler

The Tornado RequestHandler class has add_header(), clear_header(), and set_header() methods. Is there a way to just see the headers that are currently set?
My use case is that I am writing some utility methods to automatically set response headers under certain conditions. But I want to add some error checking in order to not add duplicates of a header that I do not want to have duplicated.
I want to write come code that is more or less like this:
class MyHandler(tornado.web.RequestHandler):
def ensure_json_header(self):
if not self.has_header_with_key('Content-Type'):
self.set_header('Content-Type', 'application/json')
def finish_json(self, data):
self.ensure_json_header()
return self.finish(json.dumps(data))
But of course there is no has_header_with_key() method in Tornado. How can I accomplish this?
EDIT: this turned out to be an X-Y question. The real answer was to just use set_header instead of add_header. I am leaving this up for anyone else who might come along with a similar question.
There's no documented api for listing the headers present in a response.
But there is a self._headers private attribute (an instance of tornado.httputil.HTTPHeaders) which is basically a dict of all headers in the response. You can do this to check a header:
if 'Content-Type' in self._headers:
# do something
As an addendum, if you want to access all headers of a request, you can do self.request.headers.
Edit: I've opened an issue about this on github after seeing your question; let's see what happens.
Tornado will always have the Content-Type header set as it is in the default headers (https://www.tornadoweb.org/en/stable/_modules/tornado/web.html#RequestHandler.clear). So if you want to ensure you have a specific content type set, just call set_header.
If you want to check that the response does not have a header set in your code, you’ll have to first reset the default header, which you can do by implementing set_default_headers and do a clear_header(“Content-Type”) there.
But you could also achieve the same by setting a property on your handler (say override_content_type), set that in code and then do a non conditional set_header before rendering the result.

caching.memoize & response_filter for internal server errors

I am using flask_caching to cache responses of my flask API. I am using the decorator on my routes like this
import random
class Status(Resource):
#cache.memoize(timeout=60) # cache for a minute
def post(self):
return random.randint(0, 5)
which will return the same random number for a minute. However, what if the random function (read: "any functionality inside the route") breaks, and the route returns a 500 internal server error? As far as I know, flask_caching would be caching this, and return the bad response for all further calls within a minute, which is not what I want.
I read into this and found the response_filter parameter, which can be added to the decorator easily, seemingly specifically to prevent this from happening ("Useful to prevent caching of code 500 responses.", from the docs:
https://flask-caching.readthedocs.io/en/latest/api.html?highlight=response_filter#flask_caching.Cache.memoize)
#cache.memoize(timeout=60, response_filter=callable_check_for_500(???))
However, I am unable to find an example of this use case. It says "If the callable returns False, the content will not be cached." - how do I implement this callable to check if the status code is 500? Any links or ideas appreciated
I figured out "a solution", but I'm not entirely happy with it
basically, the check_500() function gets the argument resp by default, however its not the full response object, and unfortunately lacks the status_code attribute, like I expected.
the status code itself is in the data, and I'm just looking at the last entry of the response, which is all the data returned. In my case its just the returned json as [0], and the status_code at [-1].
implementation is currently as follows:
#cache.memoize(timeout=60, response_filter=check_500) # cache for a minute
with the callable check_500 function defined above
def check_500(resp):
if resp[-1] == 500:
return False
else:
return True
This works pretty much like above_c_level suggested in the comment, so thank you very much, however I would advise to look at the last index of the response instead of checking if 500 is in the response data at all. It still seems a bit wonky, if anyone has a more elaborate idea, feel free to post another answer.

How to get a request that is not the last in HttPretty?

Using the HTTPretty library for Python, I can create mock HTTP responses for my unit tests. When the code I am testing runs, instead of my request reaching the third party, the request is intercepted and my code receives the response I configured.
I then use last_request() and can check the url my code requested, any parameters, etc.
What I would like is to know how can I access not just the last request but also any other requests my code sent before the last one.
This seems to be possible. In the documentation it uses a list called latest_requests. For example here
But that doesn't seem to work for me. I get an AttributeError AttributeError: module 'httpretty' has no attribute 'latest_requests'
Here is some code that illustrates what I am trying to do and where I get AttributeError
import httpretty
import requests
httpretty.enable()
httpretty.register_uri(
method=httpretty.GET,
uri='http://www.firsturl.com',
status=200,
body='First Body'
)
httpretty.enable()
httpretty.register_uri(
method=httpretty.GET,
uri='http://www.secondurl.com',
status=200,
body='secondBody'
)
firstresponse = requests.get('http://www.firsturl.com')
secondresponse = requests.get('http://www.secondurl.com')
print(httpretty.latest_requests[-1].url)
# clean up
httpretty.disable()
httpretty.reset()
Thanks!!
Unfortunately, after reading the docs and attempting to get your code working, I can only describe the documentation as blatantly incorrect. There appear to be three | separate | pull requests from several years ago that claim to make httpretty.latest_requests a real attribute but none of them have merged in for whatever reason.
With all of that said, I managed to get the list of all previous requests by calling
httpretty.HTTPretty.latest_requests
This returns a list of HTTPrettyRequest objects. Seeing as httpretty.last_request() returns an HTTPrettyRequest object, that attribute is probably what you're looking for.
Unfortunately, .url is not defined on that class (but it is defined on the blank request object which doesn't make any sense). If you want to check that the request URL is what you're expecting, you pretty much have to try reconstructing it yourself:
req = httpretty.HTTPretty.latest_requests[-1]
url = req.headers.get('Host', '') + req.path
If you're passing anything in the query string, you'll have to reconstruct that from req.querystring although that's not ordered so you probably don't want to turn that into a string for matching purposes. Also, if all of your requests are going to the same domain, you can leave off the host part and just compare req.path.

Mocking requests.post and requests.json decoder python

I'm creating a test suite for my module that uses the requests library quite a bit. However, I'm trying to mock several different return values for a specific request, and I'm having trouble doing so. Here is my code snippet that doesn't work:
class MyTests(unittest.TestCase):
#patch('mypackage.mymodule.requests.post')
def test_change_nested_dict_function(self, mock_post):
mock_post.return_value.status_code = 200
mock_post.return_value.json = nested_dictionary
modified_dict = mymodule.change_nested_dict()
self.assertEqual(modified_dict['key1']['key2'][0]['key3'], 'replaced_value')
The function I am attempting to mock:
import requests
def change_nested_dict():
uri = 'http://this_is_the_endpoint/I/am/hitting'
payload = {'param1': 'foo', 'param2': 'bar'}
r = requests.post(uri, params=payload)
# This function checks to make sure the response is giving the
# correct status code, hence why I need to mock the status code above
raise_error_if_bad_status_code(r)
dict_to_be_changed = r.json()
def _internal_fxn_to_change_nested_value(dict):
''' This goes through the dict and finds the correct key to change the value.
This is the actual function I am trying to test above'''
return changed_dict
modified_dict = _internal_fxn_to_change_nested_value(dict_to_be_changed)
return modified_dict
I know a simple way of doing this would be to not have a nested function, but I am only showing you part of the entire function's code. Trust me, the nested function is necessary and I really do not want to change that part of it.
My issue is, I don't understand how to mock requests.post and then set the return value for both the status code and the internal json decoder. I also can't seem to find a way around this issue since I can't seem to patch the internal function either, which also would solve this problem. Does anyone have any suggestions/ideas? Thanks a bunch.
I bumped here and although I agree that possibly using special purpose libraries is a better solution, I ended up doing the following
from mock import patch, Mock
#patch('requests.post')
def test_something_awesome(mocked_post):
mocked_post.return_value = Mock(status_code=201, json=lambda : {"data": {"id": "test"}})
This worked for me for both getting the status_code and the json() at the receiver end while doing the unit-test.
Wrote it here thinking that someone may find it helpful.
When you mock a class each child method is set up as a new MagicMock that in turn needs to be configured. So in this case you need to set the return_value for mock_post to bring the child attribute into being, and one to actually return something, i.e:
mock_post.return_value.status_code.return_value = 200
mock_post.return_value.json.return_value = nested_dictionary
You can see this by looking at the type of everything:
print(type(mock_post))
print(type(mock_post.json))
In both cases the type is <class 'unittest.mock.MagicMock'>
Probably it is better for you to look at some specialized libraries for requests testing:
responses
requests-mock
requests-testing
They provide clean way to mock responses in unittests.
An alternate approach is to just create an actual Response object and then do a configure_mock() on the original mock.
from requests import Response
class MyTests(unittest.TestCase):
#patch('mypackage.mymodule.requests.post')
def test_change_nested_dict_function(self, mock_post):
resp = Response()
resp.status_code = 200
resp.json = nested_dictionary
mock_post.configure_mock(return_value=resp)
...

How do I test an API Client with Python?

I'm working on a client library for a popular API. Currently, all of my unit tests of said client are making actual API calls against a test account.
Here's an example:
def test_get_foo_settings(self):
client = MyCustomClient(token, account)
results = client.get_foo_settings()
assert_is(type(results), list)
I'd like to stop making actual API calls against my test account.
How should I tackle this? Should I be using Mock to mock the calls to the client and response?
Also, I'm confused on the philosophy of what to test with this client library. I'm not interested in testing the actual API, but when there are different factors involved like the method being invoked, the permutations of possible return results, etc - I'm not sure what I should test and/or when it is safe to make assumptions (such as a mocked response).
Any direction and/or samples of how to use Mock in my type of scenario would be appreciated.
I would personally do it by first creating a single interface or function call which your library uses to actually contact the service, then write a custom mock for that during tests.
For example, if the service uses HTTP and you're using Requests to contact the service:
class MyClient(…):
def do_stuff(self):
result = requests.get(self.service_url + "/stuff")
return result.json()
I would first write a small wrapper around requests:
class MyClient(…):
def _do_get(self, suffix):
return requests.get(self.service_url + "/" + suffix).json()
def do_stuff(self):
return self._do_get("stuff")
Then, for tests, I would mock out the relevant functions:
class MyClientWithMocks(MyClient):
def _do_get(self, suffix):
self.request_log.append(suffix)
return self.send_result
And use it in tests like this:
def test_stuff(self):
client = MyClientWithMocks(send_result="bar")
assert_equal(client.do_stuff(), "bar")
assert_contains(client.request_log, "stuff")
Additionally, it would likely be advantageous to write your tests so that you can run them both against your mock and against the real service, so that if things start failing, you can quickly figure out who's fault it is.
I'm using HTTmock and I'm pretty happy with it : https://github.com/patrys/httmock

Categories

Resources