How to override the .get() method in requests?

How to override the .get() method in requests? - python

I would like the .get() method in requests to do extra operations beside the GET itself:
print out "hello world" (in reality this will be logging)
wait 5 seconds before issuing the actual GET (in reality this will be a more complex wait-and-retry operation)
Right now my simplistic solution is to use a function which actually calls requests.get():
def multiple_requests(self, url, retries=5, wait=5):
"""
retries several times an URL
:param
url: the url to check
retries: how meny times to retry
wait: number of seconds to wait between retries
:return: the requests response, or None if failed
"""
for _ in range(retries):
try:
r = requests.get(url)
except Exception as e:
self.log.error("cannot connect to {url}: {e}, retrying in {wait} seconds".format(url=url, e=e, wait=wait))
else:
if r.ok:
return r
else:
self.log.error(
"error connecting to {url}, code {e}, retrying in {wait} seconds".format(
url=url, e=r.status_code, wait=wait
)
)
finally:
time.sleep(wait)
# give up after several tries
self.log.error("cannot connect to {url} despite retries, giving up".format(url=url))
return None
but I have a strong feeling that it would be possible to override the actual .get() method in requests.
I use object programming in a very basic way and that would be an opportunity to actually learn the override part. There are various tutorials on how to override and call the parent class methods (which is exactly what I want to do: be able to finally use the original .get() method)
I therefore tried a basic override:
import requests
class MyRequest(requests.Request):
def get(self, url, **kwargs):
print("hello world")
# calling the parent .get() method to actually GET something
super(MyRequest, self).get(url, **kwargs)
r = MyRequest.get('http://google.com')
This code fails with
Traceback (most recent call last):
File "C:/Users/yop/dev/infoscreen/testingrequestsclass.py", line 8, in <module>
r = MyRequest.get('http://google.com')
TypeError: get() missing 1 required positional argument: 'url'
To be honest, I am stuck here. The tutorials all start with a definition of the parent class, while what I have is hidden (there is documentation)

requests.get is just a function, you can override it. It is not a method on the requests.Requests model:
import requests.api
def my_get(url, **kwargs):
print('Hello World!')
kwargs.setdefault('allow_redirects', True)
return requests.api.request('get', url, **kwargs)
requests.api.get = my_get
This then uses a new session object to handle the request.
Instead of replacing requests.get(), I'd provide a subclass of the requests.Session() object, overriding the Session.request() method, then use an instance of that session object:
from requests import Session
class MySession(Session):
def request(self, method, url, **kwargs):
print('Hello World!')
return super().request(method, url, **kwargs)
then use that like this:
with MySession() as session:
response = session.get(url)
The advantage here is that you then can also make use of the full feature set that session objects offer, plus your additional code will also work for POST and PUT and DELETE and HEAD, etc. requests.

Related

Does requests_cache automatically update cache on update of info?

I have a very unreliable API that I request using Python. I have been thinking about using requests_cache and setting expire_after to be 999999999999 like I have seen other people do.
The only problem is, I do not know if when the API works again, that if the data is updated. If requests_cache will automatically auto-update and delete the old entry.
I have tried reading the docs but I cannot really see this anywhere.

requests_cache will not update until the expire_after time has passed. In that case it will not detect that your API is back to a working state.
I note that the project has since added an option that I implemented in the past; you can now set the old_data_on_error option when configuring the cache; see the CachedSession documentation:
old_data_on_error – If True it will return expired cached response if update fails.
It would reuse existing cache data in case a backend update fails, rather than delete that data.
In the past, I created my own requests_cache session setup (plus small patch) that would reuse cached values beyond expire_after if the backend gave a 500 error or timed out (using short timeouts) to deal with a problematic API layer, rather than rely on expire_after:
import logging
from datetime import (
datetime,
timedelta
)
from requests.exceptions import (
ConnectionError,
Timeout,
)
from requests_cache.core import (
dispatch_hook,
CachedSession,
)
log = logging.getLogger(__name__)
# Stop logging from complaining if no logging has been configured.
log.addHandler(logging.NullHandler())
class FallbackCachedSession(CachedSession):
"""Cached session that'll reuse expired cache data on timeouts
This allows survival in case the backend is down, living of stale
data until it comes back.
"""
def send(self, request, **kwargs):
# this *bypasses* CachedSession.send; we want to call the method
# CachedSession.send() would have delegated to!
session_send = super(CachedSession, self).send
if (self._is_cache_disabled or
request.method not in self._cache_allowable_methods):
response = session_send(request, **kwargs)
response.from_cache = False
return response
cache_key = self.cache.create_key(request)
def send_request_and_cache_response(stale=None):
try:
response = session_send(request, **kwargs)
except (Timeout, ConnectionError):
if stale is None:
raise
log.warning('No response received, reusing stale response for '
'%s', request.url)
return stale
if stale is not None and response.status_code == 500:
log.warning('Response gave 500 error, reusing stale response '
'for %s', request.url)
return stale
if response.status_code in self._cache_allowable_codes:
self.cache.save_response(cache_key, response)
response.from_cache = False
return response
response, timestamp = self.cache.get_response_and_time(cache_key)
if response is None:
return send_request_and_cache_response()
if self._cache_expire_after is not None:
is_expired = datetime.utcnow() - timestamp > self._cache_expire_after
if is_expired:
self.cache.delete(cache_key)
# try and get a fresh response, but if that fails reuse the
# stale one
return send_request_and_cache_response(stale=response)
# dispatch hook here, because we've removed it before pickling
response.from_cache = True
response = dispatch_hook('response', request.hooks, response, **kwargs)
return response
def basecache_delete(self, key):
# We don't really delete; we instead set the timestamp to
# datetime.min. This way we can re-use stale values if the backend
# fails
try:
if key not in self.responses:
key = self.keys_map[key]
self.responses[key] = self.responses[key][0], datetime.min
except KeyError:
return
from requests_cache.backends.base import BaseCache
BaseCache.delete = basecache_delete
The above subclass of CachedSession bypasses the original send() method to instead go directly to the original requests.Session.send() method, to return existing cached value even if the timeout has passed but the backend has failed. Deletion is disabled in favour of setting the timeout value to 0, so we can still reuse that old value if a new request fails.
Use the FallbackCachedSession instead of a regular CachedSession object.
If you wanted to use requests_cache.install_cache(), make sure to pass in FallbackCachedSession to that function in the session_factory keyword argument:
import requests_cache
requests_cache.install_cache(
'cache_name', backend='some_backend', expire_after=180,
session_factory=FallbackCachedSession)
My approach is a little more comprehensive than what requests_cache implemented some time after I hacked together the above; my version will fall back to a stale response even if you explicitly marked it as deleted before.

Try to do something like that:
class UnreliableAPIClient:
def __init__(self):
self.some_api_method_cached = {} # we will store results here
def some_api_method(self, param1, param2)
params_hash = "{0}-{1}".format(param1, param2) # need to identify input
try:
result = do_call_some_api_method_with_fail_probability(param1, param2)
self.some_api_method_cached[params_hash] = result # save result
except:
result = self.some_api_method_cached[params_hash] # resort to cached result
if result is None:
raise # reraise exception if nothing cached
return result
Of course you can make simple decorator with that, up to you - http://www.artima.com/weblogs/viewpost.jsp?thread=240808

What is the best way to force a keyword while using **kwargs?

I'm not sure if I have used the correct terminology in the question.
Currently, I am trying to make a wrapper/interface around Google's Blogger API (Blog service).
[I know it has been done already, but I am using this as a project to learn OOP/python.]
I have made a method that gets a set of 25 posts from a blog:
def get_posts(self, **kwargs):
""" Makes an API request. Returns list of posts. """
api_url = '/blogs/{id}/posts'.format(id=self.id)
return self._send_request(api_url, kwargs)
def _send_request(self, url, parameters={}):
""" Sends an HTTP GET request to the Blogger API.
Returns JSON decoded response as a dict. """
url = '{base}{url}?'.format(base=self.base, url=url)
# Requests formats the parameters into the URL for me
try:
r = requests.get(url, params=parameters)
except:
print "** Could not reach url:\n", url
return
api_response = r.text
return self._jload(api_response)
The problem is, I have to specify the API key every time I call the get_posts function:
someblog = BloggerClient(url='http://someblog.blogger.com', key='0123')
someblog.get_posts(key=self.key)
Every API call requires that the key be sent as a parameter on the URL.
Then, what is the best way to do that?
I'm thinking a possible way (but probably not the best way?), is to add the key to the kwargs dictionary in the _send_request():
def _send_request(self, url, parameters={}):
""" Sends an HTTP get request to Blogger API.
Returns JSON decoded response. """
# Format the full API URL:
url = '{base}{url}?'.format(base=self.base, url=url)
# The api key will be always be added:
parameters['key']= self.key
try:
r = requests.get(url, params=parameters)
except:
print "** Could not reach url:\n", url
return
api_response = r.text
return self._jload(api_response)
I can't really get my head around what is the best way (or most pythonic way) to do it.

You could store it in a named constant.
If this code doesn't need to be secure, simply
API_KEY = '1ih3f2ihf2f'
If it's going to be hosted on a server somewhere or needs to be more secure, you could store the value in an environment variable
In your terminal:
export API_KEY='1ih3f2ihf2f'
then in your python script:
import os
API_KEY = os.environ.get('API_KEY')

The problem is, I have to specify the API key every time I call the get_posts function:
If it really is just this one method, the obvious idea is to write a wrapper:
def get_posts(blog, *args, **kwargs):
returns blog.get_posts(*args, key=key, **kwargs)
Or, better, wrap up the class to do it for you:
class KeyRememberingBloggerClient(BloggerClient):
def __init__(self, *args, **kwargs):
self.key = kwargs.pop('key')
super(KeyRememberingBloggerClient, self).__init__(*args, **kwargs)
def get_posts(self, *args, **kwargs):
return super(KeyRememberingBloggerClient, self).get_posts(
*args, key=self.key, **kwargs)
So now:
someblog = KeyRememberingBloggerClient(url='http://someblog.blogger.com', key='0123')
someblog.get_posts()
Yes, you can override or monkeypatch the _send_request method that all of the other methods use, but if there's only 1 or 2 methods that need to be fixed, why delve into the undocumented internals of the class, and fork the body of one of those methods just so you can change it in a way you clearly weren't expected to, instead of doing it cleanly?
Of course if there are 90 different methods scattered across 4 different classes, you might want to consider building these wrappers programmatically (and/or monkeypatching the classes)… or just patching the one private method, as you're doing. That seems reasonable.

Python mock, django and requests

So, I've just started using mock with a Django project. I'm trying to mock out part of a view which makes a request to a remote API to confirm a subscription request was genuine (a form of verification as per the spec I'm working to).
What I have resembles:
class SubscriptionView(View):
def post(self, request, **kwargs):
remote_url = request.POST.get('remote_url')
if remote_url:
response = requests.get(remote_url, params={'verify': 'hello'})
if response.status_code != 200:
return HttpResponse('Verification of request failed')
What I now want to do is to use mock to mock out the requests.get call to change the response, but I can't work out how to do this for the patch decorator. I'd thought you do something like:
#patch(requests.get)
def test_response_verify(self):
# make a call to the view using self.app.post (WebTest),
# requests.get makes a suitable fake response from the mock object
How do I achieve this?

You're almost there. You're just calling it slightly incorrectly.
from mock import call, patch
#patch('my_app.views.requests')
def test_response_verify(self, mock_requests):
# We setup the mock, this may look like magic but it works, return_value is
# a special attribute on a mock, it is what is returned when it is called
# So this is saying we want the return value of requests.get to have an
# status code attribute of 200
mock_requests.get.return_value.status_code = 200
# Here we make the call to the view
response = SubscriptionView().post(request, {'remote_url': 'some_url'})
self.assertEqual(
mock_requests.get.call_args_list,
[call('some_url', params={'verify': 'hello'})]
)
You can also test that the response is the correct type and has the right content.

It's all in the documentation:
patch(target, new=DEFAULT, spec=None, create=False, spec_set=None, autospec=None, new_callable=None, **kwargs)
target should be a string in the form ‘package.module.ClassName’.
from mock import patch
# or #patch('requests.get')
#patch.object(requests, 'get')
def test_response_verify(self):
# make a call to the view using self.app.post (WebTest),
# requests.get makes a suitable fake response from the mock object

Nested web service calls with tornado (async?)

I am implementing a SOAP web service using tornado (and the third party tornadows module). One of the operations in my service needs to call another so I have the chain:
External request in (via SOAPUI) to operation A
Internal request (via requests module) in to operation B
Internal response from operation B
External response from operation A
Because it is all running in one service it is being blocked somewhere though. I'm not familiar with tornado's async functionality.
There is only one request handling method (post) because everything comes in on the single url and then the specific operation (method doing processing) is called based on the SOAPAction request header value. I have decorated my post method with #tornado.web.asynchronous and called self.finish() at the end but no dice.
Can tornado handle this scenario and if so how can I implement it?
EDIT (added code):
class SoapHandler(tornado.web.RequestHandler):
#tornado.web.asynchronous
def post(self):
""" Method post() to process of requests and responses SOAP messages """
try:
self._request = self._parseSoap(self.request.body)
soapaction = self.request.headers['SOAPAction'].replace('"','')
self.set_header('Content-Type','text/xml')
for operations in dir(self):
operation = getattr(self,operations)
method = ''
if callable(operation) and hasattr(operation,'_is_operation'):
num_methods = self._countOperations()
if hasattr(operation,'_operation') and soapaction.endswith(getattr(operation,'_operation')) and num_methods > 1:
method = getattr(operation,'_operation')
self._response = self._executeOperation(operation,method=method)
break
elif num_methods == 1:
self._response = self._executeOperation(operation,method='')
break
soapmsg = self._response.getSoap().toprettyxml()
self.write(soapmsg)
self.finish()
except Exception as detail:
#traceback.print_exc(file=sys.stdout)
wsdl_nameservice = self.request.uri.replace('/','').replace('?wsdl','').replace('?WSDL','')
fault = soapfault('Error in web service : {fault}'.format(fault=detail), wsdl_nameservice)
self.write(fault.getSoap().toxml())
self.finish()
This is the post method from the request handler. It's from the web services module I'm using (so not my code) but I added the async decorator and self.finish(). All it basically does is call the correct operation (as dictated in the SOAPAction of the request).
class CountryService(soaphandler.SoapHandler):
#webservice(_params=GetCurrencyRequest, _returns=GetCurrencyResponse)
def get_currency(self, input):
result = db_query(input.country, 'currency')
get_currency_response = GetCurrencyResponse()
get_currency_response.currency = result
headers = None
return headers, get_currency_response
#webservice(_params=GetTempRequest, _returns=GetTempResponse)
def get_temp(self, input):
get_temp_response = GetTempResponse()
curr = self.make_curr_request(input.country)
get_temp_response.temp = curr
headers = None
return headers, get_temp_response
def make_curr_request(self, country):
soap_request = """<soapenv:Envelope xmlns:soapenv='http://schemas.xmlsoap.org/soap/envelope/' xmlns:coun='CountryService'>
<soapenv:Header/>
<soapenv:Body>
<coun:GetCurrencyRequestget_currency>
<country>{0}</country>
</coun:GetCurrencyRequestget_currency>
</soapenv:Body>
</soapenv:Envelope>""".format(country)
headers = {'Content-Type': 'text/xml;charset=UTF-8', 'SOAPAction': '"http://localhost:8080/CountryService/get_currency"'}
r = requests.post('http://localhost:8080/CountryService', data=soap_request, headers=headers)
try:
tree = etree.fromstring(r.content)
currency = tree.xpath('//currency')
message = currency[0].text
except:
message = "Failure"
return message
These are two of the operations of the web service (get_currency & get_temp). So SOAPUI hits get_temp, which makes a SOAP request to get_currency (via make_curr_request and the requests module). Then the results should just chain back and be sent back to SOAPUI.
The actual operation of the service makes no sense (returning the currency when asked for the temperature) but i'm just trying to get the functionality working and these are the operations I have.

I don't think that your soap module, or requests is asyncronous.
I believe adding the #asyncronous decorator is only half the battle. Right now you aren't making any async requests inside of your function (every request is blocking, which ties up the server until your method finishes)
You can switch this up by using tornados AsynHttpClient. This can be used pretty much as an exact replacement for requests. From the docoumentation example:
class MainHandler(tornado.web.RequestHandler):
#tornado.web.asynchronous
def get(self):
http = tornado.httpclient.AsyncHTTPClient()
http.fetch("http://friendfeed-api.com/v2/feed/bret",
callback=self.on_response)
def on_response(self, response):
if response.error: raise tornado.web.HTTPError(500)
json = tornado.escape.json_decode(response.body)
self.write("Fetched " + str(len(json["entries"])) + " entries "
"from the FriendFeed API")
self.finish()
Their method is decorated with async AND they are making asyn http requests. This is where the flow gets a little strange. When you use the AsyncHttpClient it doesn't lock up the event loop (PLease I just started using tornado this week, take it easy if all of my terminology isn't correct yet). This allows the server to freely processs incoming requests. When your asynchttp request is finished the callback method will be executed, in this case on_response.
Here you can replace requests with the tornado asynchttp client realtively easily. For your soap service, though, things might be more complicated. You could make a local webserivce around your soap client and make async requests to it using the tornado asyn http client???
This will create some complex callback logic which can be fixed using the gen decorator

This issue was fixed since yesterday.
Pull request:
https://github.com/rancavil/tornado-webservices/pull/23
Example: here a simple webservice that doesn't take arguments and returns the version.
Notice you should:
Method declaration: decorate the method with #gen.coroutine
Returning results: use raise gen.Return(data)
Code:
from tornado import gen
from tornadows.soaphandler import SoapHandler
...
class Example(SoapHandler):
#gen.coroutine
#webservice(_params=None, _returns=Version)
def Version(self):
_version = Version()
# async stuff here, let's suppose you ask other rest service or resource for the version details.
# ...
# returns the result.
raise gen.Return(_version)
Cheers!

How do I unit test a module that relies on urllib2?

I've got a piece of code that I can't figure out how to unit test! The module pulls content from external XML feeds (twitter, flickr, youtube, etc.) with urllib2. Here's some pseudo-code for it:
params = (url, urlencode(data),) if data else (url,)
req = Request(*params)
response = urlopen(req)
#check headers, content-length, etc...
#parse the response XML with lxml...
My first thought was to pickle the response and load it for testing, but apparently urllib's response object is unserializable (it raises an exception).
Just saving the XML from the response body isn't ideal, because my code uses the header information too. It's designed to act on a response object.
And of course, relying on an external source for data in a unit test is a horrible idea.
So how do I write a unit test for this?

urllib2 has a functions called build_opener() and install_opener() which you should use to mock the behaviour of urlopen()
import urllib2
from StringIO import StringIO
def mock_response(req):
if req.get_full_url() == "http://example.com":
resp = urllib2.addinfourl(StringIO("mock file"), "mock message", req.get_full_url())
resp.code = 200
resp.msg = "OK"
return resp
class MyHTTPHandler(urllib2.HTTPHandler):
def http_open(self, req):
print "mock opener"
return mock_response(req)
my_opener = urllib2.build_opener(MyHTTPHandler)
urllib2.install_opener(my_opener)
response=urllib2.urlopen("http://example.com")
print response.read()
print response.code
print response.msg

It would be best if you could write a mock urlopen (and possibly Request) which provides the minimum required interface to behave like urllib2's version. You'd then need to have your function/method which uses it able to accept this mock urlopen somehow, and use urllib2.urlopen otherwise.
This is a fair amount of work, but worthwhile. Remember that python is very friendly to ducktyping, so you just need to provide some semblance of the response object's properties to mock it.
For example:
class MockResponse(object):
def __init__(self, resp_data, code=200, msg='OK'):
self.resp_data = resp_data
self.code = code
self.msg = msg
self.headers = {'content-type': 'text/xml; charset=utf-8'}
def read(self):
return self.resp_data
def getcode(self):
return self.code
# Define other members and properties you want
def mock_urlopen(request):
return MockResponse(r'<xml document>')
Granted, some of these are difficult to mock, because for example I believe the normal "headers" is an HTTPMessage which implements fun stuff like case-insensitive header names. But, you might be able to simply construct an HTTPMessage with your response data.

Build a separate class or module responsible for communicating with your external feeds.
Make this class able to be a test double. You're using python, so you're pretty golden there; if you were using C#, I'd suggest either in interface or virtual methods.
In your unit test, insert a test double of the external feed class. Test that your code uses the class correctly, assuming that the class does the work of communicating with your external resources correctly. Have your test double return fake data rather than live data; test various combinations of the data and of course the possible exceptions urllib2 could throw.
Aand... that's it.
You can't effectively automate unit tests that rely on external sources, so you're best off not doing it. Run an occasional integration test on your communication module, but don't include those tests as part of your automated tests.
Edit:
Just a note on the difference between my answer and #Crast's answer. Both are essentially correct, but they involve different approaches. In Crast's approach, you use a test double on the library itself. In my approach, you abstract the use of the library away into a separate module and test double that module.
Which approach you use is entirely subjective; there's no "correct" answer there. I prefer my approach because it allows me to build more modular, flexible code, something I value. But it comes at a cost in terms of additional code to write, something that may not be valued in many agile situations.

You can use pymox to mock the behavior of anything and everything in the urllib2 (or any other) package. It's 2010, you shouldn't be writing your own mock classes.

I think the easiest thing to do is to actually create a simple web server in your unit test. When you start the test, create a new thread that listens on some arbitrary port and when a client connects just returns a known set of headers and XML, then terminates.
I can elaborate if you need more info.
Here's some code:
import threading, SocketServer, time
# a request handler
class SimpleRequestHandler(SocketServer.BaseRequestHandler):
def handle(self):
data = self.request.recv(102400) # token receive
senddata = file(self.server.datafile).read() # read data from unit test file
self.request.send(senddata)
time.sleep(0.1) # make sure it finishes receiving request before closing
self.request.close()
def serve_data(datafile):
server = SocketServer.TCPServer(('127.0.0.1', 12345), SimpleRequestHandler)
server.datafile = datafile
http_server_thread = threading.Thread(target=server.handle_request())
To run your unit test, call serve_data() then call your code that requests a URL that looks like http://localhost:12345/anythingyouwant.

Why not just mock a website that returns the response you expect? then start the server in a thread in setup and kill it in the teardown. I ended up doing this for testing code that would send email by mocking an smtp server and it works great. Surely something more trivial could be done for http...
from smtpd import SMTPServer
from time import sleep
import asyncore
SMTP_PORT = 6544
class MockSMTPServer(SMTPServer):
def __init__(self, localaddr, remoteaddr, cb = None):
self.cb = cb
SMTPServer.__init__(self, localaddr, remoteaddr)
def process_message(self, peer, mailfrom, rcpttos, data):
print (peer, mailfrom, rcpttos, data)
if self.cb:
self.cb(peer, mailfrom, rcpttos, data)
self.close()
def start_smtp(cb, port=SMTP_PORT):
def smtp_thread():
_smtp = MockSMTPServer(("127.0.0.1", port), (None, 0), cb)
asyncore.loop()
return Thread(None, smtp_thread)
def test_stuff():
#.......snip noise
email_result = None
def email_back(*args):
email_result = args
t = start_smtp(email_back)
t.start()
sleep(1)
res.form["email"]= self.admin_email
res = res.form.submit()
assert res.status_int == 302,"should've redirected"
sleep(1)
assert email_result is not None, "didn't get an email"

Trying to improve a bit on #john-la-rooy answer, I've made a small class allowing simple mocking for unit tests
Should work with python 2 and 3
try:
import urllib.request as urllib
except ImportError:
import urllib2 as urllib
from io import BytesIO
class MockHTTPHandler(urllib.HTTPHandler):
def mock_response(self, req):
url = req.get_full_url()
print("incomming request:", url)
if url.endswith('.json'):
resdata = b'[{"hello": "world"}]'
headers = {'Content-Type': 'application/json'}
resp = urllib.addinfourl(BytesIO(resdata), header, url, 200)
resp.msg = "OK"
return resp
raise RuntimeError('Unhandled URL', url)
http_open = mock_response
#classmethod
def install(cls):
previous = urllib._opener
urllib.install_opener(urllib.build_opener(cls))
return previous
#classmethod
def remove(cls, previous=None):
urllib.install_opener(previous)
Used like this:
class TestOther(unittest.TestCase):
def setUp(self):
previous = MockHTTPHandler.install()
self.addCleanup(MockHTTPHandler.remove, previous)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to override the .get() method in requests? - python

Related

Does requests_cache automatically update cache on update of info?

What is the best way to force a keyword while using **kwargs?

Python mock, django and requests

Nested web service calls with tornado (async?)

How do I unit test a module that relies on urllib2?

Categories

Resources