I'm writing a python client for a RESTful API that requires a special header to be passed on each request. The header has the form: X-Seq-Count: n, where n is the sequential number of the request: first made request should have the header X-Seq-Count: 1 be present, the second one should have X-Seq-Count: 2 etc.
I'm using the requests library to handle the low level HTTP calls. What would be the best approach to track the amount of requests made and inject the custom header? What I came up with is subclassing the requests.Session and overriding the Session.prepare_request method:
class CustomSession(requests.Session):
def __init__(self):
super().__init__()
self.requests_count = 0
def prepare_request(self, request):
# increment requests counter
self.requests_count += 1
# update the header
self.headers['X-Seq-Count'] = str(self.requests_count)
return super().prepare_request(request)
Hovewer, I'm not very happy with subclassing Session. Is there a better way? I stumbled upon the event hooks in the docs, but unsure how to use them - looking at the source code, it seems that the hooks can only be applied to the response object, not the request object?
as an alternative, you can take advantage of auth mechanism of requests, you can modify the prepared Request object:
def auth_provider(req):
global requests_count
requests_count += 1
req.headers['X-Seq-Count'] = requests_count
print('requests_count:', requests_count)
return req
requests_count = 0
s = requests.Session()
s.auth = auth_provider
s.get('https://www.example.com')
requests.get('https://www.example.com', auth=auth_provider)
output:
requests_count: 1
requests_count: 2
however subclassing Session sounds okay to me.
Related
I've been tasked to update an existing Python module that sends requests to an external API in many different places with requests. That is, in this module there are at least 50 usages of requests to send requests to said API.
There was a change in the API that requires a header to be added to all requests. Before manually adding the header to all 50 requests, I was wondering if it is possible to define some kind of "middleware" (as for example in Django) that could add the header for all requests at once.
Is something like this possible with requests?
You can monkey-patch the requests.request function with a wrapper that updates the dict specified by the headers arguments with additional entries.
Below is an example that would force all requests to have a header of User-Agent with the value My Browser:
import requests
import inspect
def override_headers(self, func, global_headers):
def wrapper(*args, **kwargs):
bound = sig.bind(*args, **kwargs)
bound.apply_defaults()
bound.arguments.setdefault('headers', {}).update(global_headers)
return func(*bound.args, **bound.kwargs)
sig = inspect.signature(func)
return wrapper
requests.request = override_headers(requests.request, {'User-Agent': 'My Browser'})
I'm using YouTube data API v3.
Is it possible to make a big BatchHttpRequest (e.g., see here) and also to use ETags for local caching at the httplib2 level (e.g., see here)?
ETags work fine for single queries, I don't understand if they are useful also for batch requests.
TL;DR:
BatchHttpRequest cannot be used with caching
HERE IT IS:
First lets see the way to initialize BatchHttpRequest:
from apiclient.http import BatchHttpRequest
def list_animals(request_id, response, exception):
if exception is not None:
# Do something with the exception
pass
else:
# Do something with the response
pass
def list_farmers(request_id, response):
"""Do something with the farmers list response."""
pass
service = build('farm', 'v2')
batch = service.new_batch_http_request()
batch.add(service.animals().list(), callback=list_animals)
batch.add(service.farmers().list(), callback=list_farmers)
batch.execute(http=http)
Second lets see how ETags are used:
from google.appengine.api import memcache
http = httplib2.Http(cache=memcache)
Now lets analyze:
Observe the last line of BatchHttpRequest example: batch.execute(http=http), and now checking the source code for execute, it calls _refresh_and_apply_credentials, which applies the http object we pass it.
def _refresh_and_apply_credentials(self, request, http):
"""Refresh the credentials and apply to the request.
Args:
request: HttpRequest, the request.
http: httplib2.Http, the global http object for the batch.
"""
# For the credentials to refresh, but only once per refresh_token
# If there is no http per the request then refresh the http passed in
# via execute()
Which means, execute call which takes in http, can be passed the ETag http you would have created as:
http = httplib2.Http(cache=memcache)
# This would mean we would get the ETags cached http
batch.execute(http=http)
Update 1:
Could try with a custom object as well:
from googleapiclient.discovery_cache import DISCOVERY_DOC_MAX_AGE
from googleapiclient.discovery_cache.base import Cache
from googleapiclient.discovery_cache.file_cache import Cache as FileCache
custCache = FileCache(max_age=DISCOVERY_DOC_MAX_AGE)
http = httplib2.Http(cache=custCache)
# This would mean we would get the ETags cached http
batch.execute(http=http)
Because, this is just a hunch on the comment in http2 lib:
"""If 'cache' is a string then it is used as a directory name for
a disk cache. Otherwise it must be an object that supports the
same interface as FileCache.
Conclusion Update 2:
After again verifying the google-api-python source code, I see that, BatchHttpRequest is fixed with 'POST' request and has a content-type of multipart/mixed;.. - source code.
Giving a clue about the fact that, BatchHttpRequest is useful in order to POST data which is then processed down the later.
Now, keeping that in mind, observing what httplib2 request method uses: _updateCache only when following criteria are met:
Request is in ["GET", "HEAD"] or response.status == 303 or is a redirect request
ElSE -- response.status in [200, 203] and method in ["GET", "HEAD"]
OR -- if response.status == 304 and method == "GET"
This means, BatchHttpRequest cannot be used with caching.
We have some custom module where we have redefined open, seek, read, tell functions to read only a part of file according to the arguments.
But, this logic overrides the default tell and python requests is trying to calculate the content-length which involves using tell(), which then redirects to our custom tell function and the logic is somewhere buggy and returns a wrong value. And I tried some changes, it throws error.
Found the following from models.py of requests:
def prepare_content_length(self, body):
if hasattr(body, 'seek') and hasattr(body, 'tell'):
body.seek(0, 2)
self.headers['Content-Length'] = builtin_str(body.tell())
body.seek(0, 0)
elif body is not None:
l = super_len(body)
if l:
self.headers['Content-Length'] = builtin_str(l)
elif (self.method not in ('GET', 'HEAD')) and (self.headers.get('Content-Length') is None):
self.headers['Content-Length'] = '0'
For now, I am not able to figure out where's the bug and stressed out to investigate more and fix it. And everything else work except content-length calculation by python requests.
So, I have created my own definition for finding content-length. And I have included the value in requests header. But, the request is still preparing the content-length and throwing error.
How can I restrict not preparing content-length and use the specified content-length?
Requests lets you modify a request before sending. See Prepared Requests.
For example:
from requests import Request, Session
s = Session()
req = Request('POST', url, data=data, headers=headers)
prepped = req.prepare()
# do something with prepped.headers
prepped.headers['Content-Length'] = your_custom_content_length_calculation()
resp = s.send(prepped, ...)
If your session has its own configuration (like cookie persistence or connection-pooling), then you should use s.prepare_request(req) instead of req.prepare().
I am using Python 2.6.5 and I am trying to capture the raw http request sent via HTTP, this works fine except when I add a proxy handler into the mix so the situation is as follows:
HTTP and HTTPS requests work fine without the proxy handler: raw HTTP request captured
HTTP requests work fine with proxy handler: proxy ok, raw HTTP request captured
HTTPS requests fail with proxy handler: proxy ok but the raw HTTP request is not captured!
The following questions are close but do not solve my problem:
How do you get default headers in a urllib2 Request? <- My solution is heavily based on this
Python urllib2 > HTTP Proxy > HTTPS request
This sets the proxy for each request <- Did not work and doing it once at the start via an opener is more elegant and efficient (instead of setting the proxy for each request)
This is what I am doing:
class MyHTTPConnection(httplib.HTTPConnection):
def send(self, s):
global RawRequest
RawRequest = s # Saving to global variable for Requester class to see
httplib.HTTPConnection.send(self, s)
class MyHTTPHandler(urllib2.HTTPHandler):
def http_open(self, req):
return self.do_open(MyHTTPConnection, req)
class MyHTTPSConnection(httplib.HTTPSConnection):
def send(self, s):
global RawRequest
RawRequest = s # Saving to global variable for Requester class to see
httplib.HTTPSConnection.send(self, s)
class MyHTTPSHandler(urllib2.HTTPSHandler):
def https_open(self, req):
return self.do_open(MyHTTPSConnection, req)
Requester class:
global RawRequest
ProxyConf = { 'http':'http://127.0.0.1:8080', 'https':'http://127.0.0.1:8080' }
# If ProxyConf = { 'http':'http://127.0.0.1:8080' }, then Raw HTTPS request captured BUT the proxy does not see the HTTPS request!
# Also tried with similar results: ProxyConf = { 'http':'http://127.0.0.1:8080', 'https':'https://127.0.0.1:8080' }
ProxyHandler = urllib2.ProxyHandler(ProxyConf)
urllib2.install_opener(urllib2.build_opener(ProxyHandler, MyHTTPHandler, MyHTTPSHandler))
urllib2.Request('http://www.google.com', None) # global RawRequest updated
# This is the problem: global RawRequest NOT updated!?
urllib2.Request('https://accounts.google.com', None)
BUT, if I remove the ProxyHandler it works!:
global RawRequest
urllib2.install_opener(urllib2.build_opener(MyHTTPHandler, MyHTTPSHandler))
urllib2.Request('http://www.google.com', None) # global RawRequest updated
urllib2.Request('https://accounts.google.com', None) # global RawRequest updated
How can I add the ProxyHandler into the mix while keeping access to the RawRequest?
Thank you in advance.
Answering my own question: It seems a bug in the underlying libraries, making RawRequest a list solves the problem: The HTTP Raw request is the first item. The custom HTTPS class is called several times, the last of which is blank. The fact that the custom HTTP class is only called once suggests this is a bug in python but the list solution gets around it
RawRequest = s
just needs to be changed to:
RawRequest.append(s)
with a previous initialisation of RawRequest = [] and retrieval of raw request via RawRequest[0] (first element of the list)
I'm writing a pythonic web API wrapper with a class like this
import httplib2
import urllib
class apiWrapper:
def __init__(self):
self.http = httplib2.Http()
def _http(self, url, method, dict):
'''
Im using this wrapper arround the http object
all the time inside the class
'''
params = urllib.urlencode(dict)
response, content = self.http.request(url,params,method)
as you can see I'm using the _http() method to simplify the interaction with the httplib2.Http() object. This method is called quite often inside the class and I'm wondering what's the best way to interact with this object:
create the object in the __init__ and then reuse it when the _http() method is called (as shown in the code above)
or create the httplib2.Http() object inside the method for every call of the _http() method (as shown in the code sample below)
import httplib2
import urllib
class apiWrapper:
def __init__(self):
def _http(self, url, method, dict):
'''Im using this wrapper arround the http object
all the time inside the class'''
http = httplib2.Http()
params = urllib.urlencode(dict)
response, content = http.request(url,params,method)
Supplying 'connection': 'close' in your headers should according to the docs close the connection after a response is received.:
headers = {'connection': 'close'}
resp, content = h.request(url, headers=headers)
You should keep the Http object if you reuse connections. It seems httplib2 is capable of reusing connections the way you use it in your first code, so this looks like a good approach.
At the same time, from a shallow inspection of the httplib2 code, it seems that httplib2 has no support for cleaning up unused connections, or to even notice when a server has decided to close a connection it no longer wants. If that is indeed the case, it looks like a bug in httplib2 to me - so I would rather use the standard library (httplib) instead.