Best practise when using httplib2.Http() object - python

I'm writing a pythonic web API wrapper with a class like this
import httplib2
import urllib
class apiWrapper:
def __init__(self):
self.http = httplib2.Http()
def _http(self, url, method, dict):
'''
Im using this wrapper arround the http object
all the time inside the class
'''
params = urllib.urlencode(dict)
response, content = self.http.request(url,params,method)
as you can see I'm using the _http() method to simplify the interaction with the httplib2.Http() object. This method is called quite often inside the class and I'm wondering what's the best way to interact with this object:
create the object in the __init__ and then reuse it when the _http() method is called (as shown in the code above)
or create the httplib2.Http() object inside the method for every call of the _http() method (as shown in the code sample below)
import httplib2
import urllib
class apiWrapper:
def __init__(self):
def _http(self, url, method, dict):
'''Im using this wrapper arround the http object
all the time inside the class'''
http = httplib2.Http()
params = urllib.urlencode(dict)
response, content = http.request(url,params,method)

Supplying 'connection': 'close' in your headers should according to the docs close the connection after a response is received.:
headers = {'connection': 'close'}
resp, content = h.request(url, headers=headers)

You should keep the Http object if you reuse connections. It seems httplib2 is capable of reusing connections the way you use it in your first code, so this looks like a good approach.
At the same time, from a shallow inspection of the httplib2 code, it seems that httplib2 has no support for cleaning up unused connections, or to even notice when a server has decided to close a connection it no longer wants. If that is indeed the case, it looks like a bug in httplib2 to me - so I would rather use the standard library (httplib) instead.

Related

Globally modify requests in a central place

I've been tasked to update an existing Python module that sends requests to an external API in many different places with requests. That is, in this module there are at least 50 usages of requests to send requests to said API.
There was a change in the API that requires a header to be added to all requests. Before manually adding the header to all 50 requests, I was wondering if it is possible to define some kind of "middleware" (as for example in Django) that could add the header for all requests at once.
Is something like this possible with requests?
You can monkey-patch the requests.request function with a wrapper that updates the dict specified by the headers arguments with additional entries.
Below is an example that would force all requests to have a header of User-Agent with the value My Browser:
import requests
import inspect
def override_headers(self, func, global_headers):
def wrapper(*args, **kwargs):
bound = sig.bind(*args, **kwargs)
bound.apply_defaults()
bound.arguments.setdefault('headers', {}).update(global_headers)
return func(*bound.args, **bound.kwargs)
sig = inspect.signature(func)
return wrapper
requests.request = override_headers(requests.request, {'User-Agent': 'My Browser'})

verify_oauth2_token uses object as function

I was doing google auth with use of backend from there:
https://developers.google.com/identity/sign-in/android/backend-auth
It seems a bit outdated and the most strange thing is that there is a line:
idinfo = id_token.verify_oauth2_token(token, requests.Request(), CLIENT_ID)
and in implementation you can see that in nested function calls, same request object lands there:
def _fetch_certs(request, certs_url):
"""Fetches certificates.
Google-style cerificate endpoints return JSON in the format of
``{'key id': 'x509 certificate'}``.
Args:
request (google.auth.transport.Request): The object used to make
HTTP requests.
certs_url (str): The certificate endpoint URL.
Returns:
Mapping[str, str]: A mapping of public key ID to x.509 certificate
data.
"""
response = request(certs_url, method='GET')
request is an object, even documentation claims so and it uses it as function. The error I get is:
TypeError: 'Request' object is not callable
What should be changed there?
Most likely you are calling the wrong python requests lib.
If you need to differentiate between the 2 available requests lib.
from google.auth.transport import requests as google_auth_request
import requests
req = google_auth_request.Request()
idinfo = id_token.verify_oauth2_token(token, req, CLIENT_ID)
See: https://google-auth.readthedocs.io/en/latest/reference/google.oauth2.id_token.html

Google API client (Python): is it possible to use BatchHttpRequest with ETag caching

I'm using YouTube data API v3.
Is it possible to make a big BatchHttpRequest (e.g., see here) and also to use ETags for local caching at the httplib2 level (e.g., see here)?
ETags work fine for single queries, I don't understand if they are useful also for batch requests.
TL;DR:
BatchHttpRequest cannot be used with caching
HERE IT IS:
First lets see the way to initialize BatchHttpRequest:
from apiclient.http import BatchHttpRequest
def list_animals(request_id, response, exception):
if exception is not None:
# Do something with the exception
pass
else:
# Do something with the response
pass
def list_farmers(request_id, response):
"""Do something with the farmers list response."""
pass
service = build('farm', 'v2')
batch = service.new_batch_http_request()
batch.add(service.animals().list(), callback=list_animals)
batch.add(service.farmers().list(), callback=list_farmers)
batch.execute(http=http)
Second lets see how ETags are used:
from google.appengine.api import memcache
http = httplib2.Http(cache=memcache)
Now lets analyze:
Observe the last line of BatchHttpRequest example: batch.execute(http=http), and now checking the source code for execute, it calls _refresh_and_apply_credentials, which applies the http object we pass it.
def _refresh_and_apply_credentials(self, request, http):
"""Refresh the credentials and apply to the request.
Args:
request: HttpRequest, the request.
http: httplib2.Http, the global http object for the batch.
"""
# For the credentials to refresh, but only once per refresh_token
# If there is no http per the request then refresh the http passed in
# via execute()
Which means, execute call which takes in http, can be passed the ETag http you would have created as:
http = httplib2.Http(cache=memcache)
# This would mean we would get the ETags cached http
batch.execute(http=http)
Update 1:
Could try with a custom object as well:
from googleapiclient.discovery_cache import DISCOVERY_DOC_MAX_AGE
from googleapiclient.discovery_cache.base import Cache
from googleapiclient.discovery_cache.file_cache import Cache as FileCache
custCache = FileCache(max_age=DISCOVERY_DOC_MAX_AGE)
http = httplib2.Http(cache=custCache)
# This would mean we would get the ETags cached http
batch.execute(http=http)
Because, this is just a hunch on the comment in http2 lib:
"""If 'cache' is a string then it is used as a directory name for
a disk cache. Otherwise it must be an object that supports the
same interface as FileCache.
Conclusion Update 2:
After again verifying the google-api-python source code, I see that, BatchHttpRequest is fixed with 'POST' request and has a content-type of multipart/mixed;.. - source code.
Giving a clue about the fact that, BatchHttpRequest is useful in order to POST data which is then processed down the later.
Now, keeping that in mind, observing what httplib2 request method uses: _updateCache only when following criteria are met:
Request is in ["GET", "HEAD"] or response.status == 303 or is a redirect request
ElSE -- response.status in [200, 203] and method in ["GET", "HEAD"]
OR -- if response.status == 304 and method == "GET"
This means, BatchHttpRequest cannot be used with caching.

Web proxy in python/django?

I need to have a proxy that acts as an intermediary to fetch images. An example would be, my server requests domain1.com/?url=domain2.com/image.png and domain1.com server will respond with the data at domain2.com/image.png via domain1.com server.
Essentially I want to pass to the proxy the URL I want fetched, and have the proxy server respond with that resource.
Any suggestions on where to start on this?
I need something very easy to use or implement as I'm very much a beginner at all of this.
Most solutions I have found in python and/or django have the proxy acts as a "translater" i.e. domain1.com/image.png translates to domain2.com/image.png, which is obviously not the same.
I currently have the following code, but fetching images results in garbled data:
import httplib2
from django.conf.urls.defaults import *
from django.http import HttpResponse
def proxy(request, url):
conn = httplib2.Http()
if request.method == "GET":
url = request.GET['url']
resp, content = conn.request(url, request.method)
return HttpResponse(content)
Old question but for future googlers, I think this is what you want:
# proxies the google logo
def test(request):
url = "http://www.google.com/logos/classicplus.png"
req = urllib2.Request(url)
response = urllib2.urlopen(req)
return HttpResponse(response.read(), mimetype="image/png")
A very simple Django proxy view with requests and StreamingHttpResponse:
import requests
from django.http import StreamingHttpResponse
def my_proxy_view(request):
url = request.GET['url']
response = requests.get(url, stream=True)
return StreamingHttpResponse(
response.raw,
content_type=response.headers.get('content-type'),
status=response.status_code,
reason=response.reason)
The advantage of this approach is that you don't need to load the complete file in memory before streaming the content to the client.
As you can see, it forwards some response headers. Depending on your needs, you may want to forward the request headers as well; for example:
response = requests.get(url, stream=True,
headers={'user-agent': request.headers.get('user-agent')})
If you need something more complete than my previous answer, you can use this class:
import requests
from django.http import StreamingHttpResponse
class ProxyHttpResponse(StreamingHttpResponse):
def __init__(self, url, headers=None, **kwargs):
upstream = requests.get(url, stream=True, headers=headers)
kwargs.setdefault('content_type', upstream.headers.get('content-type'))
kwargs.setdefault('status', upstream.status_code)
kwargs.setdefault('reason', upstream.reason)
super().__init__(upstream.raw, **kwargs)
for name, value in upstream.headers.items():
self[name] = value
You can use this class like so:
def my_proxy_view(request):
url = request.GET['url']
return ProxyHttpResponse(url, headers=request.headers)
The advantage of this version is that you can reuse it in multiple views. Also, it forwards all headers, and you can easily extend it to add or exclude some other headers.
If the file you're fetching and returning is an image, you'll need to change the mimetype of your HttpResponse Object.
Use mechanize, it allow you to choose a proxy and act like a browser, making it easy to change the user agent, to go back and forth in the history and to handle authentification or cookies.

how to make a delete / put request in python

I can make get or post request using urllib, but how do I make DELETE- and PUT-requests?
The requests library can handle POST, PUT, DELETE, and all other HTTP methods, and is significantly less scary than urllib, httplib and their variants.
You can override get_method with something like this:
def _make_request(url, data, method):
request.urllib2.Request(url, data=data)
request.get_method = lambda: method
Then you pass "DELETE" as method.
This answer covers the details.
PUT request can be performed by httplib2
http://code.google.com/p/httplib2
http://twistedmatrix.com/documents/current/web/howto/client.html
If you're looking to work with HTTP in twisted using the client side I'd suggest checking that out. It demonstrates how you can really easily make a request using the agent class.
As far as I know, urllib and urllib2 only support GET and POST requests. You should probably take a look at httplib or httplib2.
The method is set implicitly in the urlopen call
When you provide the data parameter a POST will be used.
urllib.request.urlopen(url, data=None[, timeout])
I don't think it's possible to use a DELETE HTTP method with urlib because of this line:
Request.get_method()
Return a string
indicating the HTTP request method.
This is only meaningful for HTTP
requests, and currently always returns
'GET' or 'POST'.
Consider using httplib, httplib2, or Twisted instead .for better support of HTTP methods.
The default HTTP methods in urllib library are POST and GET:
def get_method(self):
"""Return a string indicating the HTTP request method."""
default_method = "POST" if self.data is not None else "GET"
return getattr(self, 'method', default_method)
But we can override this get_method() function to get DELETE request:
req = urllib.request.Request(new_url)
req.get_method = lambda: "DELETE"

Categories

Resources