How do you get default headers in a urllib2 Request? - python

I have a Python web client that uses urllib2. It is easy enough to add HTTP headers to my outgoing requests. I just create a dictionary of the headers I want to add, and pass it to the Request initializer.
However, other "standard" HTTP headers get added to the request as well as the custom ones I explicitly add. When I sniff the request using Wireshark, I see headers besides the ones I add myself. My question is how do a I get access to these headers? I want to log every request (including the full set of HTTP headers), and can't figure out how.
any pointers?
in a nutshell: How do I get all the outgoing headers from an HTTP request created by urllib2?

If you want to see the literal HTTP request that is sent out, and therefore see every last header exactly as it is represented on the wire, then you can tell urllib2 to use your own version of an HTTPHandler that prints out (or saves, or whatever) the outgoing HTTP request.
import httplib, urllib2
class MyHTTPConnection(httplib.HTTPConnection):
def send(self, s):
print s # or save them, or whatever!
httplib.HTTPConnection.send(self, s)
class MyHTTPHandler(urllib2.HTTPHandler):
def http_open(self, req):
return self.do_open(MyHTTPConnection, req)
opener = urllib2.build_opener(MyHTTPHandler)
response = opener.open('http://www.google.com/')
The result of running this code is:
GET / HTTP/1.1
Accept-Encoding: identity
Host: www.google.com
Connection: close
User-Agent: Python-urllib/2.6

The urllib2 library uses OpenerDirector objects to handle the actual opening. Fortunately, the python library provides defaults so you don't have to. It is, however, these OpenerDirector objects that are adding the extra headers.
To see what they are after the request has been sent (so that you can log it, for example):
req = urllib2.Request(url='http://google.com')
response = urllib2.urlopen(req)
print req.unredirected_hdrs
(produces {'Host': 'google.com', 'User-agent': 'Python-urllib/2.5'} etc)
The unredirected_hdrs is where the OpenerDirectors dump their extra headers. Simply looking at req.headers will show only your own headers - the library leaves those unmolested for you.
If you need to see the headers before you send the request, you'll need to subclass the OpenerDirector in order to intercept the transmission.
Hope that helps.
EDIT: I forgot to mention that, once the request as been sent, req.header_items() will give you a list of tuples of ALL the headers, with both your own and the ones added by the OpenerDirector. I should have mentioned this first since it's the most straightforward :-) Sorry.
EDIT 2: After your question about an example for defining your own handler, here's the sample I came up with. The concern in any monkeying with the request chain is that we need to be sure that the handler is safe for multiple requests, which is why I'm uncomfortable just replacing the definition of putheader on the HTTPConnection class directly.
Sadly, because the internals of HTTPConnection and the AbstractHTTPHandler are very internal, we have to reproduce much of the code from the python library to inject our custom behaviour. Assuming I've not goofed below and this works as well as it did in my 5 minutes of testing, please be careful to revisit this override if you update your Python version to a revision number (ie: 2.5.x to 2.5.y or 2.5 to 2.6, etc).
I should therefore mention that I am on Python 2.5.1. If you have 2.6 or, particularly, 3.0, you may need to adjust this accordingly.
Please let me know if this doesn't work. I'm having waaaayyyy too much fun with this question:
import urllib2
import httplib
import socket
class CustomHTTPConnection(httplib.HTTPConnection):
def __init__(self, *args, **kwargs):
httplib.HTTPConnection.__init__(self, *args, **kwargs)
self.stored_headers = []
def putheader(self, header, value):
self.stored_headers.append((header, value))
httplib.HTTPConnection.putheader(self, header, value)
class HTTPCaptureHeaderHandler(urllib2.AbstractHTTPHandler):
def http_open(self, req):
return self.do_open(CustomHTTPConnection, req)
http_request = urllib2.AbstractHTTPHandler.do_request_
def do_open(self, http_class, req):
# All code here lifted directly from the python library
host = req.get_host()
if not host:
raise URLError('no host given')
h = http_class(host) # will parse host:port
h.set_debuglevel(self._debuglevel)
headers = dict(req.headers)
headers.update(req.unredirected_hdrs)
headers["Connection"] = "close"
headers = dict(
(name.title(), val) for name, val in headers.items())
try:
h.request(req.get_method(), req.get_selector(), req.data, headers)
r = h.getresponse()
except socket.error, err: # XXX what error?
raise urllib2.URLError(err)
r.recv = r.read
fp = socket._fileobject(r, close=True)
resp = urllib2.addinfourl(fp, r.msg, req.get_full_url())
resp.code = r.status
resp.msg = r.reason
# This is the line we're adding
req.all_sent_headers = h.stored_headers
return resp
my_handler = HTTPCaptureHeaderHandler()
opener = urllib2.OpenerDirector()
opener.add_handler(my_handler)
req = urllib2.Request(url='http://www.google.com')
resp = opener.open(req)
print req.all_sent_headers
shows: [('Accept-Encoding', 'identity'), ('Host', 'www.google.com'), ('Connection', 'close'), ('User-Agent', 'Python-urllib/2.5')]

How about something like this:
import urllib2
import httplib
old_putheader = httplib.HTTPConnection.putheader
def putheader(self, header, value):
print header, value
old_putheader(self, header, value)
httplib.HTTPConnection.putheader = putheader
urllib2.urlopen('http://www.google.com')

A low-level solution:
import httplib
class HTTPConnection2(httplib.HTTPConnection):
def __init__(self, *args, **kwargs):
httplib.HTTPConnection.__init__(self, *args, **kwargs)
self._request_headers = []
self._request_header = None
def putheader(self, header, value):
self._request_headers.append((header, value))
httplib.HTTPConnection.putheader(self, header, value)
def send(self, s):
self._request_header = s
httplib.HTTPConnection.send(self, s)
def getresponse(self, *args, **kwargs):
response = httplib.HTTPConnection.getresponse(self, *args, **kwargs)
response.request_headers = self._request_headers
response.request_header = self._request_header
return response
Example:
conn = HTTPConnection2("www.python.org")
conn.request("GET", "/index.html", headers={
"User-agent": "test",
"Referer": "/",
})
response = conn.getresponse()
response.status, response.reason:
1: 200 OK
response.request_headers:
[('Host', 'www.python.org'), ('Accept-Encoding', 'identity'), ('Referer', '/'), ('User-agent', 'test')]
response.request_header:
GET /index.html HTTP/1.1
Host: www.python.org
Accept-Encoding: identity
Referer: /
User-agent: test

A other solution, witch used the idea from How do you get default headers in a urllib2 Request? But doesn't copy code from std-lib:
class HTTPConnection2(httplib.HTTPConnection):
"""
Like httplib.HTTPConnection but stores the request headers.
Used in HTTPConnection3(), see below.
"""
def __init__(self, *args, **kwargs):
httplib.HTTPConnection.__init__(self, *args, **kwargs)
self.request_headers = []
self.request_header = ""
def putheader(self, header, value):
self.request_headers.append((header, value))
httplib.HTTPConnection.putheader(self, header, value)
def send(self, s):
self.request_header = s
httplib.HTTPConnection.send(self, s)
class HTTPConnection3(object):
"""
Wrapper around HTTPConnection2
Used in HTTPHandler2(), see below.
"""
def __call__(self, *args, **kwargs):
"""
instance made in urllib2.HTTPHandler.do_open()
"""
self._conn = HTTPConnection2(*args, **kwargs)
self.request_headers = self._conn.request_headers
self.request_header = self._conn.request_header
return self
def __getattribute__(self, name):
"""
Redirect attribute access to the local HTTPConnection() instance.
"""
if name == "_conn":
return object.__getattribute__(self, name)
else:
return getattr(self._conn, name)
class HTTPHandler2(urllib2.HTTPHandler):
"""
A HTTPHandler which stores the request headers.
Used HTTPConnection3, see above.
>>> opener = urllib2.build_opener(HTTPHandler2)
>>> opener.addheaders = [("User-agent", "Python test")]
>>> response = opener.open('http://www.python.org/')
Get the request headers as a list build with HTTPConnection.putheader():
>>> response.request_headers
[('Accept-Encoding', 'identity'), ('Host', 'www.python.org'), ('Connection', 'close'), ('User-Agent', 'Python test')]
>>> response.request_header
'GET / HTTP/1.1\\r\\nAccept-Encoding: identity\\r\\nHost: www.python.org\\r\\nConnection: close\\r\\nUser-Agent: Python test\\r\\n\\r\\n'
"""
def http_open(self, req):
conn_instance = HTTPConnection3()
response = self.do_open(conn_instance, req)
response.request_headers = conn_instance.request_headers
response.request_header = conn_instance.request_header
return response
EDIT: Update the source

see urllib2.py:do_request (line 1044 (1067)) and urllib2.py:do_open (line 1073)
(line 293) self.addheaders = [('User-agent', client_version)] (only 'User-agent' added)

It sounds to me like you're looking for the headers of the response object, which include Connection: close, etc. These headers live in the object returned by urlopen. Getting at them is easy enough:
from urllib2 import urlopen
req = urlopen("http://www.google.com")
print req.headers.headers
req.headers is a instance of httplib.HTTPMessage

It should send the default http headers (as specified by w3.org) alongside the ones you specify. You can use a tool like WireShark if you would like to see them in their entirety.
Edit:
If you would like to log them, you can use WinPcap to capture packets sent by specific applications (in your case, python). You can also specify the type of packets and many other details.
-John

Related

Create a HTTP object from a string in Python

I am having a device which is sending the following http message to my RaspberryPi:
POST /sinvertwebmonitor/InverterService/InverterService.asmx/CollectInverterData HTTP/1.1
Host: www.automation.siemens.com
Content-Type: application/x-www-form-urlencoded
Content-Length: 349
xmlData=<rd><m>xxxxx</m><s>yyyyyy</s><d t="1483019400" l="600"><p i="1">460380AE</p><p i="2">43655DE7</p><p i="3">4212C986</p><p i="4">424805BC</p><p i="5">4604E3D1</p><p i="6">441F616A</p><p i="7">4155E7F5</p><p i="8">E1</p><p i="9">112</p><p i="C">153</p><p i="D">4</p><p i="E">11ABAC</p><p i="F">22A48C</p><p i="10">0</p></d></rd>
I cannot change anything on the device.
On the RaspberryPi im running a script to listen and receive the message from a socket.
This works so far and the received message is the one above.
Now, I would like to create a HTTP object from this message and then extract comfortably the header, content and so on.
Similar to the following example:
r = requests.get('https://www.google.com')
r.status_code
However, without "getting" an url. I just want to read the string I already have.
Pseudo-example:
r = requests.read(hereComesTheString)
r.status_code
I hope the problem became understandable.
Would be glad to get some hints.
Thanks and best regards,
Christoph
You use the status_code property in your example, but what you are receiving is a request not a response. However you can still create a simple object for accessing the data in the request.
It is probably easiest to create your own custom class:
import mimetools
from StringIO import StringIO
request = """POST /sinvertwebmonitor/InverterService/InverterService.asmx/CollectInverterData HTTP/1.1
Host: www.automation.siemens.com
Content-Type: application/x-www-form-urlencoded
Content-Length: 349
xmlData=<rd><m>xxxxx</m><s>yyyyyy</s><d t="1483019400" l="600"><p i="1">460380AE</p><p i="2">43655DE7</p><p i="3">4212C986</p><p i="4">424805BC</p><p i="5">4604E3D1</p><p i="6">441F616A</p><p i="7">4155E7F5</p><p i="8">E1</p><p i="9">112</p><p i="C">153</p><p i="D">4</p><p i="E">11ABAC</p><p i="F">22A48C</p><p i="10">0</p></d></rd>"""
class Request:
def __init__(self, request):
stream = StringIO(request)
request = stream.readline()
words = request.split()
[self.command, self.path, self.version] = words
self.headers = mimetools.Message(stream, 0)
self.content = stream.read()
def __getitem__(self, key):
return self.headers.get(key, '')
r = Request(request)
print(r.command)
print(r.path)
print(r.version)
for header in r.headers:
print(header, r[header])
print(r.content)
This outputs:
POST
/sinvertwebmonitor/InverterService/InverterService.asmx/CollectInverterData
HTTP/1.1
('host', 'www.automation.siemens.com')
('content-type', 'application/x-www-form-urlencoded')
('content-length', '349')
xmlData=<rd><m>xxxxx</m><s>yyyyyy</s><d t="1483019400" l="600"><p i="1">460380AE</p><p i="2">43655DE7</p><p i="3">4212C986</p><p i="4">424805BC</p><p i="5">4604E3D1</p><p i="6">441F616A</p><p i="7">4155E7F5</p><p i="8">E1</p><p i="9">112</p><p i="C">153</p><p i="D">4</p><p i="E">11ABAC</p><p i="F">22A48C</p><p i="10">0</p></d></rd>
If you're using plain socket server, then you need to implement an HTTP server so that you can split the request and respond according to the protocol.
It's probably easier just to use an existing HTTP server and app server. Flask is ideal for this:
from flask import Flask
from flask import request
app = Flask(__name__)
#app.route("/sinvertwebmonitor/InverterService/InverterService.asmx/CollectInverterData", methods=['POST'])
def dataCollector():
data = request.form['xmlData']
print(data)
# parseData. Take a look at ElementTree
if __name__ == "__main__":
app.run(host=0.0.0.0, port=80)
Thanks Alden. Below your code with a few changes so it works with Python3.
import email
from io import StringIO
request = """POST /sinvertwebmonitor/InverterService/InverterService.asmx/CollectInverterData HTTP/1.1
Host: www.automation.siemens.com
Content-Type: application/x-www-form-urlencoded
Content-Length: 349
xmlData=<rd><m>xxxxx</m><s>yyyyyy</s><d t="1483019400" l="600"><p i="1">460380AE</p><p i="2">43655DE7</p><p i="3">4212C986</p><p i="4">424805BC</p><p i="5">4604E3D1</p><p i="6">441F616A</p><p i="7">4155E7F5</p><p i="8">E1</p><p i="9">112</p><p i="C">153</p><p i="D">4</p><p i="E">11ABAC</p><p i="F">22A48C</p><p i="10">0</p></d></rd>"""
class Request:
def __init__(self, request):
stream = StringIO(request)
request = stream.readline()
words = request.split()
[self.command, self.path, self.version] = words
self.headers = email.message_from_string(request)
self.content = stream.read()
def __getitem__(self, key):
return self.headers.get(key, '')
r = Request(request)
print(r.command)
print(r.path)
print(r.version)
for header in r.headers:
print(header, r[header])
print(r.content)

How to deal with 401 (unauthorised) in python requests

What I want to do is GET from a site and if that request returns a 401, then redo my authentication wiggle (which may be out of date) and try again. But I don't want to try a third time, since that would be my authentication wiggle having the wrong credentials. Does anyone have a nice way of doing this that doesn't involve properly ugly code, ideally in python requests library, but I don't mind changing.
It doesn't get any less ugly than this, I think:
import requests
from requests.auth import HTTPBasicAuth
response = requests.get('http://your_url')
if response.status_code == 401:
response = requests.get('http://your_url', auth=HTTPBasicAuth('user', 'pass'))
if response.status_code != 200:
# Definitely something's wrong
You could have wrapped this in a function and used a decorator to evaluate the response and retry the auth on 401. Then you only need to decorate any function that requires this re-auth logic....
Update:
As requested, a code example. I'm afraid this one is an old piece of code, Python 2 based, but you'll get the idea. This one will retry an http call a number of times as defined in settings.NUM_PLATFORM_RETRIES and will call a refresh_token on auth failures. you can adjust the use case and result to whatever.
You can then use this decorator around methods:
#retry_on_read_error
def some_func():
do_something()
def retry_on_read_error(fn):
"""
Retry Feed reads on failures
If a token refresh is required it is performed before retry.
This decorator relies on the model to have a refresh_token method defined, othewise it will fail
"""
#wraps(fn)
def _wrapper(self, *args, **kwargs):
for i in range(settings.NUM_PLATFORM_RETRIES):
try:
res = fn(self, *args, **kwargs)
try:
_res = json.loads(res)
except ValueError:
# not a json response (could be local file read or non json data)
return res
if 'error' in _res and _res['error']['status'] in (401, 400):
raise AccessRefusedException(_res['error']['message'])
return res
except (urllib2.URLError, IOError, AccessRefusedException) as e:
if isinstance(e, AccessRefusedException):
self.refresh_token()
continue
raise ApiRequestFailed(
"Api failing, after %s retries: %s" % (settings.NUM_PLATFORM_RETRIES, e), args, kwargs
)
return _wrapper
You can use something like this
# 401 retry strategy
import requests
from requests import Request, Session, RequestException
class PreparedRequest:
"""
Class to make Http request with 401 retry
"""
failedRequests = []
defaultBaseUrl = "https://jsonplaceholder.typicode.com"
MAX_RETRY_COUNT = 0
def __init__(self, method, endpoint,
baseurl=defaultBaseUrl, headers=None, data=None, params=None):
"""
Constructor for PreparedRequest class
#param method: Http Request Method
#param endpoint: endpoint of the request
#param headers: headers of the request
#param data: data of request
#param params: params of the request
"""
self.method = method
self.url = baseurl + endpoint
self.headers = headers
self.data = data
self.params = params
self.response = None
def send(self):
"""
To send http request to the server
#return: response of the request
"""
req = Request(method=self.method, url=self.url, data=self.data,
headers=self.headers,params=self.params)
session = Session()
prepared = session.prepare_request(req)
response = session.send(prepared)
if response.status_code == 200:
PreparedRequest.failedRequests.append(self)
PreparedRequest.refresh_token()
elif response.status_code == 502:
raise Exception(response.raise_for_status())
else:
self.response = session.send(prepared)
#staticmethod
def refresh_token():
if PreparedRequest.MAX_RETRY_COUNT > 3:
return
print("Refreshing the token")
# Write your refresh token strategy here
PreparedRequest.MAX_RETRY_COUNT += 1
total_failed = len(PreparedRequest.failedRequests)
for i in range(total_failed):
item = PreparedRequest.failedRequests.pop()
item.send()
r = PreparedRequest(method="GET", endpoint="/todos/")
r.send()
print(r.response.json())
You need to send in the header of the request the authentication param
import requests
from requests.auth import HTTPBasicAuth
auth = HTTPBasicAuth("username", "password")
response = requests.get("http://serverIpOrName/html", auth=auth)
if response.status_code == 401 :
print("Authentication required")
if response.status_code == 200:
print(response.content)

Set useragent on Client __init__ (Python suds)

I would like to know how to set the useragent in all SOAP request with suds in Python, including WSDL get.
Indeed, on the following code :
Client('http://...')
The WSDL is get with the default Python useragent.
The WSDL is available on the server only for specific useragent.
Thank you
I don't know whether that's the easiest way to do it, but it is certainly possible to do using httplib2 (this trick also gives you keep-alive connections) :
from suds.transport import Transport
import httplib2, StringIO
class Httplib2Response:
pass
class Httplib2Transport(Transport):
def __init__(self, **kwargs):
Transport.__init__(self)
self.http = httplib2.Http()
def send(self, request):
url = request.url
message = request.message
headers = request.headers
headers['User-Agent']='XYZ'
response = Httplib2Response()
response.headers, response.message = self.http.request(url,
"PUT", body=message, headers=headers)
return response
def open(self, request):
response = Httplib2Response()
request.headers['User-Agent']='XYZ'
response.headers, response.message = self.http.request(request.url, "GET",
body=request.message, headers=request.headers)
return StringIO.StringIO(response.message)
And then you need to pass the transport class to the suds.client:
http = Httplib2Transport()
client = Client(url,transport=http)
You can override the u2opener method of Transport class to set your own addheaders attribute:
class HttpTransportCustomUserAgent(HttpTransport):
def __init__(self, **kwargs):
self.user_agent = kwargs.get('user_agent', 'Python-urllib/%s' % urllib2.__version__)
if 'user_agent' in kwargs:
del(kwargs['user_agent'])
HttpTransport.__init__(self, **kwargs)
def u2opener(self):
"""
Create a urllib opener.
#return: An opener.
#rtype: I{OpenerDirector}
"""
if self.urlopener is None:
result = urllib2.build_opener(*self.u2handlers())
result.addheaders = [('User-agent', self.user_agent)]
return result
else:
return self.urlopener
Now you can use this new transporter class for suds.client:
http = HttpTransportCustomUserAgent(user_agent='My custom User Agent')
client = Client(url, transport=http)

Cookiejar use in an opener

I have the following code at the moment:
tw_jar = cookielib.CookieJar()
tw_jar.set_cookie(c1)
tw_jar.set_cookie(c2)
o = urllib2.build_opener( urllib2.HTTPCookieProcessor(tw_jar) )
urllib2.install_opener( o )
Now I later in my code I don't want to use any of the cookies (Also new cookies created meanwhile).
Can I do a simple tw_jar.clear() or do I need to build and install the opener again to get rid of all cookies used in the requests?
This is how HTTPCookieProcessor is defined in my Python installation:
class HTTPCookieProcessor(BaseHandler):
def __init__(self, cookiejar=None):
import cookielib
if cookiejar is None:
cookiejar = cookielib.CookieJar()
self.cookiejar = cookiejar
def http_request(self, request):
self.cookiejar.add_cookie_header(request)
return request
def http_response(self, request, response):
self.cookiejar.extract_cookies(response, request)
return response
https_request = http_request
https_response = http_response
As only a reference is saved, you can just manipulate the original tw_jar instance and it will affect all future requests.
If you don't want any cookies, I'd recommend to create a new opener. However, if for some reason you want to keep the old one, removing the cookie processor from the list of handlers should work:
o.handlers = [h for h in o.handlers
if not isinstance(h, urllib2.HTTPCookieProcessor)]

Python httplib2.Http not sending post parameters

I have been trying to make an API request to Twilio using the httplib2 Http class and no matter how I try to setup the request, it doesn't send my post DATA. I know this, because I posted to a local URL and the post arguments are empty. Here is my code:
_TWILIO_URL = 'https://api.twilio.com/2010-04-01/Accounts/%s/%s'
class Api(object):
'''
A Python interface to the Twilio API
'''
def __init__(self, AccountSid, AuthToken, From):
self.AccountSid = AccountSid
self.AuthToken = AuthToken
self.From = From
def _get_from(self, From):
"""Use the provided From number or the one defined on initialization"""
if From:
return From
else:
return self.From
def call(self, To, Url, From=None):
"""Sends a request to Twilio having it call a number; the provided URL will indicate how to handle the call"""
url = _TWILIO_URL % (self.AccountSid, 'Calls')
data = dict(From=self._get_from(From), To=To, Url=Url)
return self.request(url, body=urlencode(data))
def request(self, url, method='POST', body=None, headers={'content-type':'text/plain'}):
"""Send the actual request"""
h = Http()
h.add_credentials(self.AccountSid, self.AuthToken)
resp, content = h.request(url, method=method, body=body, headers=headers)
print content
if resp['status'] == '200':
return loads(content)
else:
raise TwilioError(resp['status'], content)
def sms(self, To, Body, From=None):
"""Sends a request to Twilio having it call a number; the provided URL will indicate how to handle the call"""
url = _TWILIO_URL % (self.AccountSid, 'SMS/Messages')
data = dict(From=self._get_from(From), To=To, Body=Body)
return self.request(url, body=urlencode(data))
I can't find anything on google talking about troubleshooting
Twilio mentions this requirement in their API docs concerning POST requests:
But be sure to set the HTTP
Content-Type header to
"application/x-www-form-urlencoded"
for your requests if you are
writing your own client.
It turns out that the 'content-type' has to be set to 'application/x-www-form-urlencoded'. If anyone knows why, please let me know.

Categories

Resources