Validate SSL certificates with Python - python

I need to write a script that connects to a bunch of sites on our corporate intranet over HTTPS and verifies that their SSL certificates are valid; that they are not expired, that they are issued for the correct address, etc. We use our own internal corporate Certificate Authority for these sites, so we have the public key of the CA to verify the certificates against.
Python by default just accepts and uses SSL certificates when using HTTPS, so even if a certificate is invalid, Python libraries such as urllib2 and Twisted will just happily use the certificate.
How do I verify a certificate in Python?

I have added a distribution to the Python Package Index which makes the match_hostname() function from the Python 3.2 ssl package available on previous versions of Python.
http://pypi.python.org/pypi/backports.ssl_match_hostname/
You can install it with:
pip install backports.ssl_match_hostname
Or you can make it a dependency listed in your project's setup.py. Either way, it can be used like this:
from backports.ssl_match_hostname import match_hostname, CertificateError
...
sslsock = ssl.wrap_socket(sock, ssl_version=ssl.PROTOCOL_SSLv3,
cert_reqs=ssl.CERT_REQUIRED, ca_certs=...)
try:
match_hostname(sslsock.getpeercert(), hostname)
except CertificateError, ce:
...

You can use Twisted to verify certificates. The main API is CertificateOptions, which can be provided as the contextFactory argument to various functions such as listenSSL and startTLS.
Unfortunately, neither Python nor Twisted comes with a the pile of CA certificates required to actually do HTTPS validation, nor the HTTPS validation logic. Due to a limitation in PyOpenSSL, you can't do it completely correctly just yet, but thanks to the fact that almost all certificates include a subject commonName, you can get close enough.
Here is a naive sample implementation of a verifying Twisted HTTPS client which ignores wildcards and subjectAltName extensions, and uses the certificate-authority certificates present in the 'ca-certificates' package in most Ubuntu distributions. Try it with your favorite valid and invalid certificate sites :).
import os
import glob
from OpenSSL.SSL import Context, TLSv1_METHOD, VERIFY_PEER, VERIFY_FAIL_IF_NO_PEER_CERT, OP_NO_SSLv2
from OpenSSL.crypto import load_certificate, FILETYPE_PEM
from twisted.python.urlpath import URLPath
from twisted.internet.ssl import ContextFactory
from twisted.internet import reactor
from twisted.web.client import getPage
certificateAuthorityMap = {}
for certFileName in glob.glob("/etc/ssl/certs/*.pem"):
# There might be some dead symlinks in there, so let's make sure it's real.
if os.path.exists(certFileName):
data = open(certFileName).read()
x509 = load_certificate(FILETYPE_PEM, data)
digest = x509.digest('sha1')
# Now, de-duplicate in case the same cert has multiple names.
certificateAuthorityMap[digest] = x509
class HTTPSVerifyingContextFactory(ContextFactory):
def __init__(self, hostname):
self.hostname = hostname
isClient = True
def getContext(self):
ctx = Context(TLSv1_METHOD)
store = ctx.get_cert_store()
for value in certificateAuthorityMap.values():
store.add_cert(value)
ctx.set_verify(VERIFY_PEER | VERIFY_FAIL_IF_NO_PEER_CERT, self.verifyHostname)
ctx.set_options(OP_NO_SSLv2)
return ctx
def verifyHostname(self, connection, x509, errno, depth, preverifyOK):
if preverifyOK:
if self.hostname != x509.get_subject().commonName:
return False
return preverifyOK
def secureGet(url):
return getPage(url, HTTPSVerifyingContextFactory(URLPath.fromString(url).netloc))
def done(result):
print 'Done!', len(result)
secureGet("https://google.com/").addCallback(done)
reactor.run()

PycURL does this beautifully.
Below is a short example. It will throw a pycurl.error if something is fishy, where you get a tuple with error code and a human readable message.
import pycurl
curl = pycurl.Curl()
curl.setopt(pycurl.CAINFO, "myFineCA.crt")
curl.setopt(pycurl.SSL_VERIFYPEER, 1)
curl.setopt(pycurl.SSL_VERIFYHOST, 2)
curl.setopt(pycurl.URL, "https://internal.stuff/")
curl.perform()
You will probably want to configure more options, like where to store the results, etc. But no need to clutter the example with non-essentials.
Example of what exceptions might be raised:
(60, 'Peer certificate cannot be authenticated with known CA certificates')
(51, "common name 'CN=something.else.stuff,O=Example Corp,C=SE' does not match 'internal.stuff'")
Some links that I found useful are the libcurl-docs for setopt and getinfo.
http://curl.haxx.se/libcurl/c/curl_easy_setopt.html
http://curl.haxx.se/libcurl/c/curl_easy_getinfo.html

From release version 2.7.9/3.4.3 on, Python by default attempts to perform certificate validation.
This has been proposed in PEP 467, which is worth a read: https://www.python.org/dev/peps/pep-0476/
The changes affect all relevant stdlib modules (urllib/urllib2, http, httplib).
Relevant documentation:
https://docs.python.org/2/library/httplib.html#httplib.HTTPSConnection
This class now performs all the necessary certificate and hostname checks by default. To revert to the previous, unverified, behavior ssl._create_unverified_context() can be passed to the context parameter.
https://docs.python.org/3/library/http.client.html#http.client.HTTPSConnection
Changed in version 3.4.3: This class now performs all the necessary certificate and hostname checks by default. To revert to the previous, unverified, behavior ssl._create_unverified_context() can be passed to the context parameter.
Note that the new built-in verification is based on the system-provided certificate database. Opposed to that, the requests package ships its own certificate bundle. Pros and cons of both approaches are discussed in the Trust database section of PEP 476.

Or simply make your life easier by using the requests library:
import requests
requests.get('https://somesite.com', cert='/path/server.crt', verify=True)
A few more words about its usage.

Here's an example script which demonstrates certificate validation:
import httplib
import re
import socket
import sys
import urllib2
import ssl
class InvalidCertificateException(httplib.HTTPException, urllib2.URLError):
def __init__(self, host, cert, reason):
httplib.HTTPException.__init__(self)
self.host = host
self.cert = cert
self.reason = reason
def __str__(self):
return ('Host %s returned an invalid certificate (%s) %s\n' %
(self.host, self.reason, self.cert))
class CertValidatingHTTPSConnection(httplib.HTTPConnection):
default_port = httplib.HTTPS_PORT
def __init__(self, host, port=None, key_file=None, cert_file=None,
ca_certs=None, strict=None, **kwargs):
httplib.HTTPConnection.__init__(self, host, port, strict, **kwargs)
self.key_file = key_file
self.cert_file = cert_file
self.ca_certs = ca_certs
if self.ca_certs:
self.cert_reqs = ssl.CERT_REQUIRED
else:
self.cert_reqs = ssl.CERT_NONE
def _GetValidHostsForCert(self, cert):
if 'subjectAltName' in cert:
return [x[1] for x in cert['subjectAltName']
if x[0].lower() == 'dns']
else:
return [x[0][1] for x in cert['subject']
if x[0][0].lower() == 'commonname']
def _ValidateCertificateHostname(self, cert, hostname):
hosts = self._GetValidHostsForCert(cert)
for host in hosts:
host_re = host.replace('.', '\.').replace('*', '[^.]*')
if re.search('^%s$' % (host_re,), hostname, re.I):
return True
return False
def connect(self):
sock = socket.create_connection((self.host, self.port))
self.sock = ssl.wrap_socket(sock, keyfile=self.key_file,
certfile=self.cert_file,
cert_reqs=self.cert_reqs,
ca_certs=self.ca_certs)
if self.cert_reqs & ssl.CERT_REQUIRED:
cert = self.sock.getpeercert()
hostname = self.host.split(':', 0)[0]
if not self._ValidateCertificateHostname(cert, hostname):
raise InvalidCertificateException(hostname, cert,
'hostname mismatch')
class VerifiedHTTPSHandler(urllib2.HTTPSHandler):
def __init__(self, **kwargs):
urllib2.AbstractHTTPHandler.__init__(self)
self._connection_args = kwargs
def https_open(self, req):
def http_class_wrapper(host, **kwargs):
full_kwargs = dict(self._connection_args)
full_kwargs.update(kwargs)
return CertValidatingHTTPSConnection(host, **full_kwargs)
try:
return self.do_open(http_class_wrapper, req)
except urllib2.URLError, e:
if type(e.reason) == ssl.SSLError and e.reason.args[0] == 1:
raise InvalidCertificateException(req.host, '',
e.reason.args[1])
raise
https_request = urllib2.HTTPSHandler.do_request_
if __name__ == "__main__":
if len(sys.argv) != 3:
print "usage: python %s CA_CERT URL" % sys.argv[0]
exit(2)
handler = VerifiedHTTPSHandler(ca_certs = sys.argv[1])
opener = urllib2.build_opener(handler)
print opener.open(sys.argv[2]).read()

M2Crypto can do the validation. You can also use M2Crypto with Twisted if you like. The Chandler desktop client uses Twisted for networking and M2Crypto for SSL, including certificate validation.
Based on Glyphs comment it seems like M2Crypto does better certificate verification by default than what you can do with pyOpenSSL currently, because M2Crypto checks subjectAltName field too.
I've also blogged on how to get the certificates Mozilla Firefox ships with in Python and usable with Python SSL solutions.

Jython DOES carry out certificate verification by default, so using standard library modules, e.g. httplib.HTTPSConnection, etc, with jython will verify certificates and give exceptions for failures, i.e. mismatched identities, expired certs, etc.
In fact, you have to do some extra work to get jython to behave like cpython, i.e. to get jython to NOT verify certs.
I have written a blog post on how to disable certificate checking on jython, because it can be useful in testing phases, etc.
Installing an all-trusting security provider on java and jython.
http://jython.xhaus.com/installing-an-all-trusting-security-provider-on-java-and-jython/

The following code allows you to benefit from all SSL validation checks (e.g. date validity, CA certificate chain ...) EXCEPT a pluggable verification step e.g. to verify the hostname or do other additional certificate verification steps.
from httplib import HTTPSConnection
import ssl
def create_custom_HTTPSConnection(host):
def verify_cert(cert, host):
# Write your code here
# You can certainly base yourself on ssl.match_hostname
# Raise ssl.CertificateError if verification fails
print 'Host:', host
print 'Peer cert:', cert
class CustomHTTPSConnection(HTTPSConnection, object):
def connect(self):
super(CustomHTTPSConnection, self).connect()
cert = self.sock.getpeercert()
verify_cert(cert, host)
context = ssl.create_default_context()
context.check_hostname = False
return CustomHTTPSConnection(host=host, context=context)
if __name__ == '__main__':
# try expired.badssl.com or self-signed.badssl.com !
conn = create_custom_HTTPSConnection('badssl.com')
conn.request('GET', '/')
conn.getresponse().read()

pyOpenSSL is an interface to the OpenSSL library. It should provide everything you need.

I was having the same problem but wanted to minimize 3rd party dependencies (because this one-off script was to be executed by many users). My solution was to wrap a curl call and make sure that the exit code was 0. Worked like a charm.

Related

Why it is possible to suppress InsecureRequestWarning from within the code with PyCharm and not from a shell

I am using kuberentes-python client to send queries to Kubernetes cluster.
I am using only a bearer token without certificate to send the API requests. Therefore, the connection cannot be verified - insecure connection and I am getting warnings:
/usr/local/lib/python3.5/dist-packages/urllib3/connectionpool.py:857: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
InsecureRequestWarning)
I know how to suppress these messages and there is a way to do it.
I added in the code the following lines:
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
It worked ! I don't get the warnings anymore in PyCharm.
But when I ran the python code from the shell python3 myapp.py, the warning were still exist.
I have a workaround for that: export PYTHONWARNINGS="ignore:Unverified HTTPS request" but it works only if you run it from the shell before running the script and I preferred it to be from the code.
My questions is why the warnings were suppressed with PyCharm but not from the shell ?
This is my code if you want to test it (just change the host and token):
from kubernetes import client
import os
from kubernetes.client import Configuration
SERVICE_TOKEN_FILENAME = "/home/ubuntu/root-token"
class InClusterConfigLoader(object):
def __init__(self, token_filename):
self._token_filename = token_filename
def load_and_set(self):
self._load_config()
self._set_config()
def _load_config(self):
self.host = "https://10.0.62.0:8443"
if not os.path.isfile(self._token_filename):
print("Service token file does not exists.")
with open(self._token_filename) as f:
self.token = f.read().rstrip('\n')
if not self.token:
print("Token file exists but empty.")
def _set_config(self):
configuration = Configuration()
configuration.host = self.host
configuration.ssl_ca_cert = None
configuration.verify_ssl = False
configuration.api_key['authorization'] = "bearer " + self.token
Configuration.set_default(configuration)
def load_incluster_config():
InClusterConfigLoader(token_filename=SERVICE_TOKEN_FILENAME).load_and_set()
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
load_incluster_config()
api = client.CoreV1Api()
service_accounts = api.list_service_account_for_all_namespaces()
print(len(service_accounts.items))

Use TLS and Python for authentication

I want to make a little update script for a software that runs on a Raspberry Pi and works like a local server. That should connect to a master server in the web to get software updates and also to verify the license of the software.
For that I set up two python scripts. I want these to connect via a TLS socket. Then the client checks the server certificate and the server checks if it's one of the authorized clients. I found a solution for this using twisted on this page.
Now there is a problem left. I want to know which client (depending on the certificate) is establishing the connection. Is there a way to do this in Python 3 with twisted?
I'm happy with every answer.
In a word: yes, this is quite possible, and all the necessary stuff is
ported to python 3 - I tested all the following under Python 3.4 on my Mac and it seems to
work fine.
The short answer is
"use twisted.internet.ssl.Certificate.peerFromTransport"
but given that a lot of set-up is required to get to the point where that is
possible, I've constructed a fully working example that you should be able to
try out and build upon.
For posterity, you'll first need to generate a few client certificates all
signed by the same CA. You've probably already done this, but so others can
understand the answer and try it out on their own (and so I could test my
answer myself ;-)), they'll need some code like this:
# newcert.py
from twisted.python.filepath import FilePath
from twisted.internet.ssl import PrivateCertificate, KeyPair, DN
def getCAPrivateCert():
privatePath = FilePath(b"ca-private-cert.pem")
if privatePath.exists():
return PrivateCertificate.loadPEM(privatePath.getContent())
else:
caKey = KeyPair.generate(size=4096)
caCert = caKey.selfSignedCert(1, CN="the-authority")
privatePath.setContent(caCert.dumpPEM())
return caCert
def clientCertFor(name):
signingCert = getCAPrivateCert()
clientKey = KeyPair.generate(size=4096)
csr = clientKey.requestObject(DN(CN=name), "sha1")
clientCert = signingCert.signRequestObject(
csr, serialNumber=1, digestAlgorithm="sha1")
return PrivateCertificate.fromCertificateAndKeyPair(clientCert, clientKey)
if __name__ == '__main__':
import sys
name = sys.argv[1]
pem = clientCertFor(name.encode("utf-8")).dumpPEM()
FilePath(name.encode("utf-8") + b".client.private.pem").setContent(pem)
With this program, you can create a few certificates like so:
$ python newcert.py a
$ python newcert.py b
Now you should have a few files you can use:
$ ls -1 *.pem
a.client.private.pem
b.client.private.pem
ca-private-cert.pem
Then you'll want a client which uses one of these certificates, and sends some
data:
# tlsclient.py
from twisted.python.filepath import FilePath
from twisted.internet.endpoints import SSL4ClientEndpoint
from twisted.internet.ssl import (
PrivateCertificate, Certificate, optionsForClientTLS)
from twisted.internet.defer import Deferred, inlineCallbacks
from twisted.internet.task import react
from twisted.internet.protocol import Protocol, Factory
class SendAnyData(Protocol):
def connectionMade(self):
self.deferred = Deferred()
self.transport.write(b"HELLO\r\n")
def connectionLost(self, reason):
self.deferred.callback(None)
#inlineCallbacks
def main(reactor, name):
pem = FilePath(name.encode("utf-8") + b".client.private.pem").getContent()
caPem = FilePath(b"ca-private-cert.pem").getContent()
clientEndpoint = SSL4ClientEndpoint(
reactor, u"localhost", 4321,
optionsForClientTLS(u"the-authority", Certificate.loadPEM(caPem),
PrivateCertificate.loadPEM(pem)),
)
proto = yield clientEndpoint.connect(Factory.forProtocol(SendAnyData))
yield proto.deferred
import sys
react(main, sys.argv[1:])
And finally, a server which can distinguish between them:
# whichclient.py
from twisted.python.filepath import FilePath
from twisted.internet.endpoints import SSL4ServerEndpoint
from twisted.internet.ssl import PrivateCertificate, Certificate
from twisted.internet.defer import Deferred
from twisted.internet.task import react
from twisted.internet.protocol import Protocol, Factory
class ReportWhichClient(Protocol):
def dataReceived(self, data):
peerCertificate = Certificate.peerFromTransport(self.transport)
print(peerCertificate.getSubject().commonName.decode('utf-8'))
self.transport.loseConnection()
def main(reactor):
pemBytes = FilePath(b"ca-private-cert.pem").getContent()
certificateAuthority = Certificate.loadPEM(pemBytes)
myCertificate = PrivateCertificate.loadPEM(pemBytes)
serverEndpoint = SSL4ServerEndpoint(
reactor, 4321, myCertificate.options(certificateAuthority)
)
serverEndpoint.listen(Factory.forProtocol(ReportWhichClient))
return Deferred()
react(main, [])
For simplicity's sake we'll just re-use the CA's own certificate for the
server, but in a more realistic scenario you'd obviously want a more
appropriate certificate.
You can now run whichclient.py in one window, then python tlsclient.py a;
python tlsclient.py b in another window, and see whichclient.py print out
a and then b respectively, identifying the clients by the commonName
field in their certificate's subject.
The one caveat here is that you might initially want to put that call to
Certificate.peerFromTransport into a connectionMade method; that won't
work.
Twisted does not presently have a callback for "TLS handshake complete";
hopefully it will eventually, but until it does, you have to wait until you've
received some authenticated data from the peer to be sure the handshake has
completed. For almost all applications, this is fine, since by the time you
have received instructions to do anything (download updates, in your case) the
peer must already have sent the certificate.

Python: Access ftp like browsers do, with proxy

I want to access a ftp server, anonymous, just for download. My company have a proxy, and ftp ports (21) are blocked. I can't access the ftp server directly.
What I whant to do is to write some code that behaves exactly the same way browsers do. The idea is that, if I can download the files with my browser, there is way to do it with code.
My code works when I try to access a web site outside the company, but is still not working for ftp servers.
proxy = urllib2.ProxyHandler({'https': 'proxy.mycompanhy.com:8080',
'http': 'proxy.mycompanhy.com:80',
'ftp': 'proxy.mycompanhy.com:21' })
auth = urllib2.HTTPBasicAuthHandler()
opener = urllib2.build_opener(proxy, auth, urllib2.HTTPHandler)
urllib2.install_opener(opener)
urlAddress = 'https://python.org'
# urlAddress = 'ftp://ftp1.cptec.inpe.br'
conn = urllib2.urlopen(urlAddress)
return_str = conn.read()
print return_str
When I try to access python.org, it works fine. If I remove the install_opener part, it does not work anymore, proving that the proxy is required.
When I use the ftp url, it blocks (or timeout if I choose to use these parameters).
I understand that ftp and http are two very different protocols.
What I don't understand is the mechanism that browsers use to access these ftp servers.
I mean, I don't know if there is a layer on server side that interfaces between http and ftp, retriveing a html; or if browser, in some other maner, access the ftp and builds the page.
There also might be a confusion with the ftp domain (or the url) and the connection mode. It seems to me that when urllib2 reads the ftp://... it automatically uses the port 21.
I found a solution using wget. This package handles with proxies, but documentation was very ubscure. You need to setup an environment variable with proxy name.
import wget
import os
import errno
# setup proxy
os.environ["ftp_proxy"] = "proxy.mycompanhy.com"
os.environ["http_proxy"] = "proxy.mycompanhy.com"
os.environ["https_proxy"] = "proxy.mycompanhy.com"
src = "http://domain.gov/data/fileToDownload.txt"
out = "C:\\outFolder\\outFileName.txt" # out is optional
# create output folder if it doesn't exists
outFolder, _ = os.path.split( out )
try:
os.makedirs(outFolder)
except OSError as exc: # Python >2.5
if exc.errno == errno.EEXIST and os.path.isdir(outFolder):
pass
else: raise
# download
filename = wget.download(src, out)

Validating client certificates in PyOpenSSL

I'm writing an app that requires a cert to be installed in the client browser. I've found this in the PyOpenSSL docs for the "Context" object but I can't see anything about how the callback is supposed to validate the cert, only that it should, somehow.
set_verify(mode, callback)
Set the verification flags for this Context object to mode and
specify that callback should be used for verification callbacks.
mode should be one of VERIFY_NONE and VERIFY_PEER. If
VERIFY_PEER is used, mode can be OR:ed with
VERIFY_FAIL_IF_NO_PEER_CERT and VERIFY_CLIENT_ONCE to further
control the behaviour. callback should take five arguments: A
Connection object, an X509 object, and three integer variables,
which are in turn potential error number, error depth and return
code. callback should return true if verification passes and
false otherwise.
I'm telling the Context object where my (self signed) keys are (see below) so I guess I don't understand why that's not enough for the library to check if the cert presented by the client is a valid one. What should one do in this callback function?
class SecureAJAXServer(PlainAJAXServer):
def __init__(self, server_address, HandlerClass):
BaseServer.__init__(self, server_address, HandlerClass)
ctx = SSL.Context(SSL.SSLv23_METHOD)
ctx.use_privatekey_file ('keys/server.key')
ctx.use_certificate_file('keys/server.crt')
ctx.set_session_id("My_experimental_AJAX_Server")
ctx.set_verify( SSL.VERIFY_PEER | SSL.VERIFY_FAIL_IF_NO_PEER_CERT | SSL.VERIFY_CLIENT_ONCE, callback_func )
self.socket = SSL.Connection(ctx, socket.socket(self.address_family, self.socket_type))
self.server_bind()
self.server_activate()
Caveat: Coding for fun here, def not a pro so if my Q reveals my total lameness, naivety and/or fundamental lack of understanding when it comes to SSL please don't be too rough!
Thanks :)
Roger
In the OpenSSL documentation for set_verify(), the key that you care about is the return code:
callback should take five arguments: A Connection object, an X509
object, and three integer variables, which are in turn potential error
number, error depth and return code. callback should return true
if verification passes and false otherwise.
There is a a full working example that shows more or less what you want to do: When are client certificates verified?
Essentially you can ignore the first 4 arguments and just check the value of the return code in the fifth argument like so:
from OpenSSL.SSL import Context, Connection, SSLv23_METHOD
from OpenSSL.SSL import VERIFY_PEER, VERIFY_FAIL_IF_NO_PEER_CERT, VERIFY_CLIENT_ONCE
class SecureAJAXServer(BaseServer):
def verify_callback(connection, x509, errnum, errdepth, ok):
if not ok:
print "Bad Certs"
else:
print "Certs are fine"
return ok
def __init__(self, server_address, HandlerClass):
BaseServer.__init__(self, server_address, HandlerClass)
ctx = Context(SSLv23_METHOD)
ctx.use_privatekey_file ('keys/server.key')
ctx.use_certificate_file('keys/server.crt')
ctx.set_session_id("My_experimental_AJAX_Server")
ctx.set_verify( VERIFY_PEER | VERIFY_FAIL_IF_NO_PEER_CERT | VERIFY_CLIENT_ONCE, verify_callback )
self.socket = Connection(ctx, socket.socket(self.address_family, self.socket_type))
self.server_bind()
self.server_activate()
Note: I made one other change which is from OpenSSL.SSL import ... to simplify your code a bit while I was testing it so you don't have the SSL. prefix in front of every import symbol.

Tell urllib2 to use custom DNS

I'd like to tell urllib2.urlopen (or a custom opener) to use 127.0.0.1 (or ::1) to resolve addresses. I wouldn't change my /etc/resolv.conf, however.
One possible solution is to use a tool like dnspython to query addresses and httplib to build a custom url opener. I'd prefer telling urlopen to use a custom nameserver though. Any suggestions?
Looks like name resolution is ultimately handled by socket.create_connection.
-> urllib2.urlopen
-> httplib.HTTPConnection
-> socket.create_connection
Though once the "Host:" header has been set, you can resolve the host and pass on the IP address through down to the opener.
I'd suggest that you subclass httplib.HTTPConnection, and wrap the connect method to modify self.host before passing it to socket.create_connection.
Then subclass HTTPHandler (and HTTPSHandler) to replace the http_open method with one that passes your HTTPConnection instead of httplib's own to do_open.
Like this:
import urllib2
import httplib
import socket
def MyResolver(host):
if host == 'news.bbc.co.uk':
return '66.102.9.104' # Google IP
else:
return host
class MyHTTPConnection(httplib.HTTPConnection):
def connect(self):
self.sock = socket.create_connection((MyResolver(self.host),self.port),self.timeout)
class MyHTTPSConnection(httplib.HTTPSConnection):
def connect(self):
sock = socket.create_connection((MyResolver(self.host), self.port), self.timeout)
self.sock = ssl.wrap_socket(sock, self.key_file, self.cert_file)
class MyHTTPHandler(urllib2.HTTPHandler):
def http_open(self,req):
return self.do_open(MyHTTPConnection,req)
class MyHTTPSHandler(urllib2.HTTPSHandler):
def https_open(self,req):
return self.do_open(MyHTTPSConnection,req)
opener = urllib2.build_opener(MyHTTPHandler,MyHTTPSHandler)
urllib2.install_opener(opener)
f = urllib2.urlopen('http://news.bbc.co.uk')
data = f.read()
from lxml import etree
doc = etree.HTML(data)
>>> print doc.xpath('//title/text()')
['Google']
Obviously there are certificate issues if you use the HTTPS, and you'll need to fill out MyResolver...
Another (dirty) way is monkey-patching socket.getaddrinfo.
For example this code adds a (unlimited) cache for dns lookups.
import socket
prv_getaddrinfo = socket.getaddrinfo
dns_cache = {} # or a weakref.WeakValueDictionary()
def new_getaddrinfo(*args):
try:
return dns_cache[args]
except KeyError:
res = prv_getaddrinfo(*args)
dns_cache[args] = res
return res
socket.getaddrinfo = new_getaddrinfo
You will need to implement your own dns lookup client (or using dnspython as you said). The name lookup procedure in glibc is pretty complex to ensure compatibility with other non-dns name systems. There's for example no way to specify a particular DNS server in the glibc library at all.

Categories

Resources