Python urllib2 with keep alive - python

How can I make a "keep alive" HTTP request using Python's urllib2?

Use the urlgrabber library. This includes an HTTP handler for urllib2 that supports HTTP 1.1 and keepalive:
>>> import urllib2
>>> from urlgrabber.keepalive import HTTPHandler
>>> keepalive_handler = HTTPHandler()
>>> opener = urllib2.build_opener(keepalive_handler)
>>> urllib2.install_opener(opener)
>>>
>>> fo = urllib2.urlopen('http://www.python.org')
Note: you should use urlgrabber version 3.9.0 or earlier, as the keepalive module has been removed in version 3.9.1
There is a port of the keepalive module to Python 3.

Try urllib3 which has the following features:
Re-use the same socket connection for multiple requests (HTTPConnectionPool and HTTPSConnectionPool) (with optional client-side certificate verification).
File posting (encode_multipart_formdata).
Built-in redirection and retries (optional).
Supports gzip and deflate decoding.
Thread-safe and sanity-safe.
Small and easy to understand codebase perfect for extending and building upon. For a more comprehensive solution, have a look at Requests.
or a much more comprehensive solution - Requests - which supports keep-alive from version 0.8.0 (by using urllib3 internally) and has the following features:
Extremely simple HEAD, GET, POST, PUT, PATCH, DELETE Requests.
Gevent support for Asyncronous Requests.
Sessions with cookie persistience.
Basic, Digest, and Custom Authentication support.
Automatic form-encoding of dictionaries
A simple dictionary interface for request/response cookies.
Multipart file uploads.
Automatc decoding of Unicode, gzip, and deflate responses.
Full support for unicode URLs and domain names.

Or check out httplib's HTTPConnection.

Unfortunately keepalive.py was removed from urlgrabber on 25 Sep 2009 by the following change after urlgrabber was changed to depend on pycurl (which supports keep-alive):
http://yum.baseurl.org/gitweb?p=urlgrabber.git;a=commit;h=f964aa8bdc52b29a2c137a917c72eecd4c4dda94
However, you can still get the last revision of keepalive.py here:
http://yum.baseurl.org/gitweb?p=urlgrabber.git;a=blob_plain;f=urlgrabber/keepalive.py;hb=a531cb19eb162ad7e0b62039d19259341f37f3a6

Note that urlgrabber does not entirely work with python 2.6. I fixed the issues (I think) by making the following modifications in keepalive.py.
In keepalive.HTTPHandler.do_open() remove this
if r.status == 200 or not HANDLE_ERRORS:
return r
And insert this
if r.status == 200 or not HANDLE_ERRORS:
# [speedplane] Must return an adinfourl object
resp = urllib2.addinfourl(r, r.msg, req.get_full_url())
resp.code = r.status
resp.msg = r.reason
return resp

Please avoid collective pain and use Requests instead. It will do the right thing by default and use keep-alive if applicable.

Here's a somewhat similar urlopen() that does keep-alive, though it's not threadsafe.
try:
from http.client import HTTPConnection, HTTPSConnection
except ImportError:
from httplib import HTTPConnection, HTTPSConnection
import select
connections = {}
def request(method, url, body=None, headers={}, **kwargs):
scheme, _, host, path = url.split('/', 3)
h = connections.get((scheme, host))
if h and select.select([h.sock], [], [], 0)[0]:
h.close()
h = None
if not h:
Connection = HTTPConnection if scheme == 'http:' else HTTPSConnection
h = connections[(scheme, host)] = Connection(host, **kwargs)
h.request(method, '/' + path, body, headers)
return h.getresponse()
def urlopen(url, data=None, *args, **kwargs):
resp = request('POST' if data else 'GET', url, data, *args, **kwargs)
assert resp.status < 400, (resp.status, resp.reason, resp.read())
return resp

Related

Python requests print used cipher for HTTPS

I have developed a CherryPy REST service with SSL (TLSv1-TLSv1.2) and disabling ciphers and insecure protocols.
Now I have another piece of code using Python requests to connect to this service. I already have written an TLS HTTPAdapter and a request succeeds. I have only one problem:
I neither see what cipher was chosen on server side nor on client side. So in fact, I do not really know, if my security options took place. I could not find out how to get SSLSocket.cipher() from the builtin Python module called for CherryPy or requests.
Is there a simple way to get this information?
Here is an example:
import requests
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.poolmanager import PoolManager
class Tlsv1_2HttpAdapter(HTTPAdapter):
""""Transport adapter" that allows us to use TLSv1.2"""
def init_poolmanager(self, connections, maxsize, block=False):
self.poolmanager = PoolManager(
num_pools=connections, maxsize=maxsize,
block=block, ssl_version=ssl.PROTOCOL_TLSv1_2)
con = "https://{}:{}".format(host, port)
tls = Tlsv1_2HttpAdapter()
try:
s = requests.Session()
s.mount(con, tls)
r = s.get(con)
except requests.exceptions.SSLError as e:
print(e, file=sys.stderr)
sys.exit(1)
I want something like: print("Cipher used: {}".format(foo.cipher()))
Many thanks in advance for your help
As a temporary solution for testing, the code below prints out the cipher suite (position 0) and protocol (position 1) like that:
('ECDHE-RSA-AES256-GCM-SHA384', 'TLSv1/SSLv3', 256)
Python 2.7 (tested):
from httplib import HTTPConnection
def request(self, method, url, body=None, headers={}):
self._send_request(method, url, body, headers)
print(self.sock.cipher())
HTTPConnection.request = request
Python 3 (tested on v3.8.9 by comment below):
from http.client import HTTPConnection
def request(self, method, url, body=None, headers={}, *,
encode_chunked=False):
self._send_request(method, url, body, headers, encode_chunked)
print(self.sock.cipher())
HTTPConnection.request = request
This is monkey patching the request() method for the only reason of adding the print statement. You can replace the print function by a debug logger if you want more control over the output.
Import or paste this snippet at the beginning of your code so that it can monkey patch the method as early as possible.

socket ResourceWarning using urllib in Python 3

I am using a urllib.request.urlopen() to GET from a web service I'm trying to test.
This returns an HTTPResponse object, which I then read() to get the response body.
But I always see a ResourceWarning about an unclosed socket from socket.py
Here's the relevant function:
from urllib.request import Request, urlopen
def get_from_webservice(url):
""" GET from the webservice """
req = Request(url, method="GET", headers=HEADERS)
with urlopen(req) as rsp:
body = rsp.read().decode('utf-8')
return json.loads(body)
Here's the warning as it appears in the program's output:
$ ./test/test_webservices.py
/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/socket.py:359: ResourceWarning: unclosed <socket.socket object, fd=5, family=30, type=1, proto=6>
self._sock = None
.s
----------------------------------------------------------------------
Ran 2 tests in 0.010s
OK (skipped=1)
If there's anything I can do to the HTTPResponse (or the Request?) to make it close its socket cleanly,
I would really like to know, because this code is for my unit tests; I don't like
ignoring warnings anywhere, but especially not there.
I don't know if this is the answer, but it is part of the way to an answer.
If I add the header "connection: close" to the response from my web services, the HTTPResponse object seems to clean itself up properly without a warning.
And in fact, the HTTP Spec (http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html) says:
HTTP/1.1 applications that do not support persistent connections MUST include the "close" connection option in every message.
So the problem was on the server end (i.e. my fault!). In the event that you don't have control over the headers coming from the server, I don't know what you can do.
I had the same problem with urllib3 and I just added a context manager to close connection automatically:
import urllib3
def get(addr, headers):
""" this function will close the connection after a http request. """
with urllib3.PoolManager() as conn:
res = conn.request('GET', addr, headers=headers)
if r.status == 200:
return res.data
else:
raise ConnectionError(res.reason)
Note that urllib3 is designed to have a pool of connections and to keep connections alive for you. This can significantly speed up your application, if it needs to make a series of requests, e.g. few calls to the backend API.
Please read urllib3 documentation re connection pools here: https://urllib3.readthedocs.io/en/1.5/pools.html
P.S. you could also use requests lib, which is not a part of the Python standard lib (at 2019) but is very powerful and simple to use: http://docs.python-requests.org/en/master/

PUT request from tornado.httpclient.AsyncHTTPClient

Is there any way to perform PUT request in tornado httpclient?
For example are there any ways to replace urllib with Requests Library?
Or maybe subclass own client and inject there construction from this answer:
import urllib2
opener = urllib2.build_opener(urllib2.HTTPHandler)
request = urllib2.Request('http://example.org', data='your_put_data')
request.add_header('Content-Type', 'your/contenttype')
request.get_method = lambda: 'PUT'
url = opener.open(request)
Any painless patches, hacks, suggestions..
I want this construction to work propertly:
response = yield gen.Task(http_client.fetch, opt.site_url + '/api/user/', method="PUT", body=urlencode(pdata))
For now it's not sending body.
Nope, Tornado doesn't use urllib (presumably it blocks). The trick to using httpclient for anything more complicated than a basic GET is to create an HTTPRequest.
Untested, but should work:
from tornado.httpclient import HTTPRequest
request = HTTPRequest(opt.site_url + '/api/user/', method="PUT", body=urlencode(pdata))
response = yield gen.Task(http_client.fetch, request)

HTTPConnection to make DELETE request: 505 response

Frustratingly I'm needing to develop something on Python 2.6.4, and need to send a delete request to a server that seems to only support http 1.1. Here is my code:
httpConnection = httplib.HTTPConnection("localhost:9080")
httpConnection.request('DELETE', remainderURL)
httpResponse = httpConnection.getresponse()
The response code I then get is: 505 (HTTP version not supported)
I've tested sending a delete request via Firefox's RESTClient to the same URL and that works.
I can't use urllib2 because it doesn't support the DELETE request. Is the HTTPConnection object http 1.0 only? Or am I doing something wrong?
The HTTPConnection class uses HTTP/1.1 throughout, and the 505 seems to indicate it's the server that cannot handle HTTP/1.1 requests.
However, if you need to make DELETE requests, why not use the Requests package instead? A DELETE is as simple as:
import requests
requests.delete(url)
That won't magically solve your HTTP version mismatch, but you can enable verbose logging to figure out what is going on:
import sys
requests.delete(url, config=dict(verbose=sys.stderr))
You can use urllib2:
req = urllib2.Request(query_url)
req.get_method = lambda: 'DELETE' # creates the delete method
url = urllib2.urlopen(req)
httplib uses HTTP/1.1 (see HTTPConnection.putRequest method documentation).
Check httpResponse.version to see what version the server is using.

Broken POST in httplib

I'm having trouble with a POST using httplib. Here is the code:
import base64
import urllib
import httplib
http = urllib3.PoolManager()
head = {"Authorization":"Basic %s" % base64.encodestring("foo:bar")}
fields = {"token":"088cfe772ce0b7760186fe4762843a11"}
conn = httplib.HTTPSConnection("foundation.iplantc.org")
conn.set_debuglevel(2)
conn.request('POST', '/auth-v1/renew', urllib.urlencode(fields), head)
print conn.getresponse().read()
conn.close()
The POST that comes out is correct. I know I started a telnet session and typing it in worked fine. Here it is:
'POST /auth-v1/renew HTTP/1.1\r\nHost: foundation.iplantc.org\r\nAccept-Encoding: identity\r\nContent-Length: 38\r\nAuthorization: Basic YXRlcnJlbDpvTnl12aesf==\n\r\n\r\ntoken=088cfe772ce0b7760186fe4762843a11'
But the response from the server is "token not found" when the python script sends it. BTW this does work fine with urllib3 (urllib2 shows the same error), which uses an multipart encoding, but I want to know what is going wrong with the above. I would rather not depend on yet another 3rd party package.
httplib doesn't automatically add a Content-Type header, you have to add it yourself.
(urllib2 automatically adds application/x-www-form-urlencoded as Content-Type).
But what's probably throwing the server off is the additional '\n' after your authorization header, introduced by base64.encodestring. You should better use base64.urlsafe_b64encode instead.

Categories

Resources