How can I use socks proxy in tornado AsyncHttpClient?
I found it possible to use only HTTP Proxy without changing the lib...
According to the documentation, proxy support is only available for the libcurl implementation of AsyncHTTPClient.
If you will take a deeper look at the HTTPRequest object you're passing to the fetch() method, you'll notice there's an extra prepare_curl_callback argument, which can call setopt on the PyCurl object before the request is sent.
Here's a little example of such prepare_curl_callback function:
import pycurl
def prepare_curl_socks5(curl):
curl.setopt(pycurl.PROXYTYPE, pycurl.PROXYTYPE_SOCKS5)
And a full usage example:
import tornado
import tornado.ioloop
import tornado.gen
import tornado.httpclient
import pycurl
def prepare_curl_socks5(curl):
curl.setopt(pycurl.PROXYTYPE, pycurl.PROXYTYPE_SOCKS5)
#tornado.gen.coroutine
def main():
# set CurlAsyncHTTPClient the default AsyncHTTPClient
tornado.httpclient.AsyncHTTPClient.configure(
"tornado.curl_httpclient.CurlAsyncHTTPClient")
http_client = tornado.httpclient.AsyncHTTPClient()
http_request = tornado.httpclient.HTTPRequest(
"http://jsonip.com",
prepare_curl_callback=prepare_curl_socks5,
proxy_host="localhost",
proxy_port=9050
)
response = yield http_client.fetch(http_request)
print response.body
if __name__ == '__main__':
tornado.ioloop.IOLoop.instance().run_sync(main)
The additional keyword argument prepare_curl_callback=prepare_curl_socks5 to the fetch() call does the magic, making CURL use SOCKS5 proxy instead of the default - HTTP proxy.
Related
I am currently using connection pool provided by urllib3 in python like the following,
pool = urllib3.PoolManager(maxsize = 10)
resp = pool.request('GET', 'http://example.com')
content = resp.read()
resp.release_conn()
However, I don't know how to set proxy while using this connection pool. I tried to set proxy in the 'request' like pool.request('GET', 'http://example.com', proxies={'http': '123.123.123.123:8888'} but it didn't work.
Can someone tell me how to set the proxy while using connection pool
Thanks~
There is an example for how to use a proxy with urllib3 in the Advanced Usage section of the documentation. I adapted it to fit your example:
import urllib3
proxy = urllib3.ProxyManager('http://123.123.123.123:8888/', maxsize=10)
resp = proxy.request('GET', 'http://example.com/')
content = resp.read()
# You don't actually need to release_conn() if you're reading the full response.
# This will be a harmless no-op:
resp.release_conn()
The ProxyManager behaves the same way as a PoolManager would.
I have developed a CherryPy REST service with SSL (TLSv1-TLSv1.2) and disabling ciphers and insecure protocols.
Now I have another piece of code using Python requests to connect to this service. I already have written an TLS HTTPAdapter and a request succeeds. I have only one problem:
I neither see what cipher was chosen on server side nor on client side. So in fact, I do not really know, if my security options took place. I could not find out how to get SSLSocket.cipher() from the builtin Python module called for CherryPy or requests.
Is there a simple way to get this information?
Here is an example:
import requests
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.poolmanager import PoolManager
class Tlsv1_2HttpAdapter(HTTPAdapter):
""""Transport adapter" that allows us to use TLSv1.2"""
def init_poolmanager(self, connections, maxsize, block=False):
self.poolmanager = PoolManager(
num_pools=connections, maxsize=maxsize,
block=block, ssl_version=ssl.PROTOCOL_TLSv1_2)
con = "https://{}:{}".format(host, port)
tls = Tlsv1_2HttpAdapter()
try:
s = requests.Session()
s.mount(con, tls)
r = s.get(con)
except requests.exceptions.SSLError as e:
print(e, file=sys.stderr)
sys.exit(1)
I want something like: print("Cipher used: {}".format(foo.cipher()))
Many thanks in advance for your help
As a temporary solution for testing, the code below prints out the cipher suite (position 0) and protocol (position 1) like that:
('ECDHE-RSA-AES256-GCM-SHA384', 'TLSv1/SSLv3', 256)
Python 2.7 (tested):
from httplib import HTTPConnection
def request(self, method, url, body=None, headers={}):
self._send_request(method, url, body, headers)
print(self.sock.cipher())
HTTPConnection.request = request
Python 3 (tested on v3.8.9 by comment below):
from http.client import HTTPConnection
def request(self, method, url, body=None, headers={}, *,
encode_chunked=False):
self._send_request(method, url, body, headers, encode_chunked)
print(self.sock.cipher())
HTTPConnection.request = request
This is monkey patching the request() method for the only reason of adding the print statement. You can replace the print function by a debug logger if you want more control over the output.
Import or paste this snippet at the beginning of your code so that it can monkey patch the method as early as possible.
According to the http.server documentation BaseHTTPRequestHandler can handle POST requests.
class http.server.BaseHTTPRequestHandler(request, client_address,
server)¶ This class is used to handle the HTTP requests that arrive at
the server. By itself, it cannot respond to any actual HTTP requests;
it must be subclassed to handle each request method (e.g. GET or
POST). BaseHTTPRequestHandler provides a number of class and instance variables, and methods for use by subclasses.
However, down below it says:
do_POST() This method serves the 'POST' request type, only allowed for
CGI scripts. Error 501, “Can only POST to CGI scripts”, is output when
trying to POST to a non-CGI url.
What does this part of the documentation mean? Isn't that contradicting itself or am I misunderstanding something?
EDIT: To clarify, the following method I tried seems to work, I'd just like to know what the documentation of do_POST means.
from os import curdir
from os.path import join as pjoin
import requests
from http.server import BaseHTTPRequestHandler, HTTPServer
port = 18888
class StoreHandler(BaseHTTPRequestHandler):
store_path = pjoin(curdir, 'store.json')
def do_POST(self):
if self.path == '/store.json':
print("Got a connection from", self.client_address)
length = self.headers['content-length']
data = self.rfile.read(int(length))
print(data)
with open(self.store_path, 'w') as fh:
fh.write(data.decode())
self.send_response(200)
self.end_headers()
server = HTTPServer(('localhost', port), StoreHandler)
server.serve_forever()
CGIHTTPRequestHandler IS a subclass of SimpleHTTPRequestHandler, which is a subclass of BaseHTTPRequestHandler (I found this out by looking at the source code for SimpleHTTPServer.py and CGIHTTPServer.py). This part below:
do_POST() This method serves the 'POST' request type, only allowed for CGI scripts. Error 501, “Can only POST to CGI scripts”, is output when trying to POST to a non-CGI url.
Refers to CGIHTTPRequestHandler, not BaseHTTPRequestHandler! See:
http.server.BaseHTTPRequestHandler
CGIHTTPRequestHandler
do_POST() as documented is a method of CGIHTTPRequestHandler. Its default behavior does not affect BaseHTTPRequestHandler in any way.
I use getpage () to load the pages:
d = getPage(url)
d.addCallback(parsePage,url)
d.addErrback(downloadError,url)
Now you need to download via http proxy. How can I call getpage () to use http proxy?
Use twisted.web.client.ProxyAgent instead. getPage is Twisted's old, not-very-good HTTP client API. IAgent is the new, better HTTP client API. Apart from its other advantages, it also has more features than getPage - including support for HTTP proxies.
Here's an example:
from __future__ import print_function
from os import environ
from twisted.internet.task import react
from twisted.internet.endpoints import HostnameEndpoint
from twisted.web.client import ProxyAgent
def main(reactor, proxy_hostname):
endpoint = HostnameEndpoint(reactor, proxy_hostname, 80)
agent = ProxyAgent(endpoint)
return agent.request(b"GET", b"http://google.com/").addCallback(print)
react(main, [environ["HTTP_PROXY"]])
I'm writing a pythonic web API wrapper with a class like this
import httplib2
import urllib
class apiWrapper:
def __init__(self):
self.http = httplib2.Http()
def _http(self, url, method, dict):
'''
Im using this wrapper arround the http object
all the time inside the class
'''
params = urllib.urlencode(dict)
response, content = self.http.request(url,params,method)
as you can see I'm using the _http() method to simplify the interaction with the httplib2.Http() object. This method is called quite often inside the class and I'm wondering what's the best way to interact with this object:
create the object in the __init__ and then reuse it when the _http() method is called (as shown in the code above)
or create the httplib2.Http() object inside the method for every call of the _http() method (as shown in the code sample below)
import httplib2
import urllib
class apiWrapper:
def __init__(self):
def _http(self, url, method, dict):
'''Im using this wrapper arround the http object
all the time inside the class'''
http = httplib2.Http()
params = urllib.urlencode(dict)
response, content = http.request(url,params,method)
Supplying 'connection': 'close' in your headers should according to the docs close the connection after a response is received.:
headers = {'connection': 'close'}
resp, content = h.request(url, headers=headers)
You should keep the Http object if you reuse connections. It seems httplib2 is capable of reusing connections the way you use it in your first code, so this looks like a good approach.
At the same time, from a shallow inspection of the httplib2 code, it seems that httplib2 has no support for cleaning up unused connections, or to even notice when a server has decided to close a connection it no longer wants. If that is indeed the case, it looks like a bug in httplib2 to me - so I would rather use the standard library (httplib) instead.