I use getpage () to load the pages:
d = getPage(url)
d.addCallback(parsePage,url)
d.addErrback(downloadError,url)
Now you need to download via http proxy. How can I call getpage () to use http proxy?
Use twisted.web.client.ProxyAgent instead. getPage is Twisted's old, not-very-good HTTP client API. IAgent is the new, better HTTP client API. Apart from its other advantages, it also has more features than getPage - including support for HTTP proxies.
Here's an example:
from __future__ import print_function
from os import environ
from twisted.internet.task import react
from twisted.internet.endpoints import HostnameEndpoint
from twisted.web.client import ProxyAgent
def main(reactor, proxy_hostname):
endpoint = HostnameEndpoint(reactor, proxy_hostname, 80)
agent = ProxyAgent(endpoint)
return agent.request(b"GET", b"http://google.com/").addCallback(print)
react(main, [environ["HTTP_PROXY"]])
Related
I am making an HTTP request from frontend and I can see the port number in Host field of request Headers in dev tools (eg xyz.com:1234). But using python's requests module, host only shows xyz.com.
How can I get the port number?
The requests library does not need to create and add a Host header when you use it to make a request, but you can add a Host header if you want: just provide the headers keyword argument--e.g. headers={'Host': 'xyz.com:1234'} if using your example above.
Parsing a port number from a URL, a manual approach
Your question seems to be more tied to parsing a port number for a request, however, and for that an example should clear things up for you:
from urllib.parse import urlparse
import requests
def get_port(url: str) -> int:
schema_ports = {'http': 80, 'https': 443}
parsed_url = urlparse(url)
if parsed_url.port:
return parsed_url.port
return schema_ports.get(parsed_url.scheme, None)
ports = (
get_port(requests.get('http://localhost:8001').request.url),
get_port(requests.get('http://google.com').request.url),
get_port(requests.get('https://google.com').request.url)
)
print(ports) # (8001, 80, 443)
In this example, there are three HTTP GET requests with the requests library. Although in this contrived example you already see the request URL, if you are working on a solution from a generic requests.models.Response object you can get the request URL from the request.url attribute. You then need to realize in cases where no port is specified explicitly, you will need to infer a reasonable default (as there is no explicit port). The get_port definition above gives an example of this for two common schemes (HTTP and HTTPS).
Read about Python's standard library's urllib.parse module for more information.
A more automated approach, leaning on the standard library
The manual approach described above describes how to think about this problem in a generic sense, but it does not scale easily to the many common schemes that may exist (ssh, gopher, etc.).
On POSIX systems, the /etc/services file maintains mappings for common service schemes to ports/protocols and optional descriptions, e.g.
http 80/udp www www-http # World Wide Web HTTP
http 80/tcp www www-http # World Wide Web HTTP
The getservbyname function in Python's socket library has a way to tap into this type of mapping:
>>> socket.getservbyname('https')
443
>>> socket.getservbyname('http')
80
With this, we can refine my first example to avoid manually specifying mappings for common schemes:
import socket
from urllib.parse import urlparse
import requests
def get_port(url: str) -> int:
parsed_url = urlparse(url)
if parsed_url.port:
return parsed_url.port
try:
return socket.getservbyname(parsed_url.scheme)
except OSError:
return None
ports = (
get_port(requests.get('http://localhost:8001').request.url),
get_port(requests.get('http://google.com').request.url),
get_port(requests.get('https://google.com').request.url)
)
print(ports) # (8001, 80, 443)
I am trying to create a python soap client with zeep.
But i can't figure out how to use the functions which are defined in the WSDL.
Here is my code:
from requests import Session
from requests.auth import HTTPBasicAuth
import zeep
from zeep.transports import Transport
session = Session()
session.auth = HTTPBasicAuth('admin', 'ip411')
transport_with_basic_auth = Transport(session=session)
client =
zeep.Client(wsdl='http://10.8.20.27/pbx10_00.wsdl',transport=transport_with_basic_auth)
client.service.Initialize('soap','test',True,True,True,True,True)
You can look at the WSDL here: www.innovaphone.com/wsdl/pbx10_00.wsdl
I have developed a CherryPy REST service with SSL (TLSv1-TLSv1.2) and disabling ciphers and insecure protocols.
Now I have another piece of code using Python requests to connect to this service. I already have written an TLS HTTPAdapter and a request succeeds. I have only one problem:
I neither see what cipher was chosen on server side nor on client side. So in fact, I do not really know, if my security options took place. I could not find out how to get SSLSocket.cipher() from the builtin Python module called for CherryPy or requests.
Is there a simple way to get this information?
Here is an example:
import requests
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.poolmanager import PoolManager
class Tlsv1_2HttpAdapter(HTTPAdapter):
""""Transport adapter" that allows us to use TLSv1.2"""
def init_poolmanager(self, connections, maxsize, block=False):
self.poolmanager = PoolManager(
num_pools=connections, maxsize=maxsize,
block=block, ssl_version=ssl.PROTOCOL_TLSv1_2)
con = "https://{}:{}".format(host, port)
tls = Tlsv1_2HttpAdapter()
try:
s = requests.Session()
s.mount(con, tls)
r = s.get(con)
except requests.exceptions.SSLError as e:
print(e, file=sys.stderr)
sys.exit(1)
I want something like: print("Cipher used: {}".format(foo.cipher()))
Many thanks in advance for your help
As a temporary solution for testing, the code below prints out the cipher suite (position 0) and protocol (position 1) like that:
('ECDHE-RSA-AES256-GCM-SHA384', 'TLSv1/SSLv3', 256)
Python 2.7 (tested):
from httplib import HTTPConnection
def request(self, method, url, body=None, headers={}):
self._send_request(method, url, body, headers)
print(self.sock.cipher())
HTTPConnection.request = request
Python 3 (tested on v3.8.9 by comment below):
from http.client import HTTPConnection
def request(self, method, url, body=None, headers={}, *,
encode_chunked=False):
self._send_request(method, url, body, headers, encode_chunked)
print(self.sock.cipher())
HTTPConnection.request = request
This is monkey patching the request() method for the only reason of adding the print statement. You can replace the print function by a debug logger if you want more control over the output.
Import or paste this snippet at the beginning of your code so that it can monkey patch the method as early as possible.
According to the http.server documentation BaseHTTPRequestHandler can handle POST requests.
class http.server.BaseHTTPRequestHandler(request, client_address,
server)¶ This class is used to handle the HTTP requests that arrive at
the server. By itself, it cannot respond to any actual HTTP requests;
it must be subclassed to handle each request method (e.g. GET or
POST). BaseHTTPRequestHandler provides a number of class and instance variables, and methods for use by subclasses.
However, down below it says:
do_POST() This method serves the 'POST' request type, only allowed for
CGI scripts. Error 501, “Can only POST to CGI scripts”, is output when
trying to POST to a non-CGI url.
What does this part of the documentation mean? Isn't that contradicting itself or am I misunderstanding something?
EDIT: To clarify, the following method I tried seems to work, I'd just like to know what the documentation of do_POST means.
from os import curdir
from os.path import join as pjoin
import requests
from http.server import BaseHTTPRequestHandler, HTTPServer
port = 18888
class StoreHandler(BaseHTTPRequestHandler):
store_path = pjoin(curdir, 'store.json')
def do_POST(self):
if self.path == '/store.json':
print("Got a connection from", self.client_address)
length = self.headers['content-length']
data = self.rfile.read(int(length))
print(data)
with open(self.store_path, 'w') as fh:
fh.write(data.decode())
self.send_response(200)
self.end_headers()
server = HTTPServer(('localhost', port), StoreHandler)
server.serve_forever()
CGIHTTPRequestHandler IS a subclass of SimpleHTTPRequestHandler, which is a subclass of BaseHTTPRequestHandler (I found this out by looking at the source code for SimpleHTTPServer.py and CGIHTTPServer.py). This part below:
do_POST() This method serves the 'POST' request type, only allowed for CGI scripts. Error 501, “Can only POST to CGI scripts”, is output when trying to POST to a non-CGI url.
Refers to CGIHTTPRequestHandler, not BaseHTTPRequestHandler! See:
http.server.BaseHTTPRequestHandler
CGIHTTPRequestHandler
do_POST() as documented is a method of CGIHTTPRequestHandler. Its default behavior does not affect BaseHTTPRequestHandler in any way.
How can I use socks proxy in tornado AsyncHttpClient?
I found it possible to use only HTTP Proxy without changing the lib...
According to the documentation, proxy support is only available for the libcurl implementation of AsyncHTTPClient.
If you will take a deeper look at the HTTPRequest object you're passing to the fetch() method, you'll notice there's an extra prepare_curl_callback argument, which can call setopt on the PyCurl object before the request is sent.
Here's a little example of such prepare_curl_callback function:
import pycurl
def prepare_curl_socks5(curl):
curl.setopt(pycurl.PROXYTYPE, pycurl.PROXYTYPE_SOCKS5)
And a full usage example:
import tornado
import tornado.ioloop
import tornado.gen
import tornado.httpclient
import pycurl
def prepare_curl_socks5(curl):
curl.setopt(pycurl.PROXYTYPE, pycurl.PROXYTYPE_SOCKS5)
#tornado.gen.coroutine
def main():
# set CurlAsyncHTTPClient the default AsyncHTTPClient
tornado.httpclient.AsyncHTTPClient.configure(
"tornado.curl_httpclient.CurlAsyncHTTPClient")
http_client = tornado.httpclient.AsyncHTTPClient()
http_request = tornado.httpclient.HTTPRequest(
"http://jsonip.com",
prepare_curl_callback=prepare_curl_socks5,
proxy_host="localhost",
proxy_port=9050
)
response = yield http_client.fetch(http_request)
print response.body
if __name__ == '__main__':
tornado.ioloop.IOLoop.instance().run_sync(main)
The additional keyword argument prepare_curl_callback=prepare_curl_socks5 to the fetch() call does the magic, making CURL use SOCKS5 proxy instead of the default - HTTP proxy.