Related
I want to download the TLS certificate chain for a given website.
I have a running code using blocking sockets, code provided here,
Getting certificate chain with Python 3.3 SSL module.
from OpenSSL import SSL
import socket
def get_certificates(hostname, port):
context = SSL.Context(method=SSL.TLSv1_METHOD)
conn = SSL.Connection(context, socket=socket.socket(family=socket.AF_INET, type=socket.SOCK_STREAM))
conn.settimeout(1)
conn.connect((hostname, port))
conn.setblocking(1)
conn.do_handshake()
conn.set_tlsext_host_name(hostname.encode())
chain = conn.get_peer_cert_chain()
conn.close()
return chain
def main():
hostname = 'www.google.com'
port = 443
chain = get_certificates(hostname, port)
This code is running fine. I want to use async to make multiprocessing with a large list of hostnames more performant. I didn't find a clear way to do it. What's the best way?
I have been trying to make requests to a website using the requests library but using different network interfaces. Following are a list of answers that I have tried to use but did not work.
This answer describes how to achieve what I want, but it uses pycurl. I could use pycurl but I have learned about this monkey patching thing and want to give it a try.
This other answer seemed to work at first, since it does not raise any error. However, I monitored my network traffic using Wireshark and the packets were sent from my default interface. I tried to print messages inside the function set_src_addr defined by the author of the answer but the message did not show up. Therefore, I think it is patching a function that is never called. I get a HTTP 200 response, which should not occur since I have bound my socket to 127.0.0.1.
import socket
real_create_conn = socket.create_connection
def set_src_addr(*args):
address, timeout = args[0], args[1]
source_address = ('127.0.0.1', 0)
return real_create_conn(address, timeout, source_address)
socket.create_connection = set_src_addr
import requests
r = requests.get('http://www.google.com')
r
<Response [200]>
I have also tried this answer. I can get two kind of errors using this method:
import socket
true_socket = socket.socket
def bound_socket(*a, **k):
sock = true_socket(*a, **k)
sock.bind(('127.0.0.1', 0))
return sock
socket.socket = bound_socket
import requests
This will not allow me to create a socket and raise this error. I have also tried to make a variation of this answer which looks like this:
import requests
import socket
true_socket = socket.socket
def bound_socket(*a, **k):
sock = true_socket(*a, **k)
sock.bind(('192.168.0.10', 0))
print(sock)
return sock
socket.socket = bound_socket
r = requests.get('https://www.google.com')
This also do not work and raises this error.
I have the following problem: I want to have each process sending requests through a specific network interface. I thought that since threads share global memory (including libraries), I should change my code to work with processes. Now, I want to apply a monkey patching solution somewhere, in a way that each process can use a different interface for communication. Am I missing something? Is this the best way to approach this problem?
Edit:
I also would like to know if it is possible for different process to have different versions of the same library. If they are shared, how can I have different versions of a library in Python (one for each process)?
This appears to work for python3:
In [1]: import urllib3
In [2]: real_create_conn = urllib3.util.connection.create_connection
In [3]: def set_src_addr(address, timeout, *args, **kw):
...: source_address = ('127.0.0.1', 0)
...: return real_create_conn(address, timeout=timeout, source_address=source_address)
...:
...: urllib3.util.connection.create_connection = set_src_addr
...:
...: import requests
...: r = requests.get('http://httpbin.org')
It fails with the following exception:
ConnectionError: HTTPConnectionPool(host='httpbin.org', port=80): Max retries exceeded with url: / (Caused by NewConnectionError("<urllib3.connection.HTTPConnection object at 0x10c4b89b0>: Failed to establish a new connection: [Errno 49] Can't assign requested address",))
I will document the solution I have found and list some problems I had in the process.
salparadise had it right. It is very similar to the first answer I have found. I am assuming that the requests module import the urllib3 and the latter has its own version of the socket library. Therefore, it is very likely that the requests module will never directly call the socket library, but will have its functionality provided by the urllib3 module.
I have not noticed it first, but the third snippet I had in my question was working. The problem why I had a ConnectionError is because I was trying to use a macvlan virtual interface over a wireless physical interface (which, if I understood correctly, drops packets if the MAC addresses do not match). Therefore, the following solution does work:
import requests
from socket import socket as backup
import socket
def socket_custom_src_ip(src_ip):
original_socket = backup
def bound_socket(*args, **kwargs):
sock = original_socket(*args, **kwargs)
sock.bind((src_ip, 0))
print(sock)
return sock
return bound_socket
In my problem, I will need to change the IP address of a socket several times. One of the problems I had was that if no backup of the socket function is made, changing it several times would cause an error RecursionError: maximum recursion depth exceeded. This occurs since on the second change, the socket.socket function would not be the original one. Therefore, my solution above creates a copy of the original socket function to use as a backup for further bindings of different IPs.
Lastly, following is a proof of concept of how to achieve multiple processes using different libraries. With this idea, I can import and monkey-patch each socket inside my processes, having different monkey-patched versions of them.
import importlib
import multiprocessing
class MyProcess(multiprocessing.Process):
def __init__(self, module):
super().__init__()
self.module = module
def run(self):
i = importlib.import_module(f'{self.module}')
print(f'{i}')
p1 = MyProcess('os')
p2 = MyProcess('sys')
p1.start()
<module 'os' from '/usr/lib/python3.7/os.py'>
p2.start()
<module 'sys' (built-in)>
This also works using the import statement and global keyword to provide transparent access inside all functions as the following
import multiprocessing
def fun(self):
import os
global os
os.var = f'{repr(self)}'
fun2()
def fun2():
print(os.system(f'echo "{os.var}"'))
class MyProcess(multiprocessing.Process):
def __init__(self):
super().__init__()
def run(self):
if 'os' in dir():
print('os already imported')
fun(self)
p1 = MyProcess()
p2 = MyProcess()
p2.start()
<MyProcess(MyProcess-2, started)>
p1.start()
<MyProcess(MyProcess-1, started)>
I faced a similar issue where I wanted to have some localhost traffic originating not from 127.0.0.1 ( I was testing a https connection over localhost)
This is how I did it using the python core libraries ssl and http.client (cf docs), as it seemed cleaner than the solutions I found online using the requests library.
import http.client as http
import ssl
dst= 'sever.infsec.local' # dns record was added to OS
src = ('127.0.0.2',0) # 0 -> select available port
context = ssl.SSLContext()
context.load_default_certs() # loads OS certifcate context
request = http.HTTPSConnection(dst, 443, context=context,
source_address=src)
request.connect()
request.request("GET", '/', json.dumps(request_data))
response = request.getresponse()
I have the following code:
class Server:
def __init__(self, port, isWithThread):
self.isWithThread = isWithThread
self.port = port
self.sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
self.sock.setblocking(0)
log.info("Socket created...")
def __enter__(self):
self.sock.bind(('127.0.0.1', self.port))
self.sock.listen(5)
log.info("Listening on %s:%s", '127.0.0.1', self.port)
return self
def __exit__(self, type, value, traceback):
self.sock.setblocking(1)
self.sock.close()
log.info("Socket closed.")
log.info("Bye")
def run(self):
#client, addr = self.sock.accept()
log.info('Selecting...')
readers, writers, errors = select.select([self.sock], [], [], 10)
log.debug(readers)
if readers:
client, addr = readers[0].accept()
log.info('Client: %s', client.recv(2048).decode())
client.sendall("Hippee!".encode())
client.close()
log.info("Disconnected from %s:%s", *addr)
What's interesting about this is that when I have the select.select and setblocking(0), it ends out keeping the address in use. If I remove the setblocking code and change the run function to:
def run(self):
client, addr = self.sock.accept()
log.info('Client: %s', client.recv(2048).decode())
client.sendall("Hippee!".encode())
client.close()
log.info("Disconnected from %s:%s", *addr)
Then I can immediately re-run the server. With the select() call, I get the following error:
python3.3 test.py server
Socket created...
Traceback (most recent call last):
File "test.py", line 89, in <module>
with Server(12345, False) as s:
File "test.py", line 57, in __enter__
self.sock.bind(('127.0.0.1', self.port))
OSError: [Errno 98] Address already in use
So why does it appear that the select is keeping my socket open, and how do I ensure it closes?
Magic. Magic is the only reason to see any difference with or without select.select(). According to this page, the reason that a socket will stay in use even after calling .close() is that the TIME_WAIT has not yet expired.
The solution is to use .setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1).
I tried this but it didn't work, and so I asked this question. Later, I realized that hey, I knew that things like Flask or SimpleHTTPServer will allow you to restart the server immediately. So I used the source, and examined the library code contained in socketserver.py. It was here that I discovered the use of .setsocketopt() but the call came before the call to .bind().
To explain setsocketopt(), let's see what does the docs say?
socket.setsockopt(level, optname, value)
Set the value of the given socket option (see the Unix manual page setsockopt(2)). The needed symbolic constants are defined in the socket module (SO_* etc.). The value can be an integer or a string representing a buffer. In the latter case it is up to the caller to ensure that the string contains the proper bits (see the optional built-in module struct for a way to encode C structures as strings)
level refers to the level of the TCP/IP stack you want to talk about. In this case we don't want the IP layer but the socket itself. The socket option is SO_REUSEADDR, and we're setting the flag (value=1). So somewhere down in the kernel or drivers, we're effectively saying, "SHHhhhhhh... It's OK. I don't care that you're in TIME_WAIT right now. I want to .bind() to you anyway."
So I changed up my code to have:
sock.setsocketopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
sock.bind(('127.0.0.1', self.port))
And it works perfectly.
\o/
I'm writing an application that will make use of Python's HTTPServer and BaseHTTPRequest. At some point I figured that due to the sensitive nature of the data user might want to send, an implementation of SOCKS would be useful. The problem is - the application is planned to run on a non-standard port and thus it would be useful if it could talk to both plaintext and SSL connections. I've found there a way to make HTTPServer use SSL:
import BaseHTTPServer, SimpleHTTPServer
import ssl
httpd = BaseHTTPServer.HTTPServer(('localhost', 4443), SimpleHTTPServer.SimpleHTTPRequestHandler)
httpd.socket = ssl.wrap_socket (httpd.socket, certfile='path/to/localhost.pem', server_side=True)
httpd.serve_forever()
Is there a way to create a socket class that would handle both SSL and plaintext connections? A neat way to detect SSL (i.e. some magic bytes)? The alternative would be to allocate two ports, but that's way less cool.
I've investigated the problem a little bit.
It's easy to make a socket behave like two different servers (depending on the type of data received). What's bad here is that python's _ssl library reads directly from socket._socket, which is a native python object and therefore can't be hooked normally.
One way is to write a C module that will hook native python socket.
Another solution is to have 1 frontend and 2 backends (https and http). Frontend listens on 4443 and decides whether it should commutate connection with https backend or http backend. You can add the same handlers to the both servers and they'll behave in the same way. Another problem is that on backend we don't know the ip of the client, but there are workarounds (Like the dict {(Frontend to backend source port number): Client IP} that frontend will be keeping and backends will be looking at).
Comparing with the C solution, the second looks quite dirty, but here it is.
import BaseHTTPServer, SimpleHTTPServer
import ssl
import socket
import select
import threading
FRONTEND_PORT = 4443
BACKEND_PORT_SSL = 44431
BACKEND_PORT_HTTP = 44432
HOST = 'localhost'
httpd_ssl = BaseHTTPServer.HTTPServer((HOST, BACKEND_PORT_SSL), SimpleHTTPServer.SimpleHTTPRequestHandler)
httpd_ssl.socket = ssl.wrap_socket (httpd_ssl.socket, certfile='key.pem', server_side=True)
httpd_direct = BaseHTTPServer.HTTPServer((HOST, BACKEND_PORT_HTTP), SimpleHTTPServer.SimpleHTTPRequestHandler)
def serve_forever(http_server):
http_server.serve_forever()
def categorize(sock, addr):
data = sock.recv(1)
if data == '\x16':
port = BACKEND_PORT_SSL
else:
port = BACKEND_PORT_HTTP
other_sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
other_sock.connect((HOST, port))
other_sock.send(data)
inp = [sock, other_sock]
select_timeout = 1.0
try:
while 1:
r,w,x = select.select(inp,[],[],select_timeout)
if not r:
continue
for s in r:
o_s = inp[1] if inp[0]==s else inp[0]
buf = s.recv(4096)
if not buf:
raise socket.error
o_s.send(buf)
except socket.error:
pass
finally:
for s in inp:
s.close()
threading.Thread(target=serve_forever, args=(httpd_ssl,)).start()
threading.Thread(target=serve_forever, args=(httpd_direct,)).start()
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.bind((HOST, FRONTEND_PORT))
sock.listen(10)
while True:
conn, addr = sock.accept()
threading.Thread(target=categorize, args=(conn, addr)).start()
I solved the problem with a helper class that peeks into the received data upon accept(), and then returns either a wrapped or the naked socket:
class SmartServerSocket:
def __init__( self, sock ):
self.sock = sock
# delegate methods as needed
_delegate_methods = [ "fileno" ]
for method in _delegate_methods:
setattr( self, method, getattr( sock, method ) )
def accept( self ):
(conn, addr) = self.sock.accept()
if conn.recv( 1, socket.MSG_PEEK ) == "\x16":
return (ssl.wrap_socket( conn, certfile='path/to/localhost.pem', server_side=True ), addr)
else:
return (conn, addr)
httpd = BaseHTTPServer.HTTPServer( ('localhost', 4443), SimpleHTTPServer.SimpleHTTPRequestHandler )
httpd.socket = SmartServerSocket( httpd.socket )
httpd.serve_forever()
If you like, you can give the server object to the SmartServerSocket constructor and have accept() set a special member variable there to the detected protocol, so you can examine this in the RequestHandler instance.
Yes, there acutally is.
But I don't know of any client or server that implements STARTLS for HTTP. It's commonly used for IMAP and SMTP, but for HTTP unfortunately there don't seem to be any implementations, it's still common practice to serve HTTP on a different port then HTTPS.
How do i set the source IP/interface with Python and urllib2?
Unfortunately the stack of standard library modules in use (urllib2, httplib, socket) is somewhat badly designed for the purpose -- at the key point in the operation, HTTPConnection.connect (in httplib) delegates to socket.create_connection, which in turn gives you no "hook" whatsoever between the creation of the socket instance sock and the sock.connect call, for you to insert the sock.bind just before sock.connect that is what you need to set the source IP (I'm evangelizing widely for NOT designing abstractions in such an airtight, excessively-encapsulated way -- I'll be speaking about that at OSCON this Thursday under the title "Zen and the Art of Abstraction Maintenance" -- but here your problem is how to deal with a stack of abstractions that WERE designed this way, sigh).
When you're facing such problems you only have two not-so-good solutions: either copy, paste and edit the misdesigned code into which you need to place a "hook" that the original designer didn't cater for; or, "monkey-patch" that code. Neither is GOOD, but both can work, so at least let's be thankful that we have such options (by using an open-source and dynamic language). In this case, I think I'd go for monkey-patching (which is bad, but copy and paste coding is even worse) -- a code fragment such as:
import socket
true_socket = socket.socket
def bound_socket(*a, **k):
sock = true_socket(*a, **k)
sock.bind((sourceIP, 0))
return sock
socket.socket = bound_socket
Depending on your exact needs (do you need all sockets to be bound to the same source IP, or...?) you could simply run this before using urllib2 normally, or (in more complex ways of course) run it at need just for those outgoing sockets you DO need to bind in a certain way (then each time restore socket.socket = true_socket to get out of the way for future sockets yet to be created). The second alternative adds its own complications to orchestrate properly, so I'm waiting for you to clarify whether you do need such complications before explaining them all.
AKX's good answer is a variant on the "copy / paste / edit" alternative so I don't need to expand much on that -- note however that it doesn't exactly reproduce socket.create_connection in its connect method, see the source here (at the very end of the page) and decide what other functionality of the create_connection function you may want to embody in your copied/pasted/edited version if you decide to go that route.
This seems to work.
import urllib2, httplib, socket
class BindableHTTPConnection(httplib.HTTPConnection):
def connect(self):
"""Connect to the host and port specified in __init__."""
self.sock = socket.socket()
self.sock.bind((self.source_ip, 0))
if isinstance(self.timeout, float):
self.sock.settimeout(self.timeout)
self.sock.connect((self.host,self.port))
def BindableHTTPConnectionFactory(source_ip):
def _get(host, port=None, strict=None, timeout=0):
bhc=BindableHTTPConnection(host, port=port, strict=strict, timeout=timeout)
bhc.source_ip=source_ip
return bhc
return _get
class BindableHTTPHandler(urllib2.HTTPHandler):
def http_open(self, req):
return self.do_open(BindableHTTPConnectionFactory('127.0.0.1'), req)
opener = urllib2.build_opener(BindableHTTPHandler)
opener.open("http://google.com/").read() # Will fail, 127.0.0.1 can't reach google.com.
You'll need to figure out some way to parameterize "127.0.0.1" there, though.
Here's a further refinement that makes use of HTTPConnection's source_address argument (introduced in Python 2.7):
import functools
import httplib
import urllib2
class BoundHTTPHandler(urllib2.HTTPHandler):
def __init__(self, source_address=None, debuglevel=0):
urllib2.HTTPHandler.__init__(self, debuglevel)
self.http_class = functools.partial(httplib.HTTPConnection,
source_address=source_address)
def http_open(self, req):
return self.do_open(self.http_class, req)
This gives us a custom urllib2.HTTPHandler implementation that is source_address aware. We can add it to a new urllib2.OpenerDirector and install it as the default opener (for future urlopen() calls) with the following code:
handler = BoundHTTPHandler(source_address=("192.168.1.10", 0))
opener = urllib2.build_opener(handler)
urllib2.install_opener(opener)
I thought I'd follow up with a slightly better version of the monkey patch. If you need to be able to set different port options on some of the sockets or are using something like SSL that subclasses socket, the following code works a bit better.
_ip_address = None
def bind_outgoing_sockets_to_ip(ip_address):
"""This binds all python sockets to the passed in ip address"""
global _ip_address
_ip_address = ip_address
import socket
from socket import socket as s
class bound_socket(s):
def connect(self, *args, **kwargs):
if self.family == socket.AF_INET:
if self.getsockname()[0] == "0.0.0.0" and _ip_address:
self.bind((_ip_address, 0))
s.connect(self, *args, **kwargs)
socket.socket = bound_socket
You have to only bind the socket on connect if you need to run something like a webserver in the same process that needs to bind to a different ip address.
Reasoning that I should monkey-patch at the highest level available, here's an alternative to Alex's answer which patches httplib instead of socket, taking advantage of httplib.HTTPSConnection.__init__()'s source_address keyword argument (which is not exposed by urllib2, AFAICT). Tested and working on Python 2.7.2.
import httplib
HTTPSConnection_real = httplib.HTTPSConnection
class HTTPSConnection_monkey(HTTPSConnection_real):
def __init__(*a, **kw):
HTTPSConnection_real.__init__(*a, source_address=(SOURCE_IP, 0), **kw)
httplib.HTTPSConnection = HTTPSConnection_monkey
As of Python 2.7 httplib.HTTPConnection had source_address added to it, allowing you to provide an IP port pair to bind to.
See: http://docs.python.org/2/library/httplib.html#httplib.HTTPConnection