Source interface with Python and urllib2

Source interface with Python and urllib2 - python

How do i set the source IP/interface with Python and urllib2?

Unfortunately the stack of standard library modules in use (urllib2, httplib, socket) is somewhat badly designed for the purpose -- at the key point in the operation, HTTPConnection.connect (in httplib) delegates to socket.create_connection, which in turn gives you no "hook" whatsoever between the creation of the socket instance sock and the sock.connect call, for you to insert the sock.bind just before sock.connect that is what you need to set the source IP (I'm evangelizing widely for NOT designing abstractions in such an airtight, excessively-encapsulated way -- I'll be speaking about that at OSCON this Thursday under the title "Zen and the Art of Abstraction Maintenance" -- but here your problem is how to deal with a stack of abstractions that WERE designed this way, sigh).
When you're facing such problems you only have two not-so-good solutions: either copy, paste and edit the misdesigned code into which you need to place a "hook" that the original designer didn't cater for; or, "monkey-patch" that code. Neither is GOOD, but both can work, so at least let's be thankful that we have such options (by using an open-source and dynamic language). In this case, I think I'd go for monkey-patching (which is bad, but copy and paste coding is even worse) -- a code fragment such as:
import socket
true_socket = socket.socket
def bound_socket(*a, **k):
sock = true_socket(*a, **k)
sock.bind((sourceIP, 0))
return sock
socket.socket = bound_socket
Depending on your exact needs (do you need all sockets to be bound to the same source IP, or...?) you could simply run this before using urllib2 normally, or (in more complex ways of course) run it at need just for those outgoing sockets you DO need to bind in a certain way (then each time restore socket.socket = true_socket to get out of the way for future sockets yet to be created). The second alternative adds its own complications to orchestrate properly, so I'm waiting for you to clarify whether you do need such complications before explaining them all.
AKX's good answer is a variant on the "copy / paste / edit" alternative so I don't need to expand much on that -- note however that it doesn't exactly reproduce socket.create_connection in its connect method, see the source here (at the very end of the page) and decide what other functionality of the create_connection function you may want to embody in your copied/pasted/edited version if you decide to go that route.

This seems to work.
import urllib2, httplib, socket
class BindableHTTPConnection(httplib.HTTPConnection):
def connect(self):
"""Connect to the host and port specified in __init__."""
self.sock = socket.socket()
self.sock.bind((self.source_ip, 0))
if isinstance(self.timeout, float):
self.sock.settimeout(self.timeout)
self.sock.connect((self.host,self.port))
def BindableHTTPConnectionFactory(source_ip):
def _get(host, port=None, strict=None, timeout=0):
bhc=BindableHTTPConnection(host, port=port, strict=strict, timeout=timeout)
bhc.source_ip=source_ip
return bhc
return _get
class BindableHTTPHandler(urllib2.HTTPHandler):
def http_open(self, req):
return self.do_open(BindableHTTPConnectionFactory('127.0.0.1'), req)
opener = urllib2.build_opener(BindableHTTPHandler)
opener.open("http://google.com/").read() # Will fail, 127.0.0.1 can't reach google.com.
You'll need to figure out some way to parameterize "127.0.0.1" there, though.

Here's a further refinement that makes use of HTTPConnection's source_address argument (introduced in Python 2.7):
import functools
import httplib
import urllib2
class BoundHTTPHandler(urllib2.HTTPHandler):
def __init__(self, source_address=None, debuglevel=0):
urllib2.HTTPHandler.__init__(self, debuglevel)
self.http_class = functools.partial(httplib.HTTPConnection,
source_address=source_address)
def http_open(self, req):
return self.do_open(self.http_class, req)
This gives us a custom urllib2.HTTPHandler implementation that is source_address aware. We can add it to a new urllib2.OpenerDirector and install it as the default opener (for future urlopen() calls) with the following code:
handler = BoundHTTPHandler(source_address=("192.168.1.10", 0))
opener = urllib2.build_opener(handler)
urllib2.install_opener(opener)

I thought I'd follow up with a slightly better version of the monkey patch. If you need to be able to set different port options on some of the sockets or are using something like SSL that subclasses socket, the following code works a bit better.
_ip_address = None
def bind_outgoing_sockets_to_ip(ip_address):
"""This binds all python sockets to the passed in ip address"""
global _ip_address
_ip_address = ip_address
import socket
from socket import socket as s
class bound_socket(s):
def connect(self, *args, **kwargs):
if self.family == socket.AF_INET:
if self.getsockname()[0] == "0.0.0.0" and _ip_address:
self.bind((_ip_address, 0))
s.connect(self, *args, **kwargs)
socket.socket = bound_socket
You have to only bind the socket on connect if you need to run something like a webserver in the same process that needs to bind to a different ip address.

Reasoning that I should monkey-patch at the highest level available, here's an alternative to Alex's answer which patches httplib instead of socket, taking advantage of httplib.HTTPSConnection.__init__()'s source_address keyword argument (which is not exposed by urllib2, AFAICT). Tested and working on Python 2.7.2.
import httplib
HTTPSConnection_real = httplib.HTTPSConnection
class HTTPSConnection_monkey(HTTPSConnection_real):
def __init__(*a, **kw):
HTTPSConnection_real.__init__(*a, source_address=(SOURCE_IP, 0), **kw)
httplib.HTTPSConnection = HTTPSConnection_monkey

As of Python 2.7 httplib.HTTPConnection had source_address added to it, allowing you to provide an IP port pair to bind to.
See: http://docs.python.org/2/library/httplib.html#httplib.HTTPConnection

Related

Monkey patching sockets library to use a specifc network interface

I have been trying to make requests to a website using the requests library but using different network interfaces. Following are a list of answers that I have tried to use but did not work.
This answer describes how to achieve what I want, but it uses pycurl. I could use pycurl but I have learned about this monkey patching thing and want to give it a try.
This other answer seemed to work at first, since it does not raise any error. However, I monitored my network traffic using Wireshark and the packets were sent from my default interface. I tried to print messages inside the function set_src_addr defined by the author of the answer but the message did not show up. Therefore, I think it is patching a function that is never called. I get a HTTP 200 response, which should not occur since I have bound my socket to 127.0.0.1.
import socket
real_create_conn = socket.create_connection
def set_src_addr(*args):
address, timeout = args[0], args[1]
source_address = ('127.0.0.1', 0)
return real_create_conn(address, timeout, source_address)
socket.create_connection = set_src_addr
import requests
r = requests.get('http://www.google.com')
r
<Response [200]>
I have also tried this answer. I can get two kind of errors using this method:
import socket
true_socket = socket.socket
def bound_socket(*a, **k):
sock = true_socket(*a, **k)
sock.bind(('127.0.0.1', 0))
return sock
socket.socket = bound_socket
import requests
This will not allow me to create a socket and raise this error. I have also tried to make a variation of this answer which looks like this:
import requests
import socket
true_socket = socket.socket
def bound_socket(*a, **k):
sock = true_socket(*a, **k)
sock.bind(('192.168.0.10', 0))
print(sock)
return sock
socket.socket = bound_socket
r = requests.get('https://www.google.com')
This also do not work and raises this error.
I have the following problem: I want to have each process sending requests through a specific network interface. I thought that since threads share global memory (including libraries), I should change my code to work with processes. Now, I want to apply a monkey patching solution somewhere, in a way that each process can use a different interface for communication. Am I missing something? Is this the best way to approach this problem?
Edit:
I also would like to know if it is possible for different process to have different versions of the same library. If they are shared, how can I have different versions of a library in Python (one for each process)?

This appears to work for python3:
In [1]: import urllib3
In [2]: real_create_conn = urllib3.util.connection.create_connection
In [3]: def set_src_addr(address, timeout, *args, **kw):
...: source_address = ('127.0.0.1', 0)
...: return real_create_conn(address, timeout=timeout, source_address=source_address)
...:
...: urllib3.util.connection.create_connection = set_src_addr
...:
...: import requests
...: r = requests.get('http://httpbin.org')
It fails with the following exception:
ConnectionError: HTTPConnectionPool(host='httpbin.org', port=80): Max retries exceeded with url: / (Caused by NewConnectionError("<urllib3.connection.HTTPConnection object at 0x10c4b89b0>: Failed to establish a new connection: [Errno 49] Can't assign requested address",))

I will document the solution I have found and list some problems I had in the process.
salparadise had it right. It is very similar to the first answer I have found. I am assuming that the requests module import the urllib3 and the latter has its own version of the socket library. Therefore, it is very likely that the requests module will never directly call the socket library, but will have its functionality provided by the urllib3 module.
I have not noticed it first, but the third snippet I had in my question was working. The problem why I had a ConnectionError is because I was trying to use a macvlan virtual interface over a wireless physical interface (which, if I understood correctly, drops packets if the MAC addresses do not match). Therefore, the following solution does work:
import requests
from socket import socket as backup
import socket
def socket_custom_src_ip(src_ip):
original_socket = backup
def bound_socket(*args, **kwargs):
sock = original_socket(*args, **kwargs)
sock.bind((src_ip, 0))
print(sock)
return sock
return bound_socket
In my problem, I will need to change the IP address of a socket several times. One of the problems I had was that if no backup of the socket function is made, changing it several times would cause an error RecursionError: maximum recursion depth exceeded. This occurs since on the second change, the socket.socket function would not be the original one. Therefore, my solution above creates a copy of the original socket function to use as a backup for further bindings of different IPs.
Lastly, following is a proof of concept of how to achieve multiple processes using different libraries. With this idea, I can import and monkey-patch each socket inside my processes, having different monkey-patched versions of them.
import importlib
import multiprocessing
class MyProcess(multiprocessing.Process):
def __init__(self, module):
super().__init__()
self.module = module
def run(self):
i = importlib.import_module(f'{self.module}')
print(f'{i}')
p1 = MyProcess('os')
p2 = MyProcess('sys')
p1.start()
<module 'os' from '/usr/lib/python3.7/os.py'>
p2.start()
<module 'sys' (built-in)>
This also works using the import statement and global keyword to provide transparent access inside all functions as the following
import multiprocessing
def fun(self):
import os
global os
os.var = f'{repr(self)}'
fun2()
def fun2():
print(os.system(f'echo "{os.var}"'))
class MyProcess(multiprocessing.Process):
def __init__(self):
super().__init__()
def run(self):
if 'os' in dir():
print('os already imported')
fun(self)
p1 = MyProcess()
p2 = MyProcess()
p2.start()
<MyProcess(MyProcess-2, started)>
p1.start()
<MyProcess(MyProcess-1, started)>

I faced a similar issue where I wanted to have some localhost traffic originating not from 127.0.0.1 ( I was testing a https connection over localhost)
This is how I did it using the python core libraries ssl and http.client (cf docs), as it seemed cleaner than the solutions I found online using the requests library.
import http.client as http
import ssl
dst= 'sever.infsec.local' # dns record was added to OS
src = ('127.0.0.2',0) # 0 -> select available port
context = ssl.SSLContext()
context.load_default_certs() # loads OS certifcate context
request = http.HTTPSConnection(dst, 443, context=context,
source_address=src)
request.connect()
request.request("GET", '/', json.dumps(request_data))
response = request.getresponse()

Proper use of a client and Deferred with Twisted

I implemented a basic SOCKS4 client with socket, but my Twisted translation isn't coming along too well. Here's my current code:
import struct
import socket
from twisted.python.failure import Failure
from twisted.internet import reactor
from twisted.internet.defer import Deferred
from twisted.internet.protocol import Protocol, ClientFactory
class Socks4Client(Protocol):
VERSION = 4
HOST = "0.0.0.0"
PORT = 80
REQUESTS = {
"CONNECT": 1,
"BIND": 2
}
RESPONSES = {
90: "request granted",
91: "request rejected or failed",
92: "request rejected because SOCKS server cannot connect to identd on the client",
93: "request rejected because the client program and identd report different user-ids"
}
def __init__(self):
self.buffer = ""
def connectionMade(self):
self.connect(self.HOST, self.PORT)
def dataReceived(self, data):
self.buffer += data
if len(self.buffer) == 8:
self.validateResponse(self.buffer)
def connect(self, host, port):
data = struct.pack("!BBH", self.VERSION, self.REQUESTS["CONNECT"], port)
data += socket.inet_aton(host)
data += "\x00"
self.transport.write(data)
def validateResponse(self, data):
version, result_code = struct.unpack("!BB", data[1:3])
if version != 4:
self.factory.protocolError(Exception("invalid version"))
elif result_code == 90:
self.factory.deferred.callback(self.responses[result_code])
elif result_code in self.RESPONSES:
self.factory.protocolError(Exception(self.responses[result_code]))
else:
self.factory.protocolError(Exception())
self.transport.abortConnection()
class Socks4Factory(ClientFactory):
protocol = Socks4Client
def __init__(self, deferred):
self.deferred = deferred
def clientConnectionFailed(self, connector, reason):
self.deferred.errback(reason)
def clientConnectionLost(self, connector, reason):
print "Connection lost:", reason
def protocolError(self, reason):
self.deferred.errback(reason)
def result(result):
print "Success:", result
def error(reason):
print "Error:", reason
if __name__ == "__main__":
d = Deferred()
d.addCallbacks(result, error)
factory = Socks4Factory(d)
reactor.connectTCP('127.0.0.1', 1080, factory)
reactor.run()
I have a feeling that I'm abusing Deferred. Is this the right way to send results from my client?
I've read a few tutorials, looked at the documentation, and read through most of the protocols bundled with Twisted, but I still can't figure it out: what exactly is a ClientFactory for? Am I using it the right way?
clientConnectionLosts gets triggered a lot. Sometimes I lose the connection and get a successful response. How is that so? What does this mean, and should I treat it as an error?
How do I make sure that my deferred calls only one callback/errback?
Any tips are appreciated.

I have a feeling that I'm abusing Deferred. Is this the right way to send results from my client?
It's not ideal, but it's not exactly wrong either. Generally, you should try to keep the code that instantiates a Deferred as close as possible to the code that calls Deferred.callback or Deferred.errback on that Deferred. In this case, those pieces of code are quite far apart - the former is in __main__ while the latter is in a class created by a class created by code in __main__. This is sort of like the law of Demeter - the more steps between these two things, the more tightly coupled, inflexible, and fragile the software.
Consider giving Socks4Client a method that creates and returns this Deferred instance. Then, try using an endpoint to setup the connection so you can more easily call this method:
endpoint = TCP4StreamClientEndpoint(reactor, "127.0.0.1", 1080)
d = endpoint.connect(factory)
def connected(protocol):
return protocol.waitForWhatever()
d.addCallback(connected)
d.addCallbacks(result, error)
One thing to note here is that using an endpoint, the clientConnectionFailed and clientConnectionLost methods of your factory won't be called. The endpoint takes over the former responsibility (not the latter though).
I've read a few tutorials, looked at the documentation, and read through most of the protocols bundled with Twisted, but I still can't figure it out: what exactly is a ClientFactory for? Am I using it the right way?
It's for just what you're doing. :) It creates protocol instances to use with connections. A factory is required because you might create connections to many servers (or many connections to one server). However, a lot of people have trouble with ClientFactory so more recently introduced Twisted APIs don't rely on it. For example, you could also do your connection setup as:
endpoint = TCP4StreamClientEndpoint(reactor, "127.0.0.1", 1080)
d = connectProtocol(endpoint, Socks4Client())
...
ClientFactory is now out of the picture.
clientConnectionLosts gets triggered a lot. Sometimes I lose the connection and get a successful response. How is that so? What does this mean, and should I treat it as an error?
Every connection must eventually be lost. You have to decide on your own whether this is an error or not. If you have finished everything you wanted to do and you called loseConnection, it is probably not an error. Consider a connection to an HTTP server. If you have sent your request and received your response, then losing the connection is probably not a big deal. But if you have only received half the response, that's a problem.
How do I make sure that my deferred calls only one callback/errback?
If you structure your code as I described in response to your first question above, it becomes easier to do this. When the code that uses callback/errback on a Deferred is spread across large parts of your program, then it becomes harder to do this correctly.
It is just a matter of proper state tracking, though. Once you give a Deferred a result, you have to arrange to know that you shouldn't give it another one. A common idiom for this is to drop the reference to the Deferred. For example, if you are saving it as the value of an attribute on a protocol instance, then set that attribute to None when you have given the Deferred its result.

How to achieve tcpflow functionality (follow tcp stream) purely within python

I am writing a tool in python (platform is linux), one of the tasks is to capture a live tcp stream and to
apply a function to each line. Currently I'm using
import subprocess
proc = subprocess.Popen(['sudo','tcpflow', '-C', '-i', interface, '-p', 'src', 'host', ip],stdout=subprocess.PIPE)
for line in iter(proc.stdout.readline,''):
do_something(line)
This works quite well (with the appropriate entry in /etc/sudoers), but I would like to avoid calling an external program.
So far I have looked into the following possibilities:
flowgrep: a python tool which looks just like what I need, BUT: it uses pynids
internally, which is 7 years old and seems pretty much abandoned. There is no pynids package
for my gentoo system and it ships with a patched version of libnids
which I couldn't compile without further tweaking.
scapy: this is a package manipulation program/library for python,
I'm not sure if tcp stream
reassembly is supported.
pypcap or pylibpcap as wrappers for libpcap. Again, libpcap is for packet
capturing, where I need stream reassembly which is not possible according
to this question.
Before I dive deeper into any of these libraries I would like to know if maybe someone
has a working code snippet (this seems like a rather common problem). I'm also grateful if
someone can give advice about the right way to go.
Thanks

Jon Oberheide has led efforts to maintain pynids, which is fairly up to date at:
http://jon.oberheide.org/pynids/
So, this might permit you to further explore flowgrep. Pynids itself handles stream reconstruction rather elegantly.See http://monkey.org/~jose/presentations/pysniff04.d/ for some good examples.

Just as a follow-up: I abandoned the idea to monitor the stream on the tcp layer. Instead I wrote a proxy in python and let the connection I want to monitor (a http session) connect through this proxy. The result is more stable and does not need root privileges to run. This solution depends on pymiproxy.
This goes into a standalone program, e.g. helper_proxy.py
from multiprocessing.connection import Listener
import StringIO
from httplib import HTTPResponse
import threading
import time
from miproxy.proxy import RequestInterceptorPlugin, ResponseInterceptorPlugin, AsyncMitmProxy
class FakeSocket(StringIO.StringIO):
def makefile(self, *args, **kw):
return self
class Interceptor(RequestInterceptorPlugin, ResponseInterceptorPlugin):
conn = None
def do_request(self, data):
# do whatever you need to sent data here, I'm only interested in responses
return data
def do_response(self, data):
if Interceptor.conn: # if the listener is connected, send the response to it
response = HTTPResponse(FakeSocket(data))
response.begin()
Interceptor.conn.send(response.read())
return data
def main():
proxy = AsyncMitmProxy()
proxy.register_interceptor(Interceptor)
ProxyThread = threading.Thread(target=proxy.serve_forever)
ProxyThread.daemon=True
ProxyThread.start()
print "Proxy started."
address = ('localhost', 6000) # family is deduced to be 'AF_INET'
listener = Listener(address, authkey='some_secret_password')
while True:
Interceptor.conn = listener.accept()
print "Accepted Connection from", listener.last_accepted
try:
Interceptor.conn.recv()
except: time.sleep(1)
finally:
Interceptor.conn.close()
if __name__ == '__main__':
main()
Start with python helper_proxy.py. This will create a proxy listening for http connections on port 8080 and listening for another python program on port 6000. Once the other python program has connected on that port, the helper proxy will send all http replies to it. This way the helper proxy can continue to run, keeping up the http connection, and the listener can be restarted for debugging.
Here is how the listener works, e.g. listener.py:
from multiprocessing.connection import Client
def main():
address = ('localhost', 6000)
conn = Client(address, authkey='some_secret_password')
while True:
print conn.recv()
if __name__ == '__main__':
main()
This will just print all the replies. Now point your browser to the proxy running on port 8080 and establish the http connection you want to monitor.

how can I get the ip address of the request in a regested function of python xmlrpc server

I'm writing a simple xmlrpc programe in python. something like the following:
def foo(data):
# I want get the calling client's IP address here... How can I ?
server=SimpleXMLRPCServer.SimpleXMLRPCServer((host, port))
server.register_function(foo)
server.handle_request()
As can be seen in the above, I want to get the client IP address in the registed function "foo", how can I ?

You may do so by subclassing the server (and possibly the handler, too). E.g.:
class MyXMLRPCServer(SimpleXMLRPCServer.SimpleXMLRPCServer):
def process_request(self, request, client_address):
self.client_address = client_address
return SimpleXMLRPCServer.SimpleXMLRPCServer.process_request(
self, request, client_address)
server=SimpleXMLRPCServer.MyXMLRPCServer((host, port))
Now server.client_address gives you the desired data. Note that this direct, short coding only works for the single-threaded case (which you're using anyway by choosing the simple server in your code) -- the need to work with the handler comes in if you want to go multi-threaded.

How to bind an ip address to telnetlib in Python

The code below binds an ip address to urllib, urllib2, etc.
import socket
true_socket = socket.socket
def bound_socket(*a, **k):
sock = true_socket(*a, **k)
sock.bind((sourceIP, 0))
return sock
socket.socket = bound_socket
Is it also able to bind an ip address to telnetlib?

telnetlib at least in recent Python releases uses socket.create_connection (see telnetlib's sources here) but that should also be caught by your monkeypatch (sources here -- you'll see it uses a bare identifier socket but that's exactly in the module you're monkeypatching). Of course monkeypatching is always extremely fragile (the tiniest optimization in some future release, hoisting the global lookup of socket in create_connection, and you're toast...;-) so maybe you'll want to monkeypath create_connection directly as a modestly-stronger approach.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Source interface with Python and urllib2 - python

How do i set the source IP/interface with Python and urllib2?

As of Python 2.7 httplib.HTTPConnection had source_address added to it, allowing you to provide an IP port pair to bind to. See: http://docs.python.org/2/library/httplib.html#httplib.HTTPConnection

Related

Monkey patching sockets library to use a specifc network interface

Proper use of a client and Deferred with Twisted

How to achieve tcpflow functionality (follow tcp stream) purely within python

how can I get the ip address of the request in a regested function of python xmlrpc server

How to bind an ip address to telnetlib in Python

Categories

Resources