I'm fairly new to twisted, and trying to utilize twisted.web.proxy.ReverseProxyResource to create a reverse proxy. Ultimately I want clients to connect to it using SSL, then I'll validate the request, and pass it only to an SSL backend server.
I'm starting out with the below (very) basic code, but struggling to get it to connect to an SSL backend, and am finding the documentation lacking. Would anyone be able to give me some good pointers, or ideally some example code?
In the code below it obviously won't work because its expecting to hit a plain HTTP server, how would I 'ssl' this?
As always any help is very, very, much appreciated all.
Thanks
Alex
from twisted.internet import reactor
from twisted.web import proxy, server
from twisted.web.resource import Resource
class Simple(Resource):
isLeaf = False
def getChild(self, name, request):
print "getChild called with name:'%s'" % name
#host = request.getAllHeaders()['host']
host = "127.0.0.1" #yes there is an SSL host listening here
return proxy.ReverseProxyResource(host, 443, "/"+name)
simple = Simple()
site = server.Site(simple)
reactor.listenTCP(8000, site)
reactor.run()
ReverseProxyResource does not support TLS. When you write ReverseProxyResource(host, 443, "/"+name) you're creating a resource which will establish a normal TCP connection to host on port 443. The TCP connection attempt will succeed but the TLS handshake will definitely fail - because the client won't even attempt one.
This is a limitation of the current ReverseProxyResource: it doesn't support the feature you want. It's somewhat likely that this feature could be implemented fairly easily. Since ReverseProxyResource was implemented, Twisted has introduced the concept of "endpoints" which make it much easier to write code that is transport-agnostic.
ReverseProxyResource could be updated to work in terms of "endpoints" (preserving backwards compatibility with the current API, though, required by Twisted). This doesn't complicate the implementation much (it may actually simplify it) and would allow you to proxy over any kind of transport for which an endpoint implementation exists (there is one for TLS, there are also many more kinds).
Related
Neither poplib or imaplib seem to offer proxy support and I couldn't find much info about it despite my google-fu attempts.
I'm using python to fetch emails from various imap/pop enabled servers and need to be able to do it through proxies.
Ideally, I'd like to be able to do it in python directly but using a wrapper (external program/script, OSX based) to force all traffic to go through the proxy might be enough if I can't find anything better.
Could anyone give me a hand? I can't imagine I'm the only one who ever needed to fetch emails through a proxy in python...
** EDIT Title edit to remove HTTP, because I shouldn't type so fast when I'm tired, sorry for that guys **
The proxies I'm planning to use allow socks in addition to http.
Pop or Imap work through http wouldn't make much sense (stateful vs stateless) but my understanding is that socks would allow me to do what I want.
So far the only way to achieve what I want seems to be dirty hacking of imaplib... would rather avoid it if I can.
You don't need to dirtily hack imaplib. You could try using the SocksiPy package, which supports socks4, socks5 and http proxy (connect):
Something like this, obviously you'd want to handle the setproxy options better, via extra arguments to a custom __init__ method, etc.
from imaplib import IMAP4, IMAP4_SSL, IMAP4_PORT, IMAP4_SSL_PORT
from socks import sockssocket, PROXY_TYPE_SOCKS4, PROXY_TYPE_SOCKS5, PROXY_TYPE_HTTP
class SocksIMAP4(IMAP4):
def open(self,host,port=IMAP4_PORT):
self.host = host
self.port = port
self.sock = sockssocket()
self.sock.setproxy(PROXY_TYPE_SOCKS5,'socks.example.com')
self.sock.connect((host,port))
self.file = self.sock.makefile('rb')
You could do similar with IMAP4_SSL. Just take care to wrap it into an ssl socket
import ssl
class SocksIMAP4SSL(IMAP4_SSL):
def open(self, host, port=IMAP4_SSL_PORT):
self.host = host
self.port = port
#actual privoxy default setting, but as said, you may want to parameterize it
self.sock = create_connection((host, port), PROXY_TYPE_HTTP, "127.0.0.1", 8118)
self.sslobj = ssl.wrap_socket(self.sock, self.keyfile, self.certfile)
self.file = self.sslobj.makefile('rb')
Answer to my own question...
There's a quick and dirty way to force trafic from a python script to go through a proxy without hassle using Socksipy (thanks MattH for pointing me that way)
import socks
import socket
socks.setdefaultproxy(socks.PROXY_TYPE_SOCKS4,proxy_ip,port,True)
socket.socket = socks.socksocket
That global socket override is obviously a bit brutal, but works as a quick fix till I find the time to properly subclass IMAP4 and IMAP4_SSL.
If I understand you correctly you're trying to put a square peg in a round hole.
An HTTP Proxy only knows how to "talk" HTTP so can't connect to a POP or IMAP server directly.
If you want to do this you'll need to implement your own server somewhere to talk to the mail servers. It would receive HTTP Requests and then make the appropriate calls to the Mail Server. E.g.:
How practical this would be I don't know since you'd have to convert a stateful protocol into a stateless one.
Neither poplib or imaplib seem to offer proxy support and I couldn't find much info about it despite my google-fu attempts.
I'm using python to fetch emails from various imap/pop enabled servers and need to be able to do it through proxies.
Ideally, I'd like to be able to do it in python directly but using a wrapper (external program/script, OSX based) to force all traffic to go through the proxy might be enough if I can't find anything better.
Could anyone give me a hand? I can't imagine I'm the only one who ever needed to fetch emails through a proxy in python...
** EDIT Title edit to remove HTTP, because I shouldn't type so fast when I'm tired, sorry for that guys **
The proxies I'm planning to use allow socks in addition to http.
Pop or Imap work through http wouldn't make much sense (stateful vs stateless) but my understanding is that socks would allow me to do what I want.
So far the only way to achieve what I want seems to be dirty hacking of imaplib... would rather avoid it if I can.
You don't need to dirtily hack imaplib. You could try using the SocksiPy package, which supports socks4, socks5 and http proxy (connect):
Something like this, obviously you'd want to handle the setproxy options better, via extra arguments to a custom __init__ method, etc.
from imaplib import IMAP4, IMAP4_SSL, IMAP4_PORT, IMAP4_SSL_PORT
from socks import sockssocket, PROXY_TYPE_SOCKS4, PROXY_TYPE_SOCKS5, PROXY_TYPE_HTTP
class SocksIMAP4(IMAP4):
def open(self,host,port=IMAP4_PORT):
self.host = host
self.port = port
self.sock = sockssocket()
self.sock.setproxy(PROXY_TYPE_SOCKS5,'socks.example.com')
self.sock.connect((host,port))
self.file = self.sock.makefile('rb')
You could do similar with IMAP4_SSL. Just take care to wrap it into an ssl socket
import ssl
class SocksIMAP4SSL(IMAP4_SSL):
def open(self, host, port=IMAP4_SSL_PORT):
self.host = host
self.port = port
#actual privoxy default setting, but as said, you may want to parameterize it
self.sock = create_connection((host, port), PROXY_TYPE_HTTP, "127.0.0.1", 8118)
self.sslobj = ssl.wrap_socket(self.sock, self.keyfile, self.certfile)
self.file = self.sslobj.makefile('rb')
Answer to my own question...
There's a quick and dirty way to force trafic from a python script to go through a proxy without hassle using Socksipy (thanks MattH for pointing me that way)
import socks
import socket
socks.setdefaultproxy(socks.PROXY_TYPE_SOCKS4,proxy_ip,port,True)
socket.socket = socks.socksocket
That global socket override is obviously a bit brutal, but works as a quick fix till I find the time to properly subclass IMAP4 and IMAP4_SSL.
If I understand you correctly you're trying to put a square peg in a round hole.
An HTTP Proxy only knows how to "talk" HTTP so can't connect to a POP or IMAP server directly.
If you want to do this you'll need to implement your own server somewhere to talk to the mail servers. It would receive HTTP Requests and then make the appropriate calls to the Mail Server. E.g.:
How practical this would be I don't know since you'd have to convert a stateful protocol into a stateless one.
What is Bottle doing in its wsgiref server implementation that the built in Python WSGIref simple server is not? When I look at Bottle, for example, it adheres to the WSGI standard and the documentation states:
1.5.1 Server Options The built-in default server is based on wsgiref WSGIServer. This non-threading HTTP server is perfectly fine for
development and early production, but may become a performance
bottleneck when server load increases.
There are three ways to eliminate this bottleneck:
• Use a different server that is either multi-threaded or asynchronous.
• Start multiple server processes and spread the load with a load-balancer.
• Do both
[emphasis mine]
Yet, everything I have read says to not use the Python wsgrief server for anything production.
What does Bottle do with wsgrief that the built in Python wsgiref does not? I'm not really questioning the wisdom of using asynch servers or "bigger" more "scalable" WSGI servers. But, I'd like to know what Bottle is doing with the wsgiref server that makes it okay for "early Production," the regular library does not.
My application would serve less than 20 people hitting a PostgreSQL or MySQL database, CRUD operations. I guess you could ask a similar question with Flask.
For reference,
http://bottlepy.org/docs/dev/bottle-docs.pdf [pdf]
https://docs.python.org/2/library/wsgiref.html#module-wsgiref.simple_server
https://github.com/bottlepy/bottle/blob/master/bottle.py
This is Bottle's implementation, at least for opening the port:
class WSGIRefServer(ServerAdapter):
def run(self, app): # pragma: no cover
from wsgiref.simple_server import make_server
from wsgiref.simple_server import WSGIRequestHandler, WSGIServer
import socket
class FixedHandler(WSGIRequestHandler):
def address_string(self): # Prevent reverse DNS lookups please.
return self.client_address[0]
def log_request(*args, **kw):
if not self.quiet:
return WSGIRequestHandler.log_request(*args, **kw)
handler_cls = self.options.get('handler_class', FixedHandler)
server_cls = self.options.get('server_class', WSGIServer)
if ':' in self.host: # Fix wsgiref for IPv6 addresses.
if getattr(server_cls, 'address_family') == socket.AF_INET:
class server_cls(server_cls):
address_family = socket.AF_INET6
self.srv = make_server(self.host, self.port, app, server_cls,
handler_cls)
self.port = self.srv.server_port # update port actual port (0 means random)
try:
self.srv.serve_forever()
except KeyboardInterrupt:
self.srv.server_close() # Prevent ResourceWarning: unclosed socket
raise
EDIT:
What is Bottle doing in its wsgiref server implementation that the built in Python WSGIref simple server is not?
What does Bottle do with wsgrief that the built in Python wsgiref does not?
Nothing (of substance).
Not sure I understand your question, but I'll take a stab at helping.
The reason for my confusion is: the code snippet you posted precisely answers [what I think is] your question. Bottle's WSGIRefServer class does nothing substantial except wrap wsgiref.simple_server. (I'm calling the logging and the IPv6 tweaks unsubstantial because they're not related to "production-readiness," which I gather is at the heart of your question.)
Is it possible that you misinterpreted the docs? I'm thinking perhaps yes, because you say:
I'd like to know what Bottle is doing with the wsgiref server that makes it okay for "early Production," the regular library does not.
but the Bottle docs are making the point that Bottle's WSGIRefServer should not be used to handle high throughput loads.
In other words, WSGIRefServer is the same as wsgiref, whereas I think you interpreted the docs as saying that the former is somehow improved over the latter. (It's not.)
Hope this helps!
Not sure if this is the right title for my problem, but here it goes:
I am currently implementing a Distributed Hash Table (DHT) with an API which can be contacted through TCP. It can serve multiple API calls like PUT, GET, Trace, while listening on multiple IP/Port combinations like this:
factory = protocol.ServerFactory()
factory.protocol = DHTServer
for ip in interfaces:
for port in ports:
reactor.listenTCP(int(port), factory, interface=ip)
print ("Listening to: "+ ip +" on Port: "+port)
reactor.run()
Now those "external" API calls are going to be executed by the underlying DHT implementation (Kademlia, Chord or Pastry). Those underlying DHT implementations are using different protocols to communicate with one another. Kademlia for example uses RPC through UDP.
The protocol for the TCP API (DHTServer in the Code above) has an internal DHT protocol like this:
self.protocol = Kademlia(8088, [("192.168.2.1", 8088)])
Now if a client makes two seperate API requests after one another i get this error message on the second request:
line 197, in _bindSocket
raise error.CannotListenError(self.interface, self.port, le)
twisted.internet.error.CannotListenError: Couldn't listen on any:8088: [Errno 10
048] Normalerweise darf jede Socketadresse (Protokoll, Netzwerkadresse oder Ansc
hluss) nur jeweils einmal verwendet werden.
Which basically says that each socket address is only to be used once. I am not quite sure, but i guess it is because for each API request a new DHTServer protocol instance is created, which in turn also creates a new Kademlia instance and both are trying to listen on the same address. But why is this the case? Shouldn't the first DHTServer protocol instance be destroyed after the first request is served? What am i doing wrong? Is there a better way of doing this? I only recently started working with twisted, so please be patient.
Thanks a lot!
I don't know anything about twisted, but kademlia is a stateful network service, having to maintain its routing table and all that.
Consider sharing a single kademlia instance (and thus underlying UDP socket) across your requests.
My solution to this was to write my own Factory with the inner protocol already pre-defined. Thus i can access it from every instance and it stays the same.
What would be the best method to restrict access to my XMLRPC server by IP address? I see the class CGIScript in web/twcgi.py has a render method that is accessing the request... but I am not sure how to gain access to this request in my server. I saw an example where someone patched twcgi.py to set environment variables and then in the server access the environment variables... but I figure there has to be a better solution.
Thanks.
When a connection is established, a factory's buildProtocol is called to create a new protocol instance to handle that connection. buildProtocol is passed the address of the peer which established the connection and buildProtocol may return None to have the connection closed immediately.
So, for example, you can write a factory like this:
from twisted.internet.protocol import ServerFactory
class LocalOnlyFactory(ServerFactory):
def buildProtocol(self, addr):
if addr.host == "127.0.0.1":
return ServerFactory.buildProtocol(self, addr)
return None
And only local connections will be handled (but all connections will still be accepted initially since you must accept them to learn what the peer address is).
You can apply this to the factory you're using to serve XML-RPC resources. Just subclass that factory and add logic like this (or you can do a wrapper instead of a subclass).
iptables or some other platform firewall is also a good idea for some cases, though. With that approach, your process never even has to see the connection attempt.
Okay, another answer is to get the ip address from the transport, inside any protocol:
d = self.transport.getHost () ; print d.type, d.host, d.port
Then use the value to filter it in any way you want.
I'd use a firewall on windows, or iptables on linux.