Using twisted to selectively reverse proxy to different servers

Using twisted to selectively reverse proxy to different servers - python

I'm using Twisted (well twistd actually) to serve content like this currently :
twistd -n -o web --path=./foo/
That's fine but I want to send some requests to another server - like this.
When the client requests
localhost/something.html
I want the request to be handled by the twistd server .
But when the client requests
localhost/api/somedata
I want the request to be reverse proxied to another server .
So in summary if the URL contains the string "api" then I want the request reverse proxied elsewhere.
I can see that Twisted has a built in Reverse Proxy but I don't know how to use that so that I can filter requests made in such a way that some requests would get sent off to the alternative server and some wouldn't.

ReverseProxyResource is a resource. You can place it into a resource hierarchy.
root = Resource()
root.putChild("something.html", SomethingHTML())
root.putChild("api", ReverseProxyResource(...))
This is just one example of an arrangement of the resource hierarchy. You can combine ReverseProxyResource with other resources in any of the ways supported by IResource.

Related

Python requests being fingerprinted?

I'm hacking together an amazon api and when only using python requests without proxying, it prompts for a captcha. When routing this python requests traffic through fiddler, it seems to pass without a problem. Is it possible that amazon is fingerprinting python requests and fiddler changes the fingerprint since it's a proxy?
I viewed headers sent from fiddler and python requests and they are the same.
There is no exra proxying/fiddler rules/filters set on fiddler to create a change.
To be clear, all mentioned proxying is only done locally, so it will not change the public ip address.
Thank you!

The reason is that websites are fingerprinting your requests with TLS hello package. There exist libraries like JA3 to generate a fingerprint for each request. They will intentionally block http clients like requests or urllib. If you uses a MITM proxy, because the proxy server create a new TLS connection with the server, the server only sees proxy server's fingerprint, so they will not block it.
If the server only blocks certain popular http libraries, you can simply change the TLS version, then you will have different fingerprint than the default one.
If the server only allows popular real-world browsers, and only accepts them as valid requests, you will need libraries that can simulate browser fingerprints, one of which is curl-impersonate and its python binding curl_cffi.
pip install curl_cffi
from curl_cffi import requests
# Notice the impersonate parameter
r = requests.get("https://tls.browserleaks.com/json", impersonate="chrome101")
print(r.json())
# output: {'ja3_hash': '53ff64ddf993ca882b70e1c82af5da49'
# the fingerprint should be the same as target browser

Pyramid subrequests

I need to call GET, POST, PUT, etc. requests to another URI because of search, but I cannot find a way to do that internally with pyramid. Is there any way to do it at the moment?

Simply use the existing python libraries for calling other webservers.
On python 2.x, use urllib2, for python 3.x, use urllib.request instead. Alternatively, you could install requests.
Do note that calling external sites from your server while serving a request yourself could mean your visitors end up waiting for a 3rd-party web server that stopped responding. Make sure you set decent time outs.

pyramid uses webob which has a client api as of version 1.2
from webob import Request
r = Request.blank("http://google.com")
response = r.send()
generally anything you want to override for the request you would just pass in as a parameter.
from webob import Request
r = Request.blank("http://facebook.com",method="DELETE")
another handy feature is that you can see the request as the http that is passed over the wire
print r
DELETE HTTP/1.0
Host: facebook.com:80
docs

Also check the response status code: response.status_int
I use it for example, to introspect my internal URIs and see whether or not a given relative URI is really served by the framework (example to generate breadcrumbs and make intermediate paths as links only if there are pages behind)

Pylons 0.9.6 Get Current Server Name

In my Pylons config file, I have:
[server:main1]
port = 9090
...config here...
[server:main2]
port = 9091
...config here...
Which are ran using:
paster serve --server-name=main1 ...(more stuff)...
paster serve --server-name=main2 ...(more stuff)...
Now, using Haproxy and Stunnel, I have all http requests going to main1 and all https requests going to main2. I would like some of my controllers to react a little differently based on if they are being requested under http or https but pylons.request.scheme always thinks that it is under http even when it is not.
Seeing as I always know that main2 is always the one handling all https requests, is there a way for the controller to determine what sever name it was ran under or what id it is?

I got around this by just changing the workflow to not have to react differently based on what protocol it's under. It doesn't look like there's a way to pass a unique arbitrary identifier to each separate process that it can read.

How to make very simple http proxy using werkzeug or other python requests framework?

Is it possible to setup a listener on say port 9090 and add a header, like Host: test.host to each request incoming on 9090 and send it on to say 8080?
Thanks
EDIT: I went with a reverse-proxy for now, applying the hostname:port to any request that comes in.

Twisted has an implementation of a reverse proxy that you could modify to suit your needs. You can look at the examples here. If you look at the source code of twisted.web.proxy, you can see that the 'Host:' header is set in ReverseProxyRequest.process, so you could subclass it and set your own header.

Unless you need to tailor the proxied request based on parameters that only your web application can know (for example, you need to authenticate the proxied request with your webapp's custom authentication system), you should use your web server's proxy capabilities.
Example with Apache:
Listen 0.0.0.0:9090
ProxyRequests off
<VirtualHost myhost:9090>
ProxyPass / http://localhost:8080/
ProxyPassReverse / http://localhost:8080/
ProxyPassReverseCookieDomain localhost myhost
</VirtualHost>
If you have to proxy things in a Flask or Werkzeug application, you can use httplib, creating requests based on the incoming request data and returning the response, either raw or modified (eg for link rewriting). It's doable, I have one such proxy in use where there was no good alternative. If you do that I recommend against using regular expressions to rewrite HTML links. I used PyQuery instead, it's far easier to get it right.

Routes in bottle which proxy to another server

I have a bottle.py application which has a number of routes already built. I would like to create a new get route which, when accessed, passes the request along to another HTTP server and relays the result back.
What is the simplest way to get that done?

In principle, all you need is to install the wsgiproxy module and do this:
import bottle
from wsgiproxy.app import WSGIProxyApp
root = bottle.Bottle()
proxy_app = WSGIProxyApp("http://localhost/")
root.mount(proxy_app,"/proxytest")
Running this app will then proxy all requests under /proxytest to the server running on localhost:80. In practice, I found this didn't work without taking extra steps to remove hop-by-hop headers. I took the code in this gist and stripped it down to make a simple app that successfully proxies the request.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Using twisted to selectively reverse proxy to different servers - python

Related

Python requests being fingerprinted?

Pyramid subrequests

Pylons 0.9.6 Get Current Server Name

How to make very simple http proxy using werkzeug or other python requests framework?

Routes in bottle which proxy to another server

Categories

Resources