Conditional upstream proxying with mitmproxy (PAC equivalent module/script)

Conditional upstream proxying with mitmproxy (PAC equivalent module/script) - python

I have a super special proxy i need to use to access certain hosts ( it turns all other traffic away ), and a bunch of complex libraries and applications that can only take a single http proxy configuration parameter for all their http requests. Which are of course a mix of restricted/proxied traffic and traffic that this proxy is refusing to handle.
I've found an example script showing how to manipulate the upstream proxy host/address in upstream mode, but couldn't find any indication in public API, that "breaking out" of upstream mode in a script is possible, to have mitmproxy directly handle traffic instead of sending it upstream, given certain conditions are met ( request target host mostly )
What am i missing? Should i be trying to do this in "regular" mode?
I invoke PAC in the title because it has the DIRECT keyword that allows the library/application to continue processing the request without going to a proxy.
thanks!

i've found evidence that this is in fact not possible and unlikely to be implemented https://github.com/mitmproxy/mitmproxy/issues/2042#issuecomment-280857954 although this issue and comment is very old, there are some recent related and unanswered questions such as How can I switch mitmproxy mode based on attributes of the proxied request
So instead, i'm pivoting to tinyproxy which does seem to provide this exact functionality https://github.com/tinyproxy/tinyproxy/blob/1.10.0/etc/tinyproxy.conf.in#L143
A shame because the replay/monitoring/interactive editing features of mitmproxy would've been amazing to have

Related

Building an http server

So I need to build an HTTP server that will contact a client and send him data like pictures or calculations and create a page with those things. I guess you understood that I do not really know what I'm doing... :(
I know python and the basic(+) of the client-server project but I don't understand that HTTP protocol and didn't understand anything from what I read on the internet...
Can anyone explain to me how to work with this protocol? What is the form of HTTP packets?
Here an example of 1 problem that I don't understand: I have been asked to get a packet (which I did) and understand what is the request there, then send back the name of the file the client wants and after it the file itself. I printed the packet and didn't understand where is the request or what the client wants...
Thank you very very much!

Can anyone explain to me how to work with this protocol? What is the form of HTTP packets?
The specification might be helpful.
Concerning the webz, you find a lot of specification on the RFCs.
More to HTTP below.
(Since you seem to be new to programming, I figured I might want to tell you the following:)
Usually one doesn't directly interact with HTTP(S) packets. Instead you use a framework, such as flask, django, aiohttp and many more. The choice of framework depends on the use-case. E.g.:
You need a database, authentication and any imaginable feature? Go with Django.
You just want to create a WebApplication without a bloated framework? Go with Flask.
You need the bare minimum or want to act as a client? Go with aiohttp.
More frameworks are listed here.
The advantage of using such frameworks is that they usually include useful things, that are battletested (i.e. usually no bugs), while you don't have to figure out pecularities of certain protocols.
You just import the framework and write awesomeness! :)
(Anyways, here is a little very oversimplified overview for completeness)
So, HTTP is an text protocol over TCP, which basically means that you send text over a simple tcp socket. When you receive your request you have to "parse" (i.e. comprehend its contents). Luckily for us the requests are standarized and follow the same scheme.
The smallest request would look like this:
GET / HTTP/1.0
Host: www.server.com
The first line starts with a verb (also called request method), in our example the verb is GET. The / denotes the path. Think of file paths on your HDD. The last part of the first line, namely HTTP/1.0, tells the receiver with which version of HTTP we are operating on. Currently the there is HTTP 1.0 and HTTP 1.1; however, I wouldn't bother with HTTP 1.1 yet and stick with HTTP 1.0, if you're implementing the requests your self.
Lastly the Host: www.server.com line tells us which server we want to talk to, since multiple instances of an HTTP server could be running under the same ip. This is used to revole the subdomain.
If you send this request to an HTTP Server, you're likely to receive an response like this:
HTTP/1.0 200 OK
Server: Apache/1.3.29 (Unix) PHP/4.3.4
Content-Length: 1337
Connection: close
Content-Type: text/html
<DATA>
This response contains the status in the first line HTTP/1.0 200 OK. The number and the 'OK' represent a status code, telling us that everything is fine. There are many status codes with their own meaning and usages.
The lines following the first are so-called Response-Headers. They provide additional useful information about the response. For instance, when we open a site like 'stackoverflow.com', the server transmits an HTML file to us for the browser to interpret. Before we can do that, we need to know the size of the HTML file.
Luckily the server tells us beforehand with Content-Length: 1337 line, that the file is 1337 bytes big. The file itself would be present where the <DATA> placeholder stands.
There are, yet again, many of these headers.
As you can see, there are many things to account for when working with HTTP, showing that it is not feasible, without a very good reason, to implement a HTTP client/server from scratch.
Instead it's preferred to use one of the frameworks (for python) listed above.
As a last note:
In the process of trying to explain the concepts as simple as possible I probably left-out or oversimplified some things. If you find any mistake, please let me know.

Python Uvicorn – obtain SSL certificate information

I have a gunicorn + uvicorn + fastApi stack.
(Basically, I am using https://hub.docker.com/r/tiangolo/uvicorn-gunicorn-fastapi docker image).
I've already implemented SSL based authentication by providing appropriate gunicorn configuration options: certfile, keyfile, ca_certs, cert_reqs.
And it works fine: user have to provide a client SSL certificate in order to be able to make an API calls.
What I need to do now is to obtain client certificate data and pass it further (add it to request headers) into my application, since it contains some client credentials.
For example, I've found a way to do it using gunicorn worker by overrding gunicorn.workers.sync.SyncWorker: https://gist.github.com/jmvrbanac/089540b255d6b40ca555c8e7ee484c13.
But is there a way to do the same thing using UvicornWorker? I've tried to look through the UvicornWorker's source code, but didn't find a way to do it.
I went deeper into the Uvicorn source code, and as far as I understand, in order to access the client TLS certificate data, I need to do some tricks with python asyncio library (https://docs.python.org/3/library/asyncio-eventloop.html), possibly with Server (https://docs.python.org/3/library/asyncio-eventloop.html#asyncio.Server) class and override some of the UvicornWorker's methods.
I am still not quite sure if it is possible to achieve the desired result though.

I ended up setting the nginx (Openresty) in front of my server and added a script to get a client certificate and put it into header.
Here is a part of my nginx config:
set_by_lua_block $client_cert {
local client_certificate = ngx.var.ssl_client_raw_cert
if (client_certificate ~= nil) then
client_certificate = string.gsub(client_certificate, "\n", "")
ngx.req.set_header("X-CLIENT-ID", client_certificate)
end
return client_certificate
}
It is also possible to extract some specific field from a client certificate (like CN, serial number etc.) directly inside nginx configuration, but I decided to pass the whole certificate further.
My problem is solved without using gunicorn as I originally wanted though, but this is the only good solution I've found so far.

Python - How to detect whether coming connections using proxy or not

I am working on a simple program written in Python which sniffs coming network packets. Then, let user use added modules like DoS detection or Ping prevention. With the help of sniffer, I can get incoming connections' IP address, MAC address, protocol flag and packet content. Now, what I want to do is adding a new module that detects whether sender using proxy or not and do some thing according to it. I was searched on the methods that can be used with Python but can not find useful one. How many ways are there to detect proxy for Python?
My sniffer code part is something like that:
.....
sock = socket.socket(socket.PF_PACKET, socket.SOCK_RAW, 8)
while True:
packet = sock.recvfrom(2048)
ipheader = packet[0][14:34]
ip_hdr = struct.unpack("!8sB3s4s4s", ipheader)
sourceIP = socket.inet_ntoa(ip_hdr[3])
tcpheader = packet[0][34:54]
tcp_hdr = struct.unpack("!HH9ss6s", tcpheader)
protoFlag = binascii.hexlify(tcp_hdr[3])
......

Firstly, you mean incoming packets.
secondly,
From the server TCP's point of view it is connected to the proxy, not the downstream client.
so your server can't identify that there is a proxy involved from the packet.
however, if you are in the application level like http proxy, there might be a X-forwarded-for header available in which there will be the original client IP. I said it might be because proxy server will decide whether or not send this header to you. If you are expecting incoming http connections to your server, you can take a look at python's urllib2 although I'm not sure if you can access the X-forwarded-for using this library.
From the docs:
urllib2.urlopen(url[, data][, timeout])
...
This function returns a file-like object with two additional methods:
geturl() — return the URL of the resource retrieved, commonly used to determine if a redirect was followed
info() — return the meta-information of the page, such as headers, in the form of an mimetools.Message instance (see Quick Reference to HTTP Headers)
so using info() will retrieve the headers. hope you find what you're looking for in there.

There aren't many ways to do this, as proxies / VPNs look like real traffic. To add to what Mid said, you can look for headers and/or user agents to help you determine if the user is using a proxy or a VPN.
The only free solution I know is getIPIntel that uses block lists, machine learning, and statistics to determine if the IP is a proxy / VPN or not.
There are other paid solutions like maxmind and blocked.
What you'll need to do is send API queries to these services and parse the results.

Obtaining original destination ip in cherrypy

I am running a captive portal on a cherrypy server and I have set up iptables rules that REDIRECT all http traffic from unregistered MAC addresses to the portal. After a user registers with me via the portal splash page, I add an iptables exception to let their traffic through.
Now what I want to do is redirect the user to the page they were originally going for - before they got sent to the portal. I know that iptables sets a field with the original destination information for all TCP packets, and I know there is a C function called getsockopt to read that field. However, I don't know how to access the socket associated with a request in cherrypy.
Can anybody help me out? :)

I'm not an expert in low-level networking and don't know how common open Wi-Fi authorisation implementations tag its clients. But what seems true to me is that in OSI model, lower layers know nothing about upper layers. In other words IP has no idea of HTTP terms and a page URL specifically.
This way having a socket reference at hand, which I believe is possible to retrieve through customising CherryPy, will give you original IP address at most, not URL. Also mixing networking (IP) and application (HTTP) layers, and generally managing one application entity in several places, will likely result in issues of all sorts. For instance dealing with HTTP speaking agents, forward and reverse proxies for instance, which won't reserve nuances of lower layer.
Update
Okay, since you say you also have the request URL, here is how you can retrieve the raw socket. As you can see, it is deep under the hood and essentially an implementation detail that an end-user shouldn't rely onto. It is not a part of the contract and it can be changed in any next version. Thus you have a good chance to shoot oneself in the foot.
#!/usr/bin/env python
import cherrypy
config = {
'global' : {
'server.socket_host' : '127.0.0.1',
'server.socket_port' : 8080,
'server.thread_pool' : 8
},
}
class App:
#cherrypy.expose
def index(self):
'''For caveats and details on the slippery slope, take a look at ws4py
https://github.com/Lawouach/WebSocket-for-Python/blob/master/ws4py/server/cherrypyserver.py
'''
print(cherrypy.serving.request.rfile.rfile._sock)
return 'Make sure you know what you are doing.'
if __name__ == '__main__':
cherrypy.quickstart(App(), '/', config)

Cant seem to get https and socks proxies to work using python requests

So I'm looking at traffic using wireshark and comparing the output for a number of situations. I'm only looking at traffic between me and google.co.za.
Situation 1: Accessing google.co.za using no proxy
requests.get('www.google.co.za')
This returns a response with status=200 and wireshark displays info about traffic passing between my pc and google's servers. This is great so far.
Situation 2: Accessing google.co.za using valid http proxy
requests.get("http://google.co.za",proxies={'http':proxy})
This returns a response with status=200 and wireshark displays no data about traffic passing between my pc and google's servers. This is great and expected and stuff.
Situation 3: Accessing google.co.za using valid socks proxy
requests.get("http://google.co.za",proxies={'socks':proxy})
result as per situation 1. Hmmm
Situation 4: same deal with https
requests.get("http://google.co.za",proxies={'https':proxy})
same result as situation 1.
Question
So it looks like when I try to use https and socks proxies requests acts as though the proxy argument is empty. Now I need to pass traffic through all sorts of proxies and I don't want any silent failures.
My question is: Why is stuff failing silently and what can I do to fix it?

Requests simply does not yet support either SOCKS or HTTPS proxies.
They're working in it, though. See here: https://github.com/kennethreitz/requests/pull/1515
Support for HTTPS proxies has already been merged into the requests 2.0 branch, so if you like you can try that version; be wary though, as it it is currently an unstable branch.
SOCKS proxy support, on the other hand, is still being worked on in the lower-level library, urllib3: https://github.com/shazow/urllib3/pull/68
Also, regardless of that, you are using the proxies argument incorrectly. It should be of the form {protocol_of_sites_you_visit: proxy}, so once support is complete, using a SOCKS5 proxy would actually be more along the lines of {"http": "socks5://127.0.0.1:9050"}.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.