I am working on a simple program written in Python which sniffs coming network packets. Then, let user use added modules like DoS detection or Ping prevention. With the help of sniffer, I can get incoming connections' IP address, MAC address, protocol flag and packet content. Now, what I want to do is adding a new module that detects whether sender using proxy or not and do some thing according to it. I was searched on the methods that can be used with Python but can not find useful one. How many ways are there to detect proxy for Python?
My sniffer code part is something like that:
.....
sock = socket.socket(socket.PF_PACKET, socket.SOCK_RAW, 8)
while True:
packet = sock.recvfrom(2048)
ipheader = packet[0][14:34]
ip_hdr = struct.unpack("!8sB3s4s4s", ipheader)
sourceIP = socket.inet_ntoa(ip_hdr[3])
tcpheader = packet[0][34:54]
tcp_hdr = struct.unpack("!HH9ss6s", tcpheader)
protoFlag = binascii.hexlify(tcp_hdr[3])
......
Firstly, you mean incoming packets.
secondly,
From the server TCP's point of view it is connected to the proxy, not the downstream client.
so your server can't identify that there is a proxy involved from the packet.
however, if you are in the application level like http proxy, there might be a X-forwarded-for header available in which there will be the original client IP. I said it might be because proxy server will decide whether or not send this header to you. If you are expecting incoming http connections to your server, you can take a look at python's urllib2 although I'm not sure if you can access the X-forwarded-for using this library.
From the docs:
urllib2.urlopen(url[, data][, timeout])
...
This function returns a file-like object with two additional methods:
geturl() — return the URL of the resource retrieved, commonly used to determine if a redirect was followed
info() — return the meta-information of the page, such as headers, in the form of an mimetools.Message instance (see Quick Reference to HTTP Headers)
so using info() will retrieve the headers. hope you find what you're looking for in there.
There aren't many ways to do this, as proxies / VPNs look like real traffic. To add to what Mid said, you can look for headers and/or user agents to help you determine if the user is using a proxy or a VPN.
The only free solution I know is getIPIntel that uses block lists, machine learning, and statistics to determine if the IP is a proxy / VPN or not.
There are other paid solutions like maxmind and blocked.
What you'll need to do is send API queries to these services and parse the results.
Related
I scraping some pages and these pages check my IP if it is a vpn or proxy (fake IP) if it is found fake the site is blocking my request please if there is a way to change my IP every x time with real IP Without using vpn or proxy or restart router
Note: I am using a Python script for this process
You IPAddress is fixed by your internet service provider, if you reset your home router, u sometimes can take another IPAddress depending on various internal questions.
Some Websites, block by the User-Agent, IP GeoLocation of your request or by rate limit.. but if u sure its is by IP, so the only way to swap your IPAddress is through by VPNTunneling or ProxyMesh.
You can obtain free proxy address from https://www.freeproxylists.net/ . Since these are free proxies so it may get down quickly so sometime you might need to rotate ip with each request you made to your target address.
You can set proxy address, Please follow up this question, how to set proxy, Proxies with Python 'Requests' module
So the flow would be:
Scrape the proxies from above address first.
Then add the proxy header as mentioned in the another question.
Rotate Ip with another request to target.
There are certain blocking factor not only your ip.
Like browser agent (https://www.scrapehero.com/how-to-fake-and-rotate-user-agents-using-python-3/?sfw=pass1637120088).
Too rigorous scraping (try to randomize timing of scraping between two requests).
Not following up robots.txt file (this sometime cant be avoided).
I am running a captive portal on a cherrypy server and I have set up iptables rules that REDIRECT all http traffic from unregistered MAC addresses to the portal. After a user registers with me via the portal splash page, I add an iptables exception to let their traffic through.
Now what I want to do is redirect the user to the page they were originally going for - before they got sent to the portal. I know that iptables sets a field with the original destination information for all TCP packets, and I know there is a C function called getsockopt to read that field. However, I don't know how to access the socket associated with a request in cherrypy.
Can anybody help me out? :)
I'm not an expert in low-level networking and don't know how common open Wi-Fi authorisation implementations tag its clients. But what seems true to me is that in OSI model, lower layers know nothing about upper layers. In other words IP has no idea of HTTP terms and a page URL specifically.
This way having a socket reference at hand, which I believe is possible to retrieve through customising CherryPy, will give you original IP address at most, not URL. Also mixing networking (IP) and application (HTTP) layers, and generally managing one application entity in several places, will likely result in issues of all sorts. For instance dealing with HTTP speaking agents, forward and reverse proxies for instance, which won't reserve nuances of lower layer.
Update
Okay, since you say you also have the request URL, here is how you can retrieve the raw socket. As you can see, it is deep under the hood and essentially an implementation detail that an end-user shouldn't rely onto. It is not a part of the contract and it can be changed in any next version. Thus you have a good chance to shoot oneself in the foot.
#!/usr/bin/env python
import cherrypy
config = {
'global' : {
'server.socket_host' : '127.0.0.1',
'server.socket_port' : 8080,
'server.thread_pool' : 8
},
}
class App:
#cherrypy.expose
def index(self):
'''For caveats and details on the slippery slope, take a look at ws4py
https://github.com/Lawouach/WebSocket-for-Python/blob/master/ws4py/server/cherrypyserver.py
'''
print(cherrypy.serving.request.rfile.rfile._sock)
return 'Make sure you know what you are doing.'
if __name__ == '__main__':
cherrypy.quickstart(App(), '/', config)
I have been trying to use SocksiPy (http://socksipy.sourceforge.net/) and set my sockets with SOCKS5 and set it to go through a local tor service that I am running on my box.
I have the following:
socks.setdefausocks.setdefaultproxy(socks.PROXY_TYPE_SOCKS5, "localhost", 9050, True)
socket.socket = socks.socksocket
import urllib2
And I am doing something similar to:
workItem = "http://192.168.1.1/some/stuff" #obviously not the real url
req = urllib2.Request(workItem)
req.add_header('User-agent', 'Mozilla 5.10')
res = urllib2.urlopen(req, timeout=60)
And even using this I have been identified by the website, my understanding was that I would be coming out of a random end point every time and it wouldn't be able to identify me. And I can confirm if I hit whatsmyip.org with this that my end point is different every time. Is there some other steps I have to take to keep anonymous? I am using an IP address in the url so it shouldn't be doing any DNS resolution that might give it away.
There is no such User-Agent 'Mozilla 5.10' in reality. If the server employs even the simplest fingerprinting based on the User-Agent it will identity you based on this uncommon setting.
And I don't think you understand TOR: it does not provide full anonymity. It only helps by providing anonymity by hiding you real IP address. But it does not help if you give your real name on a web site or use such easily detectable features like an uncommon user agent.
You might have a look at the Design and Implementation Notes for the TOR browser bundle to see what kind of additional steps they take to be less detectable and where they still see open problems. You might also read about Device Fingerprinting which is used to identity the seemingly anonymous peer.
I am using Flask and need to get the user's IP address. This is usually done through request.remote_addr but since this app is hosted at a 3rd party (and using cloudflare) it just returns the localhost.
Flask suggests getting the X-Forwarded-Host but then they immediately say it is a security risk. Is there a safe way to get the client's real ip?
The Problem
The issue here is not that the ProxyFix itself will cause the user to get access to your system, but rather the fact that the ProxyFix will take what was once mostly reliable information and replace it instead with potentially unreliable information.
For starters, when you don't use ProxyFix, the REMOTE_ADDR attribute is most likely retrieved from the source IP address in the TCP packets. While not impossible, the source IP address in TCP packets are tough to spoof. Therefore, if you need a reliable way to retrieve the user's IP address, REMOTE_ADDR is a good way to do it; in most cases, you can rely on it to provide you something that is accurate when you do request.remote_addr.
The problem is, of course, in a reverse-proxy situation the TCP connection is not coming from the end user; instead, the end user makes a TCP connection with the reverse proxy, and the reverse proxy then makes a second TCP connection with your web app. Therefore, the request.remote_addr in your app will have the IP address of the reverse proxy rather than the original user.
A Potential Solution
ProxyFix is supposed to solve this problem so that you can make request.remote_addr have the user's IP address rather than the proxy. It does this by looking at the typical HTTP header that remote proxies (like Apache and Nginx) add into the HTTP header (X-Forwarded-For) and use the user's IP address it finds there. Note that Cloudflare uses a different HTTP Header, so ProxyFix probably won't help you; you'll need to write your own implementation of this middleware to get request.remote_addr to use the original client's IP address. However, in the rest of this answer I will continue to refer to that fix as "ProxyFix".
This solution, however, is problematic. The problem is that while the TCP header is mostly reliable, the HTTP headers are not; if a user can bypass your reverse proxy and send data right to the server, they can put whatever they want in the HTTP header. For example, they can make the IP address in the HTTP header the IP address of someone else! If you use the IP address for authentication, the user can spoof that authentication mechanism. If you store the IP address in your database and then display it in your application to another user in HTML, the user could inject SQL or Javascript into the header, potentially causing SQL injection or XSS vulnerabilities.
So, to summarize; ProxyFix takes a known mostly-safe solution to retrieve the user's IP address from a TCP packet and switches it to using the not-very-safe-by-itself solution of parsing an easily-spoofed HTTP header.
Therefore, the recomendation to use ProxyFix ONLY in reverse proxy situations means just that: don't use this if you accept connections from places that are NOT the proxy. This is often means have the reverse proxy (like Nginx or Apache) handle all your incoming traffic and have your application that actually uses ProxyFix safe behind a firewall.
You should also read this post which explains how ProxyFix was broken in the past (although is now fixed). This will also explains how ProxyFix works, and give you ideas on how to set your num_proxies argument.
A Better Solution
Let's say your user is at point A, they send the request to Cloudflare (B) which eventually sends the request to your final application (point C). Cloudflare will send the IP address of A in the CF-Connecting-IP header.
As explained above, if the user finds the IP address to point C, they could send a specially crafted HTTP request directly to point C which includes any header info they want. ProxyFix will use its logic to determine what the IP address is from the HTTP header, which of course is problematic if you rely on that value for, well, mostly anything.
Therefore, you might want to look at using something like mod_cloudflare, which allows you to do these proxy fixes directly in the Apache mod, but only when the HTTP connection comes from Cloudflare IP addresses (as defined by the TCP IP source). You can also have it only accept connections from Cloudflare. See How do I restore original visitor IP to my server logs for more info on this and help doing this with other servers (like Nginx).
This should give you a start. However, keep in mind that you're still not "safe": you've only shut down one possible attack vector, and that attack vector assumed that the attacker knew the IP address of your actual application. In that case, the malicious user could try to do a TCP attack with a spoofed Cloudflare IP address, although this would be extremely difficult. More likely, if they wanted to cause havoc, they would just DDOS your source server since they've bypassed Cloudflare. So, there are plenty more things to think about in securing, your application. Hopefully this helps you with understanding how to make one part slightly safer.
I want to make a GET request to retrieve the contents of a web-page or a web service.
I want to send specific headers for this request AND
I want to set the IP address FROM WHICH this request will be sent.
(The server on which this code is running has multiple IP addresses available).
How can I achieve this with Python and its libraries?
I checked urllib2 and it won't set the source address (at least not on Python 2.7). The underlying library is httplib, which does have that feature, so you may have some luck using that directly.
From the httplib documentation:
class httplib.HTTPConnection(host[, port[, strict[, timeout[, source_address]]]])
The optional source_address parameter may be a tuple of a (host, port) to use as the source address the HTTP connection is made from.
You may even be able to convince urllib2 to use this feature by creating a custom HTTPHandler class. You will need to duplicate some code from urllib2.py, because AbstractHTTPHandler is using a simpler version of this call:
class AbstractHTTPHandler(BaseHandler):
# ...
def do_open(self, http_class, req):
# ...
h = http_class(host, timeout=req.timeout) # will parse host:port
Where http_class is httplib.HTTPConnection for HTTP connections.
Probably this would work instead, if patching urllib2.py (or duplicating and renaming it) is an acceptable workaround:
h = http_class(host, timeout=req.timeout, source_address=(req.origin_req_host,0))
There are many options available to you for making http requests. I don't even think there is really a commonly agreed upon "best". You could use any of these:
urllib2: http://docs.python.org/library/urllib2.html
requests: http://docs.python-requests.org/en/v0.10.4/index.html
mechanize: http://wwwsearch.sourceforge.net/mechanize/
This list is not exhaustive. Read the docs and take your pick. Some are lower level and some offer rich browser-like features. All of them let you set headers before making request.