I have written a simple python script that fetches my ip.
import urllib
import socks
import socket
#set the proxy and port
socks.set_default_proxy(socks.SOCKS5, "127.0.0.1", 9150)
#initialize the socket
socket.socket = socks.socksocket
#store the URL that we want
url = 'https://check.torproject.org/'
#open the URL and store it into 'response'
response = urllib.urlopen(url)
#parse the response
html = response.read()
#print to console
print html
Nothing too complex, however the problem starts when analyzing the response from check.torbrowser. The site will always give me an address that is different from my currently running Tor browser that is on the same page. However, the html response will say that I am being routed through the Tor network but it doesnt look to be coming from the 'standard' tor browser. The latter part I understand, though I did not include it in the code above, I was playing with User-Agent strings and other headers, so I will chalk it up to that being the primary cause. What I do not understand is where in the h-e-double hockey sticks did the IP come from that was served as a response from the py script?
My next question, which builds on top of all this, is how do I connect my python script to the tor network correctly? After a little googling, I found that tor will block traffic for everything other than the socks protocol and that an alternative is to use privoxy in conjunction with tor. My initial thought is to do some kind of routing that would result in the layering of software. In my mind, it would look like:
Python -> Privoxy -> Tor -> Destination
My end goal in all of this is to grab a .onion based address and save/read it. However, I have put that to the side after all of these problems started occurring. A little info to help get better answers: I am using a Windows machine, though I have a Linux one if there is some functionality that may be present there that would help this process, and I am using Python 2.7 though, again, this can be easily changed.
I would like to ask that the steps to make all this happen be laid out - or at least some links/direction, I am by no means afraid to read a few good blogs/tutorials about the subject. However, I feel like this is really a couple of seperate questions, and would require quiet a lengthy answer so I would be more than happy to just know that I am on the right path before I rip more of my hair out :)
Your code is correct, however your assumption that Tor will always give you the same IP address is not. Thanks to circuit isolation, a privacy feature of Tor that ensures isolation between the connections you open, you're routing the request through a different exit node than the Tor Browser will.
Reliably emulating the Tor Browser behavior is hard and I would recommend against it. Your method for connecting to the Tor network looks correct.
Tor will allow you to use any protocol you want, but yes you need to connect through the SOCKS protocol. That's fine though: almost all network protocols (http included) play nicely with SOCKS.
With torpy library you can renew circuits as you wish.
>>> from torpy.http.requests import TorRequests
>>>
>>> def show_ip(resp):
... for line in resp.text.splitlines():
... if 'Your IP address appears to be' in line:
... print(line)
...
>>> with TorRequests() as tor_requests:
... print("build circuit")
... with tor_requests.get_session() as sess:
... show_ip(sess.get("https://check.torproject.org/"))
... show_ip(sess.get("https://check.torproject.org/"))
... print("renew circuit")
... with tor_requests.get_session() as sess:
... show_ip(sess.get("https://check.torproject.org/"))
... show_ip(sess.get("https://check.torproject.org/"))
...
build circuit
<p>Your IP address appears to be: <strong>178.17.171.102</strong></p>
<p>Your IP address appears to be: <strong>178.17.171.102</strong></p>
renew circuit
<p>Your IP address appears to be: <strong>49.50.66.209</strong></p>
<p>Your IP address appears to be: <strong>49.50.66.209</strong></p>
Related
I scraping some pages and these pages check my IP if it is a vpn or proxy (fake IP) if it is found fake the site is blocking my request please if there is a way to change my IP every x time with real IP Without using vpn or proxy or restart router
Note: I am using a Python script for this process
You IPAddress is fixed by your internet service provider, if you reset your home router, u sometimes can take another IPAddress depending on various internal questions.
Some Websites, block by the User-Agent, IP GeoLocation of your request or by rate limit.. but if u sure its is by IP, so the only way to swap your IPAddress is through by VPNTunneling or ProxyMesh.
You can obtain free proxy address from https://www.freeproxylists.net/ . Since these are free proxies so it may get down quickly so sometime you might need to rotate ip with each request you made to your target address.
You can set proxy address, Please follow up this question, how to set proxy, Proxies with Python 'Requests' module
So the flow would be:
Scrape the proxies from above address first.
Then add the proxy header as mentioned in the another question.
Rotate Ip with another request to target.
There are certain blocking factor not only your ip.
Like browser agent (https://www.scrapehero.com/how-to-fake-and-rotate-user-agents-using-python-3/?sfw=pass1637120088).
Too rigorous scraping (try to randomize timing of scraping between two requests).
Not following up robots.txt file (this sometime cant be avoided).
I'm having issues with connecting to the Internet using python.
I am on a corporate network that uses a PAC file to set proxies. Now this would be fine if I could find and parse the PAC to get what I need but I cannot.
The oddity:
R can connect to the internet to download files through wininet and .External(C_download,...) so I know it is possible and when I do:
import ctypes
wininet = ctypes.windll.wininet
flags = ctypes.wintypes.DWORD()
connected = wininet.InternetGetConnectedState(ctypes.byref(flags), None)
print(connected, hex(flags.value))
I get: 1 0x12 so I have a connection available but once I try to use other functions from within wininet I'm constantly met with error functions like:
AttributeError: function 'InternetCheckConnection' not found
and this goes for pretty much any other function of wininet, but this doesn't surprise me as the only named function in dir(wininet) is InternetGetConnectedState.
The wininet approach can clearly work, but I have no idea how to proceed with it [especially given that I only use Windows in work].
"ok, so poor wording - let's just change that to: open a connection to a web page and obtain its content using python "
Sounds like you actually need BeautifulSoup and Requests. Here's a quick example of them being used to explore a webpage
First, I would strongly suggest to install the requests module. Doing HTTP without it on Python is pretty painful.
According to this answer you need to download wpad.dat from the host wpad. That is a text file that contains the proxy address.
Once you know the proxy settings, you can configure requests to use them:
import requests
proxies = {
'http': 'http://10.10.1.10:3128',
'https': 'http://10.10.1.10:1080',
}
requests.get('http://example.org', proxies=proxies)
I am using Tor servers to route the requests of my crawler, which is multithreaded but nonetheless very easy on loading since I make each thread sleep for a random normal time with a mean of 20 seconds (approx 3 requests a minute). I need to get first google search result for some 20,000 odd queries. My crawler is scripted in python using urllib2 (socks proxy) and mechanize (http proxy).
# Snippet of code initializing the urllib2 build_opener
host = socks_hostname
port = socks_port
socks_username = username
socks_password = password
cj = cookielib.CookieJar()
br = urllib2.build_opener(SocksiPyHandler(socks.PROXY_TYPE_SOCKS5, host, port,
username=socks_username,
password=socks_password),
urllib2.HTTPCookieProcessor(cj))
# Get randomly generated User-Agent string.
br.addheaders = [('User-Agent', self.get_user_agent())]
return br
I just discovered that Tor network isn't hiding my IP as far as google is concerned. I wrote a small test script to check the ip address from google and from http://whatismyip.net. While whatismyip.net seems to get some ip based from Canada, Google shows my real ip, this confuses me. I have made sure that I don't have any cookies that can be tracked.
What is even more puzzling is that, when I use the tor in my firefox, then google shows a random ip based in Canada as well. So, it's only when I send automated requests, that my real ip gets exposed, can someone help me figure out what is causing this leak?
I understand crawling is a sensitive topic, but the rate of my crawling is actually slower than a human being!
I am working on a simple program written in Python which sniffs coming network packets. Then, let user use added modules like DoS detection or Ping prevention. With the help of sniffer, I can get incoming connections' IP address, MAC address, protocol flag and packet content. Now, what I want to do is adding a new module that detects whether sender using proxy or not and do some thing according to it. I was searched on the methods that can be used with Python but can not find useful one. How many ways are there to detect proxy for Python?
My sniffer code part is something like that:
.....
sock = socket.socket(socket.PF_PACKET, socket.SOCK_RAW, 8)
while True:
packet = sock.recvfrom(2048)
ipheader = packet[0][14:34]
ip_hdr = struct.unpack("!8sB3s4s4s", ipheader)
sourceIP = socket.inet_ntoa(ip_hdr[3])
tcpheader = packet[0][34:54]
tcp_hdr = struct.unpack("!HH9ss6s", tcpheader)
protoFlag = binascii.hexlify(tcp_hdr[3])
......
Firstly, you mean incoming packets.
secondly,
From the server TCP's point of view it is connected to the proxy, not the downstream client.
so your server can't identify that there is a proxy involved from the packet.
however, if you are in the application level like http proxy, there might be a X-forwarded-for header available in which there will be the original client IP. I said it might be because proxy server will decide whether or not send this header to you. If you are expecting incoming http connections to your server, you can take a look at python's urllib2 although I'm not sure if you can access the X-forwarded-for using this library.
From the docs:
urllib2.urlopen(url[, data][, timeout])
...
This function returns a file-like object with two additional methods:
geturl() — return the URL of the resource retrieved, commonly used to determine if a redirect was followed
info() — return the meta-information of the page, such as headers, in the form of an mimetools.Message instance (see Quick Reference to HTTP Headers)
so using info() will retrieve the headers. hope you find what you're looking for in there.
There aren't many ways to do this, as proxies / VPNs look like real traffic. To add to what Mid said, you can look for headers and/or user agents to help you determine if the user is using a proxy or a VPN.
The only free solution I know is getIPIntel that uses block lists, machine learning, and statistics to determine if the IP is a proxy / VPN or not.
There are other paid solutions like maxmind and blocked.
What you'll need to do is send API queries to these services and parse the results.
I have been trying to use SocksiPy (http://socksipy.sourceforge.net/) and set my sockets with SOCKS5 and set it to go through a local tor service that I am running on my box.
I have the following:
socks.setdefausocks.setdefaultproxy(socks.PROXY_TYPE_SOCKS5, "localhost", 9050, True)
socket.socket = socks.socksocket
import urllib2
And I am doing something similar to:
workItem = "http://192.168.1.1/some/stuff" #obviously not the real url
req = urllib2.Request(workItem)
req.add_header('User-agent', 'Mozilla 5.10')
res = urllib2.urlopen(req, timeout=60)
And even using this I have been identified by the website, my understanding was that I would be coming out of a random end point every time and it wouldn't be able to identify me. And I can confirm if I hit whatsmyip.org with this that my end point is different every time. Is there some other steps I have to take to keep anonymous? I am using an IP address in the url so it shouldn't be doing any DNS resolution that might give it away.
There is no such User-Agent 'Mozilla 5.10' in reality. If the server employs even the simplest fingerprinting based on the User-Agent it will identity you based on this uncommon setting.
And I don't think you understand TOR: it does not provide full anonymity. It only helps by providing anonymity by hiding you real IP address. But it does not help if you give your real name on a web site or use such easily detectable features like an uncommon user agent.
You might have a look at the Design and Implementation Notes for the TOR browser bundle to see what kind of additional steps they take to be less detectable and where they still see open problems. You might also read about Device Fingerprinting which is used to identity the seemingly anonymous peer.