I'm trying to rotate ip's using Tor, Privoxy and Stem but I end up getting always the same ip. I've tried several things (changing proxies, using request sessions, and a lot more) but with no success.
This is my python code:
import requests
from stem import Signal
from stem.control import Controller
with Controller.from_port(port = 9051) as controller:
controller.authenticate('mykey')
controller.signal(Signal.NEWNYM)
#proxies = {
# "http": "http://127.0.0.1:8118"
#}
proxies = {
'http': 'socks5h://127.0.0.1:9050',
'https': 'socks5h://127.0.0.1:9050'
}
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.73.11 (KHTML, like Gecko) Version/7.0.1 Safari/537.73.11'
}
r = requests.get("http://icanhazip.com", proxies=proxies, headers=headers, stream=False)
print (r.text)
.torc file has this config
ExitNodes {ar}
StrictNodes 1
ControlPort 9051
HashedControlPassword 16:BA2B8B2EAC4B391060A6FAA27FA922706F08D0BA0115D79840265D9DC3
privoxy config file has this line
forward-socks5 / 127.0.0.1:9050 .
I've found the problem. The IP Routing was working ok, the problem was that I'd been using the ExitNodes from {ar} and there's only one node for Argentina. So, it's always the same IP.
I found the following method very handy and useful rather than the way you tried above. Make sure to put the right location of your tor.exe file within torexe variable. Proof of concept:
import requests
import os
torexe = os.popen(r"C:\Users\WCS\Desktop\Tor Browser\Browser\TorBrowser\Tor\tor.exe")
with requests.Session() as s:
s.proxies['http'] = 'socks5h://localhost:9050'
res = s.get("http://icanhazip.com")
print(res.text)
torexe.close()
Related
I am currently building a proxy rotator for Python. Everything is running fine so far, except for the fact that despite the proxies, the tracker - pages return my own IP.
I have already read through dozens of posts in this forum. It often says "something is wrong with the proxy in this case".
I have a long list of proxies ( about 600 ) which I test with my method and I made sure when I scrapped them that they were marked either "elite" or "anonymous" before I put them on this list.
So can it be that the majority of free proxies are "junk" when it comes to anonymity or am I fundamentally doing something wrong?
And is there basically a way to find out how the proxy is set regarding anonymity?
Python 3.10.
import requests
headers = {
"User-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36"
}
proxi = {"http": ""}
prox_ping_ready = [173.219.112.85:8080,
43.132.148.107:2080,
216.176.187.99:8886,
193.108.21.234:1234,
151.80.120.192:3128,
139.255.10.234:8080,
120.24.33.141:8000,
12.88.29.66:9080,
47.241.66.249:1081,
51.79.205.165:8080,
63.250.53.181:3128,
160.3.168.70:8080]
ipTracker = ["wtfismyip.com/text", "api.ip.sb/ip", "ipecho.net/plain", "ifconfig.co/ip"]
for element in proxy_ping_ready:
for choice in ipTracker:
try:
proxi["http"] = "http://" + element
ips = requests.get(f'https://{choice}', proxies=proxi, timeout=1, headers=headers).text
print(f'My IP address is: {ips}', choice)
except Exception as e:
print("Error:", e)
time.sleep(3)
Output(example):
My IP address is: 89.13.9.135
api.ip.sb/ip
My IP address is: 89.13.9.135
wtfismyip.com/text
My IP address is: 89.13.9.135
ifconfig.co/ip
(Every time my own address).
You only set your proxy for http traffic, you need to include a key for https traffic as well.
proxi["http"] = "http://" + element
proxi["https"] = "http://" + element # or "https://" + element, depends on the proxy
As James mentioned, you should use also https proxy
proxi["https"] = "http://" + element
If you getting max retries with url it most probably means that the proxy is not working or is too slow and overloaded, so you might increase your timeout.
You can verify if your proxy is working by setting it as env variable. I took one from your list
import os
os.environ["http_proxy"] = "173.219.112.85:8080"
os.environ["https_proxy"] = "173.219.112.85:8080"
and then run your code without proxy settings by changing your request to
ips = requests.get(f'wtfismyip.com/text', headers=headers).text
https://www.sahibinden.com/en
If you open it incognito window and check headers in Fiddler then these are the two main headers you get:
When I click the last one and check request headers this is what I get
I want to get these headers in Python. Is there any way that I can get these using Selenium? Im a bit clueless here.
You can use Selenium Wire. It is a Selenium extension which has been developed for this exact purpose.
https://pypi.org/project/selenium-wire/
An example after pip install:
## Import webdriver from Selenium Wire instead of Selenium
from seleniumwire import webdriver
## Get the URL
driver = webdriver.Chrome("my/path/to/driver", options=options)
driver.get("https://my.test.url.com")
## Print request headers
for request in driver.requests:
print(request.url) # <--------------- Request url
print(request.headers) # <----------- Request headers
print(request.response.headers) # <-- Response headers
You can run JS command like this;
var req = new XMLHttpRequest()
req.open('GET', document.location, false)
req.send(null)
return req.getAllResponseHeaders()
On Python;
driver.get("https://t.me/codeksiyon")
headers = driver.execute_script("var req = new XMLHttpRequest();req.open('GET', document.location, false);req.send(null);return req.getAllResponseHeaders()")
# type(headers) == str
headers = headers.splitlines()
The bottom line is, No, you can't retrieve the request headers using Selenium.
Details
It had been a long time demand from the Selenium users to add the WebDriver methods to read the HTTP status code and headers from a HTTP response. We have discussed about implementing this feature through Selenium at length within the discussion WebDriver lacks HTTP response header and status code methods.
However, Jason Leyba (Selenium contributor) in his comment straightly mentioned:
We will not be adding this feature to the WebDriver API as it falls outside of our current scope (emulating user actions).
Ashley Leyba further added, attempting to make WebDriver the ideal web testing tool will suffer in overall quality as driver.get(url) blocks until the browser has loaded the page and return the response for the final loaded page. So in case of a login redirects, status codes and headers will always end up with a 200 instead of the 302 you're looking for.
Finally, Simon M Stewart (WebDriver creator) in his comment concluded that:
This feature isn't going to happen. The recommended approach is to either extend the HtmlUnitDriver to access the information you require or to make use of an external proxy that exposes this information such as the BrowserMob Proxy
It's not possible to get headers using Selenium. Further information
However, you might use other libraries such as requests, BeautifulSoup to get headers.
Maybe you can use BrowserMob Proxy for this. Here is a example:
import settings
from browsermobproxy import Server
from selenium.webdriver import DesiredCapabilities
config = settings.Config
server = Server(config.BROWSERMOB_PATH)
server.start()
proxy = server.create_proxy()
from selenium import webdriver
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--proxy-server=%s' % proxy.proxy)
chrome_options.add_argument('--headless')
capabilities = DesiredCapabilities.CHROME.copy()
capabilities['acceptSslCerts'] = True
capabilities['acceptInsecureCerts'] = True
driver = webdriver.Chrome(options=chrome_options,
desired_capabilities=capabilities,
executable_path=config.CHROME_PATH)
proxy.new_har("sahibinden", options={'captureHeaders': True})
driver.get("https://www.sahibinden.com/en")
entries = proxy.har['log']["entries"]
for entry in entries:
if 'request' in entry.keys():
print(entry['request']['url'])
print(entry['request']['headers'])
print('\n')
proxy.close()
driver.quit()
js_headers = '''
const _xhr = new XMLHttpRequest();
_xhr.open("HEAD", document.location, false);
_xhr.send(null);
const _headers = {};
_xhr.getAllResponseHeaders().trim().split(/[\\r\\n]+/).map((value) => value.split(/: /)).forEach((keyValue) => {
_headers[keyValue[0].trim()] = keyValue[1].trim();
});
return _headers;
'''
page_headers = driver.execute_script(js_headers)
type(page_headers) # -> dict
You can use https://pypi.org/project/selenium-wire/ a plug-in replacement for webdriver adding request/response manipulation even for https by using its own local ssl certificate.
from seleniumwire import webdriver
d = webdriver.Chrome() # make sure chrome/chromedriver is in path
d.get('https://en.wikipedia.org')
vars(d.requests[-1].headers)
will list the headers in the last requests object list:
{'policy': Compat32(), '_headers': [('content-length', '1361'),
('content-type', 'application/json'), ('sec-fetch-site', 'none'),
('sec-fetch-mode', 'no-cors'), ('sec-fetch-dest', 'empty'),
('user-agent', 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.102 Safari/537.36'),
('accept-encoding', 'gzip, deflate, br')],
'_unixfrom': None, '_payload': None, '_charset': None,
'preamble': None, 'epilogue': None, 'defects': [], '_default_type': 'text/plain'}
I've recently been experimenting with using proxies with Python requests and cannot seem to get them to work. Although the requests go through with the proxies, testing that I have done has lead me to believe the proxies aren't being applied to the request. Even with obviously bad proxies, my requests still go through, which makes me think the proxy was not being used at all. To demonstrate this, I made a simple script (the working proxy has been edited for this post):
import requests
proxy1 = {"http":"http://this:should#not:work"}
proxy2= {"http":"http://this:proxy#is.working.com:33128"}
r1 = requests.get("https://google.com", proxies=proxy2)
print(r1.status_code)
#prints 200 as expected
r2 = requests.get("https://google.com", proxies=proxy1)
print(r2.status_code)
#prints 200 which is weird since I was expecting the request to not go through
Does anyone know why this is happening and if the requests actually are being used with the proxies?
In both examples you define proxy only for http
proxy1 = {"http": "http://this:should#not:work"}
proxy2 = {"http": "http://this:proxy#is.working.com:33128"}
but you use url with https:
https://google.com
so requests doesn't use proxy.
You have to define proxy for https
proxy1 = {"https": "http://this:should#not:work"}
proxy2 = {"https": "http://this:proxy#is.working.com:33128"}
Doc: requests: proxy
EDIT:
Using https://httpbin.org/get you can test GET requests and it will send you back all your headers and IP
I took proxy from one of page with free proxies so it may not work for some time
import requests
proxy = {"https": "104.244.75.26:8080"}
r = requests.get("https://httpbin.org/get", proxies=proxy1)
print(r.status_code)
print(r.text)
Result shows IP of proxy
200
{
"args": {},
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Host": "httpbin.org",
"User-Agent": "python-requests/2.22.0"
},
"origin": "104.244.75.26, 104.244.75.26",
"url": "https://httpbin.org/get"
}
I want to fetch an IPv6 page with urllib.
Works with square brack IPv6 notation but I have no clue how to (easily) convince python to do an IPv6 request when I give it the FQDN
Like the below ip is: https://www.dslreports.com/whatismyip
from sys import version_info
PY3K = version_info >= (3, 0)
if PY3K:
import urllib.request as urllib
else:
import urllib2 as urllib
url = None
opener = urllib.build_opener()
opener.addheaders = [('User-agent',
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36")]
url = opener.open("http://[2607:fad0:3706:1::1000]/whatismyip", timeout=3)
content = url.read()
I finally solved my issue. Not in the most elegant way, but it works for me.
After reading:
Force requests to use IPv4 / IPv6
and
Python urllib2 force IPv4
I decided to do an DNS lookup and just send a Host header with the FQDN to grab the content. (Host headers are needed for vhosts)
Here is the ugly snippet:
# Ugly hack to get either IPv4 or IPv6 response from server
parsed_uri = urlparse(server)
fqdn = "{uri.netloc}".format(uri=parsed_uri)
scheme = "{uri.scheme}".format(uri=parsed_uri)
path = "{uri.path}".format(uri=parsed_uri)
try:
ipVersion = ip_kind(fqdn[1:-1])
ip = fqdn
except ValueError:
addrs = socket.getaddrinfo(fqdn, 80)
if haveIPv6:
ipv6_addrs = [addr[4][0] for addr in addrs if addr[0] == socket.AF_INET6]
ip = "[" + ipv6_addrs[0] + "]"
else:
ipv4_addrs = [addr[4][0] for addr in addrs if addr[0] == socket.AF_INET]
ip = ipv4_addrs[0]
server = "{}://{}{}".format(scheme, ip, path)
url = urllib.Request(server, None, {'User-agent' : 'Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5'})
# Next line adds the host header
url.host = fqdn
content = urllib.urlopen(url).read()
This is far from ideal and it could be much cleaner but it works for me.
It is implemented here: https://github.com/SteveClement/ipgetter/tree/IPv6
This simply goes through a list of servers that return you your border gateway ip, now in IPv6 too.
[update: this line about Python 2 / Python 3 is non longer valid since the question has been updated]
First, you seem to use Python 2. This is important because the urllib module has been split into parts and renamed in Python 3.
Secondly, your code snippet seems incorrect: build_opener is not a function available with urllib. It is available with urllib2.
So, I assume that your code is in fact the following one:
import urllib2
opener = urllib2.build_opener()
opener.addheaders = [('User-agent',
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36")]
url = opener.open("http://www.dslreports.com/whatismyip", timeout=3)
If your DNS resolver handles correctly IPv6 resource records, and if your operating system is built with dual-stack IPv4/IPv6 or single IPv6-only stack, and if you have a correct IPv6 network path to dslreports.com, this Python program will use IPv6 to connect to www.dslreports.com. So, there is no need to convince python to do an IPv6 request.
I am try to learn python, but I have no knowledge about HTTP, I read some posts here about how to use requests to login web site. But it doesn't work. My simple code is here (not real number and password):
#!/usr/bin/env python3
import requests
login_data = {'txtDID': '111111111',
'txtPswd': 'mypassword'}
with requests.Session() as c:
c.post('http://phone.ipkall.com/login.asp', data=login_data)
r = c.get('http://phone.ipkall.com/update.asp')
print(r.text)
print("Done")
But I can't get my personal information which should be showed after login. Can anyone give me some hint? Or point me a direction? I have no idea what's going wrong.
Servers don't like bots (scripts) for security reason. So your script have to behave like human using real browser. First use get() to get session cookies, set user-agent in headers to real one. Use http://httpbin.org/headers to see what user-agent is send by your browser.
Always check results r.status_code and r.url
So you can start with this:
(I don't have acount on this server so I can't test it)
#!/usr/bin/env python3
import requests
s = requests.Session()
s.headers.update({
'User-agent': "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:30.0) Gecko/20100101 Firefox/30.0",
})
# --------
# to get cookies, session ID, etc.
r = s.get('http://phone.ipkall.com/login.asp')
print( r.status_code, r.url )
# --------
login_data = {
'txtDID': '111111111',
'txtPswd': 'mypassword',
'submit1': 'Submit'
}
r = s.post('http://phone.ipkall.com/process.asp?action=verify', data=login_data)
print( r.status_code, r.url )
# --------
BTW: If page use JavaScript you have problem because requests can't run javascript on page.