I want to fetch an IPv6 page with urllib.
Works with square brack IPv6 notation but I have no clue how to (easily) convince python to do an IPv6 request when I give it the FQDN
Like the below ip is: https://www.dslreports.com/whatismyip
from sys import version_info
PY3K = version_info >= (3, 0)
if PY3K:
import urllib.request as urllib
else:
import urllib2 as urllib
url = None
opener = urllib.build_opener()
opener.addheaders = [('User-agent',
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36")]
url = opener.open("http://[2607:fad0:3706:1::1000]/whatismyip", timeout=3)
content = url.read()
I finally solved my issue. Not in the most elegant way, but it works for me.
After reading:
Force requests to use IPv4 / IPv6
and
Python urllib2 force IPv4
I decided to do an DNS lookup and just send a Host header with the FQDN to grab the content. (Host headers are needed for vhosts)
Here is the ugly snippet:
# Ugly hack to get either IPv4 or IPv6 response from server
parsed_uri = urlparse(server)
fqdn = "{uri.netloc}".format(uri=parsed_uri)
scheme = "{uri.scheme}".format(uri=parsed_uri)
path = "{uri.path}".format(uri=parsed_uri)
try:
ipVersion = ip_kind(fqdn[1:-1])
ip = fqdn
except ValueError:
addrs = socket.getaddrinfo(fqdn, 80)
if haveIPv6:
ipv6_addrs = [addr[4][0] for addr in addrs if addr[0] == socket.AF_INET6]
ip = "[" + ipv6_addrs[0] + "]"
else:
ipv4_addrs = [addr[4][0] for addr in addrs if addr[0] == socket.AF_INET]
ip = ipv4_addrs[0]
server = "{}://{}{}".format(scheme, ip, path)
url = urllib.Request(server, None, {'User-agent' : 'Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5'})
# Next line adds the host header
url.host = fqdn
content = urllib.urlopen(url).read()
This is far from ideal and it could be much cleaner but it works for me.
It is implemented here: https://github.com/SteveClement/ipgetter/tree/IPv6
This simply goes through a list of servers that return you your border gateway ip, now in IPv6 too.
[update: this line about Python 2 / Python 3 is non longer valid since the question has been updated]
First, you seem to use Python 2. This is important because the urllib module has been split into parts and renamed in Python 3.
Secondly, your code snippet seems incorrect: build_opener is not a function available with urllib. It is available with urllib2.
So, I assume that your code is in fact the following one:
import urllib2
opener = urllib2.build_opener()
opener.addheaders = [('User-agent',
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36")]
url = opener.open("http://www.dslreports.com/whatismyip", timeout=3)
If your DNS resolver handles correctly IPv6 resource records, and if your operating system is built with dual-stack IPv4/IPv6 or single IPv6-only stack, and if you have a correct IPv6 network path to dslreports.com, this Python program will use IPv6 to connect to www.dslreports.com. So, there is no need to convince python to do an IPv6 request.
Related
I am currently building a proxy rotator for Python. Everything is running fine so far, except for the fact that despite the proxies, the tracker - pages return my own IP.
I have already read through dozens of posts in this forum. It often says "something is wrong with the proxy in this case".
I have a long list of proxies ( about 600 ) which I test with my method and I made sure when I scrapped them that they were marked either "elite" or "anonymous" before I put them on this list.
So can it be that the majority of free proxies are "junk" when it comes to anonymity or am I fundamentally doing something wrong?
And is there basically a way to find out how the proxy is set regarding anonymity?
Python 3.10.
import requests
headers = {
"User-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36"
}
proxi = {"http": ""}
prox_ping_ready = [173.219.112.85:8080,
43.132.148.107:2080,
216.176.187.99:8886,
193.108.21.234:1234,
151.80.120.192:3128,
139.255.10.234:8080,
120.24.33.141:8000,
12.88.29.66:9080,
47.241.66.249:1081,
51.79.205.165:8080,
63.250.53.181:3128,
160.3.168.70:8080]
ipTracker = ["wtfismyip.com/text", "api.ip.sb/ip", "ipecho.net/plain", "ifconfig.co/ip"]
for element in proxy_ping_ready:
for choice in ipTracker:
try:
proxi["http"] = "http://" + element
ips = requests.get(f'https://{choice}', proxies=proxi, timeout=1, headers=headers).text
print(f'My IP address is: {ips}', choice)
except Exception as e:
print("Error:", e)
time.sleep(3)
Output(example):
My IP address is: 89.13.9.135
api.ip.sb/ip
My IP address is: 89.13.9.135
wtfismyip.com/text
My IP address is: 89.13.9.135
ifconfig.co/ip
(Every time my own address).
You only set your proxy for http traffic, you need to include a key for https traffic as well.
proxi["http"] = "http://" + element
proxi["https"] = "http://" + element # or "https://" + element, depends on the proxy
As James mentioned, you should use also https proxy
proxi["https"] = "http://" + element
If you getting max retries with url it most probably means that the proxy is not working or is too slow and overloaded, so you might increase your timeout.
You can verify if your proxy is working by setting it as env variable. I took one from your list
import os
os.environ["http_proxy"] = "173.219.112.85:8080"
os.environ["https_proxy"] = "173.219.112.85:8080"
and then run your code without proxy settings by changing your request to
ips = requests.get(f'wtfismyip.com/text', headers=headers).text
I'm trying to rotate ip's using Tor, Privoxy and Stem but I end up getting always the same ip. I've tried several things (changing proxies, using request sessions, and a lot more) but with no success.
This is my python code:
import requests
from stem import Signal
from stem.control import Controller
with Controller.from_port(port = 9051) as controller:
controller.authenticate('mykey')
controller.signal(Signal.NEWNYM)
#proxies = {
# "http": "http://127.0.0.1:8118"
#}
proxies = {
'http': 'socks5h://127.0.0.1:9050',
'https': 'socks5h://127.0.0.1:9050'
}
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.73.11 (KHTML, like Gecko) Version/7.0.1 Safari/537.73.11'
}
r = requests.get("http://icanhazip.com", proxies=proxies, headers=headers, stream=False)
print (r.text)
.torc file has this config
ExitNodes {ar}
StrictNodes 1
ControlPort 9051
HashedControlPassword 16:BA2B8B2EAC4B391060A6FAA27FA922706F08D0BA0115D79840265D9DC3
privoxy config file has this line
forward-socks5 / 127.0.0.1:9050 .
I've found the problem. The IP Routing was working ok, the problem was that I'd been using the ExitNodes from {ar} and there's only one node for Argentina. So, it's always the same IP.
I found the following method very handy and useful rather than the way you tried above. Make sure to put the right location of your tor.exe file within torexe variable. Proof of concept:
import requests
import os
torexe = os.popen(r"C:\Users\WCS\Desktop\Tor Browser\Browser\TorBrowser\Tor\tor.exe")
with requests.Session() as s:
s.proxies['http'] = 'socks5h://localhost:9050'
res = s.get("http://icanhazip.com")
print(res.text)
torexe.close()
I was trying to convert a very simple function on Python 2 to Python 3 that would scrape a web page and return a list of proxys so I could use on a Twitter robot:
#!/usr/bin/env python
#python25 on windows7
#####################################
# GPL v2
# Author: Arjun Sreedharan
# Email: arjun024#gmail.com
#####################################
import urllib2
import re
import os
import time
import random
def main():
request = urllib2.Request("http://www.ip-adress.com/proxy_list/")
# request.add_header("User-Agent", "Mozilla/5.0 (Windows; U; Windows NT 5.1; es-ES; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5")
#Without Referer header ip-adress.com gives 403 Forbidden
request.add_header("Referer","https://www.google.co.in/")
f = urllib2.urlopen(request)
#outfile = open('outfile.htm','w')
str1 = f.read()
#outfile.write(str1)
# normally DOT matches anycharacter EXCEPT newline. re.DOTALL makes dot
include newline
pattern = re.compile('.*<td>(.*)</td>.*<td>Elite</td>.*', re.DOTALL)
matched = re.search(pattern,str1)
print(matched.group(1))
"""
ip = matched.group(1)
os.system('echo "http_proxy=http://'+ip+'" > ~/.wgetrc')
if random.randint(1,2)==1:
os.system('wget --proxy=on -t 1 --timeout=14 --header="User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; es-ES; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5" http://funnytweets.in -O /dev/null')
else:
os.system('wget --proxy=on -t 1 --timeout=14 --header="User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.2.149.29 Safari/525.13" http://funnytweets.in -O /dev/null')
"""
if __name__ == '__main__':
while True:
main()
time.sleep(2)
Ok, I already know that the urllib2 is diferent on P3 but i could not make it work :( Anyone can help? :) thanks!
In Python3 Request and urlopen are located in the urllib.request module, so hou have to change your imports accordingly.
from urllib.request import Request, urlopen
You could make your code Python2 and Python3 compatible if you catch ImportError exceptions when importing from urllib2.
try :
from urllib2 import Request, urlopen
except ImportError:
from urllib.request import Request, urlopen
Also keep in mind that URLError and HTTPError are located in urllib.error, if you need them.
I am trying to make request through a SOCKS5 proxy server over HTTPS but it fails or returns the empty string. I am using PySocks library.
Here is my example
WEB_SITE_PROXY_CHECK_URL = "whatismyipaddress.com/"
REQUEST_SCHEMA = "https://"
host_url = REQUEST_SCHEMA + WEB_SITE_PROXY_CHECK_URL
socket.connect((host_url, 443))
request = "GET / HTTP/1.1\nHost: " + host_url + "\nUser-Agent:Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11\n\n"
socket.send(request)
response = socket.recv(4096)
print response
But it doesn't work, it prints an empty string response.
Is there any way to make HTTPS request through the socks5 proxy in Python ?
Thanks
As of requests version 2.10.0, released on 2016-04-29, requests
supports SOCKS.
It requires PySocks, which can be installed with pip install pysocks.
import requests
host_url = 'https://example.com'
#Fill in your own proxies' details
proxies={http:'socks5://user:pass#host:port',
https:'socks5://user:pass#host:port'}
#define headers if you will
headers={}
response = requests.get(host_url, headers=headers, proxies=proxies)
Beware, when using a SOCKS proxy, request socks will make HTTP requests with the full URL (e.g., GET example.com HTTP/1.1 rather than GET / HTTP/1.1) and this behavior may cause problems.
I'm trying to make urllib requests to http://google.com in Python 3 (I rewrote it in 2.7 using urllib2 as well, same issue). Below is some of my code:
import urllib.request
from urllib.request import urlopen
import http.cookiejar
cj = http.cookiejar.CookieJar()
opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))
opener.addheaders = [('User-agent', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.91 Safari/537.36')]
def makeRequest():
search = 'http://google.com'
print('About to search...')
response = opener.open(search).read()
print('Done')
makeRequest()
When I run this code, it runs in about 14 seconds:
real 0m14.386s
user 0m0.087s
sys 0m0.027s
This seems to be the case with any Google site (Gmail, Google Play, etc.). When I change the search variable to a different site, such as Stackoverflow or Twitter, it runs in well under half a second:
real 0m0.277s
user 0m0.085s
sys 0m0.017s
Does anyone know what could be causing the slow response from Google?
First, you can use ping or traceroute to google.com and others sites to compare the time delay to see if the DNS issue.
Second, you can use wireshark to sniffer every packets to see if something wrong with the communication.
I think may be DNS issue, but I can't make sure that.