Using multiple proxies to open a link in urllib2 - python

What i am trying to do is read a line(an ip address), open the website with that address, and then repeat with all the addresses in the file. instead, i get an error. I am new to python, so maybe its a simple mistake. Thanks in advance !!!
CODE:
>>> f = open("proxy.txt","r"); #file containing list of ip addresses
>>> address = (f.readline()).strip(); # to remove \n at end of line
>>>
>>> while line:
proxy = urllib2.ProxyHandler({'http': address })
opener = urllib2.build_opener(proxy)
urllib2.install_opener(opener)
urllib2.urlopen('http://www.google.com')
address = (f.readline()).strip();
ERROR:
Traceback (most recent call last):
File "<pyshell#15>", line 5, in <module>
urllib2.urlopen('http://www.google.com')
File "D:\Programming\Python\lib\urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "D:\Programming\Python\lib\urllib2.py", line 394, in open
response = self._open(req, data)
File "D:\Programming\Python\lib\urllib2.py", line 412, in _open
'_open', req)
File "D:\Programming\Python\lib\urllib2.py", line 372, in _call_chain
result = func(*args)
File "D:\Programming\Python\lib\urllib2.py", line 1199, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "D:\Programming\Python\lib\urllib2.py", line 1174, in do_open
raise URLError(err)
URLError: <urlopen error [Errno 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond>

It means that the proxy is unavailable.
Here's a proxy checker that checks several proxies simultaneously:
#!/usr/bin/env python
import fileinput # accept proxies from files or stdin
try:
from gevent.pool import Pool # $ pip install gevent
import gevent.monkey; gevent.monkey.patch_all() # patch stdlib
except ImportError: # fallback on using threads
from multiprocessing.dummy import Pool
try:
from urllib2 import ProxyHandler, build_opener
except ImportError: # Python 3
from urllib.request import ProxyHandler, build_opener
def is_proxy_alive(proxy, timeout=5):
opener = build_opener(ProxyHandler({'http': proxy})) # test redir. and such
try: # send request, read response headers, close connection
opener.open("http://example.com", timeout=timeout).close()
except EnvironmentError:
return None
else:
return proxy
candidate_proxies = (line.strip() for line in fileinput.input())
pool = Pool(20) # use 20 concurrent connections
for proxy in pool.imap_unordered(is_proxy_alive, candidate_proxies):
if proxy is not None:
print(proxy)
Usage:
$ python alive-proxies.py proxy.txt
$ echo user:password#ip:port | python alive-proxies.py

Related

Python urllib2 setting timeout [duplicate]

This question already has answers here:
setting the timeout on a urllib2.request() call
(3 answers)
Closed 7 years ago.
I'm trying to fetch a url by making a POST request using Python's urllib2 module. I'm constructing the request in the following way.
handler = urllib2.HTTPHandler()
opener = urllib2.build_opener(handler)
url = 'xyz...'
request = urllib2.Request(url,data='{}')
request.add_header('Content-Type','application/json')
request.get_method = lambda: 'POST'
try:
connection = opener.open(request)
except urllib2.HTTPError as e:
connection = e
except urllib2.URLError as e:
print 'TIMEOUT: ' + e.reason
I want to set a timeout for the open request someplace. Per the docs https://docs.python.org/3.1/library/urllib.request.html
the build_opener() call should return a OpenDirector instance which should have a timeout parameter. But I can't seem to get it to work. Also, the reason I'm constructing a request is because I need to specify an empty body data='{}' in the request and I can't seem to be able to get that going with urlopen either. Any help appreciated.
You can pass timeout as a parameter to the open method call of the opener.
Normal functioning using lambda function to ensure request is POST rather than GET with no body
>>> import urllib2
>>> handler = urllib2.HTTPHandler()
>>> opener = urllib2.build_opener(handler)
>>> request = urllib2.Request('http://httpbin.org/post')
>>> request.get_method = lambda: 'POST'
>>> opener.open(request)
<addinfourl at 4363264800 whose fp = <socket._fileobject object at 0x101b654d0>>
Simply add timeout,
>>> opener.open(request, timeout=0.01)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 431, in open
response = self._open(req, data)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 449, in _open
'_open', req)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 409, in _call_chain
result = func(*args)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 1227, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 1197, in do_open
raise URLError(err)
urllib2.URLError: <urlopen error timed out>

How to send POST data from Python to PHP scripts with basic HTTP authentication?

I try to send POST data from a Python program to a PHP file that uses basic HTTP authentication. I run this code:
import urllib.parse
from urllib.request import urlopen
path="https://username:password#url_to_my_file.php"
path=path.encode('utf8')
data=urllib.parse.urlencode({"Hello":"There"})
data=data.encode('utf8')
req=urlopen(path,mydata)
req.add_header("Content-type","application/x-www-form-urlencoded")
page=urllib.urlopen(req).read()
I got this error:
req.data=data
AttributeError: 'bytes' object has not attribute 'data'
How can I fix this bug ?
UPDATE:
Following the solution below, I changed my code this way:
from urllib.request import HTTPPasswordMgrWithDefaultRealm, HTTPBasicAuthHandler, build_opener, Request
import urllib
url="https://www.my_website.com/file.php"
path="http://my_username:my_password#https://www.my_website.com/file.php"
mydata=urllib.parse.urlencode({"Hello":"Test"})
pwmgr = HTTPPasswordMgrWithDefaultRealm()
pwmgr.add_password(None, url, 'my_username', 'my_password')
authhandler = HTTPBasicAuthHandler(pwmgr)
opener = build_opener(authhandler)
req = Request(path, mydata)
req.add_header("Content-type","application/x-www-form-urlencoded")
page = opener.open(req).read()
I got these errors:
Traceback (most recent call last):
File "/usr/local/python3.1.3/lib/python3.1/http/client.py", line 673, in _set_hostport
port = int(host[i+1:])
ValueError: invalid literal for int() with base 10: ''
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "a1.py", line 17, in <module>
page = opener.open(req).read()
File "/usr/local/python3.1.3/lib/python3.1/urllib/request.py", line 350, in open
response = self._open(req, data)
File "/usr/local/python3.1.3/lib/python3.1/urllib/request.py", line 368, in _open
'_open', req)
File "/usr/local/python3.1.3/lib/python3.1/urllib/request.py", line 328, in _call_chain
result = func(*args)
File "/usr/local/python3.1.3/lib/python3.1/urllib/request.py", line 1112, in http_open
return self.do_open(http.client.HTTPConnection, req)
File "/usr/local/python3.1.3/lib/python3.1/urllib/request.py", line 1065, in do_open
h = http_class(host, timeout=req.timeout) # will parse host:port
File "/usr/local/python3.1.3/lib/python3.1/http/client.py", line 655, in __init__
self._set_hostport(host, port)
File "/usr/local/python3.1.3/lib/python3.1/http/client.py", line 675, in _set_hostport
raise InvalidURL("nonnumeric port: '%s'" % host[i+1:])
http.client.InvalidURL: nonnumeric port: ''
You opened the URL twice. First with:
req=urlopen(path,mydata)
Then again with:
page=urllib.urlopen(req).read()
If you wanted to create a separate Request object, do so:
from urllib.request import urlopen, Request
req = Request(path, mydata)
req.add_header("Content-type","application/x-www-form-urlencoded")
page = urlopen(req).read()
Note that you should not encode the URL; it should be a str value.
urllib.request will also not parse authentication information from the URL; you'll need to provide that separately by using a password manager:
from urllib.request import HTTPPasswordMgrWithDefaultRealm, HTTPBasicAuthHandler, build_opener
url = "https://url_to_my_file.php"
pwmgr = HTTPPasswordMgrWithDefaultRealm()
pwmgr.add_password(None, url, 'username', 'password')
authhandler = HTTPBasicAuthHandler(pwmgr)
opener = build_opener(authhandler)
req = Request(path, mydata)
req.add_header("Content-type","application/x-www-form-urlencoded")
page = opener.open(req).read()

Python and proxy - urllib2.URLError: <urlopen error [Errno 110] Connection timed out>

I tried to google and search for similar question on stackOverflow, but still can't solve my problem.
I need my python script to perform http connections via proxy.
Below is my test script:
import urllib2, urllib
proxy = urllib2.ProxyHandler({'http': 'http://255.255.255.255:3128'})
opener = urllib2.build_opener(proxy, urllib2.HTTPHandler)
urllib2.install_opener(opener)
conn = urllib2.urlopen('http://www.whatismyip.com/')
return_str = conn.read()
webpage = open('webpage.html', 'w')
webpage.write(return_str)
webpage.close()
This script works absolutely fine on my local computer (Windows 7, Python 2.7.3), but when I try to run it on the server, it gives me the following error:
Traceback (most recent call last):
File "proxy_auth.py", line 18, in <module>
conn = urllib2.urlopen('http://www.whatismyip.com/')
File "/home/myusername/python/lib/python2.7/urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "/home/myusername/python/lib/python2.7/urllib2.py", line 400, in open
response = self._open(req, data)
File "/home/myusername/python/lib/python2.7/urllib2.py", line 418, in _open
'_open', req)
File "/home/myusername/python/lib/python2.7/urllib2.py", line 378, in _call_chai n
result = func(*args)
File "/home/myusername/python/lib/python2.7/urllib2.py", line 1207, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/home/myusername/python/lib/python2.7/urllib2.py", line 1177, in do_open
raise URLError(err)
urllib2.URLError: <urlopen error [Errno 110] Connection timed out>
I also tried to use requests library, and got the same error.
# testing request library
r = requests.get('http://www.whatismyip.com/', proxies={'http':'http://255.255.255.255:3128'})
If I don't set proxy, then the program works fine.
# this works fine
conn = urllib2.urlopen('http://www.whatismyip.com/')
I think the problem is that on my shared hosting account it is not possible to set an environment variable for proxy ... or something like that.
Are there any workarounds or alternative approaches that would let me set proxies for http connections? How should I modify my test script?
The problem was in closed ports.
I had to buy a dedicated IP before tech support could open the ports I needed.
Now my script works fine.
Conclusion: when you are on a shared hosting, most ports are probably closed and you will have to contact tech support to open ports.

Can't get Custom DNS server working in Python

I'm having real trouble getting python to use a custom dns server.
I have followed this Tell urllib2 to use custom DNS
If I don't specify a self.host and self.port, it will go through without blocking.
Here is the code:
import urllib2
import httplib
import socket
class MyHTTPConnection (httplib.HTTPConnection):
def connect (self):
if self.host == 'www.porn.com':
self.host = '208.67.222.123' #OpenDNS FamilyShield
self.port = 53
self.sock = socket.create_connection ((self.host, self.port))
class MyHTTPHandler (urllib2.HTTPHandler):
def http_open (self, req):
return self.do_open (MyHTTPConnection, req)
opener = urllib2.build_opener(MyHTTPHandler)
urllib2.install_opener (opener)
f = urllib2.urlopen ('http://www.porn.com/videos/anime-toon.html')
data = f.read ()
print data
I keep getting a "raise BadStatusLine(line)" error
Error log:
Traceback (most recent call last):
File "K:\Desktop\rte\dns2.py", line 16, in <module>
f = urllib2.urlopen ('http://www.porn.com/videos/anime-toon.html')
File "C:\Python27\lib\urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "C:\Python27\lib\urllib2.py", line 394, in open
response = self._open(req, data)
File "C:\Python27\lib\urllib2.py", line 412, in _open
'_open', req)
File "C:\Python27\lib\urllib2.py", line 372, in _call_chain
result = func(*args)
File "K:\Desktop\rte\dns2.py", line 12, in http_open
return self.do_open (MyHTTPConnection, req)
File "C:\Python27\lib\urllib2.py", line 1170, in do_open
r = h.getresponse(buffering=True)
File "C:\Python27\lib\httplib.py", line 1027, in getresponse
response.begin()
File "C:\Python27\lib\httplib.py", line 407, in begin
version, status, reason = self._read_status()
File "C:\Python27\lib\httplib.py", line 371, in _read_status
raise BadStatusLine(line)
BadStatusLine: ''
EDIT: Going on isedev response, that I was going about it the wrong way.
It doesn't seem to register with urllib2 the changes to the namesservers
import dns.resolver
import urllib2
resolver = dns.resolver.Resolver()
resolver.nameservers = ['208.67.222.123']
answer = resolver.query('www.porn.com','A')
web_url = 'http://www.porn.com/videos/anime-toon.html'
req1 = urllib2.Request(web_url)
req1.add_header('User-Agent', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.0.3) Gecko/2008092417 Firefox/3.0.3')
response1 = urllib2.urlopen(req1)
html=response1.read()
print html
I think you've misunderstood what's being done in the "Custom DNS" answer you refer to. The example given in that solution is not in fact setting up a custom DNS server - the MyResolver class is given as example only and performs a hard-coded name-to-IP for 'news.bbc.co.uk'.
So what your code is actually doing is redirecting an HTTP request to 'www.porn.com' (port 80) to the OpenDNS Family Shield DNS server (on port 53)... which will obviously lead to the error you're getting.
So what you need to do is replace:
if self.host == 'www.porn.com':
self.host = '208.67.222.123' #OpenDNS FamilyShield
self.port = 53
with code that actually resolves 'www.porn.com' against the chosen DNS server directly (using dnspython for instance).
Assuming you've got the dnspython package installed, you could do something like:
import urllib2
import httplib
import socket
import dns.resolver
class MyHTTPConnection (httplib.HTTPConnection):
def connect (self):
if self.host == 'www.porn.com':
resolver = dns.resolver.Resolver()
resolver.nameservers = ['208.67.222.123']
answer = resolver.query(self.host,'A')
self.host = answer.rrset.items[0].address
self.sock = socket.create_connection ((self.host, self.port))
class MyHTTPHandler (urllib2.HTTPHandler):
def http_open (self, req):
return self.do_open (MyHTTPConnection, req)
opener = urllib2.build_opener(MyHTTPHandler)
urllib2.install_opener (opener)
f = urllib2.urlopen ('http://www.porn.com/videos/anime-toon.html')
data = f.read ()
print data
This code returns '404 - not found' and network trace shows HTTP request to 'hit-adult.opendns.com', which is what 'www.porn.com' resolves to when using the '208.67.222.123' nameserver:
dig #208.67.222.123 www.porn.com A
;; ANSWER SECTION:
www.porn.com. 0 IN A 67.215.65.130
nslookup 67.215.65.130
130.65.215.67.in-addr.arpa name = hit-adult.opendns.com.
The above is an example only. Real code would require error checking, etc...

In Python 3.2, I can open and read an HTTPS web page with http.client, but urllib.request is failing to open the same page

I want to open and read https://yande.re/ with urllib.request, but I'm getting an SSL error. I can open and read the page just fine using http.client with this code:
import http.client
conn = http.client.HTTPSConnection('www.yande.re')
conn.request('GET', 'https://yande.re/')
resp = conn.getresponse()
data = resp.read()
However, the following code using urllib.request fails:
import urllib.request
opener = urllib.request.build_opener()
resp = opener.open('https://yande.re/')
data = resp.read()
It gives me the following error: ssl.SSLError: [Errno 1] _ssl.c:392: error:1411809D:SSL routines:SSL_CHECK_SERVERHELLO_TLSEXT:tls invalid ecpointformat list. Why can I open the page with HTTPSConnection but not opener.open?
Edit: Here's my OpenSSL version and the traceback from trying to open https://yande.re/
>>> import ssl; ssl.OPENSSL_VERSION
'OpenSSL 1.0.0a 1 Jun 2010'
>>> import urllib.request
>>> urllib.request.urlopen('https://yande.re/')
Traceback (most recent call last):
File "<pyshell#3>", line 1, in <module>
urllib.request.urlopen('https://yande.re/')
File "C:\Python32\lib\urllib\request.py", line 138, in urlopen
return opener.open(url, data, timeout)
File "C:\Python32\lib\urllib\request.py", line 369, in open
response = self._open(req, data)
File "C:\Python32\lib\urllib\request.py", line 387, in _open
'_open', req)
File "C:\Python32\lib\urllib\request.py", line 347, in _call_chain
result = func(*args)
File "C:\Python32\lib\urllib\request.py", line 1171, in https_open
context=self._context, check_hostname=self._check_hostname)
File "C:\Python32\lib\urllib\request.py", line 1138, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error [Errno 1] _ssl.c:392: error:1411809D:SSL routines:SSL_CHECK_SERVERHELLO_TLSEXT:tls invalid ecpointformat list>
>>>
What a coincidence! I'm having the same problem as you are, with an added complication: I'm behind a proxy. I found this bug report regarding https-not-working-with-urllib. Luckily, they posted a workaround.
import urllib.request
import ssl
##uncomment this code if you're behind a proxy
##https port is 443 but it doesn't work for me, used port 80 instead
##proxy_auth = '{0}://{1}:{2}#{3}'.format('https', 'username', 'password',
## 'proxy:80')
##proxies = { 'https' : proxy_auth }
##proxy = urllib.request.ProxyHandler(proxies)
##proxy_auth_handler = urllib.request.HTTPBasicAuthHandler()
##opener = urllib.request.build_opener(proxy, proxy_auth_handler,
## https_sslv3_handler)
https_sslv3_handler =
urllib.request.HTTPSHandler(context=ssl.SSLContext(ssl.PROTOCOL_SSLv3))
opener = urllib.request.build_opener(https_sslv3_handler)
urllib.request.install_opener(opener)
resp = opener.open('https://yande.re/')
data = resp.read().decode('utf-8')
print(data)
Btw, thanks for showing how to use http.client. I didn't know that there's another library that can be used to connect to the internet. ;)
This is due to a bug in the early 1.x OpenSSL implementation of elliptic curve cryptography. Take a closer look at the relevant part of the exception:
_ssl.c:392: error:1411809D:SSL routines:SSL_CHECK_SERVERHELLO_TLSEXT:tls invalid ecpointformat list
This is an error from the underlying OpenSSL library code which is a result of mishandling the EC point format TLS extension. One workaround is to use the SSLv3 instead of SSLv23 method, the other workaround is to use a cipher suite specification which disables all ECC cipher suites (I had good results with ALL:-ECDH, use openssl ciphers for testing). The fix is to update OpenSSL.
The problem is due to the hostnames that your giving in the two examples:
import http.client
conn = http.client.HTTPSConnection('www.yande.re')
conn.request('GET', 'https://yande.re/')
and...
import urllib.request
urllib.request.urlopen('https://yande.re/')
Note that in the first example, you're asking the client to make a connection to the host: www.yande.re and in the second example, urllib will first parse the url 'https://yande.re' and then try a request at the host yande.re
Although www.yande.re and yande.re may resolve to the same IP address, from the perspective of the web server these are different virtual hosts. My guess is that you had an SNI configuration problem on your web server's side. Seeing as that the original question was posted on May 21, and the current cert at yande.re starts May 28, I'm thinking that you already fixed this problem?
Try this:
import connection #imports connection
import url
url = 'http://www.google.com/'
webpage = url.open(url)
try:
connection.receive(webpage)
except:
webpage = url.text('This webpage is not available!')
connection.receive(webpage)

Categories

Resources