urllib.urlopen isn't working. Is there a workaround? - python

I'm getting a getaddress error and after doing some sleuthing, it looks like it might be my corporate intranet not allowing the connection (I'm assuming due to security, although it is strange that IE works but won't allow Python to open a url). Is there a safe way to get around this?
Here's the exact error:
Traceback (most recent call last):
File "<pyshell#1>", line 1, in <module>
b = urllib.urlopen('http://www.google.com')
File "C:\Python26\lib\urllib.py", line 87, in urlopen
return opener.open(url)
File "C:\Python26\lib\urllib.py", line 203, in open
return getattr(self, name)(url)
File "C:\Python26\lib\urllib.py", line 342, in open_http
h.endheaders()
File "C:\Python26\lib\httplib.py", line 868, in endheaders
self._send_output()
File "C:\Python26\lib\httplib.py", line 740, in _send_output
self.send(msg)
File "C:\Python26\lib\httplib.py", line 699, in send
self.connect()
File "C:\Python26\lib\httplib.py", line 683, in connect
self.timeout)
File "C:\Python26\lib\socket.py", line 498, in create_connection
for res in getaddrinfo(host, port, 0, SOCK_STREAM):
IOError: [Errno socket error] [Errno 11001] getaddrinfo failed
More info: I also get this error with urllib2.urlopen

You probably need to fill in proxy information.
import urllib2
proxy_handler = urllib2.ProxyHandler({'http': 'http://yourcorporateproxy:12345/'})
proxy_auth_handler = urllib2.HTTPBasicAuthHandler()
proxy_auth_handler.add_password('realm', 'host', 'username', 'password')
opener = urllib2.build_opener(proxy_handler, proxy_auth_handler)
opener.open('http://www.stackoverflow.com')

Check you are using the correct proxy.
You can get the proxy information by using urllib.getproxies (note: getproxies does not work with dynamic proxy configuration, like when using PAC).
Update As per information about empty proxy list, I would suggest using an urlopener, with the proxy name and information.
Some good information about how use proxies urlopeners:
Urllib manual
Michael Foord's introduction to urllib

Possibly this is a DNS issue, try urlopen with the IP address of the web server you're accessing, i.e.
import urllib
URL="http://66.102.11.99" # www.google.com
f = urllib.urlopen(URL)
f.read()
If this succeeds, then it's probably a DNS issue rather than a proxy issue (but you should also check your proxy setup).

Looks like a DNS problem.
Since you are using Windows, you can try run this command
nslookup www.google.com
To check if the web address can be resolved successfully.
If not, it is a network setting issue
If OK, then we have to look at possible alternative causes

I was facing the same issue.
In my system the proxy configuration is through a .PAC file.
So i opended that file, took out the default proxy url, for me it was http://168.219.61.250:8080/
Following test code worked for me :
import urllib2
proxy_support = urllib2.ProxyHandler({'http': 'http://168.219.61.250:8080/'})
opener = urllib2.build_opener(proxy_support)
urllib2.install_opener(opener)
response = urllib2.urlopen('http://python.org/')
html = response.read()
print html
You might need to add some more code, if your proxy requires authentication
Hope this helps!!

Related

Python and proxy - urllib2.URLError: <urlopen error [Errno 110] Connection timed out>

I tried to google and search for similar question on stackOverflow, but still can't solve my problem.
I need my python script to perform http connections via proxy.
Below is my test script:
import urllib2, urllib
proxy = urllib2.ProxyHandler({'http': 'http://255.255.255.255:3128'})
opener = urllib2.build_opener(proxy, urllib2.HTTPHandler)
urllib2.install_opener(opener)
conn = urllib2.urlopen('http://www.whatismyip.com/')
return_str = conn.read()
webpage = open('webpage.html', 'w')
webpage.write(return_str)
webpage.close()
This script works absolutely fine on my local computer (Windows 7, Python 2.7.3), but when I try to run it on the server, it gives me the following error:
Traceback (most recent call last):
File "proxy_auth.py", line 18, in <module>
conn = urllib2.urlopen('http://www.whatismyip.com/')
File "/home/myusername/python/lib/python2.7/urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "/home/myusername/python/lib/python2.7/urllib2.py", line 400, in open
response = self._open(req, data)
File "/home/myusername/python/lib/python2.7/urllib2.py", line 418, in _open
'_open', req)
File "/home/myusername/python/lib/python2.7/urllib2.py", line 378, in _call_chai n
result = func(*args)
File "/home/myusername/python/lib/python2.7/urllib2.py", line 1207, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/home/myusername/python/lib/python2.7/urllib2.py", line 1177, in do_open
raise URLError(err)
urllib2.URLError: <urlopen error [Errno 110] Connection timed out>
I also tried to use requests library, and got the same error.
# testing request library
r = requests.get('http://www.whatismyip.com/', proxies={'http':'http://255.255.255.255:3128'})
If I don't set proxy, then the program works fine.
# this works fine
conn = urllib2.urlopen('http://www.whatismyip.com/')
I think the problem is that on my shared hosting account it is not possible to set an environment variable for proxy ... or something like that.
Are there any workarounds or alternative approaches that would let me set proxies for http connections? How should I modify my test script?
The problem was in closed ports.
I had to buy a dedicated IP before tech support could open the ports I needed.
Now my script works fine.
Conclusion: when you are on a shared hosting, most ports are probably closed and you will have to contact tech support to open ports.

Repeated POST request is causing error "socket.error: (99, 'Cannot assign requested address')"

I have a web-service deployed in my box. I want to check the result of this service with various input. Here is the code I am using:
import sys
import httplib
import urllib
apUrl = "someUrl:somePort"
fileName = sys.argv[1]
conn = httplib.HTTPConnection(apUrl)
titlesFile = open(fileName, 'r')
try:
for title in titlesFile:
title = title.strip()
params = urllib.urlencode({'search': 'abcd', 'text': title})
conn.request("POST", "/somePath/", params)
response = conn.getresponse()
data = response.read().strip()
print data+"\t"+title
conn.close()
finally:
titlesFile.close()
This code is giving an error after same number of lines printed (28233). Error message:
Traceback (most recent call last):
File "testService.py", line 19, in ?
conn.request("POST", "/somePath/", params)
File "/usr/lib/python2.4/httplib.py", line 810, in request
self._send_request(method, url, body, headers)
File "/usr/lib/python2.4/httplib.py", line 833, in _send_request
self.endheaders()
File "/usr/lib/python2.4/httplib.py", line 804, in endheaders
self._send_output()
File "/usr/lib/python2.4/httplib.py", line 685, in _send_output
self.send(msg)
File "/usr/lib/python2.4/httplib.py", line 652, in send
self.connect()
File "/usr/lib/python2.4/httplib.py", line 636, in connect
raise socket.error, msg
socket.error: (99, 'Cannot assign requested address')
I am using Python 2.4.3. I am doing conn.close() also. But why is this error being given?
This is not a python problem.
In linux kernel 2.4 the ephemeral port range is from 32768 through 61000. So number of available ports = 61000-32768+1 = 28233. From what i understood, because the web-service in question is quite fast (<5ms actually) thus all the ports get used up. The program has to wait for about a minute or two for the ports to close.
What I did was to count the number of conn.close(). When the number was 28000 wait for 90sec and reset the counter.
BIGYaN identified the problem correctly and you can verify that by calling "netstat -tn" right after the exception occurs. You will see very many connections with state "TIME_WAIT".
The alternative to waiting for port numbers to become available again is to simply use one connection for all requests. You are not required to call conn.close() after each call of conn.request(). You can simply leave the connection open until you are done with your requests.
I too faced similar issue while executing multiple POST statements using python's request library in Spark. To make it worse, I used multiprocessing over each executor to post to a server. So thousands of connections created in seconds that took few seconds each to change the state from TIME_WAIT and release the ports for the next set of connections.
Out of all the available solutions available over the internet that speak of disabling keep-alive, using with request.Session() et al, I found this answer to be working which makes use of 'Connection' : 'close' configuration as header parameter. You may need to put the header content in a separte line outside the post command though.
headers = {
'Connection': 'close'
}
with requests.Session() as session:
response = session.post('https://xx.xxx.xxx.x/xxxxxx/x', headers=headers, files=files, verify=False)
results = response.json()
print results
This is my answer to the similar issue using the above solution.

In Python 3.2, I can open and read an HTTPS web page with http.client, but urllib.request is failing to open the same page

I want to open and read https://yande.re/ with urllib.request, but I'm getting an SSL error. I can open and read the page just fine using http.client with this code:
import http.client
conn = http.client.HTTPSConnection('www.yande.re')
conn.request('GET', 'https://yande.re/')
resp = conn.getresponse()
data = resp.read()
However, the following code using urllib.request fails:
import urllib.request
opener = urllib.request.build_opener()
resp = opener.open('https://yande.re/')
data = resp.read()
It gives me the following error: ssl.SSLError: [Errno 1] _ssl.c:392: error:1411809D:SSL routines:SSL_CHECK_SERVERHELLO_TLSEXT:tls invalid ecpointformat list. Why can I open the page with HTTPSConnection but not opener.open?
Edit: Here's my OpenSSL version and the traceback from trying to open https://yande.re/
>>> import ssl; ssl.OPENSSL_VERSION
'OpenSSL 1.0.0a 1 Jun 2010'
>>> import urllib.request
>>> urllib.request.urlopen('https://yande.re/')
Traceback (most recent call last):
File "<pyshell#3>", line 1, in <module>
urllib.request.urlopen('https://yande.re/')
File "C:\Python32\lib\urllib\request.py", line 138, in urlopen
return opener.open(url, data, timeout)
File "C:\Python32\lib\urllib\request.py", line 369, in open
response = self._open(req, data)
File "C:\Python32\lib\urllib\request.py", line 387, in _open
'_open', req)
File "C:\Python32\lib\urllib\request.py", line 347, in _call_chain
result = func(*args)
File "C:\Python32\lib\urllib\request.py", line 1171, in https_open
context=self._context, check_hostname=self._check_hostname)
File "C:\Python32\lib\urllib\request.py", line 1138, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error [Errno 1] _ssl.c:392: error:1411809D:SSL routines:SSL_CHECK_SERVERHELLO_TLSEXT:tls invalid ecpointformat list>
>>>
What a coincidence! I'm having the same problem as you are, with an added complication: I'm behind a proxy. I found this bug report regarding https-not-working-with-urllib. Luckily, they posted a workaround.
import urllib.request
import ssl
##uncomment this code if you're behind a proxy
##https port is 443 but it doesn't work for me, used port 80 instead
##proxy_auth = '{0}://{1}:{2}#{3}'.format('https', 'username', 'password',
## 'proxy:80')
##proxies = { 'https' : proxy_auth }
##proxy = urllib.request.ProxyHandler(proxies)
##proxy_auth_handler = urllib.request.HTTPBasicAuthHandler()
##opener = urllib.request.build_opener(proxy, proxy_auth_handler,
## https_sslv3_handler)
https_sslv3_handler =
urllib.request.HTTPSHandler(context=ssl.SSLContext(ssl.PROTOCOL_SSLv3))
opener = urllib.request.build_opener(https_sslv3_handler)
urllib.request.install_opener(opener)
resp = opener.open('https://yande.re/')
data = resp.read().decode('utf-8')
print(data)
Btw, thanks for showing how to use http.client. I didn't know that there's another library that can be used to connect to the internet. ;)
This is due to a bug in the early 1.x OpenSSL implementation of elliptic curve cryptography. Take a closer look at the relevant part of the exception:
_ssl.c:392: error:1411809D:SSL routines:SSL_CHECK_SERVERHELLO_TLSEXT:tls invalid ecpointformat list
This is an error from the underlying OpenSSL library code which is a result of mishandling the EC point format TLS extension. One workaround is to use the SSLv3 instead of SSLv23 method, the other workaround is to use a cipher suite specification which disables all ECC cipher suites (I had good results with ALL:-ECDH, use openssl ciphers for testing). The fix is to update OpenSSL.
The problem is due to the hostnames that your giving in the two examples:
import http.client
conn = http.client.HTTPSConnection('www.yande.re')
conn.request('GET', 'https://yande.re/')
and...
import urllib.request
urllib.request.urlopen('https://yande.re/')
Note that in the first example, you're asking the client to make a connection to the host: www.yande.re and in the second example, urllib will first parse the url 'https://yande.re' and then try a request at the host yande.re
Although www.yande.re and yande.re may resolve to the same IP address, from the perspective of the web server these are different virtual hosts. My guess is that you had an SNI configuration problem on your web server's side. Seeing as that the original question was posted on May 21, and the current cert at yande.re starts May 28, I'm thinking that you already fixed this problem?
Try this:
import connection #imports connection
import url
url = 'http://www.google.com/'
webpage = url.open(url)
try:
connection.receive(webpage)
except:
webpage = url.text('This webpage is not available!')
connection.receive(webpage)

httplib python error

I'm trying to send an http get request via the httplib, but I'm facing issues.
conn = httplib.HTTPConnection("10.30.111.13/View")
conn.request("GET", "/Default.aspx")
res = conn.getresponse()
if res.status == 200:
print(res.status)
else:
print("Something went terribly wrong")
I get the following error:
TypeError (cannot concatenate 'str' and 'int' objects).
If put the next line of codes, it works no problem:
conn = httplib.HTTPConnection("www.google.com")
conn.request("GET", "/")
EDIT, here is a more detailed log I managed to pull out of my third party software (it restricts me in turn of python usability):
File "<string>", line 3248, in initialization
File "C:\python22\lib\httplib.py", line 701, in request
self._send_request(method, url, body, headers)
File "C:\python22\lib\httplib.py", line 723, in _send_request
self.endheaders()
File "C:\python22\lib\httplib.py", line 695, in endheaders
self._send_output()
File "C:\python22\lib\httplib.py", line 581, in _send_output
self.send(msg)
File "C:\python22\lib\httplib.py", line 548, in send
self.connect()
File "C:\python22\lib\httplib.py", line 516, in connect
socket.SOCK_STREAM):
gaierror: (7, 'getaddrinfo failed')
I'm not someplace where I can test this now, but here's what I think:
You're passing only an IP address to a host field that's expecting a DNS address, not an IP address. That's why your second error listing says 'getaddrinfo' failed.
That said, I'm not sure how to use an IP address with httplib. Maybe try "http://10.30.111.13" instead. A good way to test it would be to replace your IP address above with Google's and see if you still get the error.
Maybe this will help -- sorry I can't say more!
I have changed the IP address for a DNS address. I also removed any path/URI that were in the HTTPConnection() parameter. Now it works. Sorry for such an obvious question guys.

python httplib Name or service not known

I'm trying to use httplib to send credit card information to authorize.net. When i try to post the request, I get the following traceback:
File "./lib/cgi_app.py", line 139, in run res = method()
File "/var/www/html/index.py", line 113, in ProcessRegistration conn.request("POST", "/gateway/transact.dll", mystring, headers)
File "/usr/local/lib/python2.7/httplib.py", line 946, in request self._send_request(method, url, body, headers)
File "/usr/local/lib/python2.7/httplib.py", line 987, in _send_request self.endheaders(body)
File "/usr/local/lib/python2.7/httplib.py", line 940, in endheaders self._send_output(message_body)
File "/usr/local/lib/python2.7/httplib.py", line 803, in _send_output self.send(msg)
File "/usr/local/lib/python2.7/httplib.py", line 755, in send self.connect()
File "/usr/local/lib/python2.7/httplib.py", line 1152, in connect self.timeout, self.source_address)
File "/usr/local/lib/python2.7/socket.py", line 567, in create_connection raise error, msg
gaierror: [Errno -2] Name or service not known
I build my request like so:
mystring = urllib.urlencode(cardHash)
headers = {"Content-Type": "text/xml", "Content-Length": str(len(mystring))}
conn = httplib.HTTPSConnection("secure.authorize.net:443", source_address=("myurl.com", 443))
conn.request("POST", "/gateway/transact.dll", mystring, headers)
to add another layer to this, it was working on our development server which has httplib 2.6 and without the source_address parameter in httplib.HTTPSConnection.
Any help is greatly appreciated.
===========================================================
EDIT:
I can run it from command line. Apparently this is some sort of permissions issue. Any ideas what permissions I would need to grant to which users to make this happen? Possibly Apache can't open the port?
As an (obvious) heads up, this same error can also be triggered by including the protocol in the host parameter. For example this code:
conn = httplib.HTTPConnection("http://secure.authorize.net", 80, ....)
will also cause the "gaierror: [Errno -2] Name or service not known" error, even if all your networking setup is correct.
gaierror: [Errno -2] Name or service not known
This error often indicates a failure of your DNS resolver. Does ping secure.authorize.net return successful replies from the same server that receives the gaierror? Does the hostname have a typo in it?
The problem ultimately came down to the fact that selinux was stopping apache from getting that port. Disabling selinux fixed the problems. I had an issue later where i didn't have /var/www/.python-eggs/, so MySQLdb was hosing on import. But after a mkdir, it was fixed.
pass the port separately from the host:
conn = httplib.HTTPSConnection("secure.authorize.net", 443, ....)

Categories

Resources