urllib2 connection timed out error - python

I am trying to open a page using urllib2 but i keep getting connection timed out errors.
The line which i am using is:
f = urllib2.urlopen(url)
exact error is:
URLError: <urlopen error [Errno 110] Connection timed out>

urllib2 respects robots.txt. Many sites block the default User-Agent.
Try adding a new User-Agent, by creating Request objects & using them as arguments for urlopen:
import urllib2
request = urllib2.Request('http://www.example.com/')
request.add_header('User-agent', 'Mozilla/5.0 (Linux i686)')
response = urllib2.urlopen(request)
Several detailed walk-throughs are available, such as http://www.doughellmann.com/PyMOTW/urllib2/

As a general strategy, open wireshark and watch the traffic generated by urllib2.urlopen(url). You may be able to see where the error is coming from.

Related

urllib2.urlopen timeout works only when connected to Internet

I'm working on Python 2.7 code to read a value from HTML page by using urllib2 library. I want to timeout the urllib2.urlopen function after 5 seconds in case of no Internet and jump to remaining code.
It works as expected when computer is connected to working internet connection. And for testing if I set timeout=0.1 it timed out suddenly without opening url, as expected. But when there is no Internet, timeout not works either I set timeout to 0.1, 5, or any other value. It simply does not timed out.
This is my Code:
import urllib2
url = "https://alfahd.witorbit.net/fingerprint.php?n"
try:
response = urllib2.urlopen(url , timeout=5).read()
print response
except Exception as e:
print e
Result when connected to Internet with timeout value 5:
180
Result when connected to Internet with timeout value 0.1 :
<urlopen error timed out>
Seems timeout is working.
Result when NOT connected to Internet and with any timeout value (it timed out after about 40 seconds every time I open url despite of any value I set for timeout=:
<urlopen error [Errno -3] Temporary failure in name resolution>
How can I timeout urllib2.urlopen when there is no Internet connectivity? Am I missing some thing? Please guide me to solve this issue. Thanks!
Because name resolution happens before the request is made, it's not subject to the timeout. You can prevent this error in name resolution by providing the IP for the host in your /etc/hosts file. For example, if the host is subdomain.example.com and the IP is 10.10.10.10 you would add the following line in the /etc/hosts file
10.10.10.10 subdomain.example.com
Alternatively, you may be able to simply use the IP address directly, however, some webservers require you use the hostname, in which case you'll need to modify the hosts file to use the name offline.

Python requests.get raises ConnectionError for HTTPS url

im trying this simple python 2.7 code:
import requests
response = requests.get(url="https://sslbl.abuse.ch", verify=False)
print response
I'm using verify=False in order to ignore verifying the SSL certificate.
I'm getting the following exception:
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='sslbl.abuse.ch', port=443): Max retries exceeded with url: / (Caused by <class 'socket.error'>: [Errno 10054] An existing connection was forcibly closed by the remote host
If I try another https url (like twitter.com) everything is ok.
What can be the problem? How can I get the response like a browser does?
UPDATE:
after upgrading requests version i get the same ConnectionError but some warnings were added:
C:\Python27\lib\site-packages\requests\packages\urllib3\util\ssl_.py:315: SNIMissingWarning: An HTTPS request has been made, but the SNI (Subject Name Indication) extension to TLS is not available on this platform. This may cause the server to present an incorrect TLS certificate, which can cause validation failures. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#snimissingwarning.
C:\Python27\lib\site-packages\requests\packages\urllib3\util\ssl_.py:120: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
I do not use Python 2.7 for my tasks, but I tried to open the URL you provided with python3.2 (it should work for all Python3x, I assume). There was no exception raised. This is what I did (>>> are omited):
from urllib.request import urlopen
url = "https://sslbl.abuse.ch"
response = urlopen(url)
type(response)
<class 'http.client.HTTPResponse'>
From the Python docs, see the output of this:
i = 0
with open(url) as response:
for line in response:
line = line.decode('utf-8')
if "Show more information about this SSL certificate" in line:
i += 1
print(i)
1060
I suggest using Python3x. Hope this helps!

HTTPS request via urllib2 fails behind NTLM proxy

Via Python's urllib2 I try to get data over HTTPS while I am behind a corporate NTLM proxy.
I run
proxy_url = ('http://user:pw#ntlmproxy:port/')
proxy_handler = urllib2.ProxyHandler({'http': proxy_url})
opener = urllib2.build_opener(proxy_handler, urllib2.HTTPHandler)
urllib2.install_opener(opener)
f = urllib2.urlopen('https://httpbin.org/ip')
myfile = f.read()
print myfile
but I get as error
urllib2.URLError: <urlopen error [Errno 8] _ssl.c:507:
EOF occurred in violation of protocol>
How can I fix this error?
Note 0: With the same code I can retrieve the unsecured HTTP equivalent http://httpbin.org/ip.
Note 1: From a normal browser I can access https://httpbin.org/ip (and other HTTPS sites) via the same corporate proxy.
Note 2: I was reading about many similar issues on the net and some suggested that it might be related to certificate verification, but urllib2 does not verify certificates anyway.
Note 3: Some people suggested in similar situations monekypatching, but I guess, there is no way to monkeypatch _ssl.c.
The problem is that Python's standard HTTP libraries do not speak Microsoft's proprietary NTLM authentication protocol fully.
I solved this problem by setting up a local NTLM-capable proxy - ntlmaps did the trick for me.(*) - which providesthe authentication against the corporate proxy and point my python code to this local proxy without authentication credentials.
Additionally I had to add in the above listed python code a proxy_handler for HTTPS. So I replaced the two lines
proxy_url = 'http://user:pw#ntlmproxy:port/'
proxy_handler = urllib2.ProxyHandler({'http': proxy_url})
with the two lines
proxy_url = 'http://localproxy:localport/'
proxy_url_https = 'https://localproxy:localport/'
proxy_handler = urllib2.ProxyHandler({'http': proxy_url, 'https': proxy_url_https})
Then the request works perfectly.
(*) ntlmaps is a Python program. Due to some reasons in my personal environment it was for me necessary that the proxy is a python program.)

Python urllib2 giving "network unreachable error" if the URL is https

I am trying to fetch some urls using urllib2 library.
a = urllib2.urlopen("http://www.google.com")
ret = a.read()
Code above is working fine, and giving expected result. But when I make the url https, it gives "network unreachable" error
a = urllib2.urlopen("https://www.google.com")
urllib2.URLError: <urlopen error [Errno 101] Network is unreachable>
Is there any problem with ssl? My python version is Python2.6.5. I am also behind an academic proxy server. I have the settings in bash file. Anyway, since http is opening proxy shouldn't be the problem here.
Normally the issue in cases like this is the proxy you are behind having an out of date or untrusted SSL certificate. urllib is fussier than most browsers when it comes to SSL and this is why you might be getting this error.
The http url didn't give error because http_proxy variable was set already. By setting https_proxy the above error disappears.
export http_proxy = "http://{proxy-address}"
Set samething for https_proxy
export https_proxy = "http://{proxy-address}"

Using an HTTP PROXY - Python [duplicate]

This question already has answers here:
Proxy with urllib2
(7 answers)
Closed 7 years ago.
I familiar with the fact that I should set the HTTP_RPOXY environment variable to the proxy address.
Generally urllib works fine, the problem is dealing with urllib2.
>>> urllib2.urlopen("http://www.google.com").read()
returns
urllib2.URLError: <urlopen error [Errno 10061] No connection could be made because the target machine actively refused it>
or
urllib2.URLError: <urlopen error [Errno 11004] getaddrinfo failed>
Extra info:
urllib.urlopen(....) works fine! It is just urllib2 that is playing tricks...
I tried #Fenikso answer but I'm getting this error now:
URLError: <urlopen error [Errno 10060] A connection attempt failed because the
connected party did not properly respond after a period of time, or established
connection failed because connected host has failed to respond>
Any ideas?
You can do it even without the HTTP_PROXY environment variable. Try this sample:
import urllib2
proxy_support = urllib2.ProxyHandler({"http":"http://61.233.25.166:80"})
opener = urllib2.build_opener(proxy_support)
urllib2.install_opener(opener)
html = urllib2.urlopen("http://www.google.com").read()
print html
In your case it really seems that the proxy server is refusing the connection.
Something more to try:
import urllib2
#proxy = "61.233.25.166:80"
proxy = "YOUR_PROXY_GOES_HERE"
proxies = {"http":"http://%s" % proxy}
url = "http://www.google.com/search?q=test"
headers={'User-agent' : 'Mozilla/5.0'}
proxy_support = urllib2.ProxyHandler(proxies)
opener = urllib2.build_opener(proxy_support, urllib2.HTTPHandler(debuglevel=1))
urllib2.install_opener(opener)
req = urllib2.Request(url, None, headers)
html = urllib2.urlopen(req).read()
print html
Edit 2014:
This seems to be a popular question / answer. However today I would use third party requests module instead.
For one request just do:
import requests
r = requests.get("http://www.google.com",
proxies={"http": "http://61.233.25.166:80"})
print(r.text)
For multiple requests use Session object so you do not have to add proxies parameter in all your requests:
import requests
s = requests.Session()
s.proxies = {"http": "http://61.233.25.166:80"}
r = s.get("http://www.google.com")
print(r.text)
I recommend you just use the requests module.
It is much easier than the built in http clients:
http://docs.python-requests.org/en/latest/index.html
Sample usage:
r = requests.get('http://www.thepage.com', proxies={"http":"http://myproxy:3129"})
thedata = r.content
Just wanted to mention, that you also may have to set the https_proxy OS environment variable in case https URLs need to be accessed.
In my case it was not obvious to me and I tried for hours to discover this.
My use case: Win 7, jython-standalone-2.5.3.jar, setuptools installation via ez_setup.py
Python 3:
import urllib.request
htmlsource = urllib.request.FancyURLopener({"http":"http://127.0.0.1:8080"}).open(url).read().decode("utf-8")
I encountered this on jython client.
The server was only talking TLS and the client using SSL context.
javax.net.ssl.SSLContext.getInstance("SSL")
Once the client was to TLS, things started working.

Categories

Resources