Via Python's urllib2 I try to get data over HTTPS while I am behind a corporate NTLM proxy.
I run
proxy_url = ('http://user:pw#ntlmproxy:port/')
proxy_handler = urllib2.ProxyHandler({'http': proxy_url})
opener = urllib2.build_opener(proxy_handler, urllib2.HTTPHandler)
urllib2.install_opener(opener)
f = urllib2.urlopen('https://httpbin.org/ip')
myfile = f.read()
print myfile
but I get as error
urllib2.URLError: <urlopen error [Errno 8] _ssl.c:507:
EOF occurred in violation of protocol>
How can I fix this error?
Note 0: With the same code I can retrieve the unsecured HTTP equivalent http://httpbin.org/ip.
Note 1: From a normal browser I can access https://httpbin.org/ip (and other HTTPS sites) via the same corporate proxy.
Note 2: I was reading about many similar issues on the net and some suggested that it might be related to certificate verification, but urllib2 does not verify certificates anyway.
Note 3: Some people suggested in similar situations monekypatching, but I guess, there is no way to monkeypatch _ssl.c.
The problem is that Python's standard HTTP libraries do not speak Microsoft's proprietary NTLM authentication protocol fully.
I solved this problem by setting up a local NTLM-capable proxy - ntlmaps did the trick for me.(*) - which providesthe authentication against the corporate proxy and point my python code to this local proxy without authentication credentials.
Additionally I had to add in the above listed python code a proxy_handler for HTTPS. So I replaced the two lines
proxy_url = 'http://user:pw#ntlmproxy:port/'
proxy_handler = urllib2.ProxyHandler({'http': proxy_url})
with the two lines
proxy_url = 'http://localproxy:localport/'
proxy_url_https = 'https://localproxy:localport/'
proxy_handler = urllib2.ProxyHandler({'http': proxy_url, 'https': proxy_url_https})
Then the request works perfectly.
(*) ntlmaps is a Python program. Due to some reasons in my personal environment it was for me necessary that the proxy is a python program.)
Related
I have configured my server to serve only https creating a self-signed certificate. I have a client that I has to validate the server's certificate and after that will download a file from the server.
How do I implement the validation in client? Is there any code example?
My question is similar with this one: How can the SSL client validate the server's certificate?
but although the fine explanation, I didn't find any help.
So far, in my code I create a directory and then I download the file with urllib2:
[...] #imports
def dir_creation(path):
try:
os.makedirs(path)
except OSError as exception:
if exception.errno != errno.EEXIST:
raise
def file_download(url):
ver_file = urllib2.urlopen(url)
data = ver_file.read()
with open(local_filename, "wb") as code:
code.write(data)
dir_creation(path)
file_download(url)
Rather than configuring your server to present a self-signed certificate, you should use a self-signed certificate as a certificate authority to sign the server certificate. (How to do this is beyond the scope of your question, but I'm sure you can find help on Stack Overflow or elsewhere.)
Now you must configure your client to trust your certificate authority. In python (2.7.9 or later), you can do this using the ssl module:
import ssl
... # create socket
ctx = ssl.create_default_context(cafile=path_to_ca_certificate)
sslsock = ctx.wrap_socket(sock)
You can then transmit and read data on the secure socket. See the ssl module documentation for more explanation.
The urllib2 API is simpler:
import urllib2
resp = urllib2.urlopen(url, cafile=path_to_ca_certificate)
resp_body = resp.read()
If you wish to use Requests, according to the documentation you can supply a path to the CA certificate as the argument to the verify parameter:
resp = requests.get(url, verify=path_to_ca_certificate)
I am trying to fetch some urls using urllib2 library.
a = urllib2.urlopen("http://www.google.com")
ret = a.read()
Code above is working fine, and giving expected result. But when I make the url https, it gives "network unreachable" error
a = urllib2.urlopen("https://www.google.com")
urllib2.URLError: <urlopen error [Errno 101] Network is unreachable>
Is there any problem with ssl? My python version is Python2.6.5. I am also behind an academic proxy server. I have the settings in bash file. Anyway, since http is opening proxy shouldn't be the problem here.
Normally the issue in cases like this is the proxy you are behind having an out of date or untrusted SSL certificate. urllib is fussier than most browsers when it comes to SSL and this is why you might be getting this error.
The http url didn't give error because http_proxy variable was set already. By setting https_proxy the above error disappears.
export http_proxy = "http://{proxy-address}"
Set samething for https_proxy
export https_proxy = "http://{proxy-address}"
I'm trying to access a website with python through tor, but I'm having problems. I started my attempts with this thread and the one referenced in it: How to make urllib2 requests through Tor in Python?
First I tried the original code snippet:
import urllib2
proxy_handler = urllib2.ProxyHandler({"tcp":"http://127.0.0.1:9050"})
opener = urllib2.build_opener(proxy_handler)
urllib2.install_opener(opener)
then I tried the modified code posted in one of the answers, which people said worked for them. Unfortunately, the code works in that it downloads the page, but it doesn't work because my IP address is still the same:
proxy_support = urllib2.ProxyHandler({"http" : "127.0.0.1:8118"})
opener = urllib2.build_opener(proxy_support)
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
print opener.open('http://www.google.com').read()
I have TOR set up in the standard configuration, per the Ubuntu and TOR sites respective documentation, and nmap shows the TOR tcp proxy running on port 9050: 9050/tcp open tor-socks However, my IP address isn't changed when I run either of the above scripts. Is python not respecting the http environment variables, or is there a code problem that I'm missing?
TOR provides a SOCKS proxy. Since urllib2 can only handle HTTP proxies, you'll have to use a SOCKS implementation.
This question already has answers here:
Proxy with urllib2
(7 answers)
Closed 7 years ago.
I familiar with the fact that I should set the HTTP_RPOXY environment variable to the proxy address.
Generally urllib works fine, the problem is dealing with urllib2.
>>> urllib2.urlopen("http://www.google.com").read()
returns
urllib2.URLError: <urlopen error [Errno 10061] No connection could be made because the target machine actively refused it>
or
urllib2.URLError: <urlopen error [Errno 11004] getaddrinfo failed>
Extra info:
urllib.urlopen(....) works fine! It is just urllib2 that is playing tricks...
I tried #Fenikso answer but I'm getting this error now:
URLError: <urlopen error [Errno 10060] A connection attempt failed because the
connected party did not properly respond after a period of time, or established
connection failed because connected host has failed to respond>
Any ideas?
You can do it even without the HTTP_PROXY environment variable. Try this sample:
import urllib2
proxy_support = urllib2.ProxyHandler({"http":"http://61.233.25.166:80"})
opener = urllib2.build_opener(proxy_support)
urllib2.install_opener(opener)
html = urllib2.urlopen("http://www.google.com").read()
print html
In your case it really seems that the proxy server is refusing the connection.
Something more to try:
import urllib2
#proxy = "61.233.25.166:80"
proxy = "YOUR_PROXY_GOES_HERE"
proxies = {"http":"http://%s" % proxy}
url = "http://www.google.com/search?q=test"
headers={'User-agent' : 'Mozilla/5.0'}
proxy_support = urllib2.ProxyHandler(proxies)
opener = urllib2.build_opener(proxy_support, urllib2.HTTPHandler(debuglevel=1))
urllib2.install_opener(opener)
req = urllib2.Request(url, None, headers)
html = urllib2.urlopen(req).read()
print html
Edit 2014:
This seems to be a popular question / answer. However today I would use third party requests module instead.
For one request just do:
import requests
r = requests.get("http://www.google.com",
proxies={"http": "http://61.233.25.166:80"})
print(r.text)
For multiple requests use Session object so you do not have to add proxies parameter in all your requests:
import requests
s = requests.Session()
s.proxies = {"http": "http://61.233.25.166:80"}
r = s.get("http://www.google.com")
print(r.text)
I recommend you just use the requests module.
It is much easier than the built in http clients:
http://docs.python-requests.org/en/latest/index.html
Sample usage:
r = requests.get('http://www.thepage.com', proxies={"http":"http://myproxy:3129"})
thedata = r.content
Just wanted to mention, that you also may have to set the https_proxy OS environment variable in case https URLs need to be accessed.
In my case it was not obvious to me and I tried for hours to discover this.
My use case: Win 7, jython-standalone-2.5.3.jar, setuptools installation via ez_setup.py
Python 3:
import urllib.request
htmlsource = urllib.request.FancyURLopener({"http":"http://127.0.0.1:8080"}).open(url).read().decode("utf-8")
I encountered this on jython client.
The server was only talking TLS and the client using SSL context.
javax.net.ssl.SSLContext.getInstance("SSL")
Once the client was to TLS, things started working.
I am trying to access a website from behind corporate firewall using below:-
password_mgr = urllib2.HTTPPasswordMgrWithDefaultRealm()
password_mgr.add_password(None, url, username, password)
auth_handler = urllib2.HTTPBasicAuthHandler(password_mgr)
opener = urllib2.build_opener(auth_handler)
urllib2.install_opener(opener)
conn = urllib2.urlopen('http://python.org')
Getting error
URLError: <urlopen error [Errno 11004] getaddrinfo failed>
I have tried with different handlers (tried ProxyHandler also in slightly different way), but doesn't seem to work.
Any clues to what could be the reason for error and any different ways to supply the credentials and make it work?
If you are using Proxy and that proxy has Username and Password (which many corporate proxies have), you need to set the proxy handler with urllib2.
proxy_url = 'http://' + proxy_user + ':' + proxy_password + '#' + proxy_ip
proxy_support = urllib2.ProxyHandler({"http":proxy_url})
opener = urllib2.build_opener(proxy_support,urllib2.HTTPHandler)
urllib2.install_opener(opener)
HTTPBasicAuthHandler is used to provide credentials for the site which you are going to access and not for going through the proxy. The above snippet might help you.
On Windows, I observed that python uses the IE Internet Options-> LAN Settings settings.
So even if we use urllib2 to install opener and specify the proxy_url, it would continue to use the IE settings.
It worked fine finally, when I exported a system variable:
http_proxy=http://userid:pswd#proxyurl.com:port