Using an HTTP PROXY - Python [duplicate] - python

This question already has answers here:
Proxy with urllib2
(7 answers)
Closed 7 years ago.
I familiar with the fact that I should set the HTTP_RPOXY environment variable to the proxy address.
Generally urllib works fine, the problem is dealing with urllib2.
>>> urllib2.urlopen("http://www.google.com").read()
returns
urllib2.URLError: <urlopen error [Errno 10061] No connection could be made because the target machine actively refused it>
or
urllib2.URLError: <urlopen error [Errno 11004] getaddrinfo failed>
Extra info:
urllib.urlopen(....) works fine! It is just urllib2 that is playing tricks...
I tried #Fenikso answer but I'm getting this error now:
URLError: <urlopen error [Errno 10060] A connection attempt failed because the
connected party did not properly respond after a period of time, or established
connection failed because connected host has failed to respond>
Any ideas?

You can do it even without the HTTP_PROXY environment variable. Try this sample:
import urllib2
proxy_support = urllib2.ProxyHandler({"http":"http://61.233.25.166:80"})
opener = urllib2.build_opener(proxy_support)
urllib2.install_opener(opener)
html = urllib2.urlopen("http://www.google.com").read()
print html
In your case it really seems that the proxy server is refusing the connection.
Something more to try:
import urllib2
#proxy = "61.233.25.166:80"
proxy = "YOUR_PROXY_GOES_HERE"
proxies = {"http":"http://%s" % proxy}
url = "http://www.google.com/search?q=test"
headers={'User-agent' : 'Mozilla/5.0'}
proxy_support = urllib2.ProxyHandler(proxies)
opener = urllib2.build_opener(proxy_support, urllib2.HTTPHandler(debuglevel=1))
urllib2.install_opener(opener)
req = urllib2.Request(url, None, headers)
html = urllib2.urlopen(req).read()
print html
Edit 2014:
This seems to be a popular question / answer. However today I would use third party requests module instead.
For one request just do:
import requests
r = requests.get("http://www.google.com",
proxies={"http": "http://61.233.25.166:80"})
print(r.text)
For multiple requests use Session object so you do not have to add proxies parameter in all your requests:
import requests
s = requests.Session()
s.proxies = {"http": "http://61.233.25.166:80"}
r = s.get("http://www.google.com")
print(r.text)

I recommend you just use the requests module.
It is much easier than the built in http clients:
http://docs.python-requests.org/en/latest/index.html
Sample usage:
r = requests.get('http://www.thepage.com', proxies={"http":"http://myproxy:3129"})
thedata = r.content

Just wanted to mention, that you also may have to set the https_proxy OS environment variable in case https URLs need to be accessed.
In my case it was not obvious to me and I tried for hours to discover this.
My use case: Win 7, jython-standalone-2.5.3.jar, setuptools installation via ez_setup.py

Python 3:
import urllib.request
htmlsource = urllib.request.FancyURLopener({"http":"http://127.0.0.1:8080"}).open(url).read().decode("utf-8")

I encountered this on jython client.
The server was only talking TLS and the client using SSL context.
javax.net.ssl.SSLContext.getInstance("SSL")
Once the client was to TLS, things started working.

Related

SSL Error CERTIFICATE_VERIFY_FAILED with requests BUT NOT with urllib.request

If I try to use requests.get() to connect a HTTPS server (a Jenkins) I got SSL error CERTIFICATE_VERIFY_FAILED certificate verify failed: unable to get local issuer certificate (_ssl.c:997)'))
HTTPS connection are working fine if I use curl or any browser.
The HTTPS server is an internal server but use a SSL cert from DigiCert. It is a wildcard certificate and the same certificate is used for a lot of other servers (like IIS server) in my company, which are working fine together with requests.
If I use urllib package the HTTPS connection will be also fine.
I don't understand why requests doesn't work and I ask what can I do that requests is working?
And no! verify=false is not the solution ;-)
For the SSLContext in the second function I have to call method load_default_certs()
My system: Windows 10, Python 3.10, requests 2.28.1, urllib3 1.26.10, certifi 2022.6.15. Packages are installed today.
url = 'https://redmercury.acme.org/'
def use_requests(url):
import requests
try:
r = requests.get(url)
print(r)
except Exception as e:
print(e)
def use_magic_code_from_stackoverflow(url):
import urllib
import ssl
ssl_context = ssl.SSLContext(ssl.PROTOCOL_TLS_CLIENT)
# ssl_context.verify_mode = ssl.CERT_REQUIRED
# ssl_context.check_hostname = True
ssl_context.load_default_certs() # WITHOUT I got SSL error(s)
# previous context
https_handler = urllib.request.HTTPSHandler(context=ssl_context)
opener = urllib.request.build_opener(https_handler)
ret = opener.open(url, timeout=2)
print(ret.status)
def use_urllib_requests(url):
import urllib.request
with urllib.request.urlopen(url) as response:
print(response.status)
use_requests(url) # SSL error
use_magic_code_from_stackoverflow(url) # server answers with 200
use_urllib_requests(url) # server answers with 200

Unable to make python requests over tor ConnectionRefusedError: [WinError 10061]

I am trying to make requests using python requests over tor, but i get the error "ConnectionRefusedError: [WinError 10061] No connection could be made because the target machine actively refused it".
Here is the code i am using:
import requests
def get_tor_session():
session = requests.session()
# Tor uses the 9050 port as the default socks port
session.proxies = {'http': 'socks5://127.0.0.1:9050',
'https': 'socks5://127.0.0.1:9050'}
return session
# Make a request through the Tor connection
# IP visible through Tor
session = get_tor_session()
print(session.get("http://httpbin.org/ip").text)
# Above should print an IP different than your public IP
# Following prints your normal public IP
print(requests.get("http://httpbin.org/ip").text)
I have tried disabling the firewall etc but can't seems to understand the problem, any help would be appreciated. I am using python 3.7 windows 10.
Thanks for all the help, yes as already mentioned in the comments, there was something wrong with the proxy.
I changed:
session.proxies = {'http': 'socks5://127.0.0.1:9050',
'https': 'socks5://127.0.0.1:9050'}
To:
session.proxies = {'http': 'socks5://127.0.0.1:9150',
'https': 'socks5://127.0.0.1:9150'}
90 to 91 in address and it worked !

HTTPS request via urllib2 fails behind NTLM proxy

Via Python's urllib2 I try to get data over HTTPS while I am behind a corporate NTLM proxy.
I run
proxy_url = ('http://user:pw#ntlmproxy:port/')
proxy_handler = urllib2.ProxyHandler({'http': proxy_url})
opener = urllib2.build_opener(proxy_handler, urllib2.HTTPHandler)
urllib2.install_opener(opener)
f = urllib2.urlopen('https://httpbin.org/ip')
myfile = f.read()
print myfile
but I get as error
urllib2.URLError: <urlopen error [Errno 8] _ssl.c:507:
EOF occurred in violation of protocol>
How can I fix this error?
Note 0: With the same code I can retrieve the unsecured HTTP equivalent http://httpbin.org/ip.
Note 1: From a normal browser I can access https://httpbin.org/ip (and other HTTPS sites) via the same corporate proxy.
Note 2: I was reading about many similar issues on the net and some suggested that it might be related to certificate verification, but urllib2 does not verify certificates anyway.
Note 3: Some people suggested in similar situations monekypatching, but I guess, there is no way to monkeypatch _ssl.c.
The problem is that Python's standard HTTP libraries do not speak Microsoft's proprietary NTLM authentication protocol fully.
I solved this problem by setting up a local NTLM-capable proxy - ntlmaps did the trick for me.(*) - which providesthe authentication against the corporate proxy and point my python code to this local proxy without authentication credentials.
Additionally I had to add in the above listed python code a proxy_handler for HTTPS. So I replaced the two lines
proxy_url = 'http://user:pw#ntlmproxy:port/'
proxy_handler = urllib2.ProxyHandler({'http': proxy_url})
with the two lines
proxy_url = 'http://localproxy:localport/'
proxy_url_https = 'https://localproxy:localport/'
proxy_handler = urllib2.ProxyHandler({'http': proxy_url, 'https': proxy_url_https})
Then the request works perfectly.
(*) ntlmaps is a Python program. Due to some reasons in my personal environment it was for me necessary that the proxy is a python program.)

opening websites using urllib2 from behind corporate firewall - 11004 getaddrinfo failed

I am trying to access a website from behind corporate firewall using below:-
password_mgr = urllib2.HTTPPasswordMgrWithDefaultRealm()
password_mgr.add_password(None, url, username, password)
auth_handler = urllib2.HTTPBasicAuthHandler(password_mgr)
opener = urllib2.build_opener(auth_handler)
urllib2.install_opener(opener)
conn = urllib2.urlopen('http://python.org')
Getting error
URLError: <urlopen error [Errno 11004] getaddrinfo failed>
I have tried with different handlers (tried ProxyHandler also in slightly different way), but doesn't seem to work.
Any clues to what could be the reason for error and any different ways to supply the credentials and make it work?
If you are using Proxy and that proxy has Username and Password (which many corporate proxies have), you need to set the proxy handler with urllib2.
proxy_url = 'http://' + proxy_user + ':' + proxy_password + '#' + proxy_ip
proxy_support = urllib2.ProxyHandler({"http":proxy_url})
opener = urllib2.build_opener(proxy_support,urllib2.HTTPHandler)
urllib2.install_opener(opener)
HTTPBasicAuthHandler is used to provide credentials for the site which you are going to access and not for going through the proxy. The above snippet might help you.
On Windows, I observed that python uses the IE Internet Options-> LAN Settings settings.
So even if we use urllib2 to install opener and specify the proxy_url, it would continue to use the IE settings.
It worked fine finally, when I exported a system variable:
http_proxy=http://userid:pswd#proxyurl.com:port

urllib2 connection timed out error

I am trying to open a page using urllib2 but i keep getting connection timed out errors.
The line which i am using is:
f = urllib2.urlopen(url)
exact error is:
URLError: <urlopen error [Errno 110] Connection timed out>
urllib2 respects robots.txt. Many sites block the default User-Agent.
Try adding a new User-Agent, by creating Request objects & using them as arguments for urlopen:
import urllib2
request = urllib2.Request('http://www.example.com/')
request.add_header('User-agent', 'Mozilla/5.0 (Linux i686)')
response = urllib2.urlopen(request)
Several detailed walk-throughs are available, such as http://www.doughellmann.com/PyMOTW/urllib2/
As a general strategy, open wireshark and watch the traffic generated by urllib2.urlopen(url). You may be able to see where the error is coming from.

Categories

Resources