I open urls with:
site = urllib2.urlopen('http://google.com')
And what I want to do is connect the same way with a proxy
I got somewhere telling me:
site = urllib2.urlopen('http://google.com', proxies={'http':'127.0.0.1'})
but that didn't work either.
I know urllib2 has something like a proxy handler, but I can't recall that function.
proxy = urllib2.ProxyHandler({'http': '127.0.0.1'})
opener = urllib2.build_opener(proxy)
urllib2.install_opener(opener)
urllib2.urlopen('http://www.google.com')
You have to install a ProxyHandler
urllib2.install_opener(
urllib2.build_opener(
urllib2.ProxyHandler({'http': '127.0.0.1'})
)
)
urllib2.urlopen('http://www.google.com')
You can set proxies using environment variables.
import os
os.environ['http_proxy'] = '127.0.0.1'
os.environ['https_proxy'] = '127.0.0.1'
urllib2 will add proxy handlers automatically this way. You need to set proxies for different protocols separately otherwise they will fail (in terms of not going through proxy), see below.
For example:
proxy = urllib2.ProxyHandler({'http': '127.0.0.1'})
opener = urllib2.build_opener(proxy)
urllib2.install_opener(opener)
urllib2.urlopen('http://www.google.com')
# next line will fail (will not go through the proxy) (https)
urllib2.urlopen('https://www.google.com')
Instead
proxy = urllib2.ProxyHandler({
'http': '127.0.0.1',
'https': '127.0.0.1'
})
opener = urllib2.build_opener(proxy)
urllib2.install_opener(opener)
# this way both http and https requests go through the proxy
urllib2.urlopen('http://www.google.com')
urllib2.urlopen('https://www.google.com')
To use the default system proxies (e.g. from the http_support environment variable), the following works for the current request (without installing it into urllib2 globally):
url = 'http://www.example.com/'
proxy = urllib2.ProxyHandler()
opener = urllib2.build_opener(proxy)
in_ = opener.open(url)
in_.read()
In Addition to the accepted answer:
My scipt gave me an error
File "c:\Python23\lib\urllib2.py", line 580, in proxy_open
if '#' in host:
TypeError: iterable argument required
Solution was to add http:// in front of the proxy string:
proxy = urllib2.ProxyHandler({'http': 'http://proxy.xy.z:8080'})
opener = urllib2.build_opener(proxy)
urllib2.install_opener(opener)
urllib2.urlopen('http://www.google.com')
One can also use requests if we would like to access a web page using proxies. Python 3 code:
>>> import requests
>>> url = 'http://www.google.com'
>>> proxy = '169.50.87.252:80'
>>> requests.get(url, proxies={"http":proxy})
<Response [200]>
More than one proxies can also be added.
>>> proxy1 = '169.50.87.252:80'
>>> proxy2 = '89.34.97.132:8080'
>>> requests.get(url, proxies={"http":proxy1,"http":proxy2})
<Response [200]>
In addition set the proxy for the command line session
Open a command line where you might want to run your script
netsh winhttp set proxy YourProxySERVER:yourProxyPORT
run your script in that terminal.
Related
import requests
import socket
from unittest.mock import patch
orig_getaddrinfo = socket.getaddrinfo
def getaddrinfoIPv6(host, port, family=0, type=0, proto=0, flags=0):
return orig_getaddrinfo(host=host, port=port, family=socket.AF_INET6, type=type, proto=proto, flags=flags)
with patch('socket.getaddrinfo', side_effect=getaddrinfoIPv6):
r = requests.get('http://icanhazip.com')
print(r.text)
Instead of using a ipv4 proxy to connect to a website, I would like to connect using an ipv6 https proxy. I have scoured google for answers, and have not found any (that I understand)... Closest I have found is... (does not use the ipv6 proxy, instead uses my own ipv6). I am open to using something besides requests to do this, however, requests are prefered. I will be attempting to thread later on.
import requests
from requests.packages.urllib3.exceptions import InsecureRequestWarning
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)
proxy = {"http":"http://username:password#[2604:0180:2:3b5:9ebc:64e9:166c:d9f9]", "https":"https://username:password#[2604:0180:2:3b5:9ebc:64e9:166c:d9f9]"}
url = "https://icanhazip.com"
r = requests.get(url, proxies=proxy, verify=False)
print(r.content)
If the code above does not work
import requests
proxy = {"http": "http://userame:password#168.235.109.30:18117", "https":"https://userame:password#168.235.109.30:18117"}
url = "https://icanhazip.com"
r = requests.get(url, proxies=proxy)
print(r.content)
This is my current provider for my ipv6 https proxy, however, they are using ipv6 over ipv4 to their clients, so this is why this code works, and the above code does not (if using the same provider) If you using a provider that supports ipv6 all by itself, then the code at the top should work for you.
You can use https://proxyturk.net/
Example curl command:
curl -m 90 -x http://proxyUsername:proxyPassword#93.104.200.99:20000 http://api6.ipify.org
You will see example result:
2a13:c206:2021:1522:9c5a:3ed5:156b:c1d0
Here are my test code:
import requests
url = 'https://api.ipify.org/'
proxyapi = 'http://ip.11jsq.com/index.php/api/entry?method=proxyServer.generate_api_url&packid=1&fa=0&fetch_key=&qty=1&time=1&pro=&city=&port=1&format=txt&ss=1&css=&dt=1'
proxy = {'http' : 'http://{}'.format(requests.get(proxyapi).text)}
print ('Downloading with fresh proxy.', proxy)
resp = requests.get(url, proxies = proxy_new)
print ('Fresh proxy response status.', resp.status_code)
print (resp.text)
#terminal output
Downloading with fresh proxy. {'http': 'http://49.84.152.176:30311'}
Fresh proxy response status. 200
222.68.154.34#my public ip address
with no error message and seems that the requests lib never apply this proxy settings. The proxyapi is valid for I've checked the proxy in my web browser, and by visiting https://api.ipify.org/, it returns the desired proxy server's ip address.
I am using python 3.6.4 and requests 2.18.4.
I'm using a proxy (behind corporate firewall), to login to an https domain. The SSL handshake doesn't seem to be going well:
CertificateError: hostname 'ats.finra.org:443' doesn't match 'ats.finra.org'
I'm using Python 2.7.9 - Mechanize and I've gotten past all of the login, password, security questioon screens, but it is getting hung up on the certification.
Any help would be amazing. I've tried the monkeywrench found here: Forcing Mechanize to use SSLv3
Doesn't work for my code though.
If you want the code file I'd be happy to send.
You can avoid this error by monkey patching ssl:
import ssl
ssl.match_hostname = lambda cert, hostname: True
In my case the certificate's DNS name was ::1 (for local testing purposes) and hostname verification failed with
ssl.CertificateError: hostname '::1' doesn't match '::1'
To fix it somewhat properly I monkey patched ssl.match_hostname with
import ssl
ssl.match_hostname = lambda cert, hostname: hostname == cert['subjectAltName'][0][1]
Which actually checks if the hostnames match.
This bug in ssl.math_hostname appears in v2.7.9 (it's not in 2.7.5), and has to do with not stripping out the hostname from the hostname:port syntax. The following rewrite of ssl.match_hostname fixes the bug. Put this before your mechanize code:
import functools, re, urlparse
import ssl
old_match_hostname = ssl.match_hostname
#functools.wraps(old_match_hostname)
def match_hostname_bugfix_ssl_py_2_7_9(cert, hostname):
m = re.search(r':\d+$',hostname) # hostname:port
if m is not None:
o = urlparse.urlparse('https://' + hostname)
hostname = o.hostname
old_match_hostname(cert, hostname)
ssl.match_hostname = match_hostname_bugfix_ssl_py_2_7_9
The following mechanize code should now work:
import mechanize
import cookielib
br = mechanize.Browser()
# Cookie Jar
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)
# Browser options
br.set_handle_equiv(True)
br.set_handle_gzip(True)
br.set_handle_redirect(True)
br.set_handle_referer(True)
br.set_handle_robots(False)
# Follows refresh 0 but not hang on refresh > 0
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)
br.addheaders = [('User-Agent', 'Nutscrape 1.0')]
# Use this proxy
br.set_proxies({"http": "localhost:3128", "https": "localhost:3128"})
r = br.open('https://www.duckduckgo.com:443/')
html = br.response().read()
# Examine the html response from a browser
f = open('foo.html','w')
f.write(html)
f.close()
I am struggling with connection to https sites and files. We have ntlm network proxy authentication. HTTP connections are working like a charm, but https is stuck with error:
pycurl.error: (27, "SSL: couldn't create a context: error:140A90A1:lib(20):func(169):reason(161)")
I tried verifypeer to 0, but it doesnt work, the same with: conn.setopt(pycurl.SSL_CIPHER_LIST, 'rsa_rc4_128_sha'). I want to download: https://nbp.pl/kursy/xml/LastA.xml. Any clue?
The code:
conn=pycurl.Curl()
conn.setopt(pycurl.URL, url)
conn.setopt(pycurl.PROXY, proxy)
conn.setopt(pycurl.PROXYPORT,8080)
conn.setopt(pycurl.HTTPAUTH, pycurl.HTTPAUTH_NTLM)
conn.setopt(pycurl.PROXYUSERPWD, user)
conn.setopt(pycurl.WRITEFUNCTION, open(r'xml\\'+name+'.'+extension,'w+').write)
conn.perform()
conn.close()
Got success with a bypass using CNTLM.
The code:
proxy = urllib2.ProxyHandler({'https':'127.0.0.1:3128'})
opener = urllib2.build_opener(proxy)
urllib2.install_opener(opener)
u = urllib2.urlopen(url)
data = u.read()
fil=open(r'xml\\'+name+'.'+extension,'w+')
fil.write(data)
Is there a way to tunnel HTTPs requests made with python's httplib through Fiddler's proxy, without turning Fiddler into a reverse proxy?
I tried using this example (from here):
import httplib
c = httplib.HTTPSConnection('localhost',8118)
c.set_tunnel('twitter.com',443)
c.request('GET','/')
r1 = c.getresponse()
print r1.status,r1.reason
data1 = r1.read()
print len(data1)
but it returns:
<HTML><HEAD><META http-equiv="Content-Type" content="text/html; charset=UTF-8"><TITLE>Fiddler Echo Service</TITLE></HEAD><BODY STYLE="font-family: arial,sans-serif;"><H1>Fiddler Echo Service</H1><BR /><PRE>GET / HTTP/1.1
Host: localhost:8080
Accept-Encoding: identity
</PRE><BR><HR><UL><LI>To configure Fiddler as a reverse proxy instead of seeing this page, see FiddlerRoot certificate</UL></BODY></HTML>
EDIT:
This code works. The problem was related to proxy settings in Internet Explorer. DO NOT set a proxy in these settings, otherwise code above won't work.
I've tried with urllib2 (Proxy with urllib2) through a Burp proxy and it worked:
import urllib2
proxy = urllib2.ProxyHandler({'https': '127.0.0.1:8081'})
opener = urllib2.build_opener(proxy)
urllib2.install_opener(opener)
print urllib2.urlopen('https://www.google.com').read()
Your code seems to work just fine with Burp Suite proxy.