I'm attempting to use Bing's Entity search API, however I need to go through a proxy in order to establish a connection.
My company has giving me a proxy in the following format to use: 'http://username:password#webproxy.subdomain.website.com:8080'
I've tried using HTTPConnection.set_tunnel(host, port=None, headers=None) with no avail. Does anyone have any idea how to run the following Python query below using the proxy I was provided?
import http.client, urllib.parse
import json
proxy = 'http://username:pwd#webproxy.subdomain.website.com:8080'
subscriptionKey = 'MY_KEY_HERE'
host = 'api.cognitive.microsoft.com'
path = '/bing/v7.0/entities'
mkt = 'en-US'
query = 'italian restaurants near me'
params = '?mkt=' + mkt + '&q=' + urllib.parse.quote (query)
def get_suggestions ():
headers = {'Ocp-Apim-Subscription-Key': subscriptionKey}
conn = http.client.HTTPSConnection(host)
conn.request("GET", path + params, None, headers)
response = conn.getresponse()
return response.read()
result = get_suggestions ()
print (json.dumps(json.loads(result), indent=4))
As a side note, I was able to run the following sucessfully with the proxy.
nltk.set_proxy('http://username:pwd#webproxy.subdomain.website.com:8080')
nltk.download('wordnet')
I ended up using the requests package, they make it very simple to channel your request through a proxy, sample code below for reference:
import requests
proxies = {
'http': f"http://username:pwd#webproxy.subdomain.website.com:8080",
'https': f"https://username:pwd#webproxy.subdomain.website.com:8080"
}
url = 'https://api.cognitive.microsoft.com/bing/v7.0/entities/'
# query string parameters
params = 'mkt=' + 'en-US' + '&q=' + urllib.parse.quote (query)
# custom headers
headers = {'Ocp-Apim-Subscription-Key': subscriptionKey}
#start session
session = requests.Session()
#persist proxy across request
session.proxies = proxies
# make GET request
r = session.get(url, params=params, headers=headers)
If you use urlib2 or requests, the easiest way to configure a proxy is setting the HTTP_PROXY or the HTTPS_PROXY environment variable:
export HTTPS_PROXY=https://user:pass#host:port
This way the library handles the proxy configuration for you, and if your code runs inside a container you can pass that information at runtime.
I'm building a small script to test the certain proxies against the API.
It seems that the actual request isn't trigger under the provided proxy. For example, the following request will be valid and I will get an response from the API.
import requests
r = requests.post("https://someapi.com", data=request_data,
proxies={"http": "http://999.999.999.999:1212"}, timeout=5)
print(r.text)
How come I get the response even if the proxy provided was invalid?
You can define the proxies like this;
import requests
pxy = "http://999.999.999.999:1212"
proxyDict = {
'http': pxy,
'https': pxy,
'ftp': pxy,
'SOCKS4': pxy
}
r = requests.post("https://someapi.com", data=request_data,
proxies=proxyDict, timeout=5)
print(r.text)
I am trying to build a simple webbot in Python, on Windows, using MechanicalSoup. Unfortunately, I am sitting behind a (company-enforced) proxy. I could not find a way to provide a proxy to MechanicalSoup. Is there such an option at all? If not, what are my alternatives?
EDIT: Following Eytan's hint, I added proxies and verify to my code, which got me a step further, but I still cannot submit a form:
import mechanicalsoup
proxies = {
'https': 'my.https.proxy:8080',
'http': 'my.http.proxy:8080'
}
url = 'https://stackoverflow.com/'
browser = mechanicalsoup.StatefulBrowser()
front_page = browser.open(url, proxies=proxies, verify=False)
form = browser.select_form('form[action="/search"]')
form.print_summary()
form["q"] = "MechanicalSoup"
form.print_summary()
browser.submit(form, url=url)
The code hangs in the last line, and submitdoesn't accept proxies as an argument.
It seems that proxies have to be specified on the session level. Then they are not required in browser.open and submitting the form also works:
import mechanicalsoup
proxies = {
'https': 'my.https.proxy:8080',
'http': 'my.http.proxy:8080'
}
url = 'https://stackoverflow.com/'
browser = mechanicalsoup.StatefulBrowser()
browser.session.proxies = proxies # THIS IS THE SOLUTION!
front_page = browser.open(url, verify=False)
form = browser.select_form('form[action="/search"]')
form["q"] = "MechanicalSoup"
result = browser.submit(form, url=url)
result.status_code
returns 200 (i.e. "OK").
According to their doc, this should work:
browser.get(url, proxies=proxy)
Try passing the 'proxies' argument to your requests.
trying to send Post request with the cookies on my pc from get request
#! /usr/bin/python
import re #regex
import urllib
import urllib2
#get request
x = urllib2.urlopen("http://www.example.com) #GET Request
cookies=x.headers['set-cookie'] #to get the cookies from get request
url = 'http://example' # to know the values type any password to know the cookies
values = {"username" : "admin",
"passwd" : password,
"lang" : "" ,
"option" : "com_login",
"task" : "login",
"return" : "aW5kZXgucGhw" }
data = urllib.urlencode(values)
req = urllib2.Request(url, data)
response = urllib2.urlopen(req)
result = response.read()
cookies=response.headers['set-cookie'] #to get the last cookies from post req in this variable
then i searched in google
how to send cookies inside same post request and found
opener = urllib2.build_opener() # send the cookies
opener.addheaders.append(('Cookie', cookies)) # send the cookies
f = opener.open("http://example")
but i don't exactly where should i type it in my code
what i need to do exactly is to
send GET request, put the cookies from the request in variable,then make post request with the value that i got from the GET request
if anyone know answer i need edit on my code
Just create a HTTP opener and a cookiejar handler. So cookies will be retrieved and will be passed together to next request automatically. See:
import urllib2 as net
import cookielib
import urllib
cookiejar = cookielib.CookieJar()
cookiejar.clear_session_cookies()
opener = net.build_opener(net.HTTPCookieProcessor(cookiejar))
data = urllib.urlencode(values)
request = net.Request(url, urllib.urlencode(data))
response = opener.open(request)
As opener is a global handler, just make any request and the previous cookies sent from previous request will be in the next request (POST/GET), automatically.
You should really look into the requests library python has to offer. All you need to do is make a dictionary for you cookies key/value pair and pass it is as an arg.
Your entire code could be replaced by
#import requests
url = 'http://example' # to know the values type any password to know the cookies
values = {"username" : "admin",
"passwd" : password,
"lang" : "" ,
"option" : "com_login",
"task" : "login",
"return" : "aW5kZXgucGhw" }
session = requests.Session()
response = session.get(url, data=values)
cookies = session.cookies.get_dict()
response = reqeusts.post(url, data=values, cookies=cookies)
The second piece of code is probably what you want, but depends on the format of the response.
I was using Mechanize module a while ago, and now try to use Requests module.
(Python mechanize doesn't work when HTTPS and Proxy Authentication required)
I have to go through proxy-server when I access the Internet.
The proxy-server requires authentication. I wrote the following codes.
import requests
from requests.auth import HTTPProxyAuth
proxies = {"http":"192.168.20.130:8080"}
auth = HTTPProxyAuth("username", "password")
r = requests.get("http://www.google.co.jp/", proxies=proxies, auth=auth)
The above codes work well when proxy-server requires basic authentication.
Now I want to know what I have to do when proxy-server requires digest authentication.
HTTPProxyAuth seems not to be effective in digest authentication (r.status_code returns 407).
No need to implement your own! in most cases
Requests has built in support for proxies, for basic authentication:
proxies = { 'https' : 'https://user:password#proxyip:port' }
r = requests.get('https://url', proxies=proxies)
see more on the docs
Or in case you need digest authentication HTTPDigestAuth may help.
Or you might need try to extend it like yutaka2487 did bellow.
Note: must use ip of proxy server not its name!
I wrote the class that can be used in proxy authentication (based on digest auth).
I borrowed almost all codes from requests.auth.HTTPDigestAuth.
import requests
import requests.auth
class HTTPProxyDigestAuth(requests.auth.HTTPDigestAuth):
def handle_407(self, r):
"""Takes the given response and tries digest-auth, if needed."""
num_407_calls = r.request.hooks['response'].count(self.handle_407)
s_auth = r.headers.get('Proxy-authenticate', '')
if 'digest' in s_auth.lower() and num_407_calls < 2:
self.chal = requests.auth.parse_dict_header(s_auth.replace('Digest ', ''))
# Consume content and release the original connection
# to allow our new request to reuse the same one.
r.content
r.raw.release_conn()
r.request.headers['Authorization'] = self.build_digest_header(r.request.method, r.request.url)
r.request.send(anyway=True)
_r = r.request.response
_r.history.append(r)
return _r
return r
def __call__(self, r):
if self.last_nonce:
r.headers['Proxy-Authorization'] = self.build_digest_header(r.method, r.url)
r.register_hook('response', self.handle_407)
return r
Usage:
proxies = {
"http" :"192.168.20.130:8080",
"https":"192.168.20.130:8080",
}
auth = HTTPProxyDigestAuth("username", "password")
# HTTP
r = requests.get("http://www.google.co.jp/", proxies=proxies, auth=auth)
r.status_code # 200 OK
# HTTPS
r = requests.get("https://www.google.co.jp/", proxies=proxies, auth=auth)
r.status_code # 200 OK
I've written a Python module (available here) which makes it possible to authenticate with a HTTP proxy using the digest scheme. It works when connecting to HTTPS websites (through monkey patching) and allows to authenticate with the website as well. This should work with latest requests library for both Python 2 and 3.
The following example fetches the webpage https://httpbin.org/ip through HTTP proxy 1.2.3.4:8080 which requires HTTP digest authentication using user name user1 and password password1:
import requests
from requests_digest_proxy import HTTPProxyDigestAuth
s = requests.Session()
s.proxies = {
'http': 'http://1.2.3.4:8080/',
'https': 'http://1.2.3.4:8080/'
}
s.auth = HTTPProxyDigestAuth('user1', 'password1')
print(s.get('https://httpbin.org/ip').text)
Should the website requires some kind of HTTP authentication, this can be specified to HTTPProxyDigestAuth constructor this way:
# HTTP Basic authentication for website
s.auth = HTTPProxyDigestAuth(('user1', 'password1'),
auth=requests.auth.HTTPBasicAuth('user1', 'password0'))
print(s.get('https://httpbin.org/basic-auth/user1/password0').text))
# HTTP Digest authentication for website
s.auth = HTTPProxyDigestAuth(('user1', 'password1'),,
auth=requests.auth.HTTPDigestAuth('user1', 'password0'))
print(s.get('https://httpbin.org/digest-auth/auth/user1/password0').text)
This snippet works for both types of requests (http and https). Tested on the current version of requests (2.23.0).
import re
import requests
from requests.utils import get_auth_from_url
from requests.auth import HTTPDigestAuth
from requests.utils import parse_dict_header
from urllib3.util import parse_url
def get_proxy_autorization_header(proxy, method):
username, password = get_auth_from_url(proxy)
auth = HTTPProxyDigestAuth(username, password)
proxy_url = parse_url(proxy)
proxy_response = requests.request(method, proxy_url, auth=auth)
return proxy_response.request.headers['Proxy-Authorization']
class HTTPSAdapterWithProxyDigestAuth(requests.adapters.HTTPAdapter):
def proxy_headers(self, proxy):
headers = {}
proxy_auth_header = get_proxy_autorization_header(proxy, 'CONNECT')
headers['Proxy-Authorization'] = proxy_auth_header
return headers
class HTTPAdapterWithProxyDigestAuth(requests.adapters.HTTPAdapter):
def proxy_headers(self, proxy):
return {}
def add_headers(self, request, **kwargs):
proxy = kwargs['proxies'].get('http', '')
if proxy:
proxy_auth_header = get_proxy_autorization_header(proxy, request.method)
request.headers['Proxy-Authorization'] = proxy_auth_header
class HTTPProxyDigestAuth(requests.auth.HTTPDigestAuth):
def init_per_thread_state(self):
# Ensure state is initialized just once per-thread
if not hasattr(self._thread_local, 'init'):
self._thread_local.init = True
self._thread_local.last_nonce = ''
self._thread_local.nonce_count = 0
self._thread_local.chal = {}
self._thread_local.pos = None
self._thread_local.num_407_calls = None
def handle_407(self, r, **kwargs):
"""
Takes the given response and tries digest-auth, if needed.
:rtype: requests.Response
"""
# If response is not 407, do not auth
if r.status_code != 407:
self._thread_local.num_407_calls = 1
return r
s_auth = r.headers.get('proxy-authenticate', '')
if 'digest' in s_auth.lower() and self._thread_local.num_407_calls < 2:
self._thread_local.num_407_calls += 1
pat = re.compile(r'digest ', flags=re.IGNORECASE)
self._thread_local.chal = requests.utils.parse_dict_header(
pat.sub('', s_auth, count=1))
# Consume content and release the original connection
# to allow our new request to reuse the same one.
r.content
r.close()
prep = r.request.copy()
requests.cookies.extract_cookies_to_jar(prep._cookies, r.request, r.raw)
prep.prepare_cookies(prep._cookies)
prep.headers['Proxy-Authorization'] = self.build_digest_header(prep.method, prep.url)
_r = r.connection.send(prep, **kwargs)
_r.history.append(r)
_r.request = prep
return _r
self._thread_local.num_407_calls = 1
return r
def __call__(self, r):
# Initialize per-thread state, if needed
self.init_per_thread_state()
# If we have a saved nonce, skip the 407
if self._thread_local.last_nonce:
r.headers['Proxy-Authorization'] = self.build_digest_header(r.method, r.url)
r.register_hook('response', self.handle_407)
self._thread_local.num_407_calls = 1
return r
session = requests.Session()
session.proxies = {
'http': 'http://username:password#proxyhost:proxyport',
'https': 'http://username:password#proxyhost:proxyport'
}
session.trust_env = False
session.mount('http://', HTTPAdapterWithProxyDigestAuth())
session.mount('https://', HTTPSAdapterWithProxyDigestAuth())
response_http = session.get("http://ww3.safestyle-windows.co.uk/the-secret-door/")
print(response_http.status_code)
response_https = session.get("https://stackoverflow.com/questions/13506455/how-to-pass-proxy-authentication-requires-digest-auth-by-using-python-requests")
print(response_https.status_code)
Generally, the problem of proxy autorization is also relevant for other types of authentication (ntlm, kerberos) when connecting using the protocol HTTPS. And despite the large number of issues (since 2013, and maybe there are earlier ones that I did not find):
in requests: Digest Proxy Auth, NTLM Proxy Auth, Kerberos Proxy Auth
in urlib3: NTLM Proxy Auth, NTLM Proxy Auth
and many many others,the problem is still not resolved.
The root of the problem in the function _tunnel of the module httplib(python2)/http.client(python3). In case of unsuccessful connection attempt, it raises an OSError without returning a response code (407 in our case) and additional data needed to build the autorization header. Lukasa gave a explanation here.
As long as there is no solution from maintainers of urllib3 (or requests), we can only use various workarounds (for example, use the approach of #Tey' or do something like this).In my version of workaround, we pre-prepare the necessary authorization data by sending a request to the proxy server and processing the received response.
You can use digest authentication by using requests.auth.HTTPDigestAuth instead of requests.auth.HTTPProxyAuth
For those of you that still end up here, there appears to be a project called requests-toolbelt that has this plus other common but not built in functionality of requests.
https://toolbelt.readthedocs.org/en/latest/authentication.html#httpproxydigestauth
This works for me. Actually, don't know about security of user:password in this soulution:
import requests
import os
http_proxyf = 'http://user:password#proxyip:port'
os.environ["http_proxy"] = http_proxyf
os.environ["https_proxy"] = http_proxyf
sess = requests.Session()
# maybe need sess.trust_env = True
print(sess.get('https://some.org').text)
import requests
import os
# in my case I had to add my local domain
proxies = {
'http': 'proxy.myagency.com:8080',
'https': 'user#localdomain:password#proxy.myagency.com:8080',
}
r=requests.get('https://api.github.com/events', proxies=proxies)
print(r.text)
Here is an answer that is not for http Basic Authentication - for example a transperant proxy within organization.
import requests
url = 'https://someaddress-behindproxy.com'
params = {'apikey': '123456789'} #if you need params
proxies = {'https': 'https://proxyaddress.com:3128'} #or some other port
response = requests.get(url, proxies=proxies, params=params)
I hope this helps someone.