Proxies with Python 'Requests' module - python
Just a short, simple one about the excellent Requests module for Python.
I can't seem to find in the documentation what the variable 'proxies' should contain. When I send it a dict with a standard "IP:PORT" value it rejected it asking for 2 values.
So, I guess (because this doesn't seem to be covered in the docs) that the first value is the ip and the second the port?
The docs mention this only:
proxies – (optional) Dictionary mapping protocol to the URL of the proxy.
So I tried this... what should I be doing?
proxy = { ip: port}
and should I convert these to some type before putting them in the dict?
r = requests.get(url,headers=headers,proxies=proxy)
The proxies' dict syntax is {"protocol": "scheme://ip:port", ...}. With it you can specify different (or the same) proxie(s) for requests using http, https, and ftp protocols:
http_proxy = "http://10.10.1.10:3128"
https_proxy = "https://10.10.1.11:1080"
ftp_proxy = "ftp://10.10.1.10:3128"
proxies = {
"http" : http_proxy,
"https" : https_proxy,
"ftp" : ftp_proxy
}
r = requests.get(url, headers=headers, proxies=proxies)
Deduced from the requests documentation:
Parameters:
method – method for the new Request object.
url – URL for the new Request object.
...
proxies – (optional) Dictionary mapping protocol to the URL of the proxy.
...
On linux you can also do this via the HTTP_PROXY, HTTPS_PROXY, and FTP_PROXY environment variables:
export HTTP_PROXY=10.10.1.10:3128
export HTTPS_PROXY=10.10.1.11:1080
export FTP_PROXY=10.10.1.10:3128
On Windows:
set http_proxy=10.10.1.10:3128
set https_proxy=10.10.1.11:1080
set ftp_proxy=10.10.1.10:3128
You can refer to the proxy documentation here.
If you need to use a proxy, you can configure individual requests with the proxies argument to any request method:
import requests
proxies = {
"http": "http://10.10.1.10:3128",
"https": "https://10.10.1.10:1080",
}
requests.get("http://example.org", proxies=proxies)
To use HTTP Basic Auth with your proxy, use the http://user:password#host.com/ syntax:
proxies = {
"http": "http://user:pass#10.10.1.10:3128/"
}
I have found that urllib has some really good code to pick up the system's proxy settings and they happen to be in the correct form to use directly. You can use this like:
import urllib
...
r = requests.get('http://example.org', proxies=urllib.request.getproxies())
It works really well and urllib knows about getting Mac OS X and Windows settings as well.
The accepted answer was a good start for me, but I kept getting the following error:
AssertionError: Not supported proxy scheme None
Fix to this was to specify the http:// in the proxy url thus:
http_proxy = "http://194.62.145.248:8080"
https_proxy = "https://194.62.145.248:8080"
ftp_proxy = "10.10.1.10:3128"
proxyDict = {
"http" : http_proxy,
"https" : https_proxy,
"ftp" : ftp_proxy
}
I'd be interested as to why the original works for some people but not me.
Edit: I see the main answer is now updated to reflect this :)
If you'd like to persisist cookies and session data, you'd best do it like this:
import requests
proxies = {
'http': 'http://user:pass#10.10.1.0:3128',
'https': 'https://user:pass#10.10.1.0:3128',
}
# Create the session and set the proxies.
s = requests.Session()
s.proxies = proxies
# Make the HTTP request through the session.
r = s.get('http://www.showmemyip.com/')
8 years late. But I like:
import os
import requests
os.environ['HTTP_PROXY'] = os.environ['http_proxy'] = 'http://http-connect-proxy:3128/'
os.environ['HTTPS_PROXY'] = os.environ['https_proxy'] = 'http://http-connect-proxy:3128/'
os.environ['NO_PROXY'] = os.environ['no_proxy'] = '127.0.0.1,localhost,.local'
r = requests.get('https://example.com') # , verify=False
The documentation
gives a very clear example of the proxies usage
import requests
proxies = {
'http': 'http://10.10.1.10:3128',
'https': 'http://10.10.1.10:1080',
}
requests.get('http://example.org', proxies=proxies)
What isn't documented, however, is the fact that you can even configure proxies for individual urls even if the schema is the same!
This comes in handy when you want to use different proxies for different websites you wish to scrape.
proxies = {
'http://example.org': 'http://10.10.1.10:3128',
'http://something.test': 'http://10.10.1.10:1080',
}
requests.get('http://something.test/some/url', proxies=proxies)
Additionally, requests.get essentially uses the requests.Session under the hood, so if you need more control, use it directly
import requests
proxies = {
'http': 'http://10.10.1.10:3128',
'https': 'http://10.10.1.10:1080',
}
session = requests.Session()
session.proxies.update(proxies)
session.get('http://example.org')
I use it to set a fallback (a default proxy) that handles all traffic that doesn't match the schemas/urls specified in the dictionary
import requests
proxies = {
'http': 'http://10.10.1.10:3128',
'https': 'http://10.10.1.10:1080',
}
session = requests.Session()
session.proxies.setdefault('http', 'http://127.0.0.1:9009')
session.proxies.update(proxies)
session.get('http://example.org')
i just made a proxy graber and also can connect with same grabed proxy without any input
here is :
#Import Modules
from termcolor import colored
from selenium import webdriver
import requests
import os
import sys
import time
#Proxy Grab
options = webdriver.ChromeOptions()
options.add_argument('headless')
driver = webdriver.Chrome(chrome_options=options)
driver.get("https://www.sslproxies.org/")
tbody = driver.find_element_by_tag_name("tbody")
cell = tbody.find_elements_by_tag_name("tr")
for column in cell:
column = column.text.split(" ")
print(colored(column[0]+":"+column[1],'yellow'))
driver.quit()
print("")
os.system('clear')
os.system('cls')
#Proxy Connection
print(colored('Getting Proxies from graber...','green'))
time.sleep(2)
os.system('clear')
os.system('cls')
proxy = {"http": "http://"+ column[0]+":"+column[1]}
url = 'https://mobile.facebook.com/login'
r = requests.get(url, proxies=proxy)
print("")
print(colored('Connecting using proxy' ,'green'))
print("")
sts = r.status_code
here is my basic class in python for the requests module with some proxy configs and stopwatch !
import requests
import time
class BaseCheck():
def __init__(self, url):
self.http_proxy = "http://user:pw#proxy:8080"
self.https_proxy = "http://user:pw#proxy:8080"
self.ftp_proxy = "http://user:pw#proxy:8080"
self.proxyDict = {
"http" : self.http_proxy,
"https" : self.https_proxy,
"ftp" : self.ftp_proxy
}
self.url = url
def makearr(tsteps):
global stemps
global steps
stemps = {}
for step in tsteps:
stemps[step] = { 'start': 0, 'end': 0 }
steps = tsteps
makearr(['init','check'])
def starttime(typ = ""):
for stemp in stemps:
if typ == "":
stemps[stemp]['start'] = time.time()
else:
stemps[stemp][typ] = time.time()
starttime()
def __str__(self):
return str(self.url)
def getrequests(self):
g=requests.get(self.url,proxies=self.proxyDict)
print g.status_code
print g.content
print self.url
stemps['init']['end'] = time.time()
#print stemps['init']['end'] - stemps['init']['start']
x= stemps['init']['end'] - stemps['init']['start']
print x
test=BaseCheck(url='http://google.com')
test.getrequests()
It’s a bit late but here is a wrapper class that simplifies scraping proxies and then making an http POST or GET:
ProxyRequests
https://github.com/rootVIII/proxy_requests
Already tested, the following code works. Need to use HTTPProxyAuth.
import requests
from requests.auth import HTTPProxyAuth
USE_PROXY = True
proxy_user = "aaa"
proxy_password = "bbb"
http_proxy = "http://your_proxy_server:8080"
https_proxy = "http://your_proxy_server:8080"
proxies = {
"http": http_proxy,
"https": https_proxy
}
def test(name):
print(f'Hi, {name}') # Press Ctrl+F8 to toggle the breakpoint.
# Create the session and set the proxies.
session = requests.Session()
if USE_PROXY:
session.trust_env = False
session.proxies = proxies
session.auth = HTTPProxyAuth(proxy_user, proxy_password)
r = session.get('https://www.stackoverflow.com')
print(r.status_code)
if __name__ == '__main__':
test('aaa')
I share some code how to fetch proxies from the site "https://free-proxy-list.net" and store data to a file compatible with tools like "Elite Proxy Switcher"(format IP:PORT):
##PROXY_UPDATER - get free proxies from https://free-proxy-list.net/
from lxml.html import fromstring
import requests
from itertools import cycle
import traceback
import re
######################FIND PROXIES#########################################
def get_proxies():
url = 'https://free-proxy-list.net/'
response = requests.get(url)
parser = fromstring(response.text)
proxies = set()
for i in parser.xpath('//tbody/tr')[:299]: #299 proxies max
proxy = ":".join([i.xpath('.//td[1]/text()')
[0],i.xpath('.//td[2]/text()')[0]])
proxies.add(proxy)
return proxies
######################write to file in format IP:PORT######################
try:
proxies = get_proxies()
f=open('proxy_list.txt','w')
for proxy in proxies:
f.write(proxy+'\n')
f.close()
print ("DONE")
except:
print ("MAJOR ERROR")
Related
How do I make proxies work with requests?
When I try to run my code, I get an error and I can't understand why. Help! import requests import json proxies = { "https": "189.113.217.35:49733", "http": "5.252.161.48:8080" } r = requests.get("https://groups.roblox.com/v1/groups/1",proxies=proxies) j = r.json() print(j)
I figured it out, the ip adress didn't have access to the proxies.
it pretty simple, i would create a session: session = requests.Session() then a proxies dict: proxies = { 'http': 'http://5.252.161.48:8080', 'https': 'http://5.252.161.48:8080' } and inject the proxies in the session session.proxies.update(proxies)
How to apply proxy with authentication using http.client for API
I'm attempting to use Bing's Entity search API, however I need to go through a proxy in order to establish a connection. My company has giving me a proxy in the following format to use: 'http://username:password#webproxy.subdomain.website.com:8080' I've tried using HTTPConnection.set_tunnel(host, port=None, headers=None) with no avail. Does anyone have any idea how to run the following Python query below using the proxy I was provided? import http.client, urllib.parse import json proxy = 'http://username:pwd#webproxy.subdomain.website.com:8080' subscriptionKey = 'MY_KEY_HERE' host = 'api.cognitive.microsoft.com' path = '/bing/v7.0/entities' mkt = 'en-US' query = 'italian restaurants near me' params = '?mkt=' + mkt + '&q=' + urllib.parse.quote (query) def get_suggestions (): headers = {'Ocp-Apim-Subscription-Key': subscriptionKey} conn = http.client.HTTPSConnection(host) conn.request("GET", path + params, None, headers) response = conn.getresponse() return response.read() result = get_suggestions () print (json.dumps(json.loads(result), indent=4)) As a side note, I was able to run the following sucessfully with the proxy. nltk.set_proxy('http://username:pwd#webproxy.subdomain.website.com:8080') nltk.download('wordnet')
I ended up using the requests package, they make it very simple to channel your request through a proxy, sample code below for reference: import requests proxies = { 'http': f"http://username:pwd#webproxy.subdomain.website.com:8080", 'https': f"https://username:pwd#webproxy.subdomain.website.com:8080" } url = 'https://api.cognitive.microsoft.com/bing/v7.0/entities/' # query string parameters params = 'mkt=' + 'en-US' + '&q=' + urllib.parse.quote (query) # custom headers headers = {'Ocp-Apim-Subscription-Key': subscriptionKey} #start session session = requests.Session() #persist proxy across request session.proxies = proxies # make GET request r = session.get(url, params=params, headers=headers)
If you use urlib2 or requests, the easiest way to configure a proxy is setting the HTTP_PROXY or the HTTPS_PROXY environment variable: export HTTPS_PROXY=https://user:pass#host:port This way the library handles the proxy configuration for you, and if your code runs inside a container you can pass that information at runtime.
Using MechanicalSoup behind proxy
I am trying to build a simple webbot in Python, on Windows, using MechanicalSoup. Unfortunately, I am sitting behind a (company-enforced) proxy. I could not find a way to provide a proxy to MechanicalSoup. Is there such an option at all? If not, what are my alternatives? EDIT: Following Eytan's hint, I added proxies and verify to my code, which got me a step further, but I still cannot submit a form: import mechanicalsoup proxies = { 'https': 'my.https.proxy:8080', 'http': 'my.http.proxy:8080' } url = 'https://stackoverflow.com/' browser = mechanicalsoup.StatefulBrowser() front_page = browser.open(url, proxies=proxies, verify=False) form = browser.select_form('form[action="/search"]') form.print_summary() form["q"] = "MechanicalSoup" form.print_summary() browser.submit(form, url=url) The code hangs in the last line, and submitdoesn't accept proxies as an argument.
It seems that proxies have to be specified on the session level. Then they are not required in browser.open and submitting the form also works: import mechanicalsoup proxies = { 'https': 'my.https.proxy:8080', 'http': 'my.http.proxy:8080' } url = 'https://stackoverflow.com/' browser = mechanicalsoup.StatefulBrowser() browser.session.proxies = proxies # THIS IS THE SOLUTION! front_page = browser.open(url, verify=False) form = browser.select_form('form[action="/search"]') form["q"] = "MechanicalSoup" result = browser.submit(form, url=url) result.status_code returns 200 (i.e. "OK").
According to their doc, this should work: browser.get(url, proxies=proxy) Try passing the 'proxies' argument to your requests.
python httplib: connect through proxy with authentification
I am trying to send GET request through a proxy with authentification. I have the following existing code: import httplib username = 'myname' password = '1234' proxyserver = "136.137.138.139" url = "http://google.com" c = httplib.HTTPConnection(proxyserver, 83, timeout = 30) c.connect() c.request("GET", url) resp = c.getresponse() data = resp.read() print data when running this code, I get an answer from the proxy saying that I must provide authentification, which is correct. In my code, I don't use login and password. My problem is that i don't know how to use them ! Any idea ?
You can refer this code if you specifically want to use httplib. https://gist.github.com/beugley/13dd4cba88a19169bcb0 But you could also use the easier requests module. import requests proxies = { "http": "http://username:password#proxyserver:port/", # "https": "https://username:password#proxyserver:port/", } url = 'http://google.com' data = requests.get(url, proxies=proxies)
How to pass proxy-authentication (requires digest auth) by using python requests module
I was using Mechanize module a while ago, and now try to use Requests module. (Python mechanize doesn't work when HTTPS and Proxy Authentication required) I have to go through proxy-server when I access the Internet. The proxy-server requires authentication. I wrote the following codes. import requests from requests.auth import HTTPProxyAuth proxies = {"http":"192.168.20.130:8080"} auth = HTTPProxyAuth("username", "password") r = requests.get("http://www.google.co.jp/", proxies=proxies, auth=auth) The above codes work well when proxy-server requires basic authentication. Now I want to know what I have to do when proxy-server requires digest authentication. HTTPProxyAuth seems not to be effective in digest authentication (r.status_code returns 407).
No need to implement your own! in most cases Requests has built in support for proxies, for basic authentication: proxies = { 'https' : 'https://user:password#proxyip:port' } r = requests.get('https://url', proxies=proxies) see more on the docs Or in case you need digest authentication HTTPDigestAuth may help. Or you might need try to extend it like yutaka2487 did bellow. Note: must use ip of proxy server not its name!
I wrote the class that can be used in proxy authentication (based on digest auth). I borrowed almost all codes from requests.auth.HTTPDigestAuth. import requests import requests.auth class HTTPProxyDigestAuth(requests.auth.HTTPDigestAuth): def handle_407(self, r): """Takes the given response and tries digest-auth, if needed.""" num_407_calls = r.request.hooks['response'].count(self.handle_407) s_auth = r.headers.get('Proxy-authenticate', '') if 'digest' in s_auth.lower() and num_407_calls < 2: self.chal = requests.auth.parse_dict_header(s_auth.replace('Digest ', '')) # Consume content and release the original connection # to allow our new request to reuse the same one. r.content r.raw.release_conn() r.request.headers['Authorization'] = self.build_digest_header(r.request.method, r.request.url) r.request.send(anyway=True) _r = r.request.response _r.history.append(r) return _r return r def __call__(self, r): if self.last_nonce: r.headers['Proxy-Authorization'] = self.build_digest_header(r.method, r.url) r.register_hook('response', self.handle_407) return r Usage: proxies = { "http" :"192.168.20.130:8080", "https":"192.168.20.130:8080", } auth = HTTPProxyDigestAuth("username", "password") # HTTP r = requests.get("http://www.google.co.jp/", proxies=proxies, auth=auth) r.status_code # 200 OK # HTTPS r = requests.get("https://www.google.co.jp/", proxies=proxies, auth=auth) r.status_code # 200 OK
I've written a Python module (available here) which makes it possible to authenticate with a HTTP proxy using the digest scheme. It works when connecting to HTTPS websites (through monkey patching) and allows to authenticate with the website as well. This should work with latest requests library for both Python 2 and 3. The following example fetches the webpage https://httpbin.org/ip through HTTP proxy 1.2.3.4:8080 which requires HTTP digest authentication using user name user1 and password password1: import requests from requests_digest_proxy import HTTPProxyDigestAuth s = requests.Session() s.proxies = { 'http': 'http://1.2.3.4:8080/', 'https': 'http://1.2.3.4:8080/' } s.auth = HTTPProxyDigestAuth('user1', 'password1') print(s.get('https://httpbin.org/ip').text) Should the website requires some kind of HTTP authentication, this can be specified to HTTPProxyDigestAuth constructor this way: # HTTP Basic authentication for website s.auth = HTTPProxyDigestAuth(('user1', 'password1'), auth=requests.auth.HTTPBasicAuth('user1', 'password0')) print(s.get('https://httpbin.org/basic-auth/user1/password0').text)) # HTTP Digest authentication for website s.auth = HTTPProxyDigestAuth(('user1', 'password1'),, auth=requests.auth.HTTPDigestAuth('user1', 'password0')) print(s.get('https://httpbin.org/digest-auth/auth/user1/password0').text)
This snippet works for both types of requests (http and https). Tested on the current version of requests (2.23.0). import re import requests from requests.utils import get_auth_from_url from requests.auth import HTTPDigestAuth from requests.utils import parse_dict_header from urllib3.util import parse_url def get_proxy_autorization_header(proxy, method): username, password = get_auth_from_url(proxy) auth = HTTPProxyDigestAuth(username, password) proxy_url = parse_url(proxy) proxy_response = requests.request(method, proxy_url, auth=auth) return proxy_response.request.headers['Proxy-Authorization'] class HTTPSAdapterWithProxyDigestAuth(requests.adapters.HTTPAdapter): def proxy_headers(self, proxy): headers = {} proxy_auth_header = get_proxy_autorization_header(proxy, 'CONNECT') headers['Proxy-Authorization'] = proxy_auth_header return headers class HTTPAdapterWithProxyDigestAuth(requests.adapters.HTTPAdapter): def proxy_headers(self, proxy): return {} def add_headers(self, request, **kwargs): proxy = kwargs['proxies'].get('http', '') if proxy: proxy_auth_header = get_proxy_autorization_header(proxy, request.method) request.headers['Proxy-Authorization'] = proxy_auth_header class HTTPProxyDigestAuth(requests.auth.HTTPDigestAuth): def init_per_thread_state(self): # Ensure state is initialized just once per-thread if not hasattr(self._thread_local, 'init'): self._thread_local.init = True self._thread_local.last_nonce = '' self._thread_local.nonce_count = 0 self._thread_local.chal = {} self._thread_local.pos = None self._thread_local.num_407_calls = None def handle_407(self, r, **kwargs): """ Takes the given response and tries digest-auth, if needed. :rtype: requests.Response """ # If response is not 407, do not auth if r.status_code != 407: self._thread_local.num_407_calls = 1 return r s_auth = r.headers.get('proxy-authenticate', '') if 'digest' in s_auth.lower() and self._thread_local.num_407_calls < 2: self._thread_local.num_407_calls += 1 pat = re.compile(r'digest ', flags=re.IGNORECASE) self._thread_local.chal = requests.utils.parse_dict_header( pat.sub('', s_auth, count=1)) # Consume content and release the original connection # to allow our new request to reuse the same one. r.content r.close() prep = r.request.copy() requests.cookies.extract_cookies_to_jar(prep._cookies, r.request, r.raw) prep.prepare_cookies(prep._cookies) prep.headers['Proxy-Authorization'] = self.build_digest_header(prep.method, prep.url) _r = r.connection.send(prep, **kwargs) _r.history.append(r) _r.request = prep return _r self._thread_local.num_407_calls = 1 return r def __call__(self, r): # Initialize per-thread state, if needed self.init_per_thread_state() # If we have a saved nonce, skip the 407 if self._thread_local.last_nonce: r.headers['Proxy-Authorization'] = self.build_digest_header(r.method, r.url) r.register_hook('response', self.handle_407) self._thread_local.num_407_calls = 1 return r session = requests.Session() session.proxies = { 'http': 'http://username:password#proxyhost:proxyport', 'https': 'http://username:password#proxyhost:proxyport' } session.trust_env = False session.mount('http://', HTTPAdapterWithProxyDigestAuth()) session.mount('https://', HTTPSAdapterWithProxyDigestAuth()) response_http = session.get("http://ww3.safestyle-windows.co.uk/the-secret-door/") print(response_http.status_code) response_https = session.get("https://stackoverflow.com/questions/13506455/how-to-pass-proxy-authentication-requires-digest-auth-by-using-python-requests") print(response_https.status_code) Generally, the problem of proxy autorization is also relevant for other types of authentication (ntlm, kerberos) when connecting using the protocol HTTPS. And despite the large number of issues (since 2013, and maybe there are earlier ones that I did not find): in requests: Digest Proxy Auth, NTLM Proxy Auth, Kerberos Proxy Auth in urlib3: NTLM Proxy Auth, NTLM Proxy Auth and many many others,the problem is still not resolved. The root of the problem in the function _tunnel of the module httplib(python2)/http.client(python3). In case of unsuccessful connection attempt, it raises an OSError without returning a response code (407 in our case) and additional data needed to build the autorization header. Lukasa gave a explanation here. As long as there is no solution from maintainers of urllib3 (or requests), we can only use various workarounds (for example, use the approach of #Tey' or do something like this).In my version of workaround, we pre-prepare the necessary authorization data by sending a request to the proxy server and processing the received response.
You can use digest authentication by using requests.auth.HTTPDigestAuth instead of requests.auth.HTTPProxyAuth
For those of you that still end up here, there appears to be a project called requests-toolbelt that has this plus other common but not built in functionality of requests. https://toolbelt.readthedocs.org/en/latest/authentication.html#httpproxydigestauth
This works for me. Actually, don't know about security of user:password in this soulution: import requests import os http_proxyf = 'http://user:password#proxyip:port' os.environ["http_proxy"] = http_proxyf os.environ["https_proxy"] = http_proxyf sess = requests.Session() # maybe need sess.trust_env = True print(sess.get('https://some.org').text)
import requests import os # in my case I had to add my local domain proxies = { 'http': 'proxy.myagency.com:8080', 'https': 'user#localdomain:password#proxy.myagency.com:8080', } r=requests.get('https://api.github.com/events', proxies=proxies) print(r.text)
Here is an answer that is not for http Basic Authentication - for example a transperant proxy within organization. import requests url = 'https://someaddress-behindproxy.com' params = {'apikey': '123456789'} #if you need params proxies = {'https': 'https://proxyaddress.com:3128'} #or some other port response = requests.get(url, proxies=proxies, params=params) I hope this helps someone.