what is the proper way to use proxies with requests in python - python

Requests is not honoring the proxies flag.
There is something I am missing about making a request over a proxy with python requests library.
If I enable the OS system proxy, then it works, but if I make the request with just requests module proxies setting, the remote machine will not see the proxy set in requests, but will see my real ip, it is as if not proxy was set.
The bellow example will show this effect, at the time of this post the bellow proxy is alive but any working proxy should replicate the effect.
import requests
proxy ={
'http:': 'https://143.208.200.26:7878',
'https:': 'http://143.208.200.26:7878'
}
data = requests.get(url='http://ip-api.com/json', proxies=proxy).json()
print('Ip: %s\nCity: %s\nCountry: %s' % (data['query'], data['city'], data['country']))
I also tried changing the proxy_dict format:
proxy ={
'http:': '143.208.200.26:7878',
'https:': '143.208.200.26:7878'
}
But still it has not effect.
I am using:
-Windows 10
-python 3.9.6
-urllib 1.25.8
Many thanks in advance for any response to help sort this out.

Ok is working yea !!! .
The credits for solving this goes to (Olvin Rogh) Thanks Olvin for your help and pointing out my problem. I was adding colon ":" inside the keys
This code is working now.
PROXY = {'https': 'https://143.208.200.26:7878',
'http': 'http://143.208.200.26:7878'}
with requests.Session() as session:
session.proxies = PROXY
r = session.get('http://ip-api.com/json')
print(json.dumps(r.json(), indent=2))

Related

Proxy server doesn't change public IP with Python Requests

I'm running this script:
import requests
proxyDict = {"http" : 'http://81.93.73.28:8081'}
r = requests.get('http://ipinfo.io/ip', proxies=proxyDict)
r.status_code
r.headers['content-type']
r.encoding
print(r.text)
I've tried my own proxy server as well as several public servers. It still prints my current ip.
What am I doing wrong?
Problems seem to be with proxy. I tried the random, free one with that code. Also, your code got a few issues. You are calling attributes without usage - they are no need. Try with that code and proxy, for me, it worked.
proxyDict = {"http" : 'http://162.14.18.11:80'}
r = requests.get('http://ipinfo.io/ip', proxies=proxyDict, )
print(r.status_code)
print(r.text)

requests: how to disable / bypass proxy

I am getting an url with:
r = requests.get("http://myserver.com")
As I can see in the 'access.log' of "myserver.com", the client's system proxy is used. But I want to disable using proxies at all with requests.
The only way I'm currently aware of for disabling proxies entirely is the following:
Create a session
Set session.trust_env to False
Create your request using that session
import requests
session = requests.Session()
session.trust_env = False
response = session.get('http://www.stackoverflow.com')
This is based on this comment by Lukasa and the (limited) documentation for requests.Session.trust_env.
Note: Setting trust_env to False also ignores the following:
Authentication information from .netrc (code)
CA bundles defined in REQUESTS_CA_BUNDLE or CURL_CA_BUNDLE (code)
If however you only want to disable proxies for a particular domain (like localhost), you can use the NO_PROXY environment variable:
import os
import requests
os.environ['NO_PROXY'] = 'stackoverflow.com'
response = requests.get('http://www.stackoverflow.com')
You can choose proxies for each request. From the docs:
import requests
proxies = {
"http": "http://10.10.1.10:3128",
"https": "http://10.10.1.10:1080",
}
requests.get("http://example.org", proxies=proxies)
So to disable the proxy, just set each one to the empty string:
import requests
proxies = {
"http": "",
"https": "",
}
requests.get("http://example.org", proxies=proxies)
Update: Switched from None to "", see comments.
The way to stop requests/urllib from proxying any requests is to set the the no_proxy (or NO_PROXY) environment variable to * e.g. in bash:
export no_proxy='*'
Or from Python:
import os
os.environ['no_proxy'] = '*'
To understand why this works is because the urllib.request.getproxies function first checks for any proxies set in the environment variables (e.g. http_proxy, HTTP_PROXY, https_proxy, HTTPS_PROXY, etc) or if none are set then it will check for system configured proxies using platform specific calls (e.g. On MacOS it will check using the system scutil/configd interfaces, and on Windows it will check the Registry). As mentioned in the comments if any proxy variables are set you can reset them as #udani suggested, or unset them like this from Python:
del os.environ['HTTP_PROXY']
Then when urllib attempts to use any proxies the proxyHandler function it will check for the presence and setting of the no_proxy environment variable - which can either be set to specific hostnames as mentioned above or it can be set the special * value whereby all hosts bypass the proxy.
With Python3, jtpereyda's solution didn't work, but the following did:
proxies = {
"http": "",
"https": "",
}
requests library respects environment variables.
http://docs.python-requests.org/en/latest/user/advanced/#proxies
So try deleting environment variables HTTP_PROXY and HTTPS_PROXY.
import os
for k in list(os.environ.keys()):
if k.lower().endswith('_proxy'):
del os.environ[k]
I implemented #jtpereyda's solution in our production codebase which worked fine on normal successful HTTP requests (200 OK), but this code ended up not working when receiving an HTTP redirect (301 Moved Permamently). Instead use:
requests.get("https://pypi.org/pypi/pillow/9.0.0/json", proxies={"http": "", "https": ""})
For comparison, this line causes a requests.exception.SSLError when behind a proxy (pypi.org tries to redirect us to Pillow with an uppercase P):
requests.get("https://pypi.org/pypi/pillow/9.0.0/json", proxies={"http": None, "https": None})
r = requests.post('https://localhost:44336/api/',data='',verify=False)
I faced the same issue when connecting with localhost to access my .net backend from a Python script with the request module.
I set verify to False, which cancels the default SSL verification.
P.s - above code will throw a warning that can be neglected by below one
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
r=requests.post('https://localhost:44336/api/',data='',verify=False)
For those for which no_proxy="*" doesnt work, try 0.0.0.0/32, that worked for me.

requests via a SOCKs proxy

How can I make an HTTP request via a SOCKs proxy (simply using ssh -D as the proxy)? I've tried using requests with SOCK proxies but it doesn't appear to work (I saw this pull request). For example:
proxies = { "http": "socks5://localhost:9999/" }
r = requests.post( endpoint, data=request, proxies=proxies )
It'd be convenient to keep using the requests library, but I can also switch to urllib2 if that is known to work.
Since SOCKS support has been added to requests 2.10.0, it is remarkably simple, and very close to what you have
Install requests[socks]:
$ pip install requests[socks]
Set up your proxies variable, and make use of it:
>>> import requests
>>> proxies = {
"http":"socks5://localhost:9999",
"https":"socks5://localhost:9999"
}
>>> requests.get(
"https://api.ipify.org?format=json",
proxies=proxies
).json()
{u'ip': u'123.xxx.xxx.xxx'}
A few things to note are to not use a / on the end of the proxies URL, and that you can also use socks4:// as the scheme too if the SOCKS server doesn't support SOCKS5.
SOCKS support for requests is still pending. If you want, you can view my Github repository here to see my branch of the Socksipy library. This is the branch that is currently being integrated into requests; it will be some time before requests fully supports it, though.
https://github.com/Anorov/PySocks/
It should work okay with urllib2. Import sockshandler in your file, and follow the example inside of it. You'll want to create an opener like this:
opener = urllib2.build_opener(SocksiPyHandler(socks.PROXY_TYPE_SOCKS5, "localhost", 9050))
Then you can use opener.open(url) and it should tunnel through the proxy.

Using urllib2 via proxy

I am trying to use urllib2 through a proxy; however, after trying just about every variation of passing my verification details using urllib2, I either get a request that hangs forever and returns nothing or I get 407 Errors. I can connect to the web fine using my browser which connects to a prox-pac and redirects accordingly; however, I can't seem to do anything via the command line curl, wget, urllib2 etc. even if I use the proxies that the prox-pac redirects to. I tried setting my proxy to all of the proxies from the pac-file using urllib2, none of which work.
My current script looks like this:
import urllib2 as url
proxy = url.ProxyHandler({'http': 'username:password#my.proxy:8080'})
auth = url.HTTPBasicAuthHandler()
opener = url.build_opener(proxy, auth, url.HTTPHandler)
url.install_opener(opener)
url.urlopen("http://www.google.com/")
which throws HTTP Error 407: Proxy Authentication Required and I also tried:
import urllib2 as url
handlePass = url.HTTPPasswordMgrWithDefaultRealm()
handlePass.add_password(None, "http://my.proxy:8080", "username", "password")
auth_handler = url.HTTPBasicAuthHandler(handlePass)
opener = url.build_opener(auth_handler)
url.install_opener(opener)
url.urlopen("http://www.google.com")
which hangs like curl or wget timing out.
What do I need to do to diagnose the problem? How is it possible that I can connect via my browser but not from the command line on the same computer using what would appear to be the same proxy and credentials?
Might it be something to do with the router? if so, how can it distinguish between browser HTTP requests and command line HTTP requests?
Frustrations like this are what drove me to use Requests. If you're doing significant amounts of work with urllib2, you really ought to check it out. For example, to do what you wish to do using Requests, you could write:
import requests
from requests.auth import HTTPProxyAuth
proxy = {'http': 'http://my.proxy:8080'}
auth = HTTPProxyAuth('username', 'password')
r = requests.get('http://wwww.google.com/', proxies=proxy, auth=auth)
print r.text
Or you could wrap it in a Session object and every request will automatically use the proxy information (plus it will store & handle cookies automatically!):
s = requests.Session(proxies=proxy, auth=auth)
r = s.get('http://www.google.com/')
print r.text

urlib2.urlopen through proxy fails after a few calls

Edit: after much fiddling, it seems urlgrabber succeeds where urllib2 fails, even when telling it close the connection after each file. Seems like there might be something wrong with the way urllib2 handles proxies, or with the way I use it !
Anyways, here is the simplest possible code to retrieve files in a loop:
import urlgrabber
for i in range(1, 100):
url = "http://www.iana.org/domains/example/"
urlgrabber.urlgrab(url, proxies={'http':'http://<user>:<password>#<proxy url>:<proxy port>'}, keepalive=1, close_connection=1, throttle=0)
Hello all !
I am trying to write a very simple python script to grab a bunch of files via urllib2.
This script needs to work through the proxy at work (my issue does not exist if grabbing files on the intranet, i.e. without the proxy).
Said script fails after a couple of requests with "HTTPError: HTTP Error 401: basic auth failed". Any idea why that might be ? It seems the proxy is rejecting my authentication, but why ? The first couple of urlopen requests went through correctly !
Edit: Adding a sleep of 10 seconds between requests to avoid some kind of throttling that might be done by the proxy did not change the results.
Here is a simplified version of my script (with identified information stripped, obviously):
import urllib2
passmgr = urllib2.HTTPPasswordMgrWithDefaultRealm()
passmgr.add_password(None, '<proxy url>:<proxy port>', '<my user name>', '<my password>')
authinfo = urllib2.ProxyBasicAuthHandler(passmgr)
proxy_support = urllib2.ProxyHandler({"http" : "<proxy http address>"})
opener = urllib2.build_opener(authinfo, proxy_support)
urllib2.install_opener(opener)
for i in range(100):
with open("e:/tmp/images/tst{}.htm".format(i), "w") as outfile:
f = urllib2.urlopen("http://www.iana.org/domains/example/")
outfile.write(f.read())
Thanks in advance !
You can minimize the number of connection by using the keepalive handler from the urlgrabber module.
import urllib2
from keepalive import HTTPHandler
keepalive_handler = HTTPHandler()
opener = urllib2.build_opener(keepalive_handler)
urllib2.install_opener(opener)
fo = urllib2.urlopen('http://www.python.org')
I am unsure that this will work correctly with your Proxy setup.
You may have to hack the keepalive module.
The proxy might be throttling your requests. I guess it thinks you look like a bot.
You could add a timeout, and see if that gets you through.

Categories

Resources